Link Search Menu Expand Document

Basic Medical Data Exploration Visualization  Heart Diseases

Source

In this lecture we’re going to learn how to use matplotlib and seaborn by following along with the following example. As always, the source author’s link is listed for reference. This page will evolve over time.

Dataset

The dataset we’ll use here is the Heart Disease Data Set containing 302 patient data each with 75 attributes. However, this example only uses 14 of them which can be seen below.

The columns used include:

  1. age: age in years
  2. sex: sex
    • 1 = male
    • 0 = female
  3. cp: chest pain type
    • Value 1: typical angina
    • Value 2: atypical angina
    • Value 3: non-anginal pain
    • Value 4: asymptomatic
  4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
  5. chol: serum cholestoral in mg/dl
  6. fbs: fasting blood sugar > 120 mg/dl
    • 1 = true
    • 0 = false
  7. restecg: restecg: resting electrocardiographic results
    • Value 0: normal
    • Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    • Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria
  8. thalach: maximum heart rate achieved
  9. exang: exercise induced angina
    • 1 = yes
    • 0 = no
  10. oldpeak: ST depression induced by exercise relative to rest
  11. slope: the slope of the peak exercise ST segment
    • Value 1: upsloping
    • Value 2: flat
    • Value 3: downsloping
  12. ca: number of major vessels (0-3) colored by flourosopy
  13. thal:
    • 3 = normal
    • 6 = fixed defect
    • 7 = reversable defect
  14. num: diagnosis of heart disease (angiographic disease status)
    • Value 0: < 50% diameter narrowing
    • Value 1: > 50% diameter narrowing
columns = ["age", 
           "sex", 
           "cp", 
           "trestbps",
           "chol", 
           "fbs", 
           "restecg",
           "thalach",
           "exang", 
           "oldpeak",
           "slope", 
           "ca", 
           "thal", 
           "num"]
# disable warnings for lecture
import warnings
warnings.filterwarnings('ignore')

Overview of the Data Set , Cleaning, and Viewing

import pandas as pd

# import the data and see the basic description
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data")
df.columns = columns
print("---- Describe ----")
print(df.describe())
---- Describe ----
              age         sex          cp    trestbps        chol         fbs  \
count  302.000000  302.000000  302.000000  302.000000  302.000000  302.000000   
mean    54.410596    0.678808    3.165563  131.645695  246.738411    0.145695   
std      9.040163    0.467709    0.953612   17.612202   51.856829    0.353386   
min     29.000000    0.000000    1.000000   94.000000  126.000000    0.000000   
25%     48.000000    0.000000    3.000000  120.000000  211.000000    0.000000   
50%     55.500000    1.000000    3.000000  130.000000  241.500000    0.000000   
75%     61.000000    1.000000    4.000000  140.000000  275.000000    0.000000   
max     77.000000    1.000000    4.000000  200.000000  564.000000    1.000000   

          restecg     thalach       exang     oldpeak       slope         num  
count  302.000000  302.000000  302.000000  302.000000  302.000000  302.000000  
mean     0.986755  149.605960    0.327815    1.035430    1.596026    0.940397  
std      0.994916   22.912959    0.470196    1.160723    0.611939    1.229384  
min      0.000000   71.000000    0.000000    0.000000    1.000000    0.000000  
25%      0.000000  133.250000    0.000000    0.000000    1.000000    0.000000  
50%      0.500000  153.000000    0.000000    0.800000    2.000000    0.000000  
75%      2.000000  166.000000    1.000000    1.600000    2.000000    2.000000  
max      2.000000  202.000000    1.000000    6.200000    3.000000    4.000000  
print('---- Info -----')
print(df.info())
---- Info -----
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302 entries, 0 to 301
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       302 non-null    float64
 1   sex       302 non-null    float64
 2   cp        302 non-null    float64
 3   trestbps  302 non-null    float64
 4   chol      302 non-null    float64
 5   fbs       302 non-null    float64
 6   restecg   302 non-null    float64
 7   thalach   302 non-null    float64
 8   exang     302 non-null    float64
 9   oldpeak   302 non-null    float64
 10  slope     302 non-null    float64
 11  ca        302 non-null    object 
 12  thal      302 non-null    object 
 13  num       302 non-null    int64  
dtypes: float64(11), int64(1), object(2)
memory usage: 33.2+ KB
None

We notice above that the ca and thal data elements are objects which we’ll likely want to remap. Let’s take a look at the data.

df['thal'].unique()
array(['3.0', '7.0', '6.0', '?'], dtype=object)
df['ca'].unique()
array(['3.0', '2.0', '0.0', '1.0', '?'], dtype=object)

From the codebook above we see these are coded values that we can remap.

# Replace Every Number greater than 0 to 1 to mark heart disease
df.loc[df['num'] > 0 , 'num'] = 1
df.ca = pd.to_numeric(df.ca, errors='coerce').fillna(0)
df.thal = pd.to_numeric(df.thal, errors='coerce').fillna(0)
df['thal'].unique()
array([3., 7., 6., 0.])
df['ca'].unique()
array([3., 2., 0., 1.])

Now we can view the datatypes of the remapped data to float64 and int64.

print('---- Dtype ----')
print(df.dtypes)
---- Dtype ----
age         float64
sex         float64
cp          float64
trestbps    float64
chol        float64
fbs         float64
restecg     float64
thalach     float64
exang       float64
oldpeak     float64
slope       float64
ca          float64
thal        float64
num           int64
dtype: object

Next we’ll want to

print('---- Null Data ----')
# count how many null values exist
print(df.isnull().sum())
---- Null Data ----
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
num         0
dtype: int64
# quickly check to see if there are any null values
print(df.isnull().values.any())
False

After doing simple clean up, changing non-numerical value to NaN and replacing NaN with 0 we can safely say our data is somewhat clean.

First / Last 10 Rows

# print the first 10 and last 10
print('------ First 10 -------')
df.head(10)
------ First 10 -------
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
0 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.0 3.0 1
1 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.0 7.0 1
2 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.0 3.0 0
3 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.0 3.0 0
4 56.0 1.0 2.0 120.0 236.0 0.0 0.0 178.0 0.0 0.8 1.0 0.0 3.0 0
5 62.0 0.0 4.0 140.0 268.0 0.0 2.0 160.0 0.0 3.6 3.0 2.0 3.0 1
6 57.0 0.0 4.0 120.0 354.0 0.0 0.0 163.0 1.0 0.6 1.0 0.0 3.0 0
7 63.0 1.0 4.0 130.0 254.0 0.0 2.0 147.0 0.0 1.4 2.0 1.0 7.0 1
8 53.0 1.0 4.0 140.0 203.0 1.0 2.0 155.0 1.0 3.1 3.0 0.0 7.0 1
9 57.0 1.0 4.0 140.0 192.0 0.0 0.0 148.0 0.0 0.4 2.0 0.0 6.0 0
#  Last 10 
print('------ Last 10 -------')
df.tail(10)
------ Last 10 -------
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
292 63.0 1.0 4.0 140.0 187.0 0.0 2.0 144.0 1.0 4.0 1.0 2.0 7.0 1
293 63.0 0.0 4.0 124.0 197.0 0.0 0.0 136.0 1.0 0.0 2.0 0.0 3.0 1
294 41.0 1.0 2.0 120.0 157.0 0.0 0.0 182.0 0.0 0.0 1.0 0.0 3.0 0
295 59.0 1.0 4.0 164.0 176.0 1.0 2.0 90.0 0.0 1.0 2.0 2.0 6.0 1
296 57.0 0.0 4.0 140.0 241.0 0.0 0.0 123.0 1.0 0.2 2.0 0.0 7.0 1
297 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.0 7.0 1
298 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.0 7.0 1
299 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.0 7.0 1
300 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.0 3.0 1
301 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0 0.0 3.0 0

Plotting Histograms

After reviewing the data in tabular form we want to visualize all of the data across the variables. We can do this easily with a histogram.

# import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
# using pandas to generate the plots
df.hist()

# using matplotlib to render (or show) the plot
plt.show()

png

# get the histogram of every data points
fig = plt.figure(figsize = (18, 18))
ax = fig.gca()

df.hist(ax=ax, bins=30)
plt.show()

png

With simple histogram of our data, we can easily observe the distribution of different attributes. One thing to note here is the fact that it is extremely easy for us to see which attributes are categorical values and which are not.

We can inspect a little bit more closely and take a look at the distribution of ages and fbs (fasting blood sugar). We can see that the age distribution is closely resembling of Gaussian distribution while fbs is a categorical value.

# import seaborn
import seaborn as sns

# a closer look at age
plt.figure(figsize=(8, 8))
sns.distplot(df.age)
plt.show()
plt.close('all')

png

# a closer look at fbs
plt.figure(figsize=(8, 8))
sns.distplot(df.fbs)
plt.show()

png

Variance-Covariance Matrix

We can calculate variance-covariance matrices in a number of ways. First we’ll use Numpy and then we’ll use the built-in Dataframe functrion. Once calculated, we can observe that most attributes do not have a strong covariance relationship.

import numpy as np
from numpy import dot

# calculate the Variance-Covariance Matrix 
sample = df.values
sample = sample - dot(np.ones((sample.shape[0],sample.shape[0])),sample)/(len(sample)-1)
covv = dot(sample.T,sample)/(len(sample)-1)
plt.figure(figsize=(8,8))
sns.heatmap(covv)
plt.show()

png

# compare with built in 
plt.figure(figsize=(8,8))
sns.heatmap(df.cov())
plt.show()

png

Correlation matrix

Similarly, the first image is created by manual numpy calculation and the second using the bulit-in method. Ee can observe that among the attributes there are actually strong correlation with one another. (especially heart disease and thal).

# calculate correaltion matrix 
sample = df.values
certering_mat = np.diag(np.ones((302))) - np.ones((302,302))/302
std_matrix = np.diag(np.std(sample,0))
temp = dot(certering_mat,dot(sample, np.linalg.inv(std_matrix)  ))
temp = dot(temp.T,temp)/len(sample)

# plot
plt.figure(figsize=(13, 13))
sns.heatmap(np.around(temp,2),annot=True,fmt=".2f",cmap="Blues",annot_kws={"size": 15})
plt.show()

png

# correaltion matrix 
sns.set(font_scale=2)
plt.figure(figsize=(13,13))

sns.heatmap(df.corr().round(2),annot=True,fmt=".2f",cmap="Blues",annot_kws={"size": 15})
plt.show()

png

Interactive Histogram

# plot the people who have heart vs not 
plt.figure(figsize=(13, 13))
sns.distplot(df.age[df.num==0], label='No Disease', color='blue')
sns.distplot(df.age[df.num==1], label='Disease', color='Red')
sns.distplot(df.trestbps[df.num==0],label= 'No Disease', color='Green')
sns.distplot(df.trestbps[df.num==1], label='Disease', color='violet')
plt.legend()
plt.show()

png

%matplotlib inline
import pygal
from IPython.display import SVG, HTML
html_pygal = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/pygal-tooltips.js"></script>
    <!-- ... -->
  </head>
  <body>
    <figure>
      {pygal_render}
    </figure>
  </body>
</html>
"""

hist = pygal.Histogram()

count, division = np.histogram(df.age[df.num==0].values,bins=100)
temp = []
for c,div in zip(count,division):
    temp.append((c,div,div+1))
    
count, division = np.histogram(df.age[df.num==1].values,bins=100)
temp1 = []
for c,div in zip(count,division):
    temp1.append((c,div,div+1))
    
count, division = np.histogram(df.trestbps[df.num==0].values,bins=100)
temp2 = []
for c,div in zip(count,division):
    temp2.append((c,div,div+1))
    
count, division = np.histogram(df.trestbps[df.num==1].values,bins=100)
temp3 = []
for c,div in zip(count,division):
    temp3.append((c,div,div+1))
    
hist.add('No Disease age', temp)
hist.add('Disease age', temp1)
hist.add('No Disease ', temp2)
hist.add('Disease', temp3)
hist.render()
HTML(html_pygal.format(pygal_render=hist.render()))

<!DOCTYPE html>

b'\nPygal002244668810101212141416161818202022224040606080801001001201201401401601601801802002001: 2913.19010346378768518.32775919732440: 29.4714.771569950517323529.61538461538460: 29.9416.353036437246978529.61538461538460: 30.4117.934502923976616529.61538461538460: 30.8819.51596941070626529.61538461538460: 31.3521.09743589743591529.61538461538460: 31.8222.678902384165553529.61538461538460: 32.2924.260368870895192529.61538461538460: 32.7625.84183535762483529.61538461538460: 33.2327.423301844354476529.61538461538462: 33.729.004768331084136507.040133779264240: 34.1730.586234817813782529.61538461538462: 34.6432.16770130454342507.040133779264240: 35.1133.74916779127307529.61538461538460: 35.5835.330634278002705529.61538461538460: 36.0536.912100764732344529.61538461538460: 36.5238.49356725146198529.61538461538462: 36.9940.07503373819165507.040133779264240: 37.4641.65650022492129529.61538461538461: 37.9343.23796671165093518.32775919732440: 38.444.81943319838057529.61538461538463: 38.8746.40089968511021495.752508361204040: 39.3447.98236617183987529.61538461538461: 39.8149.56383265856952518.32775919732440: 40.2851.14529914529916529.61538461538469: 40.7552.7267656320288428.02675585284280: 41.2254.30823211875844529.61538461538467: 41.6955.88969860548809450.602006688963230: 42.1657.471165092217724529.61538461538465: 42.6359.052631578947356473.177257525083630: 43.160.63409806567703529.61538461538468: 43.5762.21556455240666439.3143812709030: 44.0463.79703103913631529.61538461538460: 44.5165.37849752586595529.61538461538466: 44.9866.9599640125956461.889632107023430: 45.4568.54143049932526529.61538461538464: 45.9270.12289698605491484.464882943143830: 46.3971.70436347278454529.61538461538463: 46.8673.28582995951419495.752508361204040: 47.3374.86729644624383529.61538461538464: 47.876.44876293297347484.464882943143830: 48.2778.0302294197031529.61538461538463: 48.7479.61169590643274495.752508361204040: 49.2181.1931623931624529.61538461538464: 49.6882.77462887989205484.464882943143830: 50.1584.3560953666217529.61538461538469: 50.6285.93756185335133428.02675585284280: 51.0987.51902834008101529.61538461538469: 51.5689.10049482681063428.02675585284280: 52.0390.68196131354028529.61538461538460: 52.592.26342780026992529.61538461538466: 52.9793.84489428699956461.889632107023430: 53.4495.42636077372921529.615384615384610: 53.9197.00782726045885416.73913043478260: 54.3898.58929374718849529.61538461538463: 54.85100.17076023391812495.752508361204040: 55.32101.7522267206478529.61538461538465: 55.79103.33369320737742473.177257525083630: 56.26104.91515969410707529.61538461538467: 56.73106.49662618083671450.602006688963230: 57.2108.07809266756638529.61538461538467: 57.67109.65955915429602450.602006688963230: 58.14111.24102564102566529.61538461538465: 58.61112.8224921277553473.177257525083630: 59.08114.40395861448494529.61538461538463: 59.55115.98542510121459495.752508361204040: 60.02117.56689158794423529.61538461538460: 60.49119.14835807467387529.61538461538461: 60.96120.7298245614035518.32775919732440: 61.43122.31129104813317529.61538461538464: 61.9123.89275753486281484.464882943143830: 62.37125.47422402159245529.61538461538462: 62.84127.0556905083221507.040133779264240: 63.31128.63715699505175529.61538461538466: 63.78130.21862348178138461.889632107023430: 64.25131.80008996851103529.61538461538464: 64.72133.38155645524068484.464882943143830: 65.19134.96302294197034529.61538461538464: 65.66136.54448942869996484.464882943143830: 66.13138.1259559154296529.61538461538463: 66.6139.70742240215924495.752508361204040: 67.07141.2888888888889529.61538461538462: 67.54142.87035537561854507.040133779264240: 68.01144.45182186234814529.61538461538460: 68.48146.0332883490778529.61538461538462: 68.95147.61475483580745507.040133779264240: 69.42149.1962213225371529.61538461538461: 69.89150.77768780926675518.32775919732440: 70.36152.3591542959964529.61538461538463: 70.83153.94062078272606495.752508361204040: 71.3155.5220872694557529.61538461538460: 71.77157.10355375618536529.61538461538460: 72.24158.685020242915529.61538461538460: 72.71160.26648672964467529.61538461538460: 73.18161.84795321637432529.61538461538461: 73.65163.42941970310397518.32775919732440: 74.12165.01088618983363529.61538461538460: 74.59166.59235267656322529.61538461538460: 75.06168.17381916329288529.61538461538461: 75.53169.75528565002253518.32775919732442: 3533.379037336932086507.040133779264240: 35.4234.79226270805219529.61538461538460: 35.8436.20548807917231529.61538461538460: 36.2637.618713450292404529.61538461538460: 36.6839.03193882141251529.61538461538460: 37.140.44516419253263529.61538461538460: 37.5241.85838956365274529.61538461538461: 37.9443.271614934772835518.32775919732440: 38.3644.68484030589295529.61538461538461: 38.7846.098065677013054518.32775919732440: 39.247.511291048133174529.61538461538462: 39.6248.924516419253266507.040133779264240: 40.0450.33774179037338529.61538461538460: 40.4651.75096716149349529.61538461538461: 40.8853.1641925326136518.32775919732440: 41.354.577417903733696529.61538461538461: 41.7255.99064327485381518.32775919732440: 42.1457.403868645973915529.61538461538460: 42.5658.817094017094036529.61538461538463: 42.9860.23031938821413495.752508361204040: 43.461.64354475933423529.61538461538463: 43.8263.05677013045435495.752508361204040: 44.2464.46999550157446529.61538461538462: 44.6665.88322087269455507.040133779264240: 45.0867.29644624381467529.61538461538460: 45.568.70967161493479529.61538461538463: 45.9270.12289698605491495.752508361204040: 46.3471.53612235717502529.61538461538462: 46.7672.9493477282951507.040133779264240: 47.1874.36257309941521529.61538461538463: 47.675.77579847053534495.752508361204040: 48.0277.18902384165543529.61538461538460: 48.4478.60224921277553529.61538461538462: 48.8680.01547458389564507.040133779264240: 49.2881.42869995501576529.61538461538463: 49.782.84192532613588495.752508361204040: 50.1284.25515069725597529.61538461538460: 50.5485.66837606837609529.61538461538463: 50.9687.0816014394962495.752508361204040: 51.3888.49482681061627529.61538461538464: 51.889.9080521817364484.464882943143830: 52.2291.32127755285651529.61538461538462: 52.6492.73450292397663507.040133779264240: 53.0694.14772829509674529.61538461538460: 53.4895.56095366621685529.61538461538466: 53.996.97417903733694461.889632107023430: 54.3298.38740440845706529.61538461538465: 54.7499.80062977957715473.177257525083630: 55.16101.21385515069726529.61538461538460: 55.58102.62708052181736529.61538461538466: 56104.04030589293748461.889632107023430: 56.42105.4535312640576529.615384615384610: 56.84106.86675663517772416.73913043478260: 57.26108.2799820062978529.615384615384612: 57.68109.69320737741792394.16387959866220: 58.1111.10643274853801529.61538461538460: 58.52112.51965811965812529.61538461538469: 58.94113.93288349077824428.02675585284280: 59.36115.34610886189836529.61538461538469: 59.78116.75933423301848428.02675585284280: 60.2118.17255960413857529.61538461538467: 60.62119.58578497525866450.602006688963230: 61.04120.99901034637878529.61538461538460: 61.46122.41223571749887529.61538461538467: 61.88123.825461088619450.602006688963230: 62.3125.2386864597391529.61538461538466: 62.72126.65191183085922461.889632107023430: 63.14128.06513720197933529.61538461538460: 63.56129.47836257309945529.61538461538464: 63.98130.89158794421957484.464882943143830: 64.4132.3048133153397529.61538461538464: 64.82133.71803868645975484.464882943143830: 65.24135.13126405757984529.61538461538463: 65.66136.54448942869996495.752508361204040: 66.08137.95771479982008529.61538461538460: 66.5139.37094017094017529.61538461538466: 66.92140.7841655420603461.889632107023430: 67.34142.1973909131804529.61538461538462: 67.76143.61061628430048507.040133779264240: 68.18145.02384165542065529.61538461538461: 68.6146.43706702654072518.32775919732440: 69.02147.85029239766084529.61538461538460: 69.44149.26351776878096529.61538461538463: 69.86150.67674313990105495.752508361204040: 70.28152.08996851102114529.61538461538460: 70.7153.50319388214123529.61538461538460: 71.12154.91641925326138529.61538461538460: 71.54156.32964462438144529.61538461538460: 71.96157.74286999550162529.61538461538460: 72.38159.15609536662168529.61538461538460: 72.8160.5693207377418529.61538461538460: 73.22161.98254610886192529.61538461538460: 73.64163.39577147998205529.61538461538460: 74.06164.80899685110217529.61538461538460: 74.48166.22222222222223529.61538461538460: 74.9167.63544759334238529.61538461538460: 75.32169.04867296446244529.61538461538460: 75.74170.4618983355826529.61538461538460: 76.16171.87512370670265529.61538461538461: 76.58173.28834907782277518.32775919732442: 94231.90355375618537507.040133779264240: 94.86234.7973009446694529.61538461538460: 95.72237.6910481331534529.61538461538460: 96.58240.58479532163744529.61538461538460: 97.44243.47854251012149529.61538461538460: 98.3246.37228969860553529.61538461538462: 99.16249.26603688708954507.040133779264240: 100.02252.15978407557355529.61538461538461: 100.88255.0535312640576518.32775919732442: 101.74257.9472784525416507.040133779264240: 102.6260.8410256410257529.61538461538461: 103.46263.73477282950967518.32775919732443: 104.32266.6285200179937495.752508361204041: 105.18269.52226720647775518.32775919732440: 106.04272.4160143949618529.61538461538460: 106.9275.30976158344583529.61538461538464: 107.76278.2035087719299484.464882943143830: 108.62281.0972559604139529.61538461538468: 109.48283.9910031488979439.3143812709030: 110.34286.88475033738194529.61538461538465: 111.2289.778497525866473.177257525083630: 112.06292.67224471435529.61538461538460: 112.92295.56599190283407529.61538461538460: 113.78298.45973909131806529.61538461538463: 114.64301.35348627980215495.752508361204040: 115.5304.24723346828614529.61538461538460: 116.36307.1409806567702529.61538461538465: 117.22310.0347278452542473.177257525083630: 118.08312.9284750337382529.61538461538460: 118.94315.82222222222225529.615384615384623: 119.8318.7159694107063270.000000000000060: 120.66321.6097165991903529.61538461538463: 121.52324.5034637876744495.752508361204040: 122.38327.39721097615836529.61538461538462: 123.24330.29095816464235507.040133779264240: 124.1333.18470535312645529.61538461538464: 124.96336.07845254161055484.464882943143831: 125.82338.9721997300945518.32775919732440: 126.68341.8659469185785529.61538461538466: 127.54344.75969410706256461.889632107023431: 128.4347.6534412955466518.327759197324423: 129.26350.5471884840306270.000000000000060: 130.12353.4409356725147529.61538461538460: 130.98356.33468286099867529.61538461538463: 131.84359.22843004948277495.752508361204040: 132.7362.12217723796675529.61538461538462: 133.56365.01592442645074507.040133779264245: 134.42367.90967161493484473.177257525083631: 135.28370.8034188034188518.32775919732440: 136.14373.6971659919028529.61538461538460: 137376.5909131803869529.61538461538469: 137.86379.484660368871428.02675585284280: 138.72382.378407557355529.615384615384617: 139.58385.272154745839337.72575250836120: 140.44388.1659019343231529.61538461538462: 141.3391.05964912280706507.040133779264240: 142.16393.9533963112911529.61538461538460: 143.02396.84714349977503529.61538461538460: 143.88399.7408906882591529.61538461538460: 144.74402.6346378767432529.61538461538461: 145.6405.5283850652272518.32775919732440: 146.46408.4221322537113529.61538461538461: 147.32411.3158794421953518.32775919732440: 148.18414.20962663067934529.61538461538460: 149.04417.10337381916327529.61538461538469: 149.9419.99712100764737428.02675585284280: 150.76422.89086819613135529.61538461538462: 151.62425.78461538461545507.040133779264240: 152.48428.67836257309943529.61538461538460: 153.34431.57210976158353529.61538461538461: 154.2434.4658569500675518.32775919732440: 155.06437.3596041385516529.61538461538461: 155.92440.2533513270357518.32775919732440: 156.78443.1470985155196529.61538461538460: 157.64446.04084570400363529.61538461538460: 158.5448.93459289248773529.61538461538465: 159.36451.82834008097177473.177257525083630: 160.22454.7220872694557529.61538461538460: 161.08457.61583445793974529.61538461538460: 161.94460.50958164642384529.61538461538460: 162.8463.40332883490794529.61538461538460: 163.66466.2970760233918529.61538461538460: 164.52469.1908232118759529.61538461538460: 165.38472.08457040036529.61538461538460: 166.24474.9783175888441529.61538461538460: 167.1477.87206477732786529.61538461538460: 167.96480.76581196581196529.61538461538460: 168.82483.65955915429606529.61538461538461: 169.68486.55330634278016518.32775919732440: 170.54489.44705353126403529.61538461538461: 171.4492.34080071974813518.32775919732440: 172.26495.23454790823223529.61538461538460: 173.12498.12829509671633529.61538461538460: 173.98501.0220422852003529.61538461538460: 174.84503.9157894736842529.61538461538460: 175.7506.8095366621683529.61538461538460: 176.56509.7032838506524529.61538461538461: 177.42512.5970310391365518.32775919732440: 178.28515.4907782276204529.61538461538461: 179.14518.3845254161045518.32775919732442: 100252.09248762932975507.040133779264240: 101255.45730994152052529.61538461538460: 102258.8221322537113529.61538461538460: 103262.18695456590194529.61538461538460: 104265.5517768780927529.61538461538460: 105268.9165991902835529.61538461538460: 106272.2814215024742529.61538461538460: 107275.6462438146649529.61538461538462: 108279.01106612685567507.040133779264240: 109282.3758884390464529.615384615384611: 110285.7407107512371405.45150501672240: 111289.10553306342786529.61538461538464: 112292.47035537561857484.464882943143830: 113295.8351776878093529.61538461538461: 114299.20000000000005518.32775919732440: 115302.5648223121908529.61538461538460: 116305.92964462438147529.61538461538461: 117309.29446693657223518.32775919732442: 118312.659289248763507.040133779264240: 119316.0241115609537529.615384615384614: 120319.3889338731444371.58862876254180: 121322.7537561853352529.61538461538461: 122326.1185784975259518.32775919732441: 123329.4834008097166518.32775919732444: 124332.8482231219074484.464882943143837: 125336.21304543409815450.602006688963232: 126339.5778677462888507.040133779264240: 127342.94269005847957529.61538461538466: 128346.30751237067034461.889632107023430: 129349.67233468286105529.615384615384613: 130353.03715699505176382.8762541806020: 131356.4019793072425529.61538461538465: 132359.76680161943324473.177257525083630: 133363.13162393162395529.61538461538463: 134366.4964462438147495.752508361204041: 135369.8612685560055518.32775919732442: 136373.2260908681962507.040133779264240: 137376.5909131803869529.61538461538463: 138379.9557354925777495.752508361204040: 139383.32055780476844529.615384615384615: 140386.6853801169591360.30100334448160: 141390.05020242914986529.61538461538461: 142393.41502474134063518.32775919732440: 143396.7798470535313529.61538461538462: 144400.14466936572205507.040133779264244: 145403.5094916779128484.464882943143831: 146406.87431399010353518.32775919732440: 147410.23913630229424529.61538461538461: 148413.60395861448495518.32775919732440: 149416.96878092667566529.61538461538468: 150420.33360323886643439.3143812709030: 151423.6984255510572529.61538461538463: 152427.06324786324797495.752508361204040: 153430.4280701754387529.61538461538461: 154433.7928924876294518.32775919732440: 155437.15771479982016529.61538461538460: 156440.5225371120109529.61538461538460: 157443.88735942420163529.61538461538461: 158447.25218173639234518.32775919732440: 159450.6170040485831529.61538461538466: 160453.9818263607738461.889632107023430: 161457.34664867296453529.61538461538460: 162460.7114709851553529.61538461538460: 163464.07629329734607529.61538461538461: 164467.4411156095367518.32775919732441: 165470.8059379217275518.32775919732440: 166474.17076023391826529.61538461538460: 167477.53558254610897529.61538461538460: 168480.9004048582997529.61538461538460: 169484.26522717049045529.61538461538463: 170487.63004948268116495.752508361204040: 171490.99487179487187529.61538461538460: 172494.35969410706264529.61538461538460: 173497.7245164192534529.61538461538461: 174501.08933873144406518.32775919732440: 175504.4541610436348529.61538461538460: 176507.8189833558256529.61538461538460: 177511.1838056680163529.61538461538461: 178514.548627980207518.32775919732440: 179517.9134502923978529.61538461538462: 180521.2782726045884507.040133779264240: 181524.6430949167792529.61538461538460: 182528.00791722897529.61538461538460: 183531.3727395411607529.61538461538460: 184534.7375618533514529.61538461538460: 185538.1023841655422529.61538461538460: 186541.4672064777329529.61538461538460: 187544.8320287899237529.61538461538460: 188548.1968511021144529.61538461538460: 189551.5616734143051529.61538461538460: 190554.9264957264959529.61538461538460: 191558.2913180386865529.61538461538461: 192561.6561403508773518.32775919732440: 193565.0209626630681529.61538461538460: 194568.3857849752587529.61538461538460: 195571.7506072874495529.61538461538460: 196575.1154295996403529.61538461538460: 197578.4802519118309529.61538461538460: 198581.8450742240218529.61538461538461: 199585.2098965362125518.3277591973244No Disease ageDisease ageNo Disease Disease'
## Bar Plot / Box Plot / Pair Plot Lets first take a look at the average age of people who have heart disease vs who does not. And we can observe that people who are slightly older have more chance of having heart disease. (only from this data set.) ```python # average age of people with / out heart dieases # plt.figure(figsize=(8,8)) sns.barplot(x='num', y='age', data=df) plt.show() ``` ![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_41_0.png) Again, when we create a box plot related to the average of people who have / doesn’t have heart disease we can observe the younger people are less likely to have heart disease. ```python # box plot # plt.figure(figsize=(8,8)) sns.boxplot(x="num", y='age', data=df) plt.show() ``` ![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_43_0.png) And finally, I wanted to show the pair plot against few of the attributes such as age, thal, ca (chest pain type), thalach ( maximum heart rate achieved) and presence of heart disease. And as seen in the correlation matrix we can observe a strong negative correlation between age and thalach. ```python # show pair plot plt.figure(figsize=(14,14)) sns.pairplot(df[['age','thal','ca','thalach','num']],hue='num') plt.show() ```
![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_45_1.png) ## Uniform Manifold Approximation and Projection embedding (UMAP) t-distributed Stochastic Neighbor Embedding (t-SNE) Run the following command from the terminal. ```bash python Manifold_Approximation_and_Projection.py ``` ```python ```