Basic Medical Data Exploration Visualization Heart Diseases Source
In this lecture we’re going to learn how to use matplotlib and seaborn by following along with the following example. As always, the source author’s link is listed for reference. This page will evolve over time.
Dataset The dataset we’ll use here is the Heart Disease Data Set containing 302 patient data each with 75 attributes. However, this example only uses 14 of them which can be seen below.
The columns used include:
age: age in years sex: sex cp: chest pain type Value 1: typical angina Value 2: atypical angina Value 3: non-anginal pain Value 4: asymptomatic trestbps: resting blood pressure (in mm Hg on admission to the hospital) chol: serum cholestoral in mg/dl fbs: fasting blood sugar > 120 mg/dl restecg: restecg: resting electrocardiographic results Value 0: normal Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria thalach: maximum heart rate achieved exang: exercise induced angina oldpeak: ST depression induced by exercise relative to rest slope: the slope of the peak exercise ST segment Value 1: upsloping Value 2: flat Value 3: downsloping ca: number of major vessels (0-3) colored by flourosopy thal: 3 = normal 6 = fixed defect 7 = reversable defect num: diagnosis of heart disease (angiographic disease status) Value 0: < 50% diameter narrowing Value 1: > 50% diameter narrowing columns = [ "age" ,
"sex" ,
"cp" ,
"trestbps" ,
"chol" ,
"fbs" ,
"restecg" ,
"thalach" ,
"exang" ,
"oldpeak" ,
"slope" ,
"ca" ,
"thal" ,
"num" ]
# disable warnings for lecture
import warnings
warnings . filterwarnings ( 'ignore' )
Overview of the Data Set , Cleaning, and Viewing import pandas as pd
# import the data and see the basic description
df = pd . read_csv ( "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data" )
df . columns = columns
print ( "---- Describe ----" )
print ( df . describe ())
---- Describe ----
age sex cp trestbps chol fbs \
count 302.000000 302.000000 302.000000 302.000000 302.000000 302.000000
mean 54.410596 0.678808 3.165563 131.645695 246.738411 0.145695
std 9.040163 0.467709 0.953612 17.612202 51.856829 0.353386
min 29.000000 0.000000 1.000000 94.000000 126.000000 0.000000
25% 48.000000 0.000000 3.000000 120.000000 211.000000 0.000000
50% 55.500000 1.000000 3.000000 130.000000 241.500000 0.000000
75% 61.000000 1.000000 4.000000 140.000000 275.000000 0.000000
max 77.000000 1.000000 4.000000 200.000000 564.000000 1.000000
restecg thalach exang oldpeak slope num
count 302.000000 302.000000 302.000000 302.000000 302.000000 302.000000
mean 0.986755 149.605960 0.327815 1.035430 1.596026 0.940397
std 0.994916 22.912959 0.470196 1.160723 0.611939 1.229384
min 0.000000 71.000000 0.000000 0.000000 1.000000 0.000000
25% 0.000000 133.250000 0.000000 0.000000 1.000000 0.000000
50% 0.500000 153.000000 0.000000 0.800000 2.000000 0.000000
75% 2.000000 166.000000 1.000000 1.600000 2.000000 2.000000
max 2.000000 202.000000 1.000000 6.200000 3.000000 4.000000
print ( '---- Info -----' )
print ( df . info ())
---- Info -----
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302 entries, 0 to 301
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 302 non-null float64
1 sex 302 non-null float64
2 cp 302 non-null float64
3 trestbps 302 non-null float64
4 chol 302 non-null float64
5 fbs 302 non-null float64
6 restecg 302 non-null float64
7 thalach 302 non-null float64
8 exang 302 non-null float64
9 oldpeak 302 non-null float64
10 slope 302 non-null float64
11 ca 302 non-null object
12 thal 302 non-null object
13 num 302 non-null int64
dtypes: float64(11), int64(1), object(2)
memory usage: 33.2+ KB
None
We notice above that the ca
and thal
data elements are objects which we’ll likely want to remap. Let’s take a look at the data.
array(['3.0', '7.0', '6.0', '?'], dtype=object)
array(['3.0', '2.0', '0.0', '1.0', '?'], dtype=object)
From the codebook above we see these are coded values that we can remap.
# Replace Every Number greater than 0 to 1 to mark heart disease
df . loc [ df [ 'num' ] > 0 , 'num' ] = 1
df . ca = pd . to_numeric ( df . ca , errors = 'coerce' ). fillna ( 0 )
df . thal = pd . to_numeric ( df . thal , errors = 'coerce' ). fillna ( 0 )
Now we can view the datatypes of the remapped data to float64
and int64
.
print ( '---- Dtype ----' )
print ( df . dtypes )
---- Dtype ----
age float64
sex float64
cp float64
trestbps float64
chol float64
fbs float64
restecg float64
thalach float64
exang float64
oldpeak float64
slope float64
ca float64
thal float64
num int64
dtype: object
Next we’ll want to
print ( '---- Null Data ----' )
# count how many null values exist
print ( df . isnull (). sum ())
---- Null Data ----
age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
num 0
dtype: int64
# quickly check to see if there are any null values
print ( df . isnull (). values . any ())
After doing simple clean up, changing non-numerical value to NaN and replacing NaN with 0 we can safely say our data is somewhat clean.
First / Last 10 Rows # print the first 10 and last 10
print ( '------ First 10 -------' )
df . head ( 10 )
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num 0 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.0 3.0 1 1 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.0 7.0 1 2 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.0 3.0 0 3 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.0 3.0 0 4 56.0 1.0 2.0 120.0 236.0 0.0 0.0 178.0 0.0 0.8 1.0 0.0 3.0 0 5 62.0 0.0 4.0 140.0 268.0 0.0 2.0 160.0 0.0 3.6 3.0 2.0 3.0 1 6 57.0 0.0 4.0 120.0 354.0 0.0 0.0 163.0 1.0 0.6 1.0 0.0 3.0 0 7 63.0 1.0 4.0 130.0 254.0 0.0 2.0 147.0 0.0 1.4 2.0 1.0 7.0 1 8 53.0 1.0 4.0 140.0 203.0 1.0 2.0 155.0 1.0 3.1 3.0 0.0 7.0 1 9 57.0 1.0 4.0 140.0 192.0 0.0 0.0 148.0 0.0 0.4 2.0 0.0 6.0 0
# Last 10
print ( '------ Last 10 -------' )
df . tail ( 10 )
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num 292 63.0 1.0 4.0 140.0 187.0 0.0 2.0 144.0 1.0 4.0 1.0 2.0 7.0 1 293 63.0 0.0 4.0 124.0 197.0 0.0 0.0 136.0 1.0 0.0 2.0 0.0 3.0 1 294 41.0 1.0 2.0 120.0 157.0 0.0 0.0 182.0 0.0 0.0 1.0 0.0 3.0 0 295 59.0 1.0 4.0 164.0 176.0 1.0 2.0 90.0 0.0 1.0 2.0 2.0 6.0 1 296 57.0 0.0 4.0 140.0 241.0 0.0 0.0 123.0 1.0 0.2 2.0 0.0 7.0 1 297 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.0 7.0 1 298 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.0 7.0 1 299 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.0 7.0 1 300 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.0 3.0 1 301 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0 0.0 3.0 0
Plotting Histograms After reviewing the data in tabular form we want to visualize all of the data across the variables. We can do this easily with a histogram.
# import matplotlib
import matplotlib.pyplot as plt
% matplotlib inline
# using pandas to generate the plots
df . hist ()
# using matplotlib to render (or show) the plot
plt . show ()
# get the histogram of every data points
fig = plt . figure ( figsize = ( 18 , 18 ))
ax = fig . gca ()
df . hist ( ax = ax , bins = 30 )
plt . show ()
With simple histogram of our data, we can easily observe the distribution of different attributes. One thing to note here is the fact that it is extremely easy for us to see which attributes are categorical values and which are not.
We can inspect a little bit more closely and take a look at the distribution of ages and fbs (fasting blood sugar). We can see that the age distribution is closely resembling of Gaussian distribution while fbs is a categorical value.
# import seaborn
import seaborn as sns
# a closer look at age
plt . figure ( figsize = ( 8 , 8 ))
sns . distplot ( df . age )
plt . show ()
plt . close ( 'all' )
# a closer look at fbs
plt . figure ( figsize = ( 8 , 8 ))
sns . distplot ( df . fbs )
plt . show ()
Variance-Covariance Matrix We can calculate variance-covariance matrices in a number of ways. First we’ll use Numpy and then we’ll use the built-in Dataframe functrion. Once calculated, we can observe that most attributes do not have a strong covariance relationship.
import numpy as np
from numpy import dot
# calculate the Variance-Covariance Matrix
sample = df . values
sample = sample - dot ( np . ones (( sample . shape [ 0 ], sample . shape [ 0 ])), sample ) / ( len ( sample ) - 1 )
covv = dot ( sample . T , sample ) / ( len ( sample ) - 1 )
plt . figure ( figsize = ( 8 , 8 ))
sns . heatmap ( covv )
plt . show ()
# compare with built in
plt . figure ( figsize = ( 8 , 8 ))
sns . heatmap ( df . cov ())
plt . show ()
Correlation matrix Similarly, the first image is created by manual numpy calculation and the second using the bulit-in method. Ee can observe that among the attributes there are actually strong correlation with one another. (especially heart disease and thal).
# calculate correaltion matrix
sample = df . values
certering_mat = np . diag ( np . ones (( 302 ))) - np . ones (( 302 , 302 )) / 302
std_matrix = np . diag ( np . std ( sample , 0 ))
temp = dot ( certering_mat , dot ( sample , np . linalg . inv ( std_matrix ) ))
temp = dot ( temp . T , temp ) / len ( sample )
# plot
plt . figure ( figsize = ( 13 , 13 ))
sns . heatmap ( np . around ( temp , 2 ), annot = True , fmt = ".2f" , cmap = "Blues" , annot_kws = { "size" : 15 })
plt . show ()
# correaltion matrix
sns . set ( font_scale = 2 )
plt . figure ( figsize = ( 13 , 13 ))
sns . heatmap ( df . corr (). round ( 2 ), annot = True , fmt = ".2f" , cmap = "Blues" , annot_kws = { "size" : 15 })
plt . show ()
Interactive Histogram # plot the people who have heart vs not
plt . figure ( figsize = ( 13 , 13 ))
sns . distplot ( df . age [ df . num == 0 ], label = 'No Disease' , color = 'blue' )
sns . distplot ( df . age [ df . num == 1 ], label = 'Disease' , color = 'Red' )
sns . distplot ( df . trestbps [ df . num == 0 ], label = 'No Disease' , color = 'Green' )
sns . distplot ( df . trestbps [ df . num == 1 ], label = 'Disease' , color = 'violet' )
plt . legend ()
plt . show ()
% matplotlib inline
import pygal
from IPython.display import SVG , HTML
html_pygal = """
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/pygal-tooltips.js"></script>
<!-- ... -->
</head>
<body>
<figure>
{pygal_render}
</figure>
</body>
</html>
"""
hist = pygal . Histogram ()
count , division = np . histogram ( df . age [ df . num == 0 ]. values , bins = 100 )
temp = []
for c , div in zip ( count , division ):
temp . append (( c , div , div + 1 ))
count , division = np . histogram ( df . age [ df . num == 1 ]. values , bins = 100 )
temp1 = []
for c , div in zip ( count , division ):
temp1 . append (( c , div , div + 1 ))
count , division = np . histogram ( df . trestbps [ df . num == 0 ]. values , bins = 100 )
temp2 = []
for c , div in zip ( count , division ):
temp2 . append (( c , div , div + 1 ))
count , division = np . histogram ( df . trestbps [ df . num == 1 ]. values , bins = 100 )
temp3 = []
for c , div in zip ( count , division ):
temp3 . append (( c , div , div + 1 ))
hist . add ( 'No Disease age' , temp )
hist . add ( 'Disease age' , temp1 )
hist . add ( 'No Disease ' , temp2 )
hist . add ( 'Disease' , temp3 )
hist . render ()
HTML ( html_pygal . format ( pygal_render = hist . render ()))
<!DOCTYPE html>
b'\nPygal 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 40 40 60 60 80 80 100 100 120 120 140 140 160 160 180 180 200 200 1: 29 13.19010346378768 518.3277591973244 0: 29.47 14.771569950517323 529.6153846153846 0: 29.94 16.353036437246978 529.6153846153846 0: 30.41 17.934502923976616 529.6153846153846 0: 30.88 19.51596941070626 529.6153846153846 0: 31.35 21.09743589743591 529.6153846153846 0: 31.82 22.678902384165553 529.6153846153846 0: 32.29 24.260368870895192 529.6153846153846 0: 32.76 25.84183535762483 529.6153846153846 0: 33.23 27.423301844354476 529.6153846153846 2: 33.7 29.004768331084136 507.04013377926424 0: 34.17 30.586234817813782 529.6153846153846 2: 34.64 32.16770130454342 507.04013377926424 0: 35.11 33.74916779127307 529.6153846153846 0: 35.58 35.330634278002705 529.6153846153846 0: 36.05 36.912100764732344 529.6153846153846 0: 36.52 38.49356725146198 529.6153846153846 2: 36.99 40.07503373819165 507.04013377926424 0: 37.46 41.65650022492129 529.6153846153846 1: 37.93 43.23796671165093 518.3277591973244 0: 38.4 44.81943319838057 529.6153846153846 3: 38.87 46.40089968511021 495.75250836120404 0: 39.34 47.98236617183987 529.6153846153846 1: 39.81 49.56383265856952 518.3277591973244 0: 40.28 51.14529914529916 529.6153846153846 9: 40.75 52.7267656320288 428.0267558528428 0: 41.22 54.30823211875844 529.6153846153846 7: 41.69 55.88969860548809 450.60200668896323 0: 42.16 57.471165092217724 529.6153846153846 5: 42.63 59.052631578947356 473.17725752508363 0: 43.1 60.63409806567703 529.6153846153846 8: 43.57 62.21556455240666 439.314381270903 0: 44.04 63.79703103913631 529.6153846153846 0: 44.51 65.37849752586595 529.6153846153846 6: 44.98 66.9599640125956 461.88963210702343 0: 45.45 68.54143049932526 529.6153846153846 4: 45.92 70.12289698605491 484.46488294314383 0: 46.39 71.70436347278454 529.6153846153846 3: 46.86 73.28582995951419 495.75250836120404 0: 47.33 74.86729644624383 529.6153846153846 4: 47.8 76.44876293297347 484.46488294314383 0: 48.27 78.0302294197031 529.6153846153846 3: 48.74 79.61169590643274 495.75250836120404 0: 49.21 81.1931623931624 529.6153846153846 4: 49.68 82.77462887989205 484.46488294314383 0: 50.15 84.3560953666217 529.6153846153846 9: 50.62 85.93756185335133 428.0267558528428 0: 51.09 87.51902834008101 529.6153846153846 9: 51.56 89.10049482681063 428.0267558528428 0: 52.03 90.68196131354028 529.6153846153846 0: 52.5 92.26342780026992 529.6153846153846 6: 52.97 93.84489428699956 461.88963210702343 0: 53.44 95.42636077372921 529.6153846153846 10: 53.91 97.00782726045885 416.7391304347826 0: 54.38 98.58929374718849 529.6153846153846 3: 54.85 100.17076023391812 495.75250836120404 0: 55.32 101.7522267206478 529.6153846153846 5: 55.79 103.33369320737742 473.17725752508363 0: 56.26 104.91515969410707 529.6153846153846 7: 56.73 106.49662618083671 450.60200668896323 0: 57.2 108.07809266756638 529.6153846153846 7: 57.67 109.65955915429602 450.60200668896323 0: 58.14 111.24102564102566 529.6153846153846 5: 58.61 112.8224921277553 473.17725752508363 0: 59.08 114.40395861448494 529.6153846153846 3: 59.55 115.98542510121459 495.75250836120404 0: 60.02 117.56689158794423 529.6153846153846 0: 60.49 119.14835807467387 529.6153846153846 1: 60.96 120.7298245614035 518.3277591973244 0: 61.43 122.31129104813317 529.6153846153846 4: 61.9 123.89275753486281 484.46488294314383 0: 62.37 125.47422402159245 529.6153846153846 2: 62.84 127.0556905083221 507.04013377926424 0: 63.31 128.63715699505175 529.6153846153846 6: 63.78 130.21862348178138 461.88963210702343 0: 64.25 131.80008996851103 529.6153846153846 4: 64.72 133.38155645524068 484.46488294314383 0: 65.19 134.96302294197034 529.6153846153846 4: 65.66 136.54448942869996 484.46488294314383 0: 66.13 138.1259559154296 529.6153846153846 3: 66.6 139.70742240215924 495.75250836120404 0: 67.07 141.2888888888889 529.6153846153846 2: 67.54 142.87035537561854 507.04013377926424 0: 68.01 144.45182186234814 529.6153846153846 0: 68.48 146.0332883490778 529.6153846153846 2: 68.95 147.61475483580745 507.04013377926424 0: 69.42 149.1962213225371 529.6153846153846 1: 69.89 150.77768780926675 518.3277591973244 0: 70.36 152.3591542959964 529.6153846153846 3: 70.83 153.94062078272606 495.75250836120404 0: 71.3 155.5220872694557 529.6153846153846 0: 71.77 157.10355375618536 529.6153846153846 0: 72.24 158.685020242915 529.6153846153846 0: 72.71 160.26648672964467 529.6153846153846 0: 73.18 161.84795321637432 529.6153846153846 1: 73.65 163.42941970310397 518.3277591973244 0: 74.12 165.01088618983363 529.6153846153846 0: 74.59 166.59235267656322 529.6153846153846 0: 75.06 168.17381916329288 529.6153846153846 1: 75.53 169.75528565002253 518.3277591973244 2: 35 33.379037336932086 507.04013377926424 0: 35.42 34.79226270805219 529.6153846153846 0: 35.84 36.20548807917231 529.6153846153846 0: 36.26 37.618713450292404 529.6153846153846 0: 36.68 39.03193882141251 529.6153846153846 0: 37.1 40.44516419253263 529.6153846153846 0: 37.52 41.85838956365274 529.6153846153846 1: 37.94 43.271614934772835 518.3277591973244 0: 38.36 44.68484030589295 529.6153846153846 1: 38.78 46.098065677013054 518.3277591973244 0: 39.2 47.511291048133174 529.6153846153846 2: 39.62 48.924516419253266 507.04013377926424 0: 40.04 50.33774179037338 529.6153846153846 0: 40.46 51.75096716149349 529.6153846153846 1: 40.88 53.1641925326136 518.3277591973244 0: 41.3 54.577417903733696 529.6153846153846 1: 41.72 55.99064327485381 518.3277591973244 0: 42.14 57.403868645973915 529.6153846153846 0: 42.56 58.817094017094036 529.6153846153846 3: 42.98 60.23031938821413 495.75250836120404 0: 43.4 61.64354475933423 529.6153846153846 3: 43.82 63.05677013045435 495.75250836120404 0: 44.24 64.46999550157446 529.6153846153846 2: 44.66 65.88322087269455 507.04013377926424 0: 45.08 67.29644624381467 529.6153846153846 0: 45.5 68.70967161493479 529.6153846153846 3: 45.92 70.12289698605491 495.75250836120404 0: 46.34 71.53612235717502 529.6153846153846 2: 46.76 72.9493477282951 507.04013377926424 0: 47.18 74.36257309941521 529.6153846153846 3: 47.6 75.77579847053534 495.75250836120404 0: 48.02 77.18902384165543 529.6153846153846 0: 48.44 78.60224921277553 529.6153846153846 2: 48.86 80.01547458389564 507.04013377926424 0: 49.28 81.42869995501576 529.6153846153846 3: 49.7 82.84192532613588 495.75250836120404 0: 50.12 84.25515069725597 529.6153846153846 0: 50.54 85.66837606837609 529.6153846153846 3: 50.96 87.0816014394962 495.75250836120404 0: 51.38 88.49482681061627 529.6153846153846 4: 51.8 89.9080521817364 484.46488294314383 0: 52.22 91.32127755285651 529.6153846153846 2: 52.64 92.73450292397663 507.04013377926424 0: 53.06 94.14772829509674 529.6153846153846 0: 53.48 95.56095366621685 529.6153846153846 6: 53.9 96.97417903733694 461.88963210702343 0: 54.32 98.38740440845706 529.6153846153846 5: 54.74 99.80062977957715 473.17725752508363 0: 55.16 101.21385515069726 529.6153846153846 0: 55.58 102.62708052181736 529.6153846153846 6: 56 104.04030589293748 461.88963210702343 0: 56.42 105.4535312640576 529.6153846153846 10: 56.84 106.86675663517772 416.7391304347826 0: 57.26 108.2799820062978 529.6153846153846 12: 57.68 109.69320737741792 394.1638795986622 0: 58.1 111.10643274853801 529.6153846153846 0: 58.52 112.51965811965812 529.6153846153846 9: 58.94 113.93288349077824 428.0267558528428 0: 59.36 115.34610886189836 529.6153846153846 9: 59.78 116.75933423301848 428.0267558528428 0: 60.2 118.17255960413857 529.6153846153846 7: 60.62 119.58578497525866 450.60200668896323 0: 61.04 120.99901034637878 529.6153846153846 0: 61.46 122.41223571749887 529.6153846153846 7: 61.88 123.825461088619 450.60200668896323 0: 62.3 125.2386864597391 529.6153846153846 6: 62.72 126.65191183085922 461.88963210702343 0: 63.14 128.06513720197933 529.6153846153846 0: 63.56 129.47836257309945 529.6153846153846 4: 63.98 130.89158794421957 484.46488294314383 0: 64.4 132.3048133153397 529.6153846153846 4: 64.82 133.71803868645975 484.46488294314383 0: 65.24 135.13126405757984 529.6153846153846 3: 65.66 136.54448942869996 495.75250836120404 0: 66.08 137.95771479982008 529.6153846153846 0: 66.5 139.37094017094017 529.6153846153846 6: 66.92 140.7841655420603 461.88963210702343 0: 67.34 142.1973909131804 529.6153846153846 2: 67.76 143.61061628430048 507.04013377926424 0: 68.18 145.02384165542065 529.6153846153846 1: 68.6 146.43706702654072 518.3277591973244 0: 69.02 147.85029239766084 529.6153846153846 0: 69.44 149.26351776878096 529.6153846153846 3: 69.86 150.67674313990105 495.75250836120404 0: 70.28 152.08996851102114 529.6153846153846 0: 70.7 153.50319388214123 529.6153846153846 0: 71.12 154.91641925326138 529.6153846153846 0: 71.54 156.32964462438144 529.6153846153846 0: 71.96 157.74286999550162 529.6153846153846 0: 72.38 159.15609536662168 529.6153846153846 0: 72.8 160.5693207377418 529.6153846153846 0: 73.22 161.98254610886192 529.6153846153846 0: 73.64 163.39577147998205 529.6153846153846 0: 74.06 164.80899685110217 529.6153846153846 0: 74.48 166.22222222222223 529.6153846153846 0: 74.9 167.63544759334238 529.6153846153846 0: 75.32 169.04867296446244 529.6153846153846 0: 75.74 170.4618983355826 529.6153846153846 0: 76.16 171.87512370670265 529.6153846153846 1: 76.58 173.28834907782277 518.3277591973244 2: 94 231.90355375618537 507.04013377926424 0: 94.86 234.7973009446694 529.6153846153846 0: 95.72 237.6910481331534 529.6153846153846 0: 96.58 240.58479532163744 529.6153846153846 0: 97.44 243.47854251012149 529.6153846153846 0: 98.3 246.37228969860553 529.6153846153846 2: 99.16 249.26603688708954 507.04013377926424 0: 100.02 252.15978407557355 529.6153846153846 1: 100.88 255.0535312640576 518.3277591973244 2: 101.74 257.9472784525416 507.04013377926424 0: 102.6 260.8410256410257 529.6153846153846 1: 103.46 263.73477282950967 518.3277591973244 3: 104.32 266.6285200179937 495.75250836120404 1: 105.18 269.52226720647775 518.3277591973244 0: 106.04 272.4160143949618 529.6153846153846 0: 106.9 275.30976158344583 529.6153846153846 4: 107.76 278.2035087719299 484.46488294314383 0: 108.62 281.0972559604139 529.6153846153846 8: 109.48 283.9910031488979 439.314381270903 0: 110.34 286.88475033738194 529.6153846153846 5: 111.2 289.778497525866 473.17725752508363 0: 112.06 292.67224471435 529.6153846153846 0: 112.92 295.56599190283407 529.6153846153846 0: 113.78 298.45973909131806 529.6153846153846 3: 114.64 301.35348627980215 495.75250836120404 0: 115.5 304.24723346828614 529.6153846153846 0: 116.36 307.1409806567702 529.6153846153846 5: 117.22 310.0347278452542 473.17725752508363 0: 118.08 312.9284750337382 529.6153846153846 0: 118.94 315.82222222222225 529.6153846153846 23: 119.8 318.7159694107063 270.00000000000006 0: 120.66 321.6097165991903 529.6153846153846 3: 121.52 324.5034637876744 495.75250836120404 0: 122.38 327.39721097615836 529.6153846153846 2: 123.24 330.29095816464235 507.04013377926424 0: 124.1 333.18470535312645 529.6153846153846 4: 124.96 336.07845254161055 484.46488294314383 1: 125.82 338.9721997300945 518.3277591973244 0: 126.68 341.8659469185785 529.6153846153846 6: 127.54 344.75969410706256 461.88963210702343 1: 128.4 347.6534412955466 518.3277591973244 23: 129.26 350.5471884840306 270.00000000000006 0: 130.12 353.4409356725147 529.6153846153846 0: 130.98 356.33468286099867 529.6153846153846 3: 131.84 359.22843004948277 495.75250836120404 0: 132.7 362.12217723796675 529.6153846153846 2: 133.56 365.01592442645074 507.04013377926424 5: 134.42 367.90967161493484 473.17725752508363 1: 135.28 370.8034188034188 518.3277591973244 0: 136.14 373.6971659919028 529.6153846153846 0: 137 376.5909131803869 529.6153846153846 9: 137.86 379.484660368871 428.0267558528428 0: 138.72 382.378407557355 529.6153846153846 17: 139.58 385.272154745839 337.7257525083612 0: 140.44 388.1659019343231 529.6153846153846 2: 141.3 391.05964912280706 507.04013377926424 0: 142.16 393.9533963112911 529.6153846153846 0: 143.02 396.84714349977503 529.6153846153846 0: 143.88 399.7408906882591 529.6153846153846 0: 144.74 402.6346378767432 529.6153846153846 1: 145.6 405.5283850652272 518.3277591973244 0: 146.46 408.4221322537113 529.6153846153846 1: 147.32 411.3158794421953 518.3277591973244 0: 148.18 414.20962663067934 529.6153846153846 0: 149.04 417.10337381916327 529.6153846153846 9: 149.9 419.99712100764737 428.0267558528428 0: 150.76 422.89086819613135 529.6153846153846 2: 151.62 425.78461538461545 507.04013377926424 0: 152.48 428.67836257309943 529.6153846153846 0: 153.34 431.57210976158353 529.6153846153846 1: 154.2 434.4658569500675 518.3277591973244 0: 155.06 437.3596041385516 529.6153846153846 1: 155.92 440.2533513270357 518.3277591973244 0: 156.78 443.1470985155196 529.6153846153846 0: 157.64 446.04084570400363 529.6153846153846 0: 158.5 448.93459289248773 529.6153846153846 5: 159.36 451.82834008097177 473.17725752508363 0: 160.22 454.7220872694557 529.6153846153846 0: 161.08 457.61583445793974 529.6153846153846 0: 161.94 460.50958164642384 529.6153846153846 0: 162.8 463.40332883490794 529.6153846153846 0: 163.66 466.2970760233918 529.6153846153846 0: 164.52 469.1908232118759 529.6153846153846 0: 165.38 472.08457040036 529.6153846153846 0: 166.24 474.9783175888441 529.6153846153846 0: 167.1 477.87206477732786 529.6153846153846 0: 167.96 480.76581196581196 529.6153846153846 0: 168.82 483.65955915429606 529.6153846153846 1: 169.68 486.55330634278016 518.3277591973244 0: 170.54 489.44705353126403 529.6153846153846 1: 171.4 492.34080071974813 518.3277591973244 0: 172.26 495.23454790823223 529.6153846153846 0: 173.12 498.12829509671633 529.6153846153846 0: 173.98 501.0220422852003 529.6153846153846 0: 174.84 503.9157894736842 529.6153846153846 0: 175.7 506.8095366621683 529.6153846153846 0: 176.56 509.7032838506524 529.6153846153846 1: 177.42 512.5970310391365 518.3277591973244 0: 178.28 515.4907782276204 529.6153846153846 1: 179.14 518.3845254161045 518.3277591973244 2: 100 252.09248762932975 507.04013377926424 0: 101 255.45730994152052 529.6153846153846 0: 102 258.8221322537113 529.6153846153846 0: 103 262.18695456590194 529.6153846153846 0: 104 265.5517768780927 529.6153846153846 0: 105 268.9165991902835 529.6153846153846 0: 106 272.2814215024742 529.6153846153846 0: 107 275.6462438146649 529.6153846153846 2: 108 279.01106612685567 507.04013377926424 0: 109 282.3758884390464 529.6153846153846 11: 110 285.7407107512371 405.4515050167224 0: 111 289.10553306342786 529.6153846153846 4: 112 292.47035537561857 484.46488294314383 0: 113 295.8351776878093 529.6153846153846 1: 114 299.20000000000005 518.3277591973244 0: 115 302.5648223121908 529.6153846153846 0: 116 305.92964462438147 529.6153846153846 1: 117 309.29446693657223 518.3277591973244 2: 118 312.659289248763 507.04013377926424 0: 119 316.0241115609537 529.6153846153846 14: 120 319.3889338731444 371.5886287625418 0: 121 322.7537561853352 529.6153846153846 1: 122 326.1185784975259 518.3277591973244 1: 123 329.4834008097166 518.3277591973244 4: 124 332.8482231219074 484.46488294314383 7: 125 336.21304543409815 450.60200668896323 2: 126 339.5778677462888 507.04013377926424 0: 127 342.94269005847957 529.6153846153846 6: 128 346.30751237067034 461.88963210702343 0: 129 349.67233468286105 529.6153846153846 13: 130 353.03715699505176 382.876254180602 0: 131 356.4019793072425 529.6153846153846 5: 132 359.76680161943324 473.17725752508363 0: 133 363.13162393162395 529.6153846153846 3: 134 366.4964462438147 495.75250836120404 1: 135 369.8612685560055 518.3277591973244 2: 136 373.2260908681962 507.04013377926424 0: 137 376.5909131803869 529.6153846153846 3: 138 379.9557354925777 495.75250836120404 0: 139 383.32055780476844 529.6153846153846 15: 140 386.6853801169591 360.3010033444816 0: 141 390.05020242914986 529.6153846153846 1: 142 393.41502474134063 518.3277591973244 0: 143 396.7798470535313 529.6153846153846 2: 144 400.14466936572205 507.04013377926424 4: 145 403.5094916779128 484.46488294314383 1: 146 406.87431399010353 518.3277591973244 0: 147 410.23913630229424 529.6153846153846 1: 148 413.60395861448495 518.3277591973244 0: 149 416.96878092667566 529.6153846153846 8: 150 420.33360323886643 439.314381270903 0: 151 423.6984255510572 529.6153846153846 3: 152 427.06324786324797 495.75250836120404 0: 153 430.4280701754387 529.6153846153846 1: 154 433.7928924876294 518.3277591973244 0: 155 437.15771479982016 529.6153846153846 0: 156 440.5225371120109 529.6153846153846 0: 157 443.88735942420163 529.6153846153846 1: 158 447.25218173639234 518.3277591973244 0: 159 450.6170040485831 529.6153846153846 6: 160 453.9818263607738 461.88963210702343 0: 161 457.34664867296453 529.6153846153846 0: 162 460.7114709851553 529.6153846153846 0: 163 464.07629329734607 529.6153846153846 1: 164 467.4411156095367 518.3277591973244 1: 165 470.8059379217275 518.3277591973244 0: 166 474.17076023391826 529.6153846153846 0: 167 477.53558254610897 529.6153846153846 0: 168 480.9004048582997 529.6153846153846 0: 169 484.26522717049045 529.6153846153846 3: 170 487.63004948268116 495.75250836120404 0: 171 490.99487179487187 529.6153846153846 0: 172 494.35969410706264 529.6153846153846 0: 173 497.7245164192534 529.6153846153846 1: 174 501.08933873144406 518.3277591973244 0: 175 504.4541610436348 529.6153846153846 0: 176 507.8189833558256 529.6153846153846 0: 177 511.1838056680163 529.6153846153846 1: 178 514.548627980207 518.3277591973244 0: 179 517.9134502923978 529.6153846153846 2: 180 521.2782726045884 507.04013377926424 0: 181 524.6430949167792 529.6153846153846 0: 182 528.00791722897 529.6153846153846 0: 183 531.3727395411607 529.6153846153846 0: 184 534.7375618533514 529.6153846153846 0: 185 538.1023841655422 529.6153846153846 0: 186 541.4672064777329 529.6153846153846 0: 187 544.8320287899237 529.6153846153846 0: 188 548.1968511021144 529.6153846153846 0: 189 551.5616734143051 529.6153846153846 0: 190 554.9264957264959 529.6153846153846 0: 191 558.2913180386865 529.6153846153846 1: 192 561.6561403508773 518.3277591973244 0: 193 565.0209626630681 529.6153846153846 0: 194 568.3857849752587 529.6153846153846 0: 195 571.7506072874495 529.6153846153846 0: 196 575.1154295996403 529.6153846153846 0: 197 578.4802519118309 529.6153846153846 0: 198 581.8450742240218 529.6153846153846 1: 199 585.2098965362125 518.3277591973244 No Disease age Disease age No Disease Disease ' ## Bar Plot / Box Plot / Pair Plot Lets first take a look at the average age of people who have heart disease vs who does not. And we can observe that people who are slightly older have more chance of having heart disease. (only from this data set.) ```python # average age of people with / out heart dieases # plt.figure(figsize=(8,8)) sns.barplot(x='num', y='age', data=df) plt.show() ``` ![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_41_0.png) Again, when we create a box plot related to the average of people who have / doesn’t have heart disease we can observe the younger people are less likely to have heart disease. ```python # box plot # plt.figure(figsize=(8,8)) sns.boxplot(x="num", y='age', data=df) plt.show() ``` ![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_43_0.png) And finally, I wanted to show the pair plot against few of the attributes such as age, thal, ca (chest pain type), thalach ( maximum heart rate achieved) and presence of heart disease. And as seen in the correlation matrix we can observe a strong negative correlation between age and thalach. ```python # show pair plot plt.figure(figsize=(14,14)) sns.pairplot(df[['age','thal','ca','thalach','num']],hue='num') plt.show() ```
![png](09%20-%20Basic%20Medical%20Data%20Visualization_files/09%20-%20Basic%20Medical%20Data%20Visualization_45_1.png) ## Uniform Manifold Approximation and Projection embedding (UMAP) t-distributed Stochastic Neighbor Embedding (t-SNE) Run the following command from the terminal. ```bash python Manifold_Approximation_and_Projection.py ``` ```python ```