Introduction to Biostatistics and Bioinformatics Regression and Correlation.
-
Upload
vincent-wheeler -
Category
Documents
-
view
231 -
download
1
Transcript of Introduction to Biostatistics and Bioinformatics Regression and Correlation.
![Page 1: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/1.jpg)
Introduction to Biostatistics and Bioinformatics
Regression and Correlation
![Page 2: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/2.jpg)
Learning Objectives
Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression
![Page 3: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/3.jpg)
Learning Objectives
Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression
Correlation • Correlation coefficient quantifies the association strength• Sensitivity to the distribution
![Page 4: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/4.jpg)
Relationships
Relationship No Relationship
![Page 5: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/5.jpg)
Relationships
Linear Relationships
Non-Linear Relationship
![Page 6: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/6.jpg)
Relationships
Linear, Strong Linear, Weak
![Page 7: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/7.jpg)
Linear Regression
Linear, Strong Linear, Weak Non-Linear
![Page 8: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/8.jpg)
Linear Regression - Residuals
Linear, Strong Linear, Weak Non-Linear
Resi
duals
Resi
duals
Resi
duals
![Page 9: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/9.jpg)
Linear Regression Model
Linearcomponent
Intercept Slope
Random Error
Dependent Variable
Independent Variable
Random Error component
ii10i εXββY
![Page 10: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/10.jpg)
Linear Regression Assumptions
The relationship between the variables is linear.
![Page 11: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/11.jpg)
Linear Regression Assumptions
The relationship between the variables is linear.
Errors are independent, normally distributed with mean zero and constant variance.
![Page 12: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/12.jpg)
Linear Regression Assumptions
Linear Non-LinearR
esi
duals
Resi
duals
![Page 13: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/13.jpg)
Linear Regression Assumptions
Constant Variance Variable VarianceR
esi
duals
Resi
duals
![Page 14: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/14.jpg)
Linear Regression Model
Linearcomponent
Intercept Slope
Random Error
Dependent Variable
Independent Variable
Random Error component
ii10i εXββY
![Page 15: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/15.jpg)
Linear Regression – Estimating the Line
Estimated
Intercept
Estimated Slope
Estimated Value
Independent Variablei10i XˆˆY
![Page 16: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/16.jpg)
Least Squares Method
Find slope and intercept given measurements Xi,Yi, i=1..N
that minimizes the sum of the squares of the residuals.
![Page 17: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/17.jpg)
Least Squares Method
2
iS
Find slope and intercept given measurements Xi,Yi, i=1..N
that minimizes the sum of the squares of the residuals.
![Page 18: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/18.jpg)
Least Squares Method
Find slope and intercept given measurements Xi,Yi, i=1..N
that minimizes the sum of the squares of the residuals.
0ˆ0
S
01
S
![Page 19: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/19.jpg)
Least Squares Method
Find slope and intercept given measurements Xi,Yi, i=1..N
that minimizes the sum of the squares of the residuals.
0))X)X(
(ˆXY
XY(2Xˆ2XXˆ2X
Y2XY2
Xˆ2XXˆ2XY2XY2)Xˆ2X)XˆY(2XY2(
)Xˆ2Xˆ2XY2()XˆXˆˆ2XˆY2(ˆ
)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ
)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ
))Xˆˆ()Xˆˆ(Y2(Yˆ
))Xˆˆ(Y(ˆ
)Y-Y(ˆˆˆ
2i
2i
1ii
ii2i1i
i1i
iii
2i1i1iii
2i1i1ii
2i1i0ii
2i
21i10i1i
1
2i
21i10
20i10i
2i
1
2i
21i10
20i10i
2i
1
2i10i10i
2i
1
2i10i
1
2ii
1
2
11
NNNN
Si
NN
N
N
i1
i0
2i
2i
iiii
1
XˆYˆ
X)X(
XYXY
ˆ
![Page 20: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/20.jpg)
Linear Regression in Python
import scipy.stats as stats
slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)
![Page 21: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/21.jpg)
Linear Regression Example
Linear, Strong
Resi
duals
x=np.linspace(-1,1,points)y=x+0.1*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept
fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear.png')
fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-residuals.png')
![Page 22: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/22.jpg)
Linear Regression Example
x=np.linspace(-1,1,points)y=x+0.4*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept
fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear-weak.png')
fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-weak-residuals.png')
Linear, Weak
Resi
duals
![Page 23: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/23.jpg)
Linear Regression Example
Outlier
![Page 24: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/24.jpg)
Regression – Non-linear data
Solution 1: Transformation
Solution 2: Non-linear Regression
,...)ˆ,ˆ,f(XY 10ii
![Page 25: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/25.jpg)
Correlation Coefficient
22 )()(
))((
YYXX
YYXXr
ii
ii
• A measure of the correlation between the two variables
• Quantifies the association strength
Pearson correlation coefficient:
![Page 26: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/26.jpg)
Correlation Coefficient
![Page 27: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/27.jpg)
Correlation Coefficient
![Page 28: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/28.jpg)
Correlation Coefficient
![Page 29: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/29.jpg)
Correlation Coefficient
![Page 30: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/30.jpg)
Correlation Coefficient
![Page 31: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/31.jpg)
Correlation Coefficient
Source: Wikipedia
![Page 32: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/32.jpg)
Coefficient of Variation
n
ni
iix
1
xxx n,...,,21
Variance
Sample
Mean
n
i
ni
ix
1
2
2)(
Coefficient of Variation (CV)
![Page 33: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/33.jpg)
Correlation Coefficient and CV
Uniform distribution
![Page 34: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/34.jpg)
Correlation Coefficient and CV
Uniform distribution Normal distribution Lognormal distribution
![Page 35: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/35.jpg)
Correlation Coefficient - Outliers
Outlier
![Page 36: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/36.jpg)
Correlation Coefficient – Non-linear
Solutions:• Transformation• Rank correlation (Spearman, r=0.93)
![Page 37: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/37.jpg)
Correlation Coefficient and p-value
Hypothesis: Is there a correlation?
r r r
p p p
![Page 38: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/38.jpg)
Application: Analytical Measurements
Theoretical Concentration
Measu
red
C
on
cen
trati
on
![Page 39: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/39.jpg)
A Few Characteristics of Analytical Measurements
Accuracy: Closeness of agreement between a test result and an accepted reference value.
Precision: Closeness of agreement between independent test results.
Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature).
Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control.
Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy.
Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.
![Page 40: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/40.jpg)
Limit of Detection and Linearity
Theoretical Concentration
Measu
red
C
on
cen
trati
on
![Page 41: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/41.jpg)
Precision and Accuracy
Theoretical Concentration
Theoretical Concentration
Measu
red
C
on
cen
trati
on
Measu
red
C
on
cen
trati
on
![Page 42: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/42.jpg)
Summary - Regression
Source: http://xkcdsw.com/content/img/2274.png
![Page 43: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/43.jpg)
Summary - Correlation
![Page 44: Introduction to Biostatistics and Bioinformatics Regression and Correlation.](https://reader036.fdocuments.net/reader036/viewer/2022062304/56649f535503460f94c77761/html5/thumbnails/44.jpg)
Next Lecture: Experimental Design & Analysis
Experimental Design by Christine Ambrosinowww.hawaii.edu/fishlab/Nearside.htm