1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Powerpoint - Regression and Correlation Analysis
-
Upload
vivay-salazar -
Category
Documents
-
view
1.494 -
download
0
Transcript of Powerpoint - Regression and Correlation Analysis
![Page 2: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/2.jpg)
Correlation Analysis
• A measure of association between two numerical variables.
• Example (positive correlation)o As soil fertility increases, rice grain yield
also increases
IRRI-PBGB-CRIL 2
also increases
![Page 3: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/3.jpg)
Example
For seven randomly selected plots,
Nitrogen Content (%)
Grain Yield (kg/ha)
0.12 16520.14 2056
IRRI-PBGB-CRIL 3
selected plots, nitrogen content in the soil and the grain yield were recorded.
0.15 25980.16 27340.19 32380.22 48240.23 4858
![Page 4: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/4.jpg)
How would you describe the graph?
Grain Yield of Rice at differnt levels of Soil Nitrogen Content
4000
5000
6000
Grain Yield (kg/ha)
IRRI-PBGB-CRIL 4How “strong” is the linear relationship?
1000
2000
3000
4000
0.1 0.15 0.2 0.25
Nitrogen Content (%)
Grain Yield (kg/ha)
![Page 5: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/5.jpg)
Measuring the Relationship
Pearson’s Sample Correlation Coefficient, r
measures the direction and the
IRRI-PBGB-CRIL 5
measures the direction and the strength of the linear association between two numerical paired variables.
![Page 6: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/6.jpg)
Direction of Association
Positive Correlation Negative Correlation
IRRI-PBGB-CRIL 6
![Page 7: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/7.jpg)
Strength of Linear Association
r value Interpretation
IRRI-PBGB-CRIL 7
1 perfect positive linear relationship
0 no linear relationship
-1 perfect negative linear relationship
![Page 8: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/8.jpg)
Strength of Linear Association
No Linear CorrelationNo Linear CorrelationNo Linear CorrelationNo Linear Correlation
Perfect Linear Positive Perfect Linear Positive Perfect Linear Positive Perfect Linear Positive CorrelationCorrelationCorrelationCorrelation
IRRI-PBGB-CRIL 8
No Linear CorrelationNo Linear CorrelationNo Linear CorrelationNo Linear Correlation
![Page 9: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/9.jpg)
Other Strengths of Association
r value Interpretation
0.9 strong association
IRRI-PBGB-CRIL 9
0.9 strong association
0.5 moderate association
0.25 weak association
![Page 10: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/10.jpg)
Other Strengths of Association
Strong Positive Linear Strong Positive Linear Strong Positive Linear Strong Positive Linear CorrelationCorrelationCorrelationCorrelation
Moderate Negative Moderate Negative Moderate Negative Moderate Negative Linear CorrelationLinear CorrelationLinear CorrelationLinear Correlation
IRRI-PBGB-CRIL 10
Linear CorrelationLinear CorrelationLinear CorrelationLinear Correlation
![Page 11: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/11.jpg)
Formula
= the sum
IRRI-PBGB-CRIL 11
x
= the sumn = number of paired
itemsxi = input variable yi = output variable
= x-bar = mean ofx’s
= y-bar = mean ofy’s
sx= standard deviation of x’s
sy= standard deviation of y’s
y
![Page 12: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/12.jpg)
Correlation Coefficient (r)
r=0 does not necessarily mean no relationship. Relationship may be
IRRI-PBGB-CRIL 12
relationship. Relationship may be nonlinear.
![Page 13: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/13.jpg)
Correlation Coefficient
IRRI-PBGB-CRIL 13
![Page 14: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/14.jpg)
Correlation Coefficient (r)
A significant r does not necessarily mean a strong linear relationship
IRRI-PBGB-CRIL 14
![Page 15: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/15.jpg)
Correlation Coefficient
350
400
450
500
r = .25**n = 234
When no. of observations is
IRRI-PBGB-CRIL 15
100
150
200
250
300
0 5 10 15 20
Tiller/plant
Yield/plot observations is
large, a low r-value may still be significant.
![Page 16: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/16.jpg)
Correlation Coefficient (r)
To be able to conclude that 2 variables have a strong linear relationship, r should be both high and significant
IRRI-PBGB-CRIL 16
and significant
![Page 17: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/17.jpg)
Correlation Coefficient
4
5
6Yield (t/ha)
r = .90**n = 60
IRRI-PBGB-CRIL 17
0
1
2
3
20 30 40 50 60 70 80 90 100 110
No. of spikelet/panicle
Yield (t/ha)
![Page 18: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/18.jpg)
Test of significance for rDegrees of Freedom Probability, p
0.05 0.01 0.001
1 0.997 1.000 1.000
2 0.950 0.990 0.999
3 0.878 0.959 0.991
4 0.811 0.917 0.974
5 0.755 0.875 0.951
6 0.707 0.834 0.925
7 0.666 0.798 0.898
r is significant if the absolute value is greater that the tabular
IRRI-PBGB-CRIL 18
7 0.666 0.798 0.898
8 0.632 0.765 0.872
9 0.602 0.735 0.847
10 0.576 0.708 0.823
11 0.553 0.684 0.801
12 0.532 0.661 0.780
13 0.514 0.641 0.760
14 0.497 0.623 0.742
15 0.482 0.606 0.725
16 0.468 0.590 0.708
17 0.456 0.575 0.693
18 0.444 0.561 0.679
19 0.433 0.549 0.665
20 0.423 0.457 0.652
value is greater that the tabular value.
![Page 19: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/19.jpg)
CORRELATION ANALYSIS
PEARSON CORRELATION ANALYSIS Nitrogen.Content Grain.Yield
Nitrogen.Content Coef 1 0.99 P-value 1 1e-04
Grain.Yield Coef 0.99 1
IRRI-PBGB-CRIL 19
Grain.Yield Coef 0.99 1 P-value 1e-04 1
![Page 20: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/20.jpg)
Regression Analysis
IRRI-PBGB-CRIL 20
Regression Analysis
![Page 21: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/21.jpg)
What is the growth rate of a rice plant?
Growth rate can be defined as the change in heightper unit of time.
Scientific Question
IRRI-PBGB-CRIL 21
per unit of time.
![Page 22: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/22.jpg)
Data Collection
DAS Height (cm)
0 0
10 12
30 55
IRRI-PBGB-CRIL 22
60 80
90 110
![Page 23: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/23.jpg)
Statistical Questions• What is the relationship
between age and height?Linear
• How do I describe or quantify the relationship?
60
80
100
120
Plant Height (cm)
IRRI-PBGB-CRIL 23
quantify the relationship?Regression
• Is the association significant?Statistical Test
0
20
40
60
0 20 40 60 80 100
Days after Seeding
Plant Height (cm)
![Page 24: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/24.jpg)
Linear Regression
• A general method for estimating or describing association between a continuous outcome variable
IRRI-PBGB-CRIL 24
continuous outcome variable (dependent) and one or multiple predictors in one equation.
o One predictor: Simple linear regressiono Multiple predictors: Multiple linear regression
![Page 25: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/25.jpg)
Statistical Model
52
54
56
Y
Data = Model Fit + Residual
YY ε+= ˆ
IRRI-PBGB-CRIL 25
46
48
50
52
X
Y iii YY ε+= ˆ
ii XY 10ˆ ββ +=
Intercept Slope
Yi = µ + α i + εi
![Page 26: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/26.jpg)
Least Squares Estimates
iii YY ε+= ˆ ii XY 10ˆ ββ +=
To estimate the intercept and slope, minimize residual sum of squares (RSS)
IRRI-PBGB-CRIL 26
RSS = εi2 =∑ (Yi − ˆ Y i)
2 =∑ (Yi − β0 − β1X i)2∑
∂RSS∂β0
=(Yi − β0 − β1X i)
2∑∂β0
= −2 (Yi − β0 − β1X i)∑ = 0
==> ˆ β 0 = Y − ˆ β 1X
∂RSS∂β1
=(Yi −Y + β1X − β1X i)
2∑∂β1
= −2 (X i − X )(Yi −Y + β1X − β1X i)∑ = 0
==> ˆ β 1 =(X i − X )(Yi −Y )∑(X i − X )
2∑
We don’t have to do the estimation by hand. R/CropStat or other statistical packages can do the work for us.
![Page 27: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/27.jpg)
LINEAR REGRESSION ANALYSISDependent Variable: Height
Analysis of Variance SV Df Sum Square Mean Square F value Pr (>F) DAS 1 8201.389781 8201.389781 95.435198 0.002279Residuals 3 257.810219 85.93674
Model Summary R Squared 0.969523
IRRI-PBGB-CRIL 27
R Squared 0.969523 Adj. R Squared 0.959364
Parameter Estimates Parameter Estimate Std. Error t value Pr (> |t|)
(Intercept) 4.912409 6.311259 0.778356 0.493109DAS 1.223358 0.125227 9.769094 0.002279
![Page 28: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/28.jpg)
Example: Growth Rate Data
Parameter Estimates Parameter Estimate Std. Error t value Pr (> |t|) (Intercept) 4.912409 6.311259 0.778356 0.493109DAS 1.223358 0.125227 9.769094 0.002279
IRRI-PBGB-CRIL 28
Intercept: The height at age 0 is 4.9 cm.Slope: The height increase per day after seeding is 1.223 cm.
Height =4.9+ 1.223DAS r = 0.98
0
20
40
60
80
100
120
140
0 20 40 60 80 100
Days after Seeding
Plant H
eight (cm)
![Page 29: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/29.jpg)
Prediction
Given the regression line, it can be predicted that the height at 40 days after
Height =4.9+ 1.223DAS r = 0.98
80
100
120
140
Plant Height (cm)
IRRI-PBGB-CRIL 29
height at 40 days after seeding will be 53.8 cm.
0
20
40
60
80
0 20 40 60 80 100
Days after Seeding
Plant Height (cm)
![Page 30: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/30.jpg)
Example: Growth Rate Data Analysis of Variance SV Df Sum Square Mean Square F value Pr (>F) DAS 1 8201.389781 8201.389781 95.435198 0.002279Residuals 3 257.810219 85.93674
Model Summary R Squared 0.969523
IRRI-PBGB-CRIL 30
R Squared 0.969523 Adj. R Squared 0.959364
∑ ∑∑∑ −+−=−+−=− 2222 )ˆ()ˆ()ˆˆ()( iiiiiii YYYYYYYYYY
SST SSM SSE
Sums of Squares
Degrees of freedomn-1 1 n-2
∑∑
−
−== 2
22
)(
)ˆ(
YY
YY
SSTSSM
Ri
i R2 is the fraction of variation in Y explained by X.
![Page 31: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/31.jpg)
Linear Regression vs. ANOVA
ANOVADependent: ContinuousIndependent: Categorical
Linear regressionDependent: ContinuousIndependent: Continuous
IRRI-PBGB-CRIL 31
Linear models
ANOVA and regression are the same thing!!!
![Page 32: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/32.jpg)
Misuse of Regression and Correlation Analysis
• Performing regression and correlation on spurious data could give significant results. But this is not a valid indication of a linear relationship.
IRRI-PBGB-CRIL 32
![Page 33: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/33.jpg)
Misuse of Regression and Correlation Analysis
• Extrapolation of resultso scope of data is extended. Example
§ If the relationship of yield IR8 and stemborer incidence is extended to cover all rice varieties
IRRI-PBGB-CRIL 33
incidence is extended to cover all rice varieties§ If the relationship between grain yield and protein
content from varietal trials is assumed to be applicable to other types of experiments such as fertilizer trials
o functional relationship is assumed to hold beyond the range of X values tested
![Page 34: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/34.jpg)
Misuse of Regression and Correlation Analysis
y = 23.751x + 4307.2r = 0.987**9000
10000
11000
There is no evidence if a linear relationship still holds
IRRI-PBGB-CRIL 34
4000
5000
6000
7000
8000
0 30 60 90 120 150 180 210 240
N-rate (kg/ha)
Grain Yield (kg
/ha) linear relationship still holds
above N = 180 kg/ha
![Page 35: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/35.jpg)
Coefficient of Determination (R2)
• Percentage of the total variation that is explained by the linear function.
IRRI-PBGB-CRIL 35
For example, with an R2 value of 0.64, the implication is 64% [(0.64)(100) = 64] of the variation in the variable Y can be explained by the linear function of the variable X.
![Page 36: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/36.jpg)
Problems with R2
• R2 tends to increase as additional variables are included to a regression equation, regardless of their true importance in determining the values of the dependent variable
The adjusted R2 (Ra2) compensates for this effect
IRRI-PBGB-CRIL 36
• Gives no information on the appropriateness of the model
iablestindependenofnop
nsobservatioofnonwhere
Rpn
nRa
var.
.
)1()1(
11 22
=
=
−+−
−−=
The adjusted R2 (Ra2) compensates for this effect
![Page 37: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/37.jpg)
Problems with R2
IRRI-PBGB-CRIL 37
Curvilinear data fitted by a straight line with high R2
Segregated data fitted by a straight line with high R2
For detecting these kinds of departures from the regression model there is no substitute to plotting the data
![Page 38: Powerpoint - Regression and Correlation Analysis](https://reader033.fdocuments.net/reader033/viewer/2022051013/5475a9adb4af9fbe0a8b5cea/html5/thumbnails/38.jpg)
Thank you!
IRRI-PBGB-CRIL 38