Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. ·...

61
Section 12.1 The Simple Regression Model 1/61

Transcript of Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. ·...

Page 1: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Section 12.1The Simple Regression Model

1/61

Page 2: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

A Motivating ExampleA Motivating ExampleVisual and musculoskeletal problems associated with the use of videodisplay terminals (VDTs) have become rather common in recentyears. Some researchers have focused on vertical gaze direction as asource of eye strain and irritation. This direction is known to beclosely related to ocular surface area (OSA), so a method ofmeasuring OSA is needed. The accompanying representative data ony = OSA (cm2) and x = width of the palpebral fissure (i.e., thehorizontal width of the eye opening, in cm) is from the article“Analysis of Ocular Surface Area for Comfortable VDT WorkstationLayout” (Ergonomics, 1996: 877–884). The order in whichobservations were obtained was not given, so for convenience they arelisted in increasing order of x values.

2/61

Page 3: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Original Data

3/61

Page 4: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Scatterplot of Data

4/61

Page 5: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Another ExampleAnother ExampleForest growth and decline phenomena throughout the world haveattracted considerable public and scientific interest. The article“Relationships Among Crown Condition, Growth, and Stand Nutritionin Seven Northern Vermont Sugarbushes” (Canad. J. Forest Res.,1995: 386–397) included a scatter plot of y = mean crown dieback(%), one indicator of growth retardation, and x = soil pH (higher pHcorresponds to more acidic soil), from which the following observations were taken:

5/61

Page 6: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Scatterplot of Data

6/61

Page 7: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

A Linear Probabilistic Model A Linear Probabilistic Model

, where

Or in another expression

, independently

Graphically

7/61

Page 8: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Understanding linear line and

1) The population regression line is the line of mean Y values given fixed .

2) The second sequence of equalities tells us that the amount ofvariability in the distribution of Y is the same at any particular x valueas it is at any other x value— this is the property of homogeneousvariation about the population regression line.

8/61

Page 9: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Understanding linear line and

9/61

Page 10: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Section 12.2 Estimating Model Parameters

10/61

Page 11: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Estimating Model Parameters Estimating Model Parameters

Intuition

11/61

Page 12: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

PRINCIPLE OF LEAST SQUARES PRINCIPLE OF LEAST SQUARES

12/61

Page 13: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Obtaining Least Square EstimatorsObtaining Least Square Estimators

13/61

Page 14: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

14/61

Page 15: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleGlobal warming is a major issue, and CO2 emissions are an importantpart of the discussion. What is the effect of increased CO2 levels on the environment? In particular, what is the effect of these higher levelson the growth of plants and trees? The article “Effects of AtmosphericCO2 Enrichment on Biomass Accumulation and Distribution in Eldarica Pine Trees” (J. Experiment. Botany, 1994: 345–349) describes the results of growing pine trees with increasing levels of CO2 in the air. There were two trees at each of four levels of CO2 concentration, and the mass of each tree was measured after 11 months of the experiment. Here are the observations with x = atmospheric concentration of CO2 (mL/L, or ppm) and y = tree mass

(kg), along with and . The mass measurements were read from a graph in the article.

15/61

Page 16: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Data and Estimates of Coefficients

16/61

Page 17: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Scatterplot with Estimated Regression Line

17/61

Page 18: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Estimating Estimating

Understanding

18/61

Page 19: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Fitted Values and ResidualsFitted Values and Residuals

19/61

Page 20: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

SSE and Estimator for SSE and Estimator for

A Short-cut Formula for Computing SSE

Note: Using the above short-cut formula, the numbers of digits in

and must be much larger than the number of digits in . Otherwise, large round-off error would occur.

20/61

Page 21: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleThe article “Promising Quantitative Nondestructive EvaluationTechniques for Composite Materials” (Materials Eval., 1985: 561–565) reports on a study to investigate how the propagation of anultrasonic stress wave through a substance depends on the propertiesof the substance. The accompanying data on fracture strength (x, as apercentage of ultimate tensile strength) and attenuation (y, inneper/cm, the decrease in amplitude of the stress wave) in fiberglass-reinforced polyester composites was read from a graph that appearedin the article. The simple linear regression model is suggested by thesubstantial linear pattern in the scatter plot.

21/61

Page 22: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Estimating Regression Line and

22/61

Page 23: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

The Coefficient of Determination The Coefficient of Determination

Different Strength of Linear Effects

23/61

Page 24: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

SSR = SST - SSE

24/61

Page 25: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

The Coefficient of Determination The Coefficient of Determination

25/61

Page 26: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExample

26/61

Page 27: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Reading Outputs of Statistical ProgramReading Outputs of Statistical ProgramAn example of MINITAB Results. R outputs similarly.

27/61

Page 28: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Section 12.3 Inferences About the RegressionCoefficient

28/61

Page 29: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Simulated Estimates of Simulated Estimates of Look at R simulated linear lines.

Another one

29/61

Page 30: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Expressing Expressing as a linear function of as a linear function of

30/61

Page 31: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Sampling Distribution of Sampling Distribution of

Proof: on blackboard.

31/61

Page 32: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Sampling Distribution of Sampling Distribution of TT

Proof:

32/61

Page 33: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

A Confidence Interval for A Confidence Interval for

A upper or lower bound for is:

33/61

Page 34: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleIs it possible to predict graduation rates from freshman test scores?Based on the aver- age SAT score of entering freshmen at a university,can we predict the percentage of those freshmen who will get a degreethere within six years? We use a random sample of 20 universitiesfrom the 248 national universities listed in the 2005 edition ofAmerica’s Best Colleges, published by U.S. News & World Report.

34/61

Page 35: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Scatterplot

35/61

Page 36: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

36/61

Page 37: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

A Closer Look at the Dataset

37/61

Page 38: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Hypothesis-Testing Procedures Hypothesis-Testing Procedures

38/61

Page 39: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleIn the previous SAT score example, we want to test:

vs

39/61

Page 40: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Regression and ANOVARegression and ANOVATesting: vs

SSR = SST - SSE

When is true, .

40/61

Page 41: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Reading SAS Outputs of Regression AnalysisReading SAS Outputs of Regression AnalysisFor SAT score example:

41/61

Page 42: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Section 12.4: Inferences Concerning and thePrediction of Future Y Values

42/61

Page 43: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Sampling Distribution of Predicted Mean of Sampling Distribution of Predicted Mean of YY

43/61

Page 44: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Proof:

More on blackboard.

44/61

Page 45: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Inferences of Mean of Inferences of Mean of YY given given

45/61

Page 46: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleRefer to the SAT score example.

, .

Let’s now calculate a confidence interval, using a 95% confidence level, for the mean graduation rate for all universities having an average freshman SAT of 1200—that is, a confidence interval for

.

The interval is centered at

46/61

Page 47: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Results of of CI

47/61

Page 48: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

A Prediction Interval for a Future Value of A Prediction Interval for a Future Value of Y Y

Mean and Variance

48/61

Page 49: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Sampling Distribution

:

The interpretation of the prediction level is that if the above PI is used repeatedly, in the long run the resulting intervals will actually contain the observed y values 100(1- α )% of the time.

49/61

Page 50: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleFor SAT example. Let's calculate a 95% prediction interval for agraduation rate that would result from selecting a single universitywhose average SAT is 1200. Relevant quantities from that exampleare

The t critical value is 2.101. The 95% prediction interval is:

50/61

Page 51: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Section 12.5 Correlation

51/61

Page 52: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Definition of sample correlation coefficient Definition of sample correlation coefficient

52/61

Page 53: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleAn accurate assessment of soil productivity is critical to rational land-use planning. Unfortunately, as the author of the article “ProductivityRatings Based on Soil Series” (Prof. Geographer, 1980: 158 –163)argues, an acceptable soil productivity index is not so easy to comeby. One difficulty is that productivity is determined partly by whichcrop is planted, and the relationship between yield of two differentcrops planted in the same soil may not be very strong. To illustrate,the article presents the accompanying data on corn yield x and peanutyield y (mT/ha) for eight different types of soil.

53/61

Page 54: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Calculating r

54/61

Page 55: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Properties of rProperties of r

55/61

Page 56: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Examples of Different Examples of Different rr

56/61

Page 57: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Rules of Thumb To State Strength of Linear RelationshipsRules of Thumb To State Strength of Linear Relationships

A frequently asked question is, “When can it be said that there is astrong correlation between the variables, and when is the correlationweak?” A reasonable rule of thumb is to say that the correlation is

• weak if 0 < r < 0.5,

• strong if .8 < r< 1, and

• moderate otherwise.

It may surprise you that r = 0.5 is considered weak, but r2 = .25implies that in a regression of y o n x, only 25% of observed yvariation would be explained by the model.

57/61

Page 58: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Correlation CoefficientCorrelation Coefficient

r is an estimate (observation) of the population parameter . The

random variable R is a function of both X iand Y i .

58/61

Page 59: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Sampling Distribution of Sampling Distribution of RRAssuming that ( ) has a bivariate normal distribution.

Proof: The T defined here is equivalent to the T defined in Slide 32,where we have shown that T|X has distribution for each X . Itfollows that the marginal distribution of T has distribution too.

59/61

Page 60: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

ExampleExampleNeurotoxic effects of manganese are well known and are usuallycaused by high occupational exposure over long periods of time. Inthe fields of occupational hygiene and environmental hygiene, therelationship between lipid peroxidation, which is responsible fordeterioration of foods and damage to live tissue, and occupationalexposure had not been previously reported. The article “LipidPeroxidation in Workers Exposed to Manganese” (Scand. J. WorkEnviron. Health, 1996: 381–386) gave data on x manganeseconcentration in blood (ppb) and y concentration (μ mol/L) ofmalondialdehyde, which is a stable product of lipid peroxidation, bothfor a sample of 22 workers exposed to manganese and for a controlsample of 45 individuals. The value of r =0.29, from which

The p-value for two-tailed test = 0.052.

60/61

Page 61: Section 12.1 The Simple Regression Modellonghai/teaching/2019/stat245... · 2019. 12. 18. · Understanding linear line and 1) The population regression line is the line of mean Y

Further Courses for Regression Analysis:

STAT 344: Applied Regression Analysis

Talking about regression on multiple inputs, checking models, etc.

STAT 443: Linear Models

Talking about the sampling distributions in rigorous manners.

61/61