Names: Linear Regress and Correlation Coefficient

14
Names:______________________________________________ Linear Regress and Correlation Coefficient The Correlation Coefficient, denoted by the letter r, is a number from -1 to 1 that measures the strength and direction of the correlation between two variables. It is used to measure the “goodness” of the fit line. The mathematical formula for computing r is: where n is the number of pairs of data. (But don’t worry; we will use a graphing calculator instead). The Correlation Coefficient measures the strength and direction of the fit line. A positive r value tells us there is a positive correlation and a negative r value tells us there is a negative correlation. A value of 0 means there is no correlation. A line with a perfect fit would have an r value of 1 or -1. . 75 ≤≤ 1 indicates a Strong fit line . 25 < .75 indicates a Moderate fit line 0< < .25 indicates a weak fit line Strong No Correlation Strong 1. Which correlation coefficient would indicate a strong negative relationship between the number of text messages sent and the age of the sender? a. -0.95 b. -0.82 c. 0.05 d. 0.28 2. The correlation coefficient that models the relationship between the amount of time Jason spends working out and the amount of weight he loses is -0.9547. What is the correct interpretation of this number? a. The correlation coefficient is close to 0. There is a weak positive correlation between the amount of time spent working out and the amount of weight lost. b. The correlation coefficient is close to -1. There is a weak negative correlation between the amount of time spent working out and the amount of weight lost. c. The correlation coefficient is close to -1. The correlation between the amount of time spent working out and the amount of weight lost cannot be determined. d. The correlation coefficient is close to -1. There is a strong negative correlation between the amount of time spent working out and the amount of weight lost.

Transcript of Names: Linear Regress and Correlation Coefficient

Page 1: Names: Linear Regress and Correlation Coefficient

Names:______________________________________________ Linear Regress and Correlation Coefficient

The Correlation Coefficient, denoted by the letter r, is a number from -1 to 1 that measures the strength and direction of the correlation between two variables. It is used to measure the “goodness” of the fit line. The mathematical formula for computing r is:

where n is the number of pairs of data. (But don’t worry; we will use a graphing calculator instead).

The Correlation Coefficient measures the strength and direction of the fit line. A positive r value

tells us there is a positive correlation and a negative r value tells us there is a negative

correlation. A value of 0 means there is no correlation. A line with a perfect fit would have an r value of 1 or -1. . 75 ≤ 𝑟 ≤ 1 indicates a Strong fit line . 25 ≤ 𝑟 < .75 indicates a Moderate fit line 0 < 𝑟 < .25 indicates a weak fit line

Strong No Correlation Strong

1. Which correlation coefficient would indicate a strong negative relationship between the number of

text messages sent and the age of the sender? a. -0.95 b. -0.82 c. 0.05 d. 0.28

2. The correlation coefficient that models the relationship between the amount of time Jason spends working out and the amount of weight he loses is -0.9547. What is the correct interpretation of this number?

a. The correlation coefficient is close to 0. There is a weak positive correlation between the amount of time spent working out and the amount of weight lost.

b. The correlation coefficient is close to -1. There is a weak negative correlation between the amount of time spent working out and the amount of weight lost.

c. The correlation coefficient is close to -1. The correlation between the amount of time spent working out and the amount of weight lost cannot be determined.

d. The correlation coefficient is close to -1. There is a strong negative correlation between the amount of time spent working out and the amount of weight lost.

Page 2: Names: Linear Regress and Correlation Coefficient

3. What type of correlation would you expect between a company’s advertising budget and its volume of sales? Why?

a. 0; relatively no correlation b. The correlation cannot be predicted c. Positive; advertising increases, sales increase. d. Negative; fewer sales, less money for advertising.

4. Victor, Vladimir, Venus, and Vivian each have a different set of data points. Each used the linear regression feature of the graphing calculator to find a linear function that models his/her data. The value of the correlation coefficient (r) associated with Victor’s function was -0.91, the value or r for Vladimir’s function was 0.73, the value of r for Venus’s function was -0.44, and the value of r for Vivian’s function was 0.88. Who has the BEST model for his or her data?

a. Venus b. Victor c. Vivian d. Vladimir

Creating a Residual Plot A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. A residual Plot is used to analyze a line of best fit. Points that are randomly dispersed tightly around the horizontal axis indicate a good linear fit. Points that are random, but loose indicate a weak fit. Points that are non-random and form a u-shape suggests a better fit would be non-linear.

Page 3: Names: Linear Regress and Correlation Coefficient

Linear Regression Linear Regression (also called least squares linear regression) is a method for predicting the value of a dependent variable, y, based on the value of an independent variable, x.

Page 4: Names: Linear Regress and Correlation Coefficient

Now let’s use the graphing calculator. First, let’s clear any data left in the calculator. Click y= ( on top left) Delete any lines that are written. Highlight Plot1 (move your cursor over Plot 1 and press enter)

Page 5: Names: Linear Regress and Correlation Coefficient

And enter

(option 9)

• If your calculator does not show 𝑟2 and 𝑟, come see me to have your diagnostics turned on.

Page 6: Names: Linear Regress and Correlation Coefficient
Page 7: Names: Linear Regress and Correlation Coefficient

Practice questions

1.

2.

3.

Page 8: Names: Linear Regress and Correlation Coefficient

4.

a. c.

b. d.

Page 9: Names: Linear Regress and Correlation Coefficient

5. The population of a bee hive is recorded monthly over a period of a year. The function 𝑦 = 46𝑥 − 40

is determined to be a good fit for the data. The actual bee population recorded at the 7 month mark was 238. What is the residual for 𝑥 = 7?

a. -44 b. -40 c. 40 d. 44 6.

a. Perfect correlation b. High correlation c. Low correlation d. No correlation

7.

Page 10: Names: Linear Regress and Correlation Coefficient

Correlation vs Causation

Correlation is used in statistics to represent the strength and direction of a linear relationship between two random variables. A scatter plot is a graphical representation of data that shows different types of correlations. Sometimes the correlation between two events can seem directly linked, but in reality, the two situations do not impact each other. Causation is a link between variables so that a change in one variable is believed to produce the change in the other variable. A correlation between two variables does not imply a causation. It is possible that a common, outside factor, called a confound, might produce the relationship indicated by the strong correlation.

For example:

According to this chart, as the number of pirates increases, Global warming increases therefore concluding that pirates cause global warming, which is clearly false.

Correlation does not necessarily mean causation Even though there is a positive correlation between the two variables, there is no causation.

Page 11: Names: Linear Regress and Correlation Coefficient

TargetStrategies® © 2008 Evans Newton Incorporated AR04MA1050512-12 Last printed 10/31/08

Name ________________________________________________________________________

Data Correlation

Directions: Read each question. Circle the letter that contains the correct answer to the question or complete the problem in the space provided. The following graph depicts the results of a class on a test measuring the number of problems missed with the number of hours studied. 1. The data represents which type of correlation?

A. Positive B. Negative C. Weak D. None

2. Which statement can be concluded about the data?

A. Studying more hours causes fewer missed problems. B. There is a negative correlation between the number of hours studied and the number of

questions missed. C. Studying causes great test scores. D. There is a causation but not a correlation between hours studied and the number of

questions missed.

Num

ber

Mis

sed

Hours Studying

Page 12: Names: Linear Regress and Correlation Coefficient

TargetStrategies® © 2008 Evans Newton Incorporated AR04MA1050512-13 Last printed 10/31/08

A survey of 100 families in each of 22 cities was taken to measure the average summer temperature and the percentage of residents that had pools in their yards. The results are shown below.

3. The data represents which type of correlation?

A. Positive B. Negative C. Weak D. Zero

4. Which statement can be concluded about the data?

A. There is a causation between the number of homes with pools and the average summer temperature.

B. There is no correlation between the number of homes with pools and the average summer temperature.

C. Hot temperatures cause people to build more pools. D. There is a correlation but not necessarily causation between the number of homes with

pools and the average summer temperature. 5. Jason concluded that more pools are caused by hotter weather. This is incorrect for what

reason?

A. The data shows no correlation. B. The relationship could be related to outside factors, such as the cost of building pools. C. There is a causation but no correlation between the variables. D. The graph shows that higher temperatures cause fewer pools to be built.

Hom

es w

ith P

ools

(p

erce

nt)

Average Summer Temperature (in °F)

Page 13: Names: Linear Regress and Correlation Coefficient

TargetStrategies® © 2008 Evans Newton Incorporated AR04MA1050512-14 Last printed 10/31/08

At a ten year reunion, the teachers took a survey of the students who took automotive class their senior year. They compared the students’ current average salary and their grade in auto class during their senior year. The results were plotted on the following graph. 6. The data represents which type of correlation?

A. Positive B. Negative C. None D. Can’t be determined

7. Which statement can be concluded about the data?

A. Students with a higher salary had lower grades in auto class. B. There is no correlation between the average salary and grade in auto class. C. There is a negative correlation. D. There is a correlation but not necessarily causation between the average salary and

grade in auto class. 8. Which of the following statements about the data is correct?

A. There is a causation but no correlation. B. There is a correlation but no causation. C. There is both correlation and causation. D. There is neither correlation nor causation.

Ave

rage

Sal

ary

(tho

usan

ds o

f $)

Grade in Auto Class (percent)

Page 14: Names: Linear Regress and Correlation Coefficient

TargetStrategies® © 2008 Evans Newton Incorporated AR04MA1050512-15 Last printed 10/31/08

Sam did research on the weight of two newborn babies, one who was fed formula and one who was fed juice. The results are represented on the following graph. 9. Describe the relationship between the diet and weight of each baby. Then explain what the

graph seems to indicate about the relationship between diet and weight. 10. Explain the meaning of the following expression: “Correlation does not imply causation.”

Provide a real-world example to support your explanation.

Wei

ght o

f Bab

y (p

ound

s)

Week

Formula

Juice