Post on 03-Jan-2016
Welcome to MM207
Unit 8 Seminar
Dr. R.
Correlation, Regression and Excel!
Correlation: the relationship between two variables, x and y.
• Scatter Plot– The x variable is on the horizontal axis– The y variable is on the vertical axis– The scatter plot is location for each x,y pair.
• Types of Relationships– Positive: both x and y move in the same direction– Negative: x and y move in opposite directions– Zero: no pattern of movement in x and y
2
Scatter Plot for Example 3 on Page 498
3
Calculating the Correlation Coefficient
• The correlation coefficient is a numerical measurement that assesses the strength and direction of the relationship between paired data.
• -1 ≤ r ≤ 1, the correlation coefficient is always in this interval
• r is the Pearson Product Moment correlation coefficient
• But I will use Excel, as usual!
2222 yynxxn
yxxynr
4
Using Excel for the Correlation
5
Compare this to the results shown on page 501
Statistical Significance of the Correlation Coefficient
• Table 11, page A28, gives the critical values for the correlation coefficient to be significant.
• If the value calculated is greater than the value in Table 11 the correlation is significant at either α = 0.05 or α = 0.01.
• In our example there are 25 pairs of data and the critical values are 0.396 [α = 0.05 ] and 0.505 [α = 0.01] and our value is 0.979. Our correlation is statistically significant at the 0.01 level. [page 503]
• Statistical significance means that the correlation is not 0 and that is all it means.
6
Linear Regression and Prediction
• Assume that there is a tennis club that will let anyone play tennis but there is a flat fee for using the facility and then a per-hour fee for using the tennis courts.
• Now, this club tells us that the fee to use the club is $10.00 and the per-hour cost is $5.00.
• Can you calculate the cost of playing 2 hours of tennis?
• Can you estimate the cost to play 10 hours of tennis?
7
Linear Regression and Prediction
• Now to do find out how much it will cost to play tennis all you did was take $10 plus $5 times the number of hours, right?
• I might write this as Cost = rate (hours) +base cost; which looks a lot like:
– Yhat = mx + b [page 514]
• The formula to get m and b look like this:
– m = (n ∑xy - (∑x)(∑y) ) / (n ∑x2 – (∑x)2 )
– b = ybar – m * xbar
• But, I will be using Excel.
8
Using Scatter Plot for the Regression Line
9
•Right click on a dot•Select “Format Trendline•Type is Linear•Click Display Equation on chart•Click Display R-squared on chart
Using Scatter Plot for the Regression Line
10
Using Regression for the Regression Line
11
Using Regression for the Regression Line
12
Using the Regression Line to make a Prediction
• If Old Faithful erupts for 3.32 minutes how long do we predict before Old Faithful erupts again?
• From either procedure the equation is:
Y = 12.481 (x) + 33.683
Y = 12.481(3.32) + 33.683Y = 41.44 + 33.683
• Y = 75.12 minutes • We predict it will be 75.12 minutes until Old Faithful
erupts after a 3.32 eruption duration.
13
Regression by Calculator
• Example 1, Page 514.
• Suppose we want to find a regression line for advertising expenses versus Company sales.
• We have the following data.
14
Expenses 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2
Sales 225 184 220 240 180 184 186 215
Regression Analysis
x. y x*y x^2 y^2
2.4 225 540.0 5.76 50,625
1.6 184 294.4 2.56 33,856
2.0 220 440.0 4.00 48,400
2.6 240 624.0 6.76 57,600
1.4 180 252.0 1.96 32,400
1.6 184 294.4 2.56 33,856
2.0 186 372.0 4.00 34,596
2.2 215 473.0 4.84 46,225
Σx = 15.8 Σy = 1634 Σx*y=3289.8 Σx^2=32.44 Σy^2=337,558
15
Regression
• m=(n*Σxy-(Σx)*(Σy))/(n*Σx^2-(Σx)^2)• m=(8*(3289.8)-(15.8)*(1634))/(8*(32.44)-(15.8)^2)• m=50.728745
• b = (Σy – mΣx)/n• b = (1634 – (50.728745)*(15.8))/8 = 104.60729
• Thus,• y = (50.728745)*x + 104.60729
16
Final Graph with Equation
17
18
(Thanks to Freakonomics by Steven D. Levitt http://freakonomics.blogs.nytimes.com)
HAVE A GREAT WEEK, EVERYONE!!