Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK)...
-
Upload
alisha-riley -
Category
Documents
-
view
219 -
download
0
description
Transcript of Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK)...
Free Powerpoint Templates
ROHANA BINTI ABDUL HAMIDINSTITUT E FOR ENGINEERING MATHEMATICS (IMK)UNIVERSITI MALAYSIA PERLIS
CHAPTER
5INTR
ODUCTION TO
LINEAR
REGRESSION
Regression – is a statistical procedure for establishing
the relationship between 2 or more variables.
This is done by fitting a linear equation to the observed data.
The regression line is then used by the researcher to see the trend and make prediction of values for the data.
There are 2 types of relationship: Simple ( 2 variables) Multiple (more than 2 variables)
5.1 INTRODUCTION TO REGRESSION
is an equation that describes a dependent variable (Y) in
terms of an independent variable (X) plus random error ε
where, = intercept of the line with the Y-axis = slope of the line = random error Random error, is the difference of data point from the
deterministic value. This regression line is estimated from the data collected
by fitting a straight line to the data set and getting the equation of the straight line,
THE SIMPLE LINEAR REGRESSION MODEL
XY 10
01
XY 10ˆˆˆ
Example 5.1: (Determine independent, X and dependent variable, Y)
1) A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight.a) X is the carbohydrate intake (independent variable).b) Y is the weight (dependent variable).
2) An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume.a) X is the cost (independent variable)b) Y is sales volume (dependent variable)
5.2 SCATTER PLOT
A scatter plot is a graph or ordered pairs (x,y).
The purpose of scatter plot – to describe the nature of the relationships between independent variable, X and dependent variable, Y in visual way.
The independent variable, x is plotted on the horizontal axis and the dependent variable, y is plotted on the vertical axis.
Positive linear relationship
SCATTER DIAGRAM
EE((yy))
xx
InterceptIntercept00
Regression lineRegression line
Slope Slope 11is positiveis positive
Negative linear relationship
SCATTER DIAGRAM
EE((yy))
xx
00InterceptIntercept
Regression lineRegression line
Slope Slope 11is negativeis negative
No relationship
SCATTER DIAGRAM
EE((yy))
xx
00InterceptIntercept
Regression lineRegression line
Slope Slope 11is 0is 0
A linear regression can be develop by freehand plot
of the data.Example 5.2:
The given table contains values for 2 variables, X and Y. Plot the given data and make a freehand estimated regression line.
5.3 LINEAR REGRESSION MODEL
X -3 -2 -1 0 1 2 3Y 1 2 3 5 8 11 12
• The least squares method is commonly
used to determine values for and that ensure a best fit for the estimated regression line to the sample data points
• The straight line fitted to the data set is the line:
5.4 LEAST SQUARES METHOD
0 1
XY 10ˆˆˆ
LEAST SQUARES METHOD
y-Intercept for the Estimated Regression Equation,
is the mean of x is the mean of y
xy 10ˆˆ
yx
LEAST SQUARES METHOD
Slope for the Estimated Regression Equation,
xx
xy
SS
1
n
yxxyS xy
nxxS xx
22
nyyS yy
22
LEAST SQUARES METHOD
• Given any value of the predicted value of
the dependent variable can be found by
substituting into the equation
LEAST SQUARES METHOD
XY 10ˆˆˆ
ix
ixy
Example 5.2:
Suppose we take a sample of seven household from a low to moderate income neighborhood and collect information on their incomes and food expenditures for the past month. The information obtained (in hundreds of ringgit Malaysia) is given below. Find the least squares regression line of food expenditure (Y) on income (X)
Income Food expenditure
35 949 1521 739 1115 528 825 9
Solution:
64,212,7 yxn
646,7222,2150 22 yxxy
2857.307212
x
1429.9764
y
nyxxyS xy
7143.2117
)64)(212(2150
nxxS xx
22
4286.8017)212(72222
xx
xy
SS
14286.8017143.211
2642.0
xy 10ˆˆ
)2857.30)(2642.0(1429.9 1414.1
XY 10ˆˆˆ
XY 2642.01414.1ˆ
The estimated regression
model
Simple linear regression involves two
estimated parameters which are β0 and β1. Test of hypothesis is used in order to know
whether independent variable is significant to dependent variable.
The analysis of variance (ANOVA) method is an approach to test the significance of the regression.
5.5 INFERENCES OF ESTIMATED PARAMETERS
ANOVA table
ANOVA table for Example 5.2
Source of
variation
Sum of squares
Degree of
freedom
Mean square
ftest
Regression
SSR=0.2642(211.
7143)=55.9349
1 MSR=55.934
9
f=MSR/MSE=55.9349/0.9844
=56.8213
Error SSE=60.8571-55.9349=4.9222
7-2 =5
MSE=4.9222/
5=0.9844
Total SST=60.8571
7-1=6
To determine whether X provides
information in predicting Y, we proceed with testing the hypothesis.
Two test are commonly used:t TestF Test
5.6 TEST OF SIGNIFICANCE
1. Determine the hypothesis2. Determine the rejection region3. Compute the test statistics4. Conclusion
t Test
1. Determine the hypothesis
2. Determine the rejection region We reject H0 if
3. Compute the test statistics
4. Conclusion If we reject H0 there is a significant relationship
between variable X and Y.
0: 10 H0: 11 H
2,2
2,2
,
nn
tttt
)ˆ(
ˆ
1
1
Vart
xx
xyyy
SnSS
Var 12
ˆ)ˆ( 11
F Test
1. Determine the hypothesis2. Determine the rejection region3. Compute the test statistics4. Conclusion
1. Determine the hypothesis
2. Determine the rejection region We reject H0 if
3. Compute the test statistics
4. Conclusion If we reject H0 there is a significant relationship
between variable X and Y.
0: 10 H0: 11 H
2,1, ntest Ff
MSEMSRf test
Correlation measures the strength of a linear
relationship between the two variables. Also known as Pearson’s product moment
coefficient of correlation. The symbol for the sample coefficient of
correlation is r. Formula :
5.7 CORRELATION (r)
yyxx
xy
SSS
r
Values of r 11 r
The coefficient of determination is a measure of the
variation of the dependent variable (Y) that is explained by the regression line and the independent variable (X).
If r = 0.90, then r2 = 0.81. It means that 81% of the variation in the dependent variable (Y) is accounted for by the variations in the independent variable (X).
The rest of the variation, 0.19 or 19%, is unexplained and called the coefficient of nondetermination.
Formula for the coefficient of nondetermination is 1- r2
5.8 COEFFICIENT OF DETERMINATION( r2 )
Exercise 1
The following table gives information on lists of the midterm, X, and final exam, Y, scores for seven students in a statistics class.
1.Find the least squares regression line.2.Calculate r and r2, and explain the values.3.Predict the final exam scores the student will get if he/she got 60 marks for midterm test.4.Construct ANOVA table. Do the data support the existence of a linear relationship between midterm and final exam. Test using α = 0.05
X 79 95 81 66 87 94 59
Y 85 97 78 76 94 84 67