Regression Analysis
-
Upload
aileen-rodriguez -
Category
Documents
-
view
16 -
download
0
description
Transcript of Regression Analysis
![Page 2: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/2.jpg)
Regression
To express the relationship between two or more variables by a mathematical formula.
x: predictor (independent) variable
y: response (dependent) variable
Identify how y varies as a function of x.
y is also considered as a random variable.
Real-Word Example:
Footwear impressions are commonly observed at crime scenes.
While there are numerous forensic properties that can be obtained
from these impressions, one in particular is the shoe size. The
detectives would like to be able to estimate the height of the
impression maker from the shoe size.
The relationship between shoe sizes and heights2
![Page 3: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/3.jpg)
Shoe Size vs. Height
3
![Page 4: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/4.jpg)
Shoe Size vs. Height
What is the predictor?
What is the response?
Can the height by accurately estimated from the shoe size?
If a shoe size is 11, what would you advise the police?
What if the size is 7 or 12.5?
4
![Page 5: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/5.jpg)
General Regression Model
The systematic part m(x) is deterministic.
The error ε(x) is a random variable.
Measurement Error
Natural Variations
Additive
5
)()()( xxmxy
![Page 6: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/6.jpg)
Example: Sin Function
6
)()sin()( xxAxy
![Page 7: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/7.jpg)
Standard Assumptions
7
![Page 8: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/8.jpg)
A1
8
![Page 9: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/9.jpg)
A2
9
![Page 10: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/10.jpg)
A3
10
![Page 11: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/11.jpg)
Back to Shoes
11
![Page 12: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/12.jpg)
Simple Linear Regression
12
xxm 10)(
![Page 13: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/13.jpg)
Model Parameters
13
![Page 14: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/14.jpg)
Derivation
14
n
iii xyR
1
21010 ),(
xy
xyn
iii
R
10
1100
020
2
1
2
11
111
1100
0
021
xnx
yxnyx
xxyxyx
xyx
n
ii
n
iii
n
iiiii
n
iiii
R
![Page 15: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/15.jpg)
Standard Deviations
15
n
iin 1
22
2
1
2/1
2
1
2
21
0
xnx
x
n n
i
2/1
2
1
2
11
xnxn
i
![Page 16: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/16.jpg)
Polynomial Terms
Modeling the data as a line is not always adequate.
Polynomial Regression
This is still a linear model!
m(x) is a linear combination of β.
Danger of Overfitting
16
p
k
kk
pp xxxxm
010 ...)(
![Page 17: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/17.jpg)
Matrix Representation
17
i
p
k
kiki xy
0
XY
![Page 18: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/18.jpg)
Matrix Representation
18
XYXYR T )(
YXXX
XXYXXYYYTT
TTTTTTR
00
YXXX TT 1
![Page 19: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/19.jpg)
Model Comparison
19
n
ii yySST
1
2 :Total Squares of Sum
n
iii yySSE
1
2^
:Error Squares of Sum
![Page 20: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/20.jpg)
R2
20
SST
SSE
SST
SSESSTR
12
2 / ( ( 1))1
/ ( 1)adj
SSE n pR
SST n
![Page 21: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/21.jpg)
Example
21
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5
0
5
10
15
20
25
30
X
Y
Y= -3.6029+4.8802X
R2=0.9131
Y= 0.7341-0.4303X+1.0621X2
R2=0.9880
Y=X2+N(0,1)
![Page 22: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/22.jpg)
Tricky Relationship
22
Exercise Time
Fitn
ess
Youth
Elderly
![Page 23: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/23.jpg)
Violent Crime vs. Video Game
23
0
2
4
6
8
10
12
14
16
18
0
100
200
300
400
500
600
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Violent Crime
Aggravated Assault
Robbery
Murder & Manslaughter
Forcible Rape
Video Game Sales
![Page 24: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/24.jpg)
这是真的吗?
24
![Page 25: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/25.jpg)
时间去哪儿了?
25
![Page 26: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/26.jpg)
![Page 27: Regression Analysis](https://reader030.fdocuments.net/reader030/viewer/2022032414/5681330e550346895d99cb7f/html5/thumbnails/27.jpg)
Summary
Regression is the oldest data mining technique.
Probably the first thing that you want to try on a new data set.
No need to do programming!
Matlab, Excel …
Quality of Regression
R2
Residual Plot
Cross Validation
What you should learn after class:
The Influence of Outliers
Confidence Interval
Nonlinear Regression
27