Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data...
Transcript of Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data...
![Page 1: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/1.jpg)
Lecture 1Intro to Spatial and Temporal Data
Dennis SunStanford University
Stats 253
June 22, 2015
![Page 2: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/2.jpg)
1 What is Spatial and Temporal Data?
2 TrendModeling
3 Omitted Variables
4 Overview of this Class
![Page 3: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/3.jpg)
1 What is Spatial and Temporal Data?
2 TrendModeling
3 Omitted Variables
4 Overview of this Class
![Page 4: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/4.jpg)
Temporal Data
Temporal data are also called time series.
●
●
●
●●●
●●●●●
●●
●
●
●●
●●●●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●●
●●●●●
●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●●●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●●●●●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●●●●●
●
●
●
●●●●
●●●●●●
●
●●
●
●●
●●●●●●
●
●
1995 2000 2005 2010 2015
02
46
810
1214
Monthly Rainfall in San Francisco
![Page 5: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/5.jpg)
Spatial DataSpatial observations can be areal units...
Percent of votes for GeorgeW. Bush in 2004 election.
![Page 6: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/6.jpg)
Spatial Data
...or points in space.
San Jose house prices from zillow.com
![Page 7: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/7.jpg)
What do the two have in common?
●
●
●
●●●
●●●●●
●●
●
●
●●
●●●●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●●
●●●●●
●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●●●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●●●●●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●●●●●
●
●
●
●●●●
●●●●●●
●
●●
●
●●
●●●●●●
●
●
1995 2000 2005 2010 2015
02
46
810
1214
Observations that are close in time or space are similar.
![Page 8: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/8.jpg)
Why is this the case?
Common or similar factors drive observations that are nearby intime and space.• Themeteorological phenomena that drive rainfall (e.g., ElNiño) in onemonth typically lasts a fewmonths.
• Religion and race are strong predictors of voters’ choices.These are likely to be similar in nearby regions.
• School quality is a strong predictor of house prices. Nearbyhouses belong to the same school district.
To make this precise, assume that each observation yi can bemodeled as a function of predictors xi:
yi = f(xi)︸ ︷︷ ︸trend
+ εi︸︷︷︸noise
![Page 9: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/9.jpg)
1 What is Spatial and Temporal Data?
2 TrendModeling
3 Omitted Variables
4 Overview of this Class
![Page 10: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/10.jpg)
LinearModels
• Wewill focus on themost commonmodel for the trend, alinearmodel:
f(xi) = xTi β,
although there are others (loess, splines, etc.).• We estimateβ by ordinary least squares (OLS)
β̂def= argmin
β
n∑i=1
(yi − xTi β)
2
= argminβ||y −Xβ||2
= (XTX)−1XTy
• Is this a good estimator?
![Page 11: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/11.jpg)
Properties of OLS
If we assume that y = Xβ + ε, whereE[ε|X] = 0, thenβ̂ = (XTX)−1XTy
= (XTX)−1XT (Xβ + ε)
= β + (XTX)−1XT ε.
Then,E[β̂|X] = β + E[(XTX)−1XT ε|X] = β, so the OLSestimator is unbiased.In fact, it is the “best” linear unbiased estimator. (More on this nexttime.)
![Page 12: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/12.jpg)
Example: House Prices in FloridaCall:lm(formula = price ~ size + beds + baths + new, data = houses)
Residuals:Min 1Q Median 3Q Max
-215.747 -30.833 -5.574 18.800 164.471
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) -28.84922 27.26116 -1.058 0.29262size 0.11812 0.01232 9.585 1.27e-15 ***beds -8.20238 10.44984 -0.785 0.43445baths 5.27378 13.08017 0.403 0.68772new 54.56238 19.21489 2.840 0.00553 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 54.25 on 95 degrees of freedomMultiple R-squared: 0.7245, Adjusted R-squared: 0.713F-statistic: 62.47 on 4 and 95 DF, p-value: < 2.2e-16
![Page 13: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/13.jpg)
Where do the standard errors come from?If we further assumeVar[ε|X] = σ2I , then we can calculate:
Var[β̂|X] = Var[β + (XTX)−1XT ε|X
]=((XTX)−1XT
)Var [ε|X]
((XTX)−1XT
)T︸ ︷︷ ︸X(XTX)−1
= σ2((XTX)−1XT
) (X(XTX)−1
)= σ2(XTX)−1.
Since β̂ is a random vector, this is a covariancematrix:
Var(β̂) =
Var(β̂1) Cov(β̂1, β̂2) ... Cov(β̂1, β̂p)
Cov(β̂2, β̂1) Var(β̂2) ... Cov(β̂2, β̂p)... ... . . . ...Cov(β̂p, β̂1) Cov(β̂p, β̂2) ... Var(β̂p)
.
The square root of the diagonal elements give us the standarderrors, i.e., SE(β̂j) =
√Var(β̂j).
![Page 14: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/14.jpg)
1 What is Spatial and Temporal Data?
2 TrendModeling
3 Omitted Variables
4 Overview of this Class
![Page 15: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/15.jpg)
What happens if we omit a variable?• Suppose the following model for house prices is correct:
pricei = β0 + β1 · sizei + β2 · newi︸ ︷︷ ︸trend
+ εi︸︷︷︸noise
,
whereE[ε|size, new] = 0 andVar[ε|size, new] ∝ I .• Suppose we don’t actually have data about whether a house is
new or not.• Weomit it from our model, so new becomes part of the noise.
pricei = β0 + β1 · sizei︸ ︷︷ ︸trend
+β2 · newi + εi︸ ︷︷ ︸noise
,
Is this a problem?• We are fine as long as
E[noise | size] = 0 Var[noise | size] ∝ I
![Page 16: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/16.jpg)
Omitted Variable Bias
Suppose the first condition is violated, i.e.,E[noise | size] 6= 0, i.e.,E[β2 · new + ε | size] 6= 0.
SinceE[ε | size] = 0, this meansE[β2 · new | size] 6= 0.
Two things have to happen for this situation to occur:• β2 6= 0: The omitted variable is relevant for predicting theresponse.
• E[new | size] 6= 0: The omitted variable is correlated with apredictor in the model.
Omitted variables are also called confounders.SinceE[noise | size] 6= 0, β̂1 is no longer unbiased for β1.
![Page 17: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/17.jpg)
Correlated Noise
• Suppose we are reasonably convinced that new is notcorrelated with size in our dataset.
• So wewill be able to obtain an unbiased estimator for theeffect of size on house prices.
• But in order for the standard errors to be valid, we needVar[β2 · new + ε | size] ∝ I.
• This depends on whetherVar[new | size] ∝ I,
but chances are:Cov[newi, newj | size] 6= 0.
![Page 18: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/18.jpg)
A Simulation Study
Suppose we have n = 20 observations fromyt = βxt + εt, β = 1
where εt is correlated (generated from an AR(1) process).
Here are theOLS estimates β̂ obtained over 10000 simulations.
0
300
600
900
0 1 2OLS Estimates
coun
t
According to the simulations:
E[β̂|x] ≈ 1, so β̂ is unbiased.SE[β̂|x] ≈ .15.
![Page 19: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/19.jpg)
A Simulation Study
Suppose we have n = 20 observations fromyt = βxt + εt, β = 1
where εt is correlated (generated from an AR(1) process).
Here are the naive SEs from calling the lm function in R.
0
500
1000
0.0 0.5 1.0 1.5OLS Standard Errors
coun
t OLS does not estimate thestandard error appropriately.
![Page 20: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/20.jpg)
1 What is Spatial and Temporal Data?
2 TrendModeling
3 Omitted Variables
4 Overview of this Class
![Page 21: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/21.jpg)
Why study spatial and temporal statistics?
• The focus of this class will be supervised learningyi = f(xi) + εi
when the error is correlated.• Wewill assume that the omitted variables do not lead to bias(E[ε|X] = 0).
• If the omitted variables all have a spatial or temporalstructure, then we can try to model it explicitly:
Cov[εi, εj |X] = g(d(i, j)).
• This will allow us to (1) obtain correct inferences for thevariables in the model and (2) obtain a more efficientestimator than theOLS estimator.
![Page 22: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/22.jpg)
Course Requirements
• We’ll have 3 homeworks, which will be coding / data analysis.• We’ll also have 3 in-class quizzes, which will go over theconceptual issues.
• These will be graded on a check / resubmit basis.• For those taking the class for a letter grade, the grade will bebased primarily on a final project.
![Page 23: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/23.jpg)
Structure of the Class
• This class will meetMonday,Wednesday, Friday at 2:15pm forthe first four weeks.
• The last four weeks will be dedicated to your final project. Iwill schedule individual meetings with students, and theremaybe sporadic lectures covering topics of interest to the class.
![Page 24: Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data Author: Dennis Sun Stanford University Stats 253 Created Date: 6/22/2015 5:03:30 PM ...](https://reader030.fdocuments.net/reader030/viewer/2022041117/5f2d380a3f672b0c381ccb5c/html5/thumbnails/24.jpg)
CourseWebsite
• The course website is stats253.stanford.edu.• All materials (syllabus, lecture slides, homeworks) will beposted here.
• All homework will be submitted through this course website.