Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data...

Lecture 1Intro to Spatial and Temporal Data

Dennis SunStanford University

Stats 253

June 22, 2015

1 What is Spatial and Temporal Data?

2 TrendModeling

3 Omitted Variables

4 Overview of this Class

2 TrendModeling

3 Omitted Variables

Temporal Data

Temporal data are also called time series.

●●●

●●●●●

●●

●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●●●●●

●●●●●

●●●●●●

●●●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●

●●●●

●●●●●●

●●

●●●●●●

1995 2000 2005 2010 2015

Monthly Rainfall in San Francisco

Spatial DataSpatial observations can be areal units...

Percent of votes for GeorgeW. Bush in 2004 election.

Spatial Data

...or points in space.

San Jose house prices from zillow.com

What do the two have in common?

●●●

●●●●●

●●

●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●●●●●

●●●●●

●●●●●●

●●●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●●●●

●●●●

●●●●●●

●●

●●●●●●

1995 2000 2005 2010 2015

Observations that are close in time or space are similar.

Why is this the case?

Common or similar factors drive observations that are nearby intime and space.• Themeteorological phenomena that drive rainfall (e.g., ElNiño) in onemonth typically lasts a fewmonths.

• Religion and race are strong predictors of voters’ choices.These are likely to be similar in nearby regions.

• School quality is a strong predictor of house prices. Nearbyhouses belong to the same school district.

To make this precise, assume that each observation yi can bemodeled as a function of predictors xi:

yi = f(xi)︸︷︷︸trend

+ εi︸︷︷︸noise

2 TrendModeling

3 Omitted Variables

LinearModels

• Wewill focus on themost commonmodel for the trend, alinearmodel:

f(xi) = xTi β,

although there are others (loess, splines, etc.).• We estimateβ by ordinary least squares (OLS)

β̂def= argmin

n∑i=1

(yi − xTi β)

= argminβ||y −Xβ||2

= (XTX)−1XTy

• Is this a good estimator?

Properties of OLS

If we assume that y = Xβ + ε, whereE[ε|X] = 0, thenβ̂ = (XTX)−1XTy

= (XTX)−1XT (Xβ + ε)

= β + (XTX)−1XT ε.

Then,E[β̂|X] = β + E[(XTX)−1XT ε|X] = β, so the OLSestimator is unbiased.In fact, it is the “best” linear unbiased estimator. (More on this nexttime.)

Example: House Prices in FloridaCall:lm(formula = price ~ size + beds + baths + new, data = houses)

Residuals:Min 1Q Median 3Q Max

-215.747 -30.833 -5.574 18.800 164.471

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) -28.84922 27.26116 -1.058 0.29262size 0.11812 0.01232 9.585 1.27e-15 ***beds -8.20238 10.44984 -0.785 0.43445baths 5.27378 13.08017 0.403 0.68772new 54.56238 19.21489 2.840 0.00553 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 54.25 on 95 degrees of freedomMultiple R-squared: 0.7245, Adjusted R-squared: 0.713F-statistic: 62.47 on 4 and 95 DF, p-value: < 2.2e-16

Where do the standard errors come from?If we further assumeVar[ε|X] = σ2I , then we can calculate:

Var[β̂|X] = Var[β + (XTX)−1XT ε|X

]=((XTX)−1XT

)Var [ε|X]

((XTX)−1XT

)T︸︷︷︸X(XTX)−1

= σ2((XTX)−1XT

) (X(XTX)−1

)= σ2(XTX)−1.

Since β̂ is a random vector, this is a covariancematrix:

Var(β̂) =

Var(β̂1) Cov(β̂1, β̂2) ... Cov(β̂1, β̂p)

Cov(β̂2, β̂1) Var(β̂2) ... Cov(β̂2, β̂p)... ... . . . ...Cov(β̂p, β̂1) Cov(β̂p, β̂2) ... Var(β̂p)

The square root of the diagonal elements give us the standarderrors, i.e., SE(β̂j) =

√Var(β̂j).

2 TrendModeling

3 Omitted Variables

What happens if we omit a variable?• Suppose the following model for house prices is correct:

pricei = β0 + β1 · sizei + β2 · newi︸︷︷︸trend

+ εi︸︷︷︸noise

whereE[ε|size, new] = 0 andVar[ε|size, new] ∝ I .• Suppose we don’t actually have data about whether a house is

new or not.• Weomit it from our model, so new becomes part of the noise.

pricei = β0 + β1 · sizei︸︷︷︸trend

+β2 · newi + εi︸︷︷︸noise

Is this a problem?• We are fine as long as

E[noise | size] = 0 Var[noise | size] ∝ I

Omitted Variable Bias

Suppose the first condition is violated, i.e.,E[noise | size] 6= 0, i.e.,E[β2 · new + ε | size] 6= 0.

SinceE[ε | size] = 0, this meansE[β2 · new | size] 6= 0.

Two things have to happen for this situation to occur:• β2 6= 0: The omitted variable is relevant for predicting theresponse.

• E[new | size] 6= 0: The omitted variable is correlated with apredictor in the model.

Omitted variables are also called confounders.SinceE[noise | size] 6= 0, β̂1 is no longer unbiased for β1.

Correlated Noise

• Suppose we are reasonably convinced that new is notcorrelated with size in our dataset.

• So wewill be able to obtain an unbiased estimator for theeffect of size on house prices.

• But in order for the standard errors to be valid, we needVar[β2 · new + ε | size] ∝ I.

• This depends on whetherVar[new | size] ∝ I,

but chances are:Cov[newi, newj | size] 6= 0.

A Simulation Study

Suppose we have n = 20 observations fromyt = βxt + εt, β = 1

where εt is correlated (generated from an AR(1) process).

Here are theOLS estimates β̂ obtained over 10000 simulations.

0 1 2OLS Estimates

According to the simulations:

E[β̂|x] ≈ 1, so β̂ is unbiased.SE[β̂|x] ≈ .15.

A Simulation Study

Suppose we have n = 20 observations fromyt = βxt + εt, β = 1

where εt is correlated (generated from an AR(1) process).

Here are the naive SEs from calling the lm function in R.

0.0 0.5 1.0 1.5OLS Standard Errors

t OLS does not estimate thestandard error appropriately.

2 TrendModeling

3 Omitted Variables

Why study spatial and temporal statistics?

• The focus of this class will be supervised learningyi = f(xi) + εi

when the error is correlated.• Wewill assume that the omitted variables do not lead to bias(E[ε|X] = 0).

• If the omitted variables all have a spatial or temporalstructure, then we can try to model it explicitly:

Cov[εi, εj |X] = g(d(i, j)).

• This will allow us to (1) obtain correct inferences for thevariables in the model and (2) obtain a more efficientestimator than theOLS estimator.

Course Requirements

• We’ll have 3 homeworks, which will be coding / data analysis.• We’ll also have 3 in-class quizzes, which will go over theconceptual issues.

• These will be graded on a check / resubmit basis.• For those taking the class for a letter grade, the grade will bebased primarily on a final project.

Structure of the Class

• This class will meetMonday,Wednesday, Friday at 2:15pm forthe first four weeks.

• The last four weeks will be dedicated to your final project. Iwill schedule individual meetings with students, and theremaybe sporadic lectures covering topics of interest to the class.

CourseWebsite

• The course website is stats253.stanford.edu.• All materials (syllabus, lecture slides, homeworks) will beposted here.

• All homework will be submitted through this course website.

Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data...

Documents

Transcript of Lecture 1 Intro to Spatial and Temporal Data · Lecture 1 Intro to Spatial and Temporal Data...

Spatial and Temporal Temperature Distributions in ...gwri.calpoly.edu/media/Research_Papers/Spatial_and_Temporal... · Spatial and Temporal Temperature Distributions in Municipal

Temporal and spatial comparisons

Where is the Universe?. Cosmological Questions Temporal Temporal Spatial Spatial Compositional Compositional…

Intro to Spatial data

Lesage Spatial Intro

Spatial-Temporal Relation Networks for Multi-Object Trackingopenaccess.thecvf.com/content_ICCV_2019/papers/Xu_Spatial-Temporal... · Spatial-Temporal Relation Networks for Multi-Object

Lecture 1 Intro to Spatial and Temporal Processes · Lecture 1 Intro to Spatial and Temporal Processes Dennis Sun Stats 253 June 23, 2014 Dennis Sun Stats 253 { Lecture 1 June 23,

THE ROLE OF SPATIAL AND TEMPORAL STRUCTURE FOR … et al_The Rol… · temporal components can increase forecast accuracy substantially. Likewise, signiﬁcant spatial and temporal

Kahru 2012 Spatial and Temporal

Modeling Spatial, Temporal and Spatio-Temporal Data in Object … · 2003-10-29 · Modeling Spatial, Temporal and Spatio-Temporal Data in Object-Relational Database Systems Vom Fachbereich

A Spatial-temporal POI Data Model for Implementing Location …3dgis.uos.ac.kr/papers/A Spatial-temporal POI Data Model... · 2017-01-23 · A Spatial-temporal POI Data Model for

2014 11 lucene spatial temporal update

Data Mining – Intro. Course Overview Spatial Databases Temporal Databases Spatio-Temporal Databases Data Mining.

Spatial Temporal Activity Recorder (STAR)

Spatial and Temporal Patterns of Terrorist Attacks by ETA ... · Spatial and Temporal Patterns of Terrorist Attacks by ETA 1970 to ... examine spatial and temporal patterns of terrorist

Spatial and Spatio-temporal Epidemiology

Temporal and Spatial Invasion

Spatial and temporal variations of nutrien

Temporal and Spatial Locality

A spatial-temporal analysis of matching in feature-based ...summit.sfu.ca/system/files/iritems1/5792/b15284670.pdf · A SPATIAL-TEMPORAL ANALYSIS OF MATCHING IN ... A Spatial-Temporal