A Brief Introduction to Statistical Forecasting

A Brief Introduction to Statistical Forecasting

Kevin Werner

Outline

• Principle Component Theory• Applications• Z Score• VIPER

Statistical regression

Basic Forecast Methods

May 1 snowpack % avg

Apr-

Jul s

trea

mflo

w %

avg

S Fork Rio Grande, Colo

Snowpack

Soil water

Snow

Rainfall

Runoff

Heat

Simulation modeling

Credit: Tom Pagano

The General Linear Regression Model

where:Y = dependent variableXi = independent variables

bi = regression coefficients

n = number of independent variables

n

iii XbbY

10

Credit: Dave Garen

The Problem

If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated.However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.

n

iii XbbY

10

Credit: Dave Garen

ExampleStreamflow = bo + b1 * (Snotel A) + b2 * (Snotel B)

-> Snotel sites are very well correlated-> An optimal b1 and b2 will be difficult to determine since the correlation is so strong

The Solution

Possibilities:1) Pre-combine X’s into composite index(es), e.g., Z-score method2) Principal components regressionThese are similar in concept but differ in the mathematics.

n

iii XbbY

10

Credit: Dave Garen

Principal Components Analysis

Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables.

Principal components are linear combinations of the X’s.

Credit: Dave Garen

Principal Components AnalysisEach principal component is a weighted sum of all the X’s:

n

jjjXePC

111

n

jjjXePC

122

n

jjnjn XePC

1

. .

.

Credit: Dave Garen


The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other.

Principal components are new variables that are not correlated with each other.

The principal components transformation is equivalent to a rotation of axes.

Credit: Dave Garen


0

2

4

6

8

10

12

14

16

18

20

0 10 20 30 40 50 60 70 80

X1

X 2

R2 = 0.698R = 0.836

PC1 = e11 X1 + e12 X2

PC2 = e21 X1 + e22 X2

Credit: Dave Garen


The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true).

Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.

Credit: Dave Garen

Credit: Dennis Hartmann

Principal Components Analysis -- Example

Independent Variables:

X1 – X5 Snow water equivalent at 5 stations

X6 – X10 Water year to date precipitation at 5 stations

X11 Antecedent streamflow

X12 Climate teleconnection index

Credit: Dave Garen

Correlation MatrixX1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y

X1 1.0

.72

.67

.76

.81

.54

.31

.54

.38

.50

.18

.64

.65

X2 1.0

.67

.45

.80

.62

.45

.47

.31

.49

.14

.39

.60

X3 1.0

.49

.72

.84

.76

.86

.68

.85

.48

.56

.80

X4 1.0

.62

.42

.26

.36

.56

.38

.28

.59

.68

X5 1.0

.62

.49

.51

.44

.62

.32

.59

.73

X6 1.0

.93

.87

.83

.90

.63

.43

.85

X7 1.0

.82

.85

.90

.67

.32

.76

X8 1.0

.74

.84

.64

.39

.70

X9 1.0

.80

.70

.49

.84

X10 1.0

.64

.46

.79

X11 1.0

.36

.51

X12 1.0

.64

Credit: Dave Garen

First Five Eigenvectors

PC1 PC2 PC3 PC4 PC5

X1 0.265 0.444 0.004 0.074 -0.104

X2 0.249 0.325 -0.483

-0.030

0.315

X3 0.335 0.016 -0.178

0.149 -0.314

X4 0.229 0.353 0.456 -0.595

-0.009

X5 0.287 0.332 -0.148

0.120 0.412

X6 0.339 -0.168

-0.162

-0.106

-0.040

X7 0.308 -0.329

-0.150

-0.058

-0.015

X8 0.317 -0.197

-0.114

0.027 -0.261

X9 0.304 -0.240

0.299 -0.313

-0.103

X10 0.330 -0.197

-0.197

0.072 -0.129

X11 0.235 -0.349

0.351 0.168 0.692

X12 0.232 0.262 0.473 0.675 -0.212

% var.

62.7 15.8 7.8 3.8 3.2

Credit: Dave Garen

Principal Components Regression Procedure

• Try the PC’s in order• Test for regression coefficient significance (t-test)• Stop at first insignificant component• Transform regression coefficients to be in terms

of original variables• Sign test – coefficient signs must be same as

correlation with Y

Credit: Dave Garen

Summary

• Principal components analysis is a standard multivariate statistical procedure

• Can be used for descriptive purposes to reduce the dimensionality of correlated variables

• Can be taken a step further to provide new, non-correlated independent variables for regression

• PC’s taken in order, subject to t-test and sign test

• Final model is expressed in terms of original X variables Credit: Dave Garen

Soil Moisture at the interannual timescale

• Another example demonstrating importance of land surface processes in the climate system: Werner, 1999:– GCM run with and without active

land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region.

– Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.

• Modeled variability– Full dynamic land surface

model simulation contains variability resembling observed variability with connection between NH and SH SSTs.

– Fixed land surface model shows no connected variability between NH and SH SSTs

Soil Moisture at the interannual timescale

Resources

• Dave Garen VIPER slides• Dennis Hartmann lecture notes (

http://www.atmos.washington.edu/~dennis/)

http://www.atmos.washington.edu/~dennis/

What does z-score regression do?

1. Combines predictors into weighted indices,emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data.

3. Regresses index against target predictand

Credit: Tom Pagano

What is a z-score?

A z-score is a “normalized anomaly”:Z = value - average

standard deviation

Credit: Tom Pagano

What is a z-score?


standard deviation

60

135avg stdev

30

15

Credit: Tom Pagano

What is a z-score?


standard deviation

60

135avg stdev

30

15

Z = (90 – 60)/15 = +2

Credit: Tom Pagano

How good are the results

Under conditions of serially compete data,and relatively “normal” conditionsPCA and Z-Score are effectively indistinguishable*

Skill and behavior is similar to the official published outlooks**

However… Any tool is a weapon if you hold it right.(aka “A fool with a tool is still a tool”)

*Viper technical note - 1 basin ** Pagano dissertation – 29 basins Credit: Tom Pagano

Super Quick Primer on VIPER

The Viper Main InterfaceLayout and interpretation

Credit: Tom Pagano


Selectingpredictors and

predictands

Global month changes

Credit: Tom Pagano



predictands

Predictorsquality, availability


Historical statisticsCredit: Tom Pagano



predictands


Forecast vs observed time series

Station availability, weights





predictands




Fcst vs obsscatterplot

Helpervariable

Scatterplot/Forecast

progression





predictands


Probabilitybounds




Helpervariable


progression

Settings




Probabilitybounds




Helpervariable


progression

Settings

Historical statistics

There’s more if you scroll right:Relate any variable to another

Credit: Tom Pagano

A Brief Introduction to Statistical Forecasting

Documents

Transcript of A Brief Introduction to Statistical Forecasting