A Brief Introduction to Statistical Forecasting Kevin Werner.
A Brief Introduction to Statistical Forecasting
Transcript of A Brief Introduction to Statistical Forecasting
![Page 1: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/1.jpg)
A Brief Introduction to Statistical Forecasting
Kevin Werner
![Page 2: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/2.jpg)
Outline
• Principle Component Theory• Applications• Z Score• VIPER
![Page 3: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/3.jpg)
Statistical regression
Basic Forecast Methods
May 1 snowpack % avg
Apr-
Jul s
trea
mflo
w %
avg
S Fork Rio Grande, Colo
Snowpack
Soil water
Snow
Rainfall
Runoff
Heat
Simulation modeling
Credit: Tom Pagano
![Page 4: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/4.jpg)
The General Linear Regression Model
where:Y = dependent variableXi = independent variables
bi = regression coefficients
n = number of independent variables
n
iii XbbY
10
Credit: Dave Garen
![Page 5: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/5.jpg)
The Problem
If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated.However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.
n
iii XbbY
10
Credit: Dave Garen
![Page 6: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/6.jpg)
ExampleStreamflow = bo + b1 * (Snotel A) + b2 * (Snotel B)
-> Snotel sites are very well correlated-> An optimal b1 and b2 will be difficult to determine since the correlation is so strong
![Page 7: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/7.jpg)
The Solution
Possibilities:1) Pre-combine X’s into composite index(es), e.g., Z-score method2) Principal components regressionThese are similar in concept but differ in the mathematics.
n
iii XbbY
10
Credit: Dave Garen
![Page 8: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/8.jpg)
Principal Components Analysis
Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables.
Principal components are linear combinations of the X’s.
Credit: Dave Garen
![Page 9: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/9.jpg)
Principal Components AnalysisEach principal component is a weighted sum of all the X’s:
n
jjjXePC
111
n
jjjXePC
122
n
jjnjn XePC
1
. .
.
Credit: Dave Garen
![Page 10: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/10.jpg)
Principal Components Analysis
The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other.
Principal components are new variables that are not correlated with each other.
The principal components transformation is equivalent to a rotation of axes.
Credit: Dave Garen
![Page 11: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/11.jpg)
Principal Components Analysis
0
2
4
6
8
10
12
14
16
18
20
0 10 20 30 40 50 60 70 80
X1
X 2
R2 = 0.698R = 0.836
PC1 = e11 X1 + e12 X2
PC2 = e21 X1 + e22 X2
Credit: Dave Garen
![Page 12: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/12.jpg)
Principal Components Analysis
The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true).
Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.
Credit: Dave Garen
![Page 13: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/13.jpg)
Credit: Dennis Hartmann
![Page 14: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/14.jpg)
Principal Components Analysis -- Example
Independent Variables:
X1 – X5 Snow water equivalent at 5 stations
X6 – X10 Water year to date precipitation at 5 stations
X11 Antecedent streamflow
X12 Climate teleconnection index
Credit: Dave Garen
![Page 15: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/15.jpg)
Correlation MatrixX1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y
X1 1.0
.72
.67
.76
.81
.54
.31
.54
.38
.50
.18
.64
.65
X2 1.0
.67
.45
.80
.62
.45
.47
.31
.49
.14
.39
.60
X3 1.0
.49
.72
.84
.76
.86
.68
.85
.48
.56
.80
X4 1.0
.62
.42
.26
.36
.56
.38
.28
.59
.68
X5 1.0
.62
.49
.51
.44
.62
.32
.59
.73
X6 1.0
.93
.87
.83
.90
.63
.43
.85
X7 1.0
.82
.85
.90
.67
.32
.76
X8 1.0
.74
.84
.64
.39
.70
X9 1.0
.80
.70
.49
.84
X10 1.0
.64
.46
.79
X11 1.0
.36
.51
X12 1.0
.64
Credit: Dave Garen
![Page 16: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/16.jpg)
First Five Eigenvectors
PC1 PC2 PC3 PC4 PC5
X1 0.265 0.444 0.004 0.074 -0.104
X2 0.249 0.325 -0.483
-0.030
0.315
X3 0.335 0.016 -0.178
0.149 -0.314
X4 0.229 0.353 0.456 -0.595
-0.009
X5 0.287 0.332 -0.148
0.120 0.412
X6 0.339 -0.168
-0.162
-0.106
-0.040
X7 0.308 -0.329
-0.150
-0.058
-0.015
X8 0.317 -0.197
-0.114
0.027 -0.261
X9 0.304 -0.240
0.299 -0.313
-0.103
X10 0.330 -0.197
-0.197
0.072 -0.129
X11 0.235 -0.349
0.351 0.168 0.692
X12 0.232 0.262 0.473 0.675 -0.212
% var.
62.7 15.8 7.8 3.8 3.2
Credit: Dave Garen
![Page 17: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/17.jpg)
Principal Components Regression Procedure
• Try the PC’s in order• Test for regression coefficient significance (t-test)• Stop at first insignificant component• Transform regression coefficients to be in terms
of original variables• Sign test – coefficient signs must be same as
correlation with Y
Credit: Dave Garen
![Page 18: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/18.jpg)
Summary
• Principal components analysis is a standard multivariate statistical procedure
• Can be used for descriptive purposes to reduce the dimensionality of correlated variables
• Can be taken a step further to provide new, non-correlated independent variables for regression
• PC’s taken in order, subject to t-test and sign test
• Final model is expressed in terms of original X variables Credit: Dave Garen
![Page 19: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/19.jpg)
Soil Moisture at the interannual timescale
• Another example demonstrating importance of land surface processes in the climate system: Werner, 1999:– GCM run with and without active
land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region.
– Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.
![Page 20: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/20.jpg)
• Modeled variability– Full dynamic land surface
model simulation contains variability resembling observed variability with connection between NH and SH SSTs.
– Fixed land surface model shows no connected variability between NH and SH SSTs
Soil Moisture at the interannual timescale
![Page 21: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/21.jpg)
Resources
• Dave Garen VIPER slides• Dennis Hartmann lecture notes (
http://www.atmos.washington.edu/~dennis/)
![Page 22: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/22.jpg)
What does z-score regression do?
1. Combines predictors into weighted indices,emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data.
3. Regresses index against target predictand
Credit: Tom Pagano
![Page 23: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/23.jpg)
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
Credit: Tom Pagano
![Page 24: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/24.jpg)
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
Credit: Tom Pagano
![Page 25: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/25.jpg)
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
60
135avg stdev
30
15
Credit: Tom Pagano
![Page 26: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/26.jpg)
What is a z-score?
A z-score is a “normalized anomaly”:Z = value - average
standard deviation
60
135avg stdev
30
15
Z = (90 – 60)/15 = +2
Credit: Tom Pagano
![Page 27: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/27.jpg)
How good are the results
Under conditions of serially compete data,and relatively “normal” conditionsPCA and Z-Score are effectively indistinguishable*
Skill and behavior is similar to the official published outlooks**
However… Any tool is a weapon if you hold it right.(aka “A fool with a tool is still a tool”)
*Viper technical note - 1 basin ** Pagano dissertation – 29 basins Credit: Tom Pagano
![Page 28: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/28.jpg)
Super Quick Primer on VIPER
![Page 29: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/29.jpg)
The Viper Main InterfaceLayout and interpretation
Credit: Tom Pagano
![Page 30: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/30.jpg)
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Global month changes
Credit: Tom Pagano
![Page 31: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/31.jpg)
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Global month changes
Historical statisticsCredit: Tom Pagano
![Page 32: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/32.jpg)
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Forecast vs observed time series
Station availability, weights
Global month changes
Historical statisticsCredit: Tom Pagano
![Page 33: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/33.jpg)
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Global month changes
Historical statisticsCredit: Tom Pagano
![Page 34: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/34.jpg)
The Viper Main InterfaceLayout and interpretation
Selectingpredictors and
predictands
Predictorsquality, availability
Probabilitybounds
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Settings
Global month changes
Historical statisticsCredit: Tom Pagano
![Page 35: A Brief Introduction to Statistical Forecasting](https://reader035.fdocuments.net/reader035/viewer/2022070605/5870a1dd1a28ab431c8b4b65/html5/thumbnails/35.jpg)
The Viper Main InterfaceLayout and interpretation
Probabilitybounds
Forecast vs observed time series
Station availability, weights
Fcst vs obsscatterplot
Helpervariable
Scatterplot/Forecast
progression
Settings
Historical statistics
There’s more if you scroll right:Relate any variable to another
Credit: Tom Pagano