Quantile Regression - University of Washington

23
Quantile Regression Caren Marzban Applied Physics Lab., Department of Statistics Univ. of Washington, Seattle, WA, USA 98195 CAPS, University of Oklahoma, Norman, OK Abstract The prediction from most regression models - be it multiple regression, neural networks, trees, etc. - is a point estimate of the conditional mean of a response (i.e., quantity being predicted), given a set of predictors. However, the conditional mean measures only the ”center” of the conditional distribution of the response. A more complete summary of the conditional distribution is provided by its quantiles. The 0.5 quantile (i.e., the median) can serve as a measure of the center, and the 0.9 quantile marks the value of the response below which resides 90quantile and the 0.05 quantile serves as the 90thereby conveying uncertainty. Quantiles arise naturally in environmental sciences. For example, one may desire to know the lowest level (e.g., 0.1 quantile) of a river, given the amount of snowpack; or the highest tempera- ture (e.g., the 0.9 quantile), given cloud cover. Recent advances in computing allow the development of regression models for predicting a given quantile of the conditional distribution, both parametrically and nonparametrically. The general approach is called Quantile Regression, but the methodology (of conditional quantile estimation) applies to any statistical model, be it multiple regression, support vector machines, or random forests. In this talk, the principles of quantile regression are reviewed and the methodology is illustrated through several examples. The technique and the examples display many of the features common in both machine learning and statistics. Motivation Q: Given data y i ,i =1, 2, ..., n, what prediction α minimizes MSE E = 1 n n i 2 i = 1 n n i (y i - α) 2 ? 1

Transcript of Quantile Regression - University of Washington

Page 1: Quantile Regression - University of Washington

Quantile Regression

Caren Marzban

Applied Physics Lab., Department of Statistics

Univ. of Washington, Seattle, WA, USA 98195

CAPS, University of Oklahoma, Norman, OK

Abstract

The prediction from most regression models - be it multiple regression, neural networks, trees, etc. -is a point estimate of the conditional mean of a response (i.e., quantity being predicted), given a set ofpredictors. However, the conditional mean measures only the ”center” of the conditional distribution ofthe response. A more complete summary of the conditional distribution is provided by its quantiles. The0.5 quantile (i.e., the median) can serve as a measure of the center, and the 0.9 quantile marks the valueof the response below which resides 90quantile and the 0.05 quantile serves as the 90thereby conveyinguncertainty. Quantiles arise naturally in environmental sciences. For example, one may desire to knowthe lowest level (e.g., 0.1 quantile) of a river, given the amount of snowpack; or the highest tempera-ture (e.g., the 0.9 quantile), given cloud cover. Recent advances in computing allow the developmentof regression models for predicting a given quantile of the conditional distribution, both parametricallyand nonparametrically. The general approach is called Quantile Regression, but the methodology (ofconditional quantile estimation) applies to any statistical model, be it multiple regression, support vectormachines, or random forests. In this talk, the principles of quantile regression are reviewed and themethodology is illustrated through several examples. The technique and the examples display many ofthe features common in both machine learning and statistics.

Motivation

Q: Given data yi, i = 1, 2, ..., n, what prediction α minimizes MSE

E =1

n

n∑i

ε2i =

1

n

n∑i(yi − α)2 ?

1

Page 2: Quantile Regression - University of Washington

A: The sample mean of y. Proof:

dE

dα∼ 1

n

n∑i(yi − α) ∼ 1

n

n∑i

yi − α ⇐⇒ α =1

n

n∑i

yi .

Q: Given data yi, i = 1, 2, ..., n, what prediction α minimizes the MAE

E =1

n

n∑i|εi| =

1

n

n∑i|yi − α| ?

A: The sample median of y. Proof: Similar.

Q: Given data (xi, yi), what prediction y(x) = α0 + α1x minimizes MSE

E =1

n

n∑i

εi =1

n

n∑i[yi − (α0 + α1 xi)]

2?

A: The conditional sample mean of y. I.e.,

y(x) = α0 + α1x = mean of y, given x

2

Page 3: Quantile Regression - University of Washington

Moving on ...

E[y|x] = conditional mean = location of conditional distribution.

Width of conditional distribution ∼ uncertainty.

Why conditional mean? Why not conditional median?

And if median (i.e., 0.5 quantile), why not other quantiles?

Why not output = 0.1, ..., 0.5, ..., 0.9 conditional quantile (given x)?

Hence Quantile Regression (QR).

Effectively, QR produces the whole conditional distribution of y.

With it one can estimate and conduct inference on conditional quantile func-

tions. (∼ Extreme Value Theory).

90% Prediction Interval (not CI) = 0.95 quantile - 0.05 quantile.

3

Page 4: Quantile Regression - University of Washington

History

Google counts:

meteorology regression 655,000

meteorology “artificial intelligence” 495,000

meteorology “neural networks” 153,000

meteorology “quantile regression” 296

QR: Roger Koenker and Gilbert Bassett (1978), Econometrica.

QR Neural Nets: James W. Taylor (2000), Journal of Forecasting.

QR Forests: Nicolai Meinshausen (2006), Jour. of Machine Learning Research.

Nonparametric QR: Takeuchi, Le, Sears, Smola, (2006), Jour. of Machine

Learning Research.

In meteorology:

QR: John Bjrnar Bremnes (2004): Probabilistic forecasts of precipitation in

terms of quantiles using NWP model output. Mon. Wea. Rev.

Friedrichs and Hense (2007): Statistical Downscaling of Extreme Precipitation

Events Using Censored Quantile Regression. Mon. Wea. Rev.

Jeremie Juban , Lionel Fugon, George Kariniotakis (2007): Probabilistic short-

term wind power forecasting based on kernel density estimators. European

Wind Energy Conference. (QR Forests)

4

Page 5: Quantile Regression - University of Washington

Quantile Regression (Forests)

Instead of minimizing∑n

i [yi − (α0 + α1 xi)]2, minimize

n∑i

f (yi − (α0 + α1 xi))

where

f (y − q) =

β (y − q) y ≥ q

(1− β) (q − y) y < q

to obtain the βth quantile.

Instead of

E[y|x] = α0 + α1x

QR gives, for each quantile

Q[y|x] = α0 + α1x

Nonparametric QR gives

Q[y|x] = splines

QR Forests∗ give

Q[y|x] = piece-wise constant

∗ Breiman (2001): Take a bootstrap sample, develop a regression tree, repeat,

average the outputs over the samples.

5

Page 6: Quantile Regression - University of Washington

Examples

1) No. tornadoes/month, 1950-1998 (technically, not regression).

12-month period filtered out with running median filter.

Courtesy of Joseph Schaefer, Storm Prediction Center.

2) This month’s vs. last month’s number of tornadoes.

3) Global temperature, 1851-1996 (Ibid).

http://www.image.ucar.edu/GSP/Data/Climate/observed.dat

4) Max water-level (above NAVD88) vs. Max 1-hr change, 23 sites in south-

western Louisiana and southeastern Texas following Hurricane Rita, Sep 2005.

http://pubs.usgs.gov/ds/2006/220/ (Table 3).

5) Six-minute obs. water-level vs. water temperature, San Fransisco, CA,

12/29/07-12/31/07

http://tidesonline.nos.noaa.gov/data read.shtml?station info=9414290+San+Francisco,+CA

6) Annual stream flow vs. prcp, 1948-1997, Big Sioux, Dell Rapids, SD.

Courtesy of Dennis Lettenmaier, Univ. of Washington.

7) Avg. (over 1931-1960) Jan. daily-min. temp. for 56 US cities, vs. lat. lon.

Code: R (packages: splines, quantreg, quantregForest).

Plot 0.05, 0.25, 0.5, 0.75, 0.95 quantiles, and the least-square fit (dashed).

Disclaimer: All substantive conclusions are cursory.

6

Page 7: Quantile Regression - University of Washington

Tornado Time Series

0.2 0.4 0.6 0.8

15

25

35

(Intercept)

oo

o

o o

oo

o

o

0.2 0.4 0.6 0.8

0.0

50

.08

x

o

o oo

o oo

o o

No. of tornadoes is increasing with time.

The quantiles fan out slightly.

7

Page 8: Quantile Regression - University of Washington

●●●●

●●●

●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●

●●●●●

●●●●

●●●●●●●

●●●●●●●

●●●●

●●●

●●●●●●●●●●●

●●

●●●●●

●●●●

●●●●●

●●●●

●●●●●●

●●●●

●●●

●●●●●●●●●●●

●●

●●●

●●●

●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●●●

●●●●

●●●

●●●

●●

●●●

●●●●●

●●●

●●

●●●●●●●

●●●●

●●●●●●●

●●

●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●●●●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●●●

●●●

●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●

●●●●●●

●●●●

●●●●

●●

●●●●●●●●●

●●

●●

●●●

●●●●

●●●●●●●●

●●

●●●●

●●●●

●●●●●●●

●●●●●

●●●●●

●●

●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●

●●●●●

●●

●●●

●●

●●●●●●

●●

●●●●●

●●●●

●●

●●●

●●●●●

●●●●●●●●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●●●●●●●

●●

●●●●●

0 100 200 300 400 500 600

2040

6080

100

x

y

The max (i.e., 0.95 quantile) number of tornadoes grows faster than the min

(i.e., 0.05 quantile) number of tornadoes.

More realistically, min is constant in time but max gows?

8

Page 9: Quantile Regression - University of Washington

This Month’s vs. Last Month’s No. of Tornadoes

0.2 0.4 0.6 0.8

15

25

35

(Intercept)

oo

o

o o

oo

o

o

0.2 0.4 0.6 0.8

0.0

50

.08

x

o

o oo

o oo

o o

More/less tornadoes yesterday means more/less tornadoes today.

Center of distribution shifts up, but it spreads out too.

Median below mean, etc., i.e., distribution has a long tail on the high-y side.

9

Page 10: Quantile Regression - University of Washington

● ●●

●●

●●●●●

●●

●●

●●

● ●●

●●●●

●●● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

0 100 200 300 400

010

020

030

040

0

x

y

Shape of conditional distribution of y changes with x.

10

Page 11: Quantile Regression - University of Washington

Global Temperature

0.2 0.4 0.6 0.8

−0

.5−

0.2

0.1

(Intercept)

o

oo o

oo

oo

o

0.2 0.4 0.6 0.8

0.0

02

50

.00

45

x

o o o

oo

o o o o

Global temperature is increasing with time.

No fanning, i.e., whole conditional distribution shifts up.

See next.

11

Page 12: Quantile Regression - University of Washington

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

0 50 100 150

−0.2

0.00.2

0.40.6

x

y

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

0 50 100 150

−0.2

0.0

0.2

0.4

0.6

x

y

Global temperature is increasing/undulating.

The central spread (i.e., 0.75 - 0.25 quantile) is constant.

The full spread (i.e., 0.95 - 0.05 quantile) shrinks with time.

12

Page 13: Quantile Regression - University of Washington

Hurricane Rita

0.2 0.4 0.6 0.8

02

04

0(Intercept)

o o o o o o o o

o

0.2 0.4 0.6 0.8

−0

.20

.41

.0

x

oo o o o o o

o

o

●●

●●

4 6 8 10 12 14

01

23

45

6

x

y

There is a positive relationship between max water-level and max change in

water-level.

Insufficient data for anything else!

13

Page 14: Quantile Regression - University of Washington

●●

●●

4 6 8 10 12 14

01

23

45

6

x

y

Crossing Quantiles?

Probably overfitting!

14

Page 15: Quantile Regression - University of Washington

Water Level vs. Temperature at San Fransisco

0.2 0.4 0.6 0.8

10

02

00

30

0(Intercept)

o

o

o o oo

o

o

o

0.2 0.4 0.6 0.8

−6

−4

−2

x

o

o

o o oo

o

o

o

●●●

●●

●●●●●

●●●●●

●●●

●●●

●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●●● ● ●●●● ●●

●● ●● ● ●●●

●●●●●●

●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●●●●●●●

●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●● ●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●●

●●●●●●●●●● ● ●

●●●

●●●●

●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●●

● ●●●●●●

●●●●●

●●●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●● ●

●●●

● ●●●●●●●●●

50.5 51.0 51.5 52.0

02

46

x

y

Surge inversely related to water temperature.

Reverse-fanning, i.e., less uncertainty in water-level at high temperatures.

15

Page 16: Quantile Regression - University of Washington

●●●

●●

●●●●●

●●●●●

●●●

●●●

●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●●● ● ●●●● ●●

●● ●● ● ●●●

●●●●●●

●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●●●●●●●

●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●● ●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●●

●●●●●●●●●● ● ●

●●●

●●●●

●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●●

● ●●●●●●

●●●●●

●●●●●●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●● ●

●●●

● ●●●●●●●●●

50.5 51.0 51.5 52.0

02

46

x

y

Nonlinear, but why?

Shape of distribution of y changes with x.

16

Page 17: Quantile Regression - University of Washington

Stream Flow vs. Precipitation

x

Fre

quen

cy

4 6 8 10

02

46

810

y

Fre

quen

cy

0 500 1000 1500

02

46

810

log(y)

Fre

quen

cy

3 4 5 6 7

01

23

45

6

Transform stream flow to logarithm of stream flow.

17

Page 18: Quantile Regression - University of Washington

0.2 0.4 0.6 0.8

02

46

(Intercept)

oo o o o o o o

o

0.2 0.4 0.6 0.8

−0

.20

.41

.0

x

o o oo o o o o

o

●●

● ●

4 5 6 7 8 9 10

34

56

7

x

log(

y)

Increasing prcp leads to increasing flow.

But not for the lowest quantile (0.05)! The upper quartile (0.95) doesn’t

increase as fast as the middle portion of the distribution.

18

Page 19: Quantile Regression - University of Washington

●●

● ●

4 5 6 7 8 9 10

34

56

7

x

log(

y)

The middle portion of the distribution of flow shifts up with prcp.

Beware of extrapolation (0.05 quantile)!

19

Page 20: Quantile Regression - University of Washington

Flow vs. Prcp with Quantile Random Forrest

●●

●●

4 5 6 7 8 9 10

34

56

7

x

log(

y)

Abrupt shifts in flow when prcp is about 5.5 and 7.5.

Again, lower quantile (0.05) is mostly insensitive to prcp.

20

Page 21: Quantile Regression - University of Washington

Jan. Min. Temperatures vs. Lat. and Lon.

60 70 80 90 100 110

2530

3540

45

180−lon

lat

60 70 80 90 100 110

2530

3540

45

180−lon

lat

60 70 80 90 100 110

2530

3540

45

180−lon

lat

Quantiles: 0.99 (top), 0.5 (middle), 0.01 (bottom).

Top vs. median: SE and W mostly same; SW and NE warmer in top.

Bottom vs. median: SW and East of Mississippi cooler in bottom; No gradient

across Mexican border. (Acknowledment: Scott Sandgathe)

21

Page 22: Quantile Regression - University of Washington

Conclusions I

“Ordinary” regression, linear or not, provides only a single summary measure

for the conditional distribution of response (predictand) - the mean - given the

predictor.

Quantile regression provides a more complete picture of the conditional distri-

bution.

The idea naturally lends itself to all the generalizations that have followed

ordinary regression.

Many problems would actually benefit more from the prediction of extreme

values (rather than the mean), anyway.

Even when the expected value is of interest, QR assays uncertainy via predic-

tion intervals.

22

Page 23: Quantile Regression - University of Washington

Juvenile Conclusions

The number of tornadoes is on the rise.

Max grows faster than min, the latter being mostly constant.

The whole shape of the conditional distribution of the number of tornadoes

today varies as a function of the number of tornadoes yesterday.

Global temperature is on the rise.

The variation (between years) is dropping.

Max water level is positively associated with max change in water level.

Beware of overfitting.

Water level is negatively associated with water temperature.

The shape of the conditional distribution of the former varies with the latter.

Increased prcp leads to increased stream flow (for one station).

But not for the lowest quantiles of stream flow.

The strong temperature gradient across the Mexican border exists only in the

median, and not in the tails of the distribution.

23