Forecasting Part 1perfeval.epfl.ch/printMe/forecastPost.pdf · Forecasting = finding conditional...

ForecastingPart 1

JYLeBoudec

1March2015

Contents

1. Whatisforecasting?2. LinearRegression

3. EstimationerrorvsPredictioninterval4. AvoidingOverfitting

5. UseofBootstrap

2

1. What is forecasting ?

Assumeyouhavebeenabletodefinethenature oftheloadforyourstudyItremainstohaveanideaaboutitsintensity

Itisimpossibletoforecastwithouterror

ThegoodengineershouldForecastwhatcanbeforecastGiveuncertainty intervals

Therestisoutsideourcontrol

3

Forecasting = finding conditional distribution of future given past

AssumeweobservesomedataWehaveobserved andwanttoforecast ℓAfullforecastistheconditionaldistributionof ℓ given

Apointforecastis(e.g.)themean,i.e. ℓ(orthemedian)

Apredictioninterval atlevel95%issuchthatℓ

4

2. Use of Regression Models

Simple,oftenusedBasedonamodelfittedoverthepast,assumedtoholdinthefuture

5

Prediction

Wehaveobtainedthemodel

with

Theconditionaldistributionof ℓ given is

ℓ

ℓ with ℓ

because ℓ isindependentof (iid assumption)

7

Virus Growth Data

Wehaveobtainedthemodel

with , 6.2205

A95%‐predictionintervalisℓ

where isthe97.5%quantileoftheLaplace( )distribution;Innaturalscale:Pointprediction:

ℓ

95%‐predictioninterval: ℓ ℓ

9

10

Naturalscale

Logscale

6.2205

Prediction interval at time 25

PI = [19942 ; 52248]

Say what is true, for this model

A. Thewidthofpredictionintervalisconstantandequalto2 1.96

B. Aistrueand istherootmeansquareoftheresidualsuptotime

C. Aistrueand istherootmeansquareoftheforecasterrorsifweapplythemodeluptotime

D. BandCE. NoneoftheaboveF. Idon’tknow

11

The w

idth of p

redicti

on i..

.

A is tru

e and

$$ is th

e r...

A is tru

e and

$$ is th

e r...

B an

d C

None

of th

e abo

veI d

on’t kn

ow

60%

0% 0%0%

40%

0%

Solution

The95%‐predictionintervalisThemodelisfittedwithleastsquares,therefore istherootmeansquaresofresiduals(Thm 3.1)

Notethattheresidualsareequaltotheforecasterrors:

AnswerD.

12

Forecast ℓ ℓ =residuals

Say what is true, for this model

A. Inlogscalethewidthofpredictionintervalsisconstantandisequaltothe97.5%‐quantileofLaplace

B. Aistrueand isthemeansquareoflog‐scaleresiduals

C. Aistrueand isthemeanoftheabsolutevalueoflog‐scaleresiduals

D. NoneoftheaboveE. Idon’tknow

13

In log

scale

the w

idth o

f...

A is t

rue a

nd 1/

$$ is

the..

.

A is t

rue a

nd 1/

$$ is

the..

.No

ne of

the a

bove

I don

’t kno

w

20%

0%0%

53%

27%

Solution

AistruebecausethemodelinwhichwebelieveassumesLaplacenoise;further, isthemeanoftheabsolutevalueofresiduals(Thm 3.2).AnswerC

Notethattheresidualsarealsotheforecasterrors(inlog‐scale).

Notethatinnaturalscale,thepredictionintervalisnotconstant(andnotsymmetric).

14

What is the 97.5% quantile of the Laplace ( ) distribution ?

.

.

.

.

.

G. Idon’tknow

15

1.96

$$+1

3.

00 $$

$$+

2 1.

96 $$

1+$

$ 3.

00 $$

1.

96 $

$ 2

1.96 $

$I d

on’t k

now

0%

6%

0%

47%

24%

12%12%

Solution

isascaleparameteroftheLaplacedistribution,hencethe

quantileshouldscalelike

(hint:tosimulateLaplacenoise,withproba ½youdo

andwithproba 1/youdo )

AllanswersexceptDarethusimpossible.AnswerD

16

Solution

FromtheCDFofLaplaceweobtain whichgives.

Notethatthe95%‐predictionintervalforLaplacenoiseiswhere isthe97.5%‐quantile,becausethepdfis

symmetric.Wecanalsoobtain bycomputingthe95%‐quantileoftheabsolutevalueofLaplacenoise,whichisanexponentialRV,i.e.solvefor

Thus .

17

3. How about the estimation error ?Inpracticeweestimatethemodelparameter fromWhencomputingtheforecast,wepretend isknown,andthusmakeanestimationerror(ie weignoreconfidenceintervalson – itishopedthattheestimationerrorismuchlessthanthepredictioninterval).Letusreturntoanexamplewealreadysaw. Assumeweobserve andwanttoforecast .Assumethatwebelieveinthemodel .Weestimateandobtain .Pointpredictionfor ifweignoreestimationuncertainty:;ifweaccountforestimationuncertainty,

95%‐predictionintervalfor ifweignoreestimationuncertainty:

18

Thm 2.6saysthat(for anexactintervalthataccountsforestimationuncertaintyis– compareto

Theestimationerrordecaysin andissmallforlarge

19

Exact Formulas exist for Linear Regression with LS

20

Take‐Home Message

WhenweuseafittedmodelthereissomeuncertaintythataddstothepredictionintervalsInmostcaseswecanignorethemodeluncertaintybecauseitimpactsthepredictionintervalsonlymarginallyInsomerarecases(e.g.linearregressionwithgaussian errors)thereareexactformulas

22

4. The Overfitting ProblemAssumewewanttoimproveourmodelbyaddingmoreparameters:addapolynomialterm+moreharmonics

23

0, 1 10, 3

Prediction for the better model

24

Thisistheoverfitting problem:abetterfitisnotthebestpredictor– intheextremecase,amodelcanfitexactlythedataandisunabletomodelit

How to avoid overfittingMethod1:useoftestdataMethod2:informationcriterion

25

Method 2: Information Criteria

Wesawthatthelikelihoodcanbeusedtodefineascorefunctionforthemodelfittingphasee.g foraLSmodel,Toavoidoverfitting,addapenaltytermtothescore

26

Best Model for Internet Data, d=1, h up to 10

28

Information criterions are able to identify the best model

Best Model for Internet Data, h=3, d up to 10

29

Information criterions are not able to identify the best model; the polynomial models are not a good class of models

Say what is true

A. Whendoingthefitandifweuseaninformationcriterion,wecanusealldataavailableuptotime

B. Whendoingthefitandifweuseascore+testdatawecanusealldataavailableuptotime

C. AandBD. NoneE. Idon’tknow

30

Whe

n doing

the f

it an

d if...

Whe

n doing

the f

it an

d if...

A an

d B

None

I d

on’t kn

ow

75%

6%6%13%

0%

Solution

AistrueBisnottrue:ifweusetestdataweneedtokeepasubsetofthedatafortestingthepredictionaccuracy.Weshouldnotusethissubsetofdataforfittingthemodel,otherwisethepredictionperformanceisnotproperlyassessed.AnswerA

31

5. Use of Bootstrap

AssumewehaveapredictionmodelTheestimationof isdoneassumingsomedistributionfor ;Assumethisdistributionisonlyapproximatelyknown;wecanimprovethepredictionintervalsifweuseabetterapproximationofthisdistribution.Forexample,wecanusetheprincipleoftheBoostrap,i.e.estimatethedistributionof byitsempiricaldistribution.

32

Assume andapplytheorem2.5toℓ

Thisgivesthealgorithm:1.Estimate bysomemethod2.Estimateresiduals3.(Thm 2.5)

4.Predictionintervalfor ℓ ℓ ℓ

33

ExampleForthisexample,thebootstrap(doneinlogscale)givesasymmetricpredictioninterval

34

bootstrap

AssumingLaplacenoise

Example

Forthisexample,thebootstrapgivesslightlysmallerintervalsthantheonesbasedongaussian noise

35

Assuminggaussian noise

bootstrap

Forecasting Part 1perfeval.epfl.ch/printMe/forecastPost.pdf · Forecasting = finding conditional...

Documents

Transcript of Forecasting Part 1perfeval.epfl.ch/printMe/forecastPost.pdf · Forecasting = finding conditional...