Forecasting Part 1perfeval.epfl.ch/printMe/forecastPost.pdf · Forecasting = finding conditional...
Transcript of Forecasting Part 1perfeval.epfl.ch/printMe/forecastPost.pdf · Forecasting = finding conditional...
ForecastingPart 1
JYLeBoudec
1March2015
Contents
1. Whatisforecasting?2. LinearRegression
3. EstimationerrorvsPredictioninterval4. AvoidingOverfitting
5. UseofBootstrap
2
1. What is forecasting ?
Assumeyouhavebeenabletodefinethenature oftheloadforyourstudyItremainstohaveanideaaboutitsintensity
Itisimpossibletoforecastwithouterror
ThegoodengineershouldForecastwhatcanbeforecastGiveuncertainty intervals
Therestisoutsideourcontrol
3
Forecasting = finding conditional distribution of future given past
AssumeweobservesomedataWehaveobserved andwanttoforecast ℓAfullforecastistheconditionaldistributionof ℓ given
Apointforecastis(e.g.)themean,i.e. ℓ(orthemedian)
Apredictioninterval atlevel95%issuchthatℓ
4
2. Use of Regression Models
Simple,oftenusedBasedonamodelfittedoverthepast,assumedtoholdinthefuture
5
6
Prediction
Wehaveobtainedthemodel
with
Theconditionaldistributionof ℓ given is
ℓ
ℓ with ℓ
because ℓ isindependentof (iid assumption)
7
8
Virus Growth Data
Wehaveobtainedthemodel
with , 6.2205
A95%‐predictionintervalisℓ
where isthe97.5%quantileoftheLaplace( )distribution;Innaturalscale:Pointprediction:
ℓ
95%‐predictioninterval: ℓ ℓ
9
10
Naturalscale
Logscale
6.2205
Prediction interval at time 25
PI = [19942 ; 52248]
Say what is true, for this model
A. Thewidthofpredictionintervalisconstantandequalto2 1.96
B. Aistrueand istherootmeansquareoftheresidualsuptotime
C. Aistrueand istherootmeansquareoftheforecasterrorsifweapplythemodeluptotime
D. BandCE. NoneoftheaboveF. Idon’tknow
11
The w
idth of p
redicti
on i..
.
A is tru
e and
$$ is th
e r...
A is tru
e and
$$ is th
e r...
B an
d C
None
of th
e abo
veI d
on’t kn
ow
60%
0% 0%0%
40%
0%
Solution
The95%‐predictionintervalisThemodelisfittedwithleastsquares,therefore istherootmeansquaresofresiduals(Thm 3.1)
Notethattheresidualsareequaltotheforecasterrors:
AnswerD.
12
Forecast ℓ ℓ =residuals
Say what is true, for this model
A. Inlogscalethewidthofpredictionintervalsisconstantandisequaltothe97.5%‐quantileofLaplace
B. Aistrueand isthemeansquareoflog‐scaleresiduals
C. Aistrueand isthemeanoftheabsolutevalueoflog‐scaleresiduals
D. NoneoftheaboveE. Idon’tknow
13
In log
scale
the w
idth o
f...
A is t
rue a
nd 1/
$$ is
the..
.
A is t
rue a
nd 1/
$$ is
the..
.No
ne of
the a
bove
I don
’t kno
w
20%
0%0%
53%
27%
Solution
AistruebecausethemodelinwhichwebelieveassumesLaplacenoise;further, isthemeanoftheabsolutevalueofresiduals(Thm 3.2).AnswerC
Notethattheresidualsarealsotheforecasterrors(inlog‐scale).
Notethatinnaturalscale,thepredictionintervalisnotconstant(andnotsymmetric).
14
What is the 97.5% quantile of the Laplace ( ) distribution ?
.
.
.
.
.
G. Idon’tknow
15
1.96
$$+1
3.
00 $$
$$+
2 1.
96 $$
1+$
$ 3.
00 $$
1.
96 $
$ 2
1.96 $
$I d
on’t k
now
0%
6%
0%
47%
24%
12%12%
Solution
isascaleparameteroftheLaplacedistribution,hencethe
quantileshouldscalelike
(hint:tosimulateLaplacenoise,withproba ½youdo
andwithproba 1/youdo )
AllanswersexceptDarethusimpossible.AnswerD
16
Solution
FromtheCDFofLaplaceweobtain whichgives.
Notethatthe95%‐predictionintervalforLaplacenoiseiswhere isthe97.5%‐quantile,becausethepdfis
symmetric.Wecanalsoobtain bycomputingthe95%‐quantileoftheabsolutevalueofLaplacenoise,whichisanexponentialRV,i.e.solvefor
Thus .
17
3. How about the estimation error ?Inpracticeweestimatethemodelparameter fromWhencomputingtheforecast,wepretend isknown,andthusmakeanestimationerror(ie weignoreconfidenceintervalson – itishopedthattheestimationerrorismuchlessthanthepredictioninterval).Letusreturntoanexamplewealreadysaw. Assumeweobserve andwanttoforecast .Assumethatwebelieveinthemodel .Weestimateandobtain .Pointpredictionfor ifweignoreestimationuncertainty:;ifweaccountforestimationuncertainty,
95%‐predictionintervalfor ifweignoreestimationuncertainty:
18
Thm 2.6saysthat(for anexactintervalthataccountsforestimationuncertaintyis– compareto
Theestimationerrordecaysin andissmallforlarge
19
Exact Formulas exist for Linear Regression with LS
20
21
Take‐Home Message
WhenweuseafittedmodelthereissomeuncertaintythataddstothepredictionintervalsInmostcaseswecanignorethemodeluncertaintybecauseitimpactsthepredictionintervalsonlymarginallyInsomerarecases(e.g.linearregressionwithgaussian errors)thereareexactformulas
22
4. The Overfitting ProblemAssumewewanttoimproveourmodelbyaddingmoreparameters:addapolynomialterm+moreharmonics
23
0, 1 10, 3
Prediction for the better model
24
Thisistheoverfitting problem:abetterfitisnotthebestpredictor– intheextremecase,amodelcanfitexactlythedataandisunabletomodelit
How to avoid overfittingMethod1:useoftestdataMethod2:informationcriterion
25
Method 2: Information Criteria
Wesawthatthelikelihoodcanbeusedtodefineascorefunctionforthemodelfittingphasee.g foraLSmodel,Toavoidoverfitting,addapenaltytermtothescore
26
27
Best Model for Internet Data, d=1, h up to 10
28
Information criterions are able to identify the best model
Best Model for Internet Data, h=3, d up to 10
29
Information criterions are not able to identify the best model; the polynomial models are not a good class of models
Say what is true
A. Whendoingthefitandifweuseaninformationcriterion,wecanusealldataavailableuptotime
B. Whendoingthefitandifweuseascore+testdatawecanusealldataavailableuptotime
C. AandBD. NoneE. Idon’tknow
30
Whe
n doing
the f
it an
d if...
Whe
n doing
the f
it an
d if...
A an
d B
None
I d
on’t kn
ow
75%
6%6%13%
0%
Solution
AistrueBisnottrue:ifweusetestdataweneedtokeepasubsetofthedatafortestingthepredictionaccuracy.Weshouldnotusethissubsetofdataforfittingthemodel,otherwisethepredictionperformanceisnotproperlyassessed.AnswerA
31
5. Use of Bootstrap
AssumewehaveapredictionmodelTheestimationof isdoneassumingsomedistributionfor ;Assumethisdistributionisonlyapproximatelyknown;wecanimprovethepredictionintervalsifweuseabetterapproximationofthisdistribution.Forexample,wecanusetheprincipleoftheBoostrap,i.e.estimatethedistributionof byitsempiricaldistribution.
32
Assume andapplytheorem2.5toℓ
Thisgivesthealgorithm:1.Estimate bysomemethod2.Estimateresiduals3.(Thm 2.5)
4.Predictionintervalfor ℓ ℓ ℓ
33
ExampleForthisexample,thebootstrap(doneinlogscale)givesasymmetricpredictioninterval
34
bootstrap
AssumingLaplacenoise
Example
Forthisexample,thebootstrapgivesslightlysmallerintervalsthantheonesbasedongaussian noise
35
Assuminggaussian noise
bootstrap