Forecasting Hierarchical Time Series

174
Total A AA AB AC B BA BB BC C CA CB CC 1 Rob J Hyndman Forecasting hierarchical time series

description

Talk given at the University of Sydney

Transcript of Forecasting Hierarchical Time Series

Page 1: Forecasting Hierarchical Time Series

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

1

Rob J Hyndman

Forecastinghierarchical time series

Page 2: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series Hierarchical time series 2

Page 3: Forecasting Hierarchical Time Series

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchiesNet labour turnoverPharmaceutical salesTourism demand by region and purpose

Forecasting hierarchical time series Hierarchical time series 3

Page 4: Forecasting Hierarchical Time Series

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchiesNet labour turnoverPharmaceutical salesTourism demand by region and purpose

Forecasting hierarchical time series Hierarchical time series 3

Page 5: Forecasting Hierarchical Time Series

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchiesNet labour turnoverPharmaceutical salesTourism demand by region and purpose

Forecasting hierarchical time series Hierarchical time series 3

Page 6: Forecasting Hierarchical Time Series

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchiesNet labour turnoverPharmaceutical salesTourism demand by region and purpose

Forecasting hierarchical time series Hierarchical time series 3

Page 7: Forecasting Hierarchical Time Series

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchiesNet labour turnoverPharmaceutical salesTourism demand by region and purpose

Forecasting hierarchical time series Hierarchical time series 3

Page 8: Forecasting Hierarchical Time Series

Forecasting the PBS

Forecasting hierarchical time series Hierarchical time series 4

Page 9: Forecasting Hierarchical Time Series

ATC drug classificationA Alimentary tract and metabolismB Blood and blood forming organsC Cardiovascular systemD DermatologicalsG Genito-urinary system and sex hormonesH Systemic hormonal preparations, excluding sex hor-

mones and insulinsJ Anti-infectives for systemic useL Antineoplastic and immunomodulating agentsM Musculo-skeletal systemN Nervous systemP Antiparasitic products, insecticides and repellentsR Respiratory systemS Sensory organsV Various

Forecasting hierarchical time series Hierarchical time series 5

Page 10: Forecasting Hierarchical Time Series

ATC drug classification

A Alimentary tract and metabolism14 classes

A10 Drugs used in diabetes84 classes

A10B Blood glucose lowering drugs

A10BA Biguanides

A10BA02 Metformin

Forecasting hierarchical time series Hierarchical time series 6

Page 11: Forecasting Hierarchical Time Series

Australian tourism

Forecasting hierarchical time series Hierarchical time series 7

Page 12: Forecasting Hierarchical Time Series

Australian tourism

Forecasting hierarchical time series Hierarchical time series 7

Also split by purpose of travel:

Holiday

Visits to friends and relatives

Business

Other

Page 13: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organized ina hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: Australian tourism demand is grouped byregion and purpose of travel.

Forecasting hierarchical time series Hierarchical time series 8

Page 14: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organized ina hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: Australian tourism demand is grouped byregion and purpose of travel.

Forecasting hierarchical time series Hierarchical time series 8

Page 15: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organized ina hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: Australian tourism demand is grouped byregion and purpose of travel.

Forecasting hierarchical time series Hierarchical time series 8

Page 16: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organized ina hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: Australian tourism demand is grouped byregion and purpose of travel.

Forecasting hierarchical time series Hierarchical time series 8

Page 17: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 18: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 19: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 20: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 21: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 22: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 23: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 24: Forecasting Hierarchical Time Series

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

There is no research on how to deal withforecasting grouped time series.

Forecasting hierarchical time series Hierarchical time series 9

Page 25: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 26: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 27: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 28: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 29: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 30: Forecasting Hierarchical Time Series

Top-down method

Forecasting hierarchical time series Hierarchical time series 10

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Page 31: Forecasting Hierarchical Time Series

Bottom-up method

Forecasting hierarchical time series Hierarchical time series 11

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

No predictionintervals

Page 32: Forecasting Hierarchical Time Series

Bottom-up method

Forecasting hierarchical time series Hierarchical time series 11

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

No predictionintervals

Page 33: Forecasting Hierarchical Time Series

Bottom-up method

Forecasting hierarchical time series Hierarchical time series 11

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

No predictionintervals

Page 34: Forecasting Hierarchical Time Series

Bottom-up method

Forecasting hierarchical time series Hierarchical time series 11

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

No predictionintervals

Page 35: Forecasting Hierarchical Time Series

Bottom-up method

Forecasting hierarchical time series Hierarchical time series 11

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

No predictionintervals

Page 36: Forecasting Hierarchical Time Series

A new approach

We propose a new statistical framework forforecasting hierarchical time series which:

1 provides point forecasts that areconsistent across the hierarchy;

2 allows for correlations and interactionbetween series at each level;

3 provides estimates of forecast uncertaintywhich are consistent across the hierarchy;

4 allows for ad hoc adjustments andinclusion of covariates at any level.

Forecasting hierarchical time series Hierarchical time series 12

Page 37: Forecasting Hierarchical Time Series

A new approach

We propose a new statistical framework forforecasting hierarchical time series which:

1 provides point forecasts that areconsistent across the hierarchy;

2 allows for correlations and interactionbetween series at each level;

3 provides estimates of forecast uncertaintywhich are consistent across the hierarchy;

4 allows for ad hoc adjustments andinclusion of covariates at any level.

Forecasting hierarchical time series Hierarchical time series 12

Page 38: Forecasting Hierarchical Time Series

A new approach

We propose a new statistical framework forforecasting hierarchical time series which:

1 provides point forecasts that areconsistent across the hierarchy;

2 allows for correlations and interactionbetween series at each level;

3 provides estimates of forecast uncertaintywhich are consistent across the hierarchy;

4 allows for ad hoc adjustments andinclusion of covariates at any level.

Forecasting hierarchical time series Hierarchical time series 12

Page 39: Forecasting Hierarchical Time Series

A new approach

We propose a new statistical framework forforecasting hierarchical time series which:

1 provides point forecasts that areconsistent across the hierarchy;

2 allows for correlations and interactionbetween series at each level;

3 provides estimates of forecast uncertaintywhich are consistent across the hierarchy;

4 allows for ad hoc adjustments andinclusion of covariates at any level.

Forecasting hierarchical time series Hierarchical time series 12

Page 40: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 41: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 42: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

YA,tYB,tYC,t

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 43: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 44: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 45: Forecasting Hierarchical Time Series

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

︸ ︷︷ ︸

BtY t = SBt

Forecasting hierarchical time series Hierarchical time series 13

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Page 46: Forecasting Hierarchical Time Series

Hierarchical dataTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

Y t =

YtYA,tYB,tYC,tYAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 14

Page 47: Forecasting Hierarchical Time Series

Hierarchical dataTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

Y t =

YtYA,tYB,tYC,tYAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 14

Page 48: Forecasting Hierarchical Time Series

Hierarchical dataTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

Y t =

YtYA,tYB,tYC,tYAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 14

Y t = SBt

Page 49: Forecasting Hierarchical Time Series

Grouped dataTotal

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 15

Page 50: Forecasting Hierarchical Time Series

Grouped dataTotal

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 15

Page 51: Forecasting Hierarchical Time Series

Grouped dataTotal

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Forecasting hierarchical time series Hierarchical time series 15

Y t = SBt

Page 52: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series Forecasting framework 16

Page 53: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 54: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 55: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 56: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 57: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 58: Forecasting Hierarchical Time Series

Forecasting notation

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.(They may not add up.)

Hierarchical forecasting methods of the form:Yn(h) = SPYn(h)

for some matrix P.

P extracts and combines base forecastsYn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: Yn(h).

Forecasting hierarchical time series Forecasting framework 17

Page 59: Forecasting Hierarchical Time Series

Bottom-up forecasts

Yn(h) = SPYn(h)

Bottom-up forecasts are obtained using

P = [0 | I] ,

where 0 is null matrix and I is identity matrix.

P matrix extracts only bottom-levelforecasts from Yn(h)

S adds them up to give the bottom-upforecasts.

Forecasting hierarchical time series Forecasting framework 18

Page 60: Forecasting Hierarchical Time Series

Bottom-up forecasts

Yn(h) = SPYn(h)

Bottom-up forecasts are obtained using

P = [0 | I] ,

where 0 is null matrix and I is identity matrix.

P matrix extracts only bottom-levelforecasts from Yn(h)

S adds them up to give the bottom-upforecasts.

Forecasting hierarchical time series Forecasting framework 18

Page 61: Forecasting Hierarchical Time Series

Bottom-up forecasts

Yn(h) = SPYn(h)

Bottom-up forecasts are obtained using

P = [0 | I] ,

where 0 is null matrix and I is identity matrix.

P matrix extracts only bottom-levelforecasts from Yn(h)

S adds them up to give the bottom-upforecasts.

Forecasting hierarchical time series Forecasting framework 18

Page 62: Forecasting Hierarchical Time Series

Top-down forecasts

Yn(h) = SPYn(h)

Top-down forecasts are obtained using

P = [p | 0]

where p = [p1, p2, . . . , pmK]′ is a vector of

proportions that sum to one.

P distributes forecasts of the aggregate tothe lowest level series.

Different methods of top-down forecastinglead to different proportionality vectors p.

Forecasting hierarchical time series Forecasting framework 19

Page 63: Forecasting Hierarchical Time Series

Top-down forecasts

Yn(h) = SPYn(h)

Top-down forecasts are obtained using

P = [p | 0]

where p = [p1, p2, . . . , pmK]′ is a vector of

proportions that sum to one.

P distributes forecasts of the aggregate tothe lowest level series.

Different methods of top-down forecastinglead to different proportionality vectors p.

Forecasting hierarchical time series Forecasting framework 19

Page 64: Forecasting Hierarchical Time Series

Top-down forecasts

Yn(h) = SPYn(h)

Top-down forecasts are obtained using

P = [p | 0]

where p = [p1, p2, . . . , pmK]′ is a vector of

proportions that sum to one.

P distributes forecasts of the aggregate tothe lowest level series.

Different methods of top-down forecastinglead to different proportionality vectors p.

Forecasting hierarchical time series Forecasting framework 19

Page 65: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 66: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 67: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 68: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 69: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 70: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 71: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 72: Forecasting Hierarchical Time Series

General properties: bias

Yn(h) = SPYn(h)

Assume: base forecasts Yn(h) are unbiased:E[Yn(h)|Y1, . . . ,Yn] = E[Yn+h|Y1, . . . ,Yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|Y1, . . . ,Yn].Then E[Yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[Yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

Forecasting hierarchical time series Forecasting framework 20

Page 73: Forecasting Hierarchical Time Series

General properties: variance

Yn(h) = SPYn(h)

Let variance of base forecasts Yn(h) be givenby

Σh = V[Yn(h)|Y1, . . . ,Yn]

Then the variance of the revised forecasts isgiven by

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′.

This is a general result for all existing methods.Forecasting hierarchical time series Forecasting framework 21

Page 74: Forecasting Hierarchical Time Series

General properties: variance

Yn(h) = SPYn(h)

Let variance of base forecasts Yn(h) be givenby

Σh = V[Yn(h)|Y1, . . . ,Yn]

Then the variance of the revised forecasts isgiven by

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′.

This is a general result for all existing methods.Forecasting hierarchical time series Forecasting framework 21

Page 75: Forecasting Hierarchical Time Series

General properties: variance

Yn(h) = SPYn(h)

Let variance of base forecasts Yn(h) be givenby

Σh = V[Yn(h)|Y1, . . . ,Yn]

Then the variance of the revised forecasts isgiven by

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′.

This is a general result for all existing methods.Forecasting hierarchical time series Forecasting framework 21

Page 76: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series Optimal forecasts 22

Page 77: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 78: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 79: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 80: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 81: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 82: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 83: Forecasting Hierarchical Time Series

Forecasts

Key idea: forecast reconciliationå Ignore structural constraints and forecast

every series of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as Y t.

Y t = SBt . So Yn(h) = Sβn(h) + εh .

βn(h) = E[Bn+h | Y1, . . . ,Yn].εh has zero mean and covariance Σh.Estimate βn(h) using GLS?

Forecasting hierarchical time series Optimal forecasts 23

Page 84: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 85: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 86: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 87: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 88: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 89: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 90: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 91: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 92: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = Sβn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Initial forecasts

Σ†h is generalized inverse of Σh.

Optimal P = (S′Σ†hS)−1S′Σ†h

Revised forecasts unbiased: SPS = S.Revised forecasts minimum variance:

V[Yn(h)|Y1, . . . ,Yn] = SPΣhP′S′

= S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.Forecasting hierarchical time series Optimal forecasts 24

Page 93: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series Approximately optimal forecasts 25

Page 94: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 95: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 96: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 97: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 98: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 99: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′Σ†hS)−1S′Σ†hYn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = V(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

Yn(h) = S(S′S)−1S′Yn(h)Forecasting hierarchical time series Approximately optimal forecasts 26

Page 100: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′S)−1S′Yn(h)

GLS = OLS.

Optimal weighted average of initialforecasts.

Optimal reconciliation weights areS(S′S)−1S′.

Weights are independent of the data andof the covariance structure of thehierarchy!

Forecasting hierarchical time series Approximately optimal forecasts 27

Page 101: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′S)−1S′Yn(h)

GLS = OLS.

Optimal weighted average of initialforecasts.

Optimal reconciliation weights areS(S′S)−1S′.

Weights are independent of the data andof the covariance structure of thehierarchy!

Forecasting hierarchical time series Approximately optimal forecasts 27

Page 102: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′S)−1S′Yn(h)

GLS = OLS.

Optimal weighted average of initialforecasts.

Optimal reconciliation weights areS(S′S)−1S′.

Weights are independent of the data andof the covariance structure of thehierarchy!

Forecasting hierarchical time series Approximately optimal forecasts 27

Page 103: Forecasting Hierarchical Time Series

Optimal combination forecasts

Yn(h) = S(S′S)−1S′Yn(h)

GLS = OLS.

Optimal weighted average of initialforecasts.

Optimal reconciliation weights areS(S′S)−1S′.

Weights are independent of the data andof the covariance structure of thehierarchy!

Forecasting hierarchical time series Approximately optimal forecasts 27

Page 104: Forecasting Hierarchical Time Series

Optimal combination forecasts

Forecasting hierarchical time series Approximately optimal forecasts 28

Yn(h) = S(S′S)−1S′Yn(h)Total

A B C

Page 105: Forecasting Hierarchical Time Series

Optimal combination forecasts

Forecasting hierarchical time series Approximately optimal forecasts 28

Yn(h) = S(S′S)−1S′Yn(h)Total

A B C

Weights:

S(S′S)−1S′ =

0.75 0.25 0.25 0.250.25 0.75 −0.25 −0.250.25 −0.25 0.75 −0.250.25 −0.25 −0.25 0.75

Page 106: Forecasting Hierarchical Time Series

Optimal combination forecasts

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Weights: S(S′S)−1S′ =

0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.080.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.060.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.060.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.190.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.020.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73

Forecasting hierarchical time series Approximately optimal forecasts 29

Page 107: Forecasting Hierarchical Time Series

Optimal combination forecasts

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Weights: S(S′S)−1S′ =

0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.080.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.060.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.060.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.190.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.020.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73

Forecasting hierarchical time series Approximately optimal forecasts 29

Page 108: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 109: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 110: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 111: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 112: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 113: Forecasting Hierarchical Time Series

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up andtop-down, especially for middle levels.

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

Conceptually easy to implement: OLS onbase forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 30

Page 114: Forecasting Hierarchical Time Series

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′S).Need to estimate covariance matrix toproduce prediction intervals.Assumption might be unrealistic.Ignores covariance matrix in computingpoint forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 31

Yn(h) = S(S′S)−1S′Yn(h)

Page 115: Forecasting Hierarchical Time Series

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′S).Need to estimate covariance matrix toproduce prediction intervals.Assumption might be unrealistic.Ignores covariance matrix in computingpoint forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 31

Yn(h) = S(S′S)−1S′Yn(h)

Page 116: Forecasting Hierarchical Time Series

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′S).Need to estimate covariance matrix toproduce prediction intervals.Assumption might be unrealistic.Ignores covariance matrix in computingpoint forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 31

Yn(h) = S(S′S)−1S′Yn(h)

Page 117: Forecasting Hierarchical Time Series

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′S).Need to estimate covariance matrix toproduce prediction intervals.Assumption might be unrealistic.Ignores covariance matrix in computingpoint forecasts.

Forecasting hierarchical time series Approximately optimal forecasts 31

Yn(h) = S(S′S)−1S′Yn(h)

Page 118: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 2: RescalingSuppose we rescale the original forecastsby Λ, reconcile using OLS, and backscale:

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h).

If Λ =(Σ†h)1/2

, we get the GLS solution.

Approximately optimal solution:

Λ = diagonal(Σ†1)1/2

That is, Λ contains inverse one-stepforecast standard deviations.

Forecasting hierarchical time series Approximately optimal forecasts 32

Yn(h) = S(S′S)−1S′Yn(h)

Page 119: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 2: RescalingSuppose we rescale the original forecastsby Λ, reconcile using OLS, and backscale:

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h).

If Λ =(Σ†h)1/2

, we get the GLS solution.

Approximately optimal solution:

Λ = diagonal(Σ†1)1/2

That is, Λ contains inverse one-stepforecast standard deviations.

Forecasting hierarchical time series Approximately optimal forecasts 32

Yn(h) = S(S′S)−1S′Yn(h)

Page 120: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 2: RescalingSuppose we rescale the original forecastsby Λ, reconcile using OLS, and backscale:

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h).

If Λ =(Σ†h)1/2

, we get the GLS solution.

Approximately optimal solution:

Λ = diagonal(Σ†1)1/2

That is, Λ contains inverse one-stepforecast standard deviations.

Forecasting hierarchical time series Approximately optimal forecasts 32

Yn(h) = S(S′S)−1S′Yn(h)

Page 121: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 2: RescalingSuppose we rescale the original forecastsby Λ, reconcile using OLS, and backscale:

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h).

If Λ =(Σ†h)1/2

, we get the GLS solution.

Approximately optimal solution:

Λ = diagonal(Σ†1)1/2

That is, Λ contains inverse one-stepforecast standard deviations.

Forecasting hierarchical time series Approximately optimal forecasts 32

Yn(h) = S(S′S)−1S′Yn(h)

Page 122: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 3: AveragingIf the bottom level error series areapproximately uncorrelated and havesimilar variances, then Λ2 is inverselyproportional to the number of seriesmaking up each element of Y.

So set Λ2 to be the inverse row sums of S.

Forecasting hierarchical time series Approximately optimal forecasts 33

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h)

Page 123: Forecasting Hierarchical Time Series

Optimal combination forecasts

Solution 3: AveragingIf the bottom level error series areapproximately uncorrelated and havesimilar variances, then Λ2 is inverselyproportional to the number of seriesmaking up each element of Y.

So set Λ2 to be the inverse row sums of S.

Forecasting hierarchical time series Approximately optimal forecasts 33

Y∗n(h) = S(S′Λ2S)−1S′Λ2Yn(h)

Page 124: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series Application to Australian tourism 34

Page 125: Forecasting Hierarchical Time Series

Application to Australian tourism

Forecasting hierarchical time series Application to Australian tourism 35

Page 126: Forecasting Hierarchical Time Series

Application to Australian tourism

Forecasting hierarchical time series Application to Australian tourism 35

Quarterly data on visitor nightsDomestic visitor nightsfrom 1998 – 2006Data from: National Visitor Survey,based on annual interviews of 120,000Australians aged 15+, collected byTourism Research Australia.

Page 127: Forecasting Hierarchical Time Series

Application to Australian tourism

Forecasting hierarchical time series Application to Australian tourism 35

Also split by purpose of travel:

Holiday

Visits to friends and relatives

Business

Other

Page 128: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

Forecasting hierarchical time series Application to Australian tourism 36

Page 129: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothing

Forecasting hierarchical time series Application to Australian tourism 36

Page 130: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear method

Forecasting hierarchical time series Application to Australian tourism 36

Page 131: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method

Forecasting hierarchical time series Application to Australian tourism 36

Page 132: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method

Forecasting hierarchical time series Application to Australian tourism 36

Page 133: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method

Forecasting hierarchical time series Application to Australian tourism 36

Page 134: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method

Forecasting hierarchical time series Application to Australian tourism 36

Page 135: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method

Forecasting hierarchical time series Application to Australian tourism 36

Page 136: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exponentialsmoothing methods.

Forecasting hierarchical time series Application to Australian tourism 36

Page 137: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exponentialsmoothing methods.Each can have an additive or multiplicativeerror, giving 30 separate models.

Forecasting hierarchical time series Application to Australian tourism 36

Page 138: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 139: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 140: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑

TrendExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 141: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑

Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 142: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing ↑

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 143: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing ↑

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Page 144: Forecasting Hierarchical Time Series

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing ↑

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errorsForecasting hierarchical time series Application to Australian tourism 37

Innovations state space models

å All ETS models can be written ininnovations state space form (IJF, 2002).

å Additive and multiplicative versions givethe same point forecasts but differentprediction intervals.

Page 145: Forecasting Hierarchical Time Series

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that areappropriate to the data. Optimizeparameters and initial values using MLE(or some other criterion).Select best method using AIC:

AIC = −2 log(Likelihood) + 2pwhere p = # parameters.Produce forecasts using best method.Obtain prediction intervals usingunderlying state space model.

Forecasting hierarchical time series Application to Australian tourism 38

Page 146: Forecasting Hierarchical Time Series

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that areappropriate to the data. Optimizeparameters and initial values using MLE(or some other criterion).Select best method using AIC:

AIC = −2 log(Likelihood) + 2pwhere p = # parameters.Produce forecasts using best method.Obtain prediction intervals usingunderlying state space model.

Forecasting hierarchical time series Application to Australian tourism 38

Page 147: Forecasting Hierarchical Time Series

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that areappropriate to the data. Optimizeparameters and initial values using MLE(or some other criterion).Select best method using AIC:

AIC = −2 log(Likelihood) + 2pwhere p = # parameters.Produce forecasts using best method.Obtain prediction intervals usingunderlying state space model.

Forecasting hierarchical time series Application to Australian tourism 38

Page 148: Forecasting Hierarchical Time Series

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that areappropriate to the data. Optimizeparameters and initial values using MLE(or some other criterion).Select best method using AIC:

AIC = −2 log(Likelihood) + 2pwhere p = # parameters.Produce forecasts using best method.Obtain prediction intervals usingunderlying state space model.

Forecasting hierarchical time series Application to Australian tourism 38

Page 149: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: Total

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

6000

065

000

7000

075

000

8000

085

000

Page 150: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: NSW

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

1800

022

000

2600

030

000

Page 151: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: VIC

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

1000

012

000

1400

016

000

1800

0

Page 152: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: Nth.Coast.NSW

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

5000

6000

7000

8000

9000

Page 153: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: Metro.QLD

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

8000

9000

1100

013

000

Page 154: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: Sth.WA

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

400

600

800

1000

1200

1400

Page 155: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: X201.Melbourne

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

4000

4500

5000

5500

6000

Page 156: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: X402.Murraylands

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

010

020

030

0

Page 157: Forecasting Hierarchical Time Series

Base forecasts

Forecasting hierarchical time series Application to Australian tourism 39

Domestic tourism forecasts: X809.Daly

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

020

4060

8010

0

Page 158: Forecasting Hierarchical Time Series

Hierarchy: states, zones, regions

Forecast Horizon (h)MAPE 1 2 4 6 8 Average

Top Level: Australia

Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06OLS 3.83 3.66 3.88 4.19 4.25 3.94Scaling 3.68 3.56 3.97 4.57 4.25 4.04Averaging 3.76 3.60 4.01 4.58 4.22 4.06

Level 1: States

Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03OLS 11.07 10.58 11.13 11.62 12.21 11.35Scaling 10.44 10.17 10.47 10.97 10.98 10.67Averaging 10.59 10.36 10.69 11.27 11.21 10.89

Based on a rolling forecast origin with at least 12 observations in thetraining set.

Forecasting hierarchical time series Application to Australian tourism 40

Page 159: Forecasting Hierarchical Time Series

Hierarchy: states, zones, regions

Forecast Horizon (h)MAPE 1 2 4 6 8 Average

Level 2: Zones

Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32OLS 15.16 15.06 15.27 15.74 16.15 15.48Scaling 14.63 14.62 14.68 15.17 15.25 14.94Averaging 14.79 14.79 14.85 15.46 15.49 15.14

Bottom Level: Regions

Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18OLS 35.89 33.86 34.26 36.06 37.49 35.43Scaling 31.68 31.22 31.08 32.41 32.77 31.89Averaging 32.84 32.20 32.06 33.44 34.04 32.96

Based on a rolling forecast origin with at least 12 observations in thetraining set.

Forecasting hierarchical time series Application to Australian tourism 41

Page 160: Forecasting Hierarchical Time Series

Groups: Purpose, states, capital

Forecast Horizon (h)MAPE 1 2 4 6 8 Average

Top Level: Australia

Bottom-up 3.48 3.30 4.04 4.56 4.58 4.03OLS 3.80 3.64 3.94 4.22 4.35 3.95Scaling 3.65 3.45 4.00 4.52 4.57 4.04Averaging 3.59 3.33 3.99 4.56 4.58 4.04

Level 1: Purpose of travel

Bottom-up 8.14 8.37 9.02 9.39 9.52 8.95OLS 7.94 7.91 8.66 8.66 9.29 8.54Scaling 7.99 8.10 8.59 9.09 9.43 8.71Averaging 8.04 8.21 8.79 9.25 9.44 8.82

Based on a rolling forecast origin with at least 12 observations in thetraining set.

Forecasting hierarchical time series Application to Australian tourism 42

Page 161: Forecasting Hierarchical Time Series

Groups: Purpose, states, capital

Forecast Horizon (h)MAPE 1 2 4 6 8 Average

Level 2: States

Bottom-up 21.34 21.75 22.39 23.26 23.31 22.58OLS 22.17 21.80 23.53 23.15 23.90 22.99Scaling 21.49 21.62 22.20 23.13 23.25 22.51Averaging 21.38 21.61 22.30 23.17 23.24 22.51

Bottom Level: Capital city versus other

Bottom-up 31.97 31.65 32.19 33.70 33.47 32.62OLS 32.31 30.92 32.41 33.35 34.13 32.55Scaling 32.12 31.36 32.18 33.36 33.43 32.52Averaging 31.92 31.39 32.04 33.51 33.39 32.49

Based on a rolling forecast origin with at least 12 observations in thetraining set.

Forecasting hierarchical time series Application to Australian tourism 43

Page 162: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series hts package for R 44

Page 163: Forecasting Hierarchical Time Series

hts package for R

Forecasting hierarchical time series hts package for R 45

hts: Hierarchical and grouped time seriesMethods for analysing and forecasting hierarchical and groupedtime series

Version: 3.01Depends: forecastImports: SparseMPublished: 2013-05-07Author: Rob J Hyndman, Roman A Ahmed, and Han Lin ShangMaintainer: Rob J Hyndman <Rob.Hyndman at monash.edu>License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Page 164: Forecasting Hierarchical Time Series

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

Forecasting hierarchical time series hts package for R 46

Page 165: Forecasting Hierarchical Time Series

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

Forecasting hierarchical time series hts package for R 46

Total

A

AX AY

B

BX BY

Page 166: Forecasting Hierarchical Time Series

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

# Forecast 10-step-ahead using optimal combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

Forecasting hierarchical time series hts package for R 47

Page 167: Forecasting Hierarchical Time Series

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

# Forecast 10-step-ahead using OLS combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

# Select your own methodsally <- allts(y)allf <- matrix(, nrow=10, ncol=ncol(ally))for(i in 1:ncol(ally))

allf[,i] <- mymethod(ally[,i], h=10)allf <- ts(allf, start=2004)# Reconcile forecasts so they add upfc2 <- combinef(allf, Smatrix(y))

Forecasting hierarchical time series hts package for R 48

Page 168: Forecasting Hierarchical Time Series

hts functionUsagehts(y, g)gts(y, g, hierarchical=FALSE)

Argumentsy Multivariate time series containing the bot-

tom level seriesg Group matrix indicating the group structure,

with one column for each series when com-pletely disaggregated, and one row for eachgrouping of the time series.

hierarchical Indicates if the grouping matrix should betreated as hierarchical.

Detailshts is simply a wrapper for gts(y,g,TRUE). Both return anobject of class gts.

Forecasting hierarchical time series hts package for R 49

Page 169: Forecasting Hierarchical Time Series

forecast.gts functionUsageforecast(object, h,method = c("comb", "bu", "mo", "tdgsf", "tdgsa", "tdfp", "all"),fmethod = c("ets", "rw", "arima"), level, positive = FALSE,xreg = NULL, newxreg = NULL, ...)

Argumentsobject Hierarchical time series object of class gts.h Forecast horizonmethod Method for distributing forecasts within the hierarchy.fmethod Forecasting method to uselevel Level used for "middle-out" method (when method="mo")positive If TRUE, forecasts are forced to be strictly positivexreg When fmethod = "arima", a vector or matrix of external re-

gressors, which must have the same number of rows as theoriginal univariate time series

newxreg When fmethod = "arima", a vector or matrix of external re-gressors, which must have the same number of rows as theoriginal univariate time series

... Other arguments passing to ets or auto.arima

Forecasting hierarchical time series hts package for R 50

Page 170: Forecasting Hierarchical Time Series

Utility functions

allts(y) Returns all series in thehierarchy

Smatrix(y) Returns the summing matrix

combinef(f) Combines initial forecastsoptimally.

Forecasting hierarchical time series hts package for R 51

Page 171: Forecasting Hierarchical Time Series

More information

Forecasting hierarchical time series hts package for R 52

Vignette on CRAN

Page 172: Forecasting Hierarchical Time Series

Outline

1 Hierarchical time series

2 Forecasting framework

3 Optimal forecasts

4 Approximately optimal forecasts

5 Application to Australian tourism

6 hts package for R

7 References

Forecasting hierarchical time series References 53

Page 173: Forecasting Hierarchical Time Series

References

RJ Hyndman, RA Ahmed, G Athanasopoulos, andHL Shang (2011). “Optimal combinationforecasts for hierarchical time series”.Computational Statistics and Data Analysis55(9), 2579–2589

RJ Hyndman, RA Ahmed, and HL Shang (2013).hts: Hierarchical time series.cran.r-project.org/package=hts.

RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.org/fpp/.

Forecasting hierarchical time series References 54

Page 174: Forecasting Hierarchical Time Series

References

RJ Hyndman, RA Ahmed, G Athanasopoulos, andHL Shang (2011). “Optimal combinationforecasts for hierarchical time series”.Computational Statistics and Data Analysis55(9), 2579–2589

RJ Hyndman, RA Ahmed, and HL Shang (2013).hts: Hierarchical time series.cran.r-project.org/package=hts.

RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.org/fpp/.

Forecasting hierarchical time series References 54

å Papers and R code:

robjhyndman.com

å Email: [email protected]