Confidential 1 DCPs in Forecasting Edward Kambour, Senior Scientist Roxy Cramer, Scientist.

Post on 14-Dec-2015

220 views 1 download

Transcript of Confidential 1 DCPs in Forecasting Edward Kambour, Senior Scientist Roxy Cramer, Scientist.

Confidential

1

DCPs in Forecasting

Edward Kambour, Senior Scientist

Roxy Cramer, Scientist

Confidential

Forecasting BackgroundForecasting Background

The booking period is broken down into intervals during which the underlying demand process is stable Handles heterogeneity in the arrival rates Addresses the small numbers problem

Signal to noise Sample sizes

Confidential

DCP ForecastingDCP Forecasting

Aggregate all transactions that occur during an interval of the booking process

Use historical aggregated bookings to forecast the arrival rate during the DCP

Forecast the arrival rate for any given day in the interval by breaking up the DCP forecast Assume constant arrival rate

Confidential

Small Numbers ProblemSmall Numbers Problem

Signal to noise Finer granularity implies a lower signal to

noise ratio For Poisson data, the SNR = sqrt(mean) Problematic for detecting demand shifts,

seasonal trends, and holiday effects

Aggregating to the DCP level increases the signal to noise ratio

Confidential

Small Numbers (cont.)Small Numbers (cont.)

Sample Size Aggregating m different days into a DCP

increases the sample size by a factor of m Using a 10 day DCP results in having 10

observations per departure date Leads to superior forecast accuracy

because we use more information about the demand process

Confidential

Example 1Example 1

5 Day Booking period Constant Poisson arrival rate

1 per day Examine forecast accuracy

5 DCPs Single DCP

Confidential

Example 1 Booking CurveExample 1 Booking Curve

Booking Curve

0

1

2

3

4

5

6

0 1 2 3 4 5

Days Prior

Tota

l Boo

king

s

Confidential

Example 1: ForecastingExample 1: Forecasting

Suppose we have observations for n departure dates

Forecast the number of bookings between 4 and 5 days out Single DCP: constant arrival rate

Average number of bookings over all the days out 5 DCPs

Average number of bookings between days 4 and 5

Confidential

Example 1: Forecast AccuracyExample 1: Forecast Accuracy

Both estimators are unbiased Single DCP estimate is based on a

sample size of 5n Variance = 1/(5n), MSE = 1/(5n)

5 DCP estimate is based on a sample size of n Variance = 1/n, MSE = 1/n

The Single DCP estimate is more accurate

Confidential

Example 1: SimulationExample 1: Simulation

5 historical departure dates

Arrival Date 0 to 1 1 to 2 2 to 3 3 to 4 4 to 51 1 0 1 1 32 1 0 0 1 13 1 0 1 2 34 0 0 1 0 05 2 0 3 3 3

5 DCP 1 0 1.2 1.4 2Single DCP 1.12 1.12 1.12 1.12 1.12

Days Out

Confidential

Example 1: Simulation Forecast Errors

Example 1: Simulation Forecast Errors

Single DCP MSE = 0.0144, MAE = 0.12

5 DCPs MSE = 0.44, MAE = 0.52

5 DCP 1 0 1.2 1.4 2Single DCP 1.12 1.12 1.12 1.12 1.12Truth 1 1 1 1 1

5 DCP Error 0 -1 0.2 0.4 1Single Error 0.12 0.12 0.12 0.12 0.12

Confidential

Example 2Example 2

10 Day Booking period Constant Poisson arrival rate over the first 5

days and the last 5 days 1 per day in the first 5 5 per day in the last 5

Examine forecast accuracy 10 DCPs 2 DCPs Single DCP

Confidential

Example 2 Booking CurveExample 2 Booking Curve

Booking Curve

05

101520253035

0 2 4 6 8 10

Days Prior

To

tal

Bo

oki

ng

s

Confidential

Example 2: ForecastingExample 2: Forecasting

Suppose we have observations for n departure dates

Forecast the number of bookings on between 4 and 5 days out Single DCP: constant arrival rate

Average number of bookings over all the days out 2 DCPs: constant arrival rate from 5-10 and 0-5

days out Average number of bookings from 0-5 days out

10 DCPs Average number of bookings between days 4 and 5

Confidential

Example 2: Forecast AccuracyExample 2: Forecast Accuracy

10 DCPs and 2 DCPs are unbiased Single DCP will overestimate for 5-10 days out and

underestimate for 0-5 days out (Absolute Bias = 2) Single DCP, sample size of 10n

Variance = 3/(10n), MSE = 3/(10n) + 4 2 DCP, sample size of 5n

Variance = 1/n, MSE = 1/n 10 DCP estimate is based on a sample size of n

Variance = 5/n, MSE = 5/n The 2 DCP estimate is most accurate

Confidential

Example 2: SimulationExample 2: Simulation

5 historical departure dates

Date 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-101 4 7 1 5 1 0 0 0 2 12 11 6 9 4 10 1 0 0 1 23 5 4 8 6 7 1 1 2 0 34 7 9 7 5 2 0 2 0 0 25 6 10 2 5 1 0 2 1 0 0

10 DCP 6.6 7.2 5.4 5 4.2 0.4 1 0.6 0.6 1.62 DCP 5.68 5.68 5.68 5.68 5.68 0.84 0.84 0.84 0.84 0.84

Single DCP 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26

Days Out

Confidential

Example 2: Simulation Forecast Errors

Example 2: Simulation Forecast Errors

Single DCP: MSE = 4.07, MAE = 2 10 DCPs: MSE = 0.92, MAE = 0.7 2 DCPs: MSE = 0.24, MAE = 0.42

10 DCP 6.6 7.2 5.4 5 4.2 0.4 1 0.6 0.6 1.62 DCP 5.68 5.68 5.68 5.68 5.68 0.84 0.84 0.84 0.84 0.84

Single DCP 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26 3.26Truth 5 5 5 5 5 1 1 1 1 1

10 DCP Error 1.6 2.2 0.4 0 -0.8 -0.6 0 -0.4 -0.4 0.62 DCP Error 0.68 0.68 0.68 0.68 0.68 -0.16 -0.16 -0.16 -0.16 -0.16Single Error -1.74 -1.74 -1.74 -1.74 -1.74 2.26 2.26 2.26 2.26 2.26

Confidential

10 DCP Booking Curve10 DCP Booking Curve

Booking Curve

0

5

10

15

20

25

30

35

0 2 4 6 8 10Days Prior

To

tal

Bo

oki

ng

s

Confidential

10 DCP Booking Curve10 DCP Booking Curve

Booking Curve

0

5

10

15

20

25

30

35

0 2 4 6 8 10Days Prior

To

tal

Bo

ok

ing

s

Confidential

2 DCP Booking Curve2 DCP Booking Curve

Booking Curve

0

5

10

15

20

25

30

35

0 2 4 6 8 10Days Prior

To

tal

Bo

ok

ing

s

Confidential

2 DCP Booking Curve2 DCP Booking Curve

Booking Curve

0

5

10

15

20

25

30

35

0 2 4 6 8 10Days Prior

To

tal B

oo

kin

gs

Confidential

Finding the Best DCP Structure

Finding the Best DCP Structure

Gather data for numerous departure dates

Fit every possible every possible DCP structure and select the one that has the smallest Mean Squared Error (MSE) The structure with the smallest MSE will

generally be the one with the fewest DCPs and negligible bias.

Recall that the MSE = Variance + Bias2

Confidential

DCP Selection AlgorithmDCP Selection Algorithm

Configure the DCP question into a multiple linear regression with indicator predictors Utilize the change point regression

methodology from McLaren (2000) Minimizes the estimated Expected MSE (risk),

Eubank (1988) Utilizes a mixture of Backward Elimination, Draper

(1981), and Regression by Leaps and Bounds, Furnival (1974)

Extend the method to partition the MSE into its variance and squared bias components

Confidential

Real Data Booking CurveReal Data Booking Curve

Aggregated Booking Curve

Days Prior to Departure

Bo

oki

ng

s

Confidential

Real Fitted Booking CurveReal Fitted Booking Curve

Aggregated Booking Curve

Days Prior to Departure

Bo

oki

ng

s

Confidential

Real Booking CurvesReal Booking Curves

Aggregated Booking Curve

Days Prior to Departure

Bo

oki

ng

s

Confidential

ConsiderationsConsiderations

Business rules and requirements Application specific requirements Concerns about the proportion of

demand in each DCP Don’t want to “put all the eggs in one

basket” Day of Week issues Long haul versus short haul

Confidential

RobustnessRobustness

Yields a mathematical starting point Finds best “sub-optimal” structures Quantifies the effect of using different

DCP structures

Confidential

ConclusionConclusion

The number of DCPs is important Too many leads to low SNR and high

forecast error Too few leads to biased forecasts, and

hence high forecast error Want constant arrival rate throughout

a DCP interval Examine historical booking curves

Keep in mind the randomness involved

Confidential

Technical ReferencesTechnical References

Draper, N. and Smith, H. (1981) Applied Regression Analysis. Wiley, New York.

Eubank, R. L. (1988) Spline Smoothing and Nonparametric Regression. Marcel Dekker, Inc., New York.

Furnival, G. M. and Wilson, R. W. (1974). Regression by Leaps and Bounds. Technometrics, 16, 499-511.

McLaren, C. E., Kambour, E. L., McLachlan, G. J. Lukaski, H. C., Li X., Brittenham, G. E., and McLaren, G. D. (2000). Patient-specific Analysis of Sequential Haematologial Data by Multiple Linear Regression and Mixture Modelling. Statistics in Medicine, 19, 83-98.