Predictive analytics, forecasting and marketing · 16/11/2016 · Market Forecast Target ... -...

Predictive analytics, forecasting and marketing Advances in promotional modelling and demand forecasting

Nikolaos Kourentzes

Lancaster University

Stockholm School of Economics

16/11/2016

About the speaker

Associate Professor at Dept. Management Science Lancaster University Management School, UK Lancaster Centre for Forecasting

Editorial Board of the International Journal of Forecasting, the leading academic journal in the field of forecasting

Research interests and consulting experience in various fields of forecasting, including:

– Business Forecasting and Demand Planning

– Promotional Modelling

– Supply Chain Forecasting and Bullwip effect

Long experience in applied research projects with industry in various sectors, including: retailing, FMCG manufacturers, pharmaceuticals, media among others. Research blog: http://nikolaos.kourentzes.com

2/75

http://nikolaos.kourentzes.com/

Agenda

1. Business forecasting and uncertainty

2. Forecasting challenges from a modelling view

3. Innovations:

3a. Temporal Hierarchies

3b. Promotional Modelling

3c. Using online search information

3/75

Forecasting & decision making

Decision making in organisations has at its core an element of forecasting

Accurate forecasts lead to reduced uncertainty better decisions

Forecasts maybe implicit or explicit

Forecasts aims to provide information about the future, conditional on historical and current knowledge Company targets and plans aim to provide direction towards a desirable future.

Present

Target

Forecast

Forecast

Difference between targets and forecasts, at different horizons, provide useful feedback

4/75


Future`

Market

Forecast Target

• Forecasts match targets – Great! Do they also match

organisational strategy and objectives?

• Forecasts to do match targets

‒ Revise plans/target to reflect probable (forecasted) outcomes.

‒ Act to invalidate forecasts change market to make your target the most probably outcome

• Our forecasts will always be wrong: we took actions since we produced them!

5/75


Forecasts are central in decisions relating to:

• Inventory management

• Promotional and marketing activities

• Logistics

• Human resource planning

• Purchasing and procurement

• Cash flow management

• Building new production/storage unit

• Entering new markets

• …

Accurate forecasts can support - Decision making - Identifying and capitalising on opportunities - Cost saving

6/75

Forecasting & uncertainty

Statistical forecasts are (often) able to provide a point forecast and a degree of

uncertainty, reflected in the prediction intervals Probability that true value will lie

within bounds.

Additional useful information will lead to tighter prediction intervals

Less uncertainty about the future

7/75


Therefore, it is crucial to identify and capture useful information to introduce in our

forecasts. Such information may be:

• Past univariate information (trend, season, …)

• Past exogenous information (events, price, competition, …)

• Future exogenous information (…)

The nature of this information does not have to be well-structured or even quantitative

(initially). Often it comprises of judgemental forecasts, though targets should be kept

separate.

Companies often have a wealth of available information, but do not use it effectively (or

at all!). Advances in forecasting allow us to take steps in using it reducing uncertainty

about the future taking better decisions.

8/75

Forecasting & uncertainty Let us assume we improved our forecast accuracy by 20%, so what?

Consider two products, one with low variability and one with high

… and some statistical forecasts

Let’s calculate how much stock we should have

9/75


For this example lets us always keep 1 week of stock

Great!

High service level

Do I really need all that stock? I always sell

about 300.

More variability, I probably need all this

stock

Stock = Forecast + Days of Supply

10/75

Forecasting & uncertainty Can we do any better? Given a statistical forecast we can estimate the uncertainty of its predictions this is essentially how accurate are our forecasts!

This uncertainty is invaluable for decision

making.

The point forecast will most probably be wrong, but the prediction intervals tell us

within which values the future values will probably

be.

With this we can calculate the inventory

11/75


The correct safety stock is associated with the forecast error (accuracy) Now the stock levels reflect how accurate are our forecasts and allow us to achieve the desired service level

Stock automatically adjusts to forecasting accuracy/uncertainty

The previous topics focused on increasing accuracy and automation Impact on inventory! Impact on financial

position!

Stock = Forecast + Service Level * Uncertainty

12/75

Agenda



3. Innovations:




13/75

How do we build & models now?

• This is by no means a resolved question, but there are some reliable approaches

• Take the example of exponential smoothing family:

• Considered one of the most reliable and robust methods for automatic univariate forecasting .

• It is a family of methods: ETS (error type, trend type, seasonality type)

• Error: Additive or Multiplicative

• Trend: None or Additive or Multiplicative, Linear or Damped/Exponential

• Seasonality: None or Additive or Multiplicative

• Adequate for a most types of time series.

• Within the state space framework we can select and fit model parameters automatically and reliably.

14/75

How do we build & models now?

None Additive Multiplicative

None

Additive

Additive Damped

Multiplicative

Multiplicative Damped

Trend

Seasonality

tht

ttt

LF

FaaAL

)1(

11 tsttt LaSAL

stttt SLAS )1()(

ksttkt SLF

11 )1()( tttt TLLT

)( 1

1

t

t

tt T

L

AS

stS )1(

hstt

h

i

i

tht STLF

1

))(1( 11

tt

st

tt TLa

S

AaL

Use likelihood to find smoothing and initial values. Use some information criterion (AICc, BIC, etc) to choose appropriate model per series.

15/75

Any issues with current practice?

Issues with automatic modelling:

• Model selection How good is the best fit model? How reliable?

• Sampling uncertainty Identified model/parameters stable as new data appear?

• Model uncertainty Appropriate model structure and parameters?

• Transparency/Trust Practitioners do not trust systems that change substantially

0 20 40 60 80 100 1200

2000

4000

6000

8000

10000

ETS(A,Ad,M) - AIC: 1262.33

0 20 40 60 80 100 1200

2000

4000

6000

8000

10000

ETS(M,Ad,A) - AIC: 1273.45

Months

16/75

Any issues with current practice?

What can go wrong in parameter and model selection:

• Business time series are often short Limited data

• Estimation of parameters can fail miserably (for monthly data optimise up to 18 parameters, with often no more than 36 observations)

• Model selection can fail as well (30 models over-fitting?)

• Both optimisation and model selection are myopic Focus on data fitting in the past, rather than ‘forecastability’

• Special cases:

10 20 30 40 50 60 7080

100

120

140

160

180

200

220

Sale

s

Month

Demand Fit Forecast

True model: Additive trend, additive seasonality Identified model: No trend, additive seasonality Why? In-sample variance explained mostly by seasonality

17/75

Characteristics of a good solution

• Provide accurate forecasts (i.e. narrow prediction intervals).

• Being consistent: the nature of the forecast should not change, unless there are substantial changes in the market increase trust to users.

• Be relatively transparent increase trust to the users and insights in the market.

• Be automatic: it is common to have to deal with several thousand of time series cannot model in detail each one separately.

• Underlying specification of the forecasting model should match the desired business objective.

18/75

Agenda



3. Innovations:




19/75

Forecasting model & objective Decisions need to be aligned:

• Operational short-term decisions

• Tactical medium-term decisions

• Strategic long-term decisions

Shorter term plans are bottom-up and based mainly on statistical forecasts & expert

adjustments.

Longer term plans are top-down and based mainly on managerial expertise factoring in

unstructured information and organisational environment.

Given different sources of information (and views) forecasts will differ plans and

decisions not aligned.

Objective: construct a framework to reconcile forecasts of different levels and eventually

align decisions less waste & costs, agility to take advantage of opportunities.

20/75

Long term forecasting

• We know that different forecasting models are better for different forecast horizons

• We also know that it helps to forecast long horizons using aggregate data

Forecasting a quarter ahead using daily data is `adventurous’ (90 steps ahead)

Forecasting a quarter ahead using quarterly data is easier (1 step ahead)

• At different data frequencies different components of the series dominate.

0 20 40 60 80 100 1200

2000

4000

6000

8000

10000

ETS(M,Ad,A) - AIC: 1273.45

Months

0 2 4 6 8 102

3

4

5

6

7

8x 10

4 ETS(A,Ad,N) - AIC: 85.92

Years

These forecasts often do not agree, which one is `correct’?

21/75

A different take on modelling: temporal tricks!

Traditionally we model time series at the frequency that we sampled them or take decisions.

However, a time series can be view in many different ways, adapting the notion of product

hierarchies to temporal hierarchies:

The advantages of temporal hierarchies can be highlighted by examining the data at both time and frequency domains.

22/75

How temporal aggregation changes the series

As an example let us use the following time series:

23/75

How temporal aggregation changes the series

Seasonal diagrams

24/75

The Idea

• Temporal aggregation strengthens and attenuates different elements of the

series:

at an aggregate level trend/cycle is easy to distinguish

at a disaggregate level high frequency elements like seasonality typically

dominate.

• Modelling a time series at a very disaggregate level (e.g. weekly) short-

term forecast. The opposite is true for aggregate levels (e.g. annual)

• Propose Temporal Hierarchies that provide a framework to optimally

combine information from various levels (irrespective of forecasting method)

to:

• reconcile forecasts

• avoid over-reliance on a single planning level

• avoid over-reliance on a single forecasting method/model

25/75

Temporal aggregation and forecasting

• It is not new, but the question has been at which single level to model the

time series. Econometrics have investigate the question for decades

inconclusive

• Supply chain applications: ADIDA beneficial to slow and fast moving items

forecast accuracy (like everything… not always!):

• Step 1: Temporally aggregate time series to the appropriate level

• Step 2: Forecast

• Step 3: Disaggregate forecast and use

• Selection of aggregation level No theoretical grounding for general

case, but good understanding for AR(1)/MA(1)/ARMA(1,1) cases.

26/75

Temporal aggregation and forecasting

Recently there has been a resurgence in using temporal aggregation for

forecasting.

• Non-overlapping temporal aggregation is a moving average filter.

• Filters high frequency components: noise, seasonality, etc.

• Reduces intermittency.

• But can increase complexity:

• Loss of estimation efficiency

• Complicates dynamics of underlying (ARIMA) process

• Identifiable process converge to IMA - often IMA(1,1)

• What is the best temporal aggregation to work on? Traditionally try to

identify a single best level.

27/75

Multiple temporal aggregation

What if we do not select an aggregation level? use multiple

10 20 30 40100

120

140

160

180

200

Period

Dem

and

Aggregation level 1 ETS(A,N,A) ETS(A,M,A)

ETS(A,M,N) ETS(A,A,N)

Issues: • Different model • Different length • Combination

2 4 6 8 10 12 14 16100

120

140

160

180

200

PeriodD

em

and

Aggregation level 3

1 2 3 4 5 6 7100

120

140

160

180

200

Period

Dem

and

Aggregation level 7

1 1.5 2 2.5 3 3.5 4100

120

140

160

180

200

Period

Dem

and

Aggregation level 12

28/75

Multiple temporal aggregation

Forecast combination:

• Forecast combination is widely considered as beneficial for forecast accuracy

• Simple combination methods (average, median) considered robust, relatively accurate to more complex methods

Issue:

• If there are different model types to be combined then the resulting forecast does not fit well at any component!

10 20 30 40100

120

140

160

180

200

Period

Dem

and

10 20 30 40100

120

140

160

180

200

Period

Dem

and

29/75

10 20 30 40 50 600

5000

10000

15000

5 10 15 20 25 300

5000

10000

15000

5 10 15 200

5000

10000

15000

1 2 3 4 5 60

5000

10000

15000

1 2 3 4 5 60

5000

10000

15000

1 2 3 4 50

5000

10000

15000

…

𝑦𝑡[1]

𝑦𝑡[2]

𝑦𝑡[3]

𝑦𝑡[10]

𝑦𝑡[11]

𝑦𝑡[12]

0 10 20 30 40 50 60 700

5000

10000

15000

Fit state space ETS

10 20 30 40 50 60-44.26

-44.24

-44.2210 20 30 40 50 60

2000

4000

6000

10 20 30 40 50 600.5

1

1.5

2

Save states

Level

Trend

Season

Fit state space ETS

0 5 10 15 20 25 30 352000

4000

6000

8000

10000

5 10 15 20 25 301

1.2

1.45 10 15 20 25 30

-96.27

-96.26

-96.255 10 15 20 25 30

2000

4000

6000

Save states

Level

Trend

Season

Aggregate

30/75

Transform states to additive and to original sampling frequency

Combine states (components)

Produce forecasts

31/75

Step 1: Aggregation

.

.

.

.

.

.

1Y

2Y

3Y

KY

2k

3k

Kk

Step 2: Forecasting

ETS

Model Selection

ETS

Model Selection

ETS

Model Selection

ETS

Model Selection

.

.

.

.

.

.

1b 1

s

2b 2

s

3b 3

s

Kb K

s

2l

3l

Kl

1l

Step 3: Combination

+

l

b

s

1Y

1

K

1

K

1

K

Strengthens and attenuates components

Estimation of parameters at multiple levels

Robustness on model selection and parameterisation

Multiple Aggregation Prediction Algorithm (MAPA)

32/75


33/75


MAPA was developed to take advantage of temporal aggregation and hierarchies:

• MAPA provides a framework to better identify and estimate the different time series components better forecasts

• On average outperforms ETS, one of the most widely used, robust and accurate univariate forecasting models in supply chain forecasting

• It provides reconciled forecasts across planning levels and forecast horizons

• Robust against model selection and parameterisation issues

• Shown to be useful for fast moving items, promotional modelling and intermittent time series forecasting.

MAPA is available for R, in the MAPA package: http://cran.r-project.org/web/packages/MAPA/index.html Its intermittent demand counterpart is available in the tsintermittent package: http://cran.r-project.org/web/packages/tsintermittent/index.html Examples and interactive demos for both are available at my blog: http://nikolaos.kourentzes.com

34/75

http://cran.r-project.org/web/packages/MAPA/index.html




http://cran.r-project.org/web/packages/tsintermittent/index.html





Temporal Hierarchies: A modelling framework

• MAPA demonstrated the strength of the approach, but it is not general:

• How to incorporate forecasts from any model/method?

• How to incorporate judgement?

• We can introduce a general framework for temporal hierarchies that borrows

many elements from cross-sectional hierarchies

• Eventually we will get to cross-temporal hierarchies, also touted as the ‘one-

number’ forecast, i.e. a reconciled forecast across planning horizons and

product/customer/location groups.

35/75

Cross-sectional and Temporal Hierarchies

• We know how to do cross-sectional hierarchies

• Top-down, bottom-up, middle-out

• Optimal combinations

Total

UK Spain

Product A Product B Product A Product B

We know how to do this!

… then we know how to do this as well, with some small-print!

but we have to correct for the different scales at each aggregation level, which is not that difficult due to the imposed temporal structure. For that we need to calculate the

covariance matrix between the forecast errors at different aggregation levels. There are easy and not so easy ways to do this.

36/75

Some evidence that it actually works!

Comparison with other M3 results (symmetric Mean Absolute Percentage Error):

• Monthly dataset ‒ Temporal (ETS based): 13.61% ‒ ETS: 14.45% [Hyndman et al., 2002]

‒ MAPA: 13.69% [Kourentzes et al., 2014]

‒ Theta: 13.85% (best original performance) [Makridakis & Hibon, 2000]

• Quarterly dataset

‒ Temporal (ETS based): 9.70% ‒ ETS: 9.94% [Hyndman et al., 2002]

‒ MAPA: 9.58% [Kourentzes et al., 2014]

‒ Theta: 8.96% (best original performance) [Makridakis & Hibon, 2000]

Detailed results available if you are interested at the end of the presentation!

37/75

Application: Predicting A&E admissions

Collect weekly data for UK A&E wards. 13 time series: covering different types of emergencies and different severities (measured as time to treatment) Span from week 45 2010 (7th Nov 2010) to week 24 2015 (7th June 2015) Series are at England level (not local authorities).

Accurately predict to support staffing and training decisions. Aligning the short and long term forecasts is important for consistency of planning and budgeting. Test set: 52 weeks. Rolling origin evaluation. Forecast horizons of interest: t+1, t+4, t+52 (1 week, 1 month, 1 year). Evaluation MASE (relative to base model) As a base model auto.arima (forecast package R) is used.

38/75

Application: Predicting A&E admissions Total Emergency Admissions via A&E

Red is the prediction of the base model (ARIMA) Blue is the temporal hierarchy reconciled forecasts (based on ARIMA)

Observe how information is `borrowed’ between temporal levels. Base models for instance provide very poor weekly and annual forecasts

39/75

Application: Predicting A&E admissions

• Accuracy gains at all planning horizons

• Crucially, forecasts are reconciled leading to aligned plans

40/75

Hierarchical forecasting & decision making

Hierarchical (or grouped) forecasting can improve accuracy, but their true strength lies in the reconciliation of the forecasts aligning forecasts is crucial for decision making.

Is the reconciliation achieved useful for decision making?

Cross-sectional Temporal

• Reconcile across different items. • Units may change at different levels of

hierarchy. • Suppose an electricity demand hierarchy:

lower and higher levels have same units. All levels relevant for decision making.

• Suppose a supply chain hierarchy. Weekly sales of SKU are useful. Weekly sales of organisation are not! Needed at different time scale.

• Reconcile across time units/horizons. • Units of items do not change. • Consider our application. NHS admissions

short and long term are useful for decision making.

• Suppose a supply chain hierarchy. Weekly sales of SKU is useful for operations. Yearly sales of a single SKU may be useful, but often not!

• Operational Tactical Strategic forecasts.

41/75

Cross-temporal hierarchies

Temporal hierarchies permit aligning operational, tactical and strategic planning, while offering accuracy gains useful for decision making BUT there can be cases that strategic level forecasts are not required for each item, but at an aggregate level.

Let us consider tourism demand for Australia as an example. Local authorities can make use of detailed forecasts (temporal/spatial) but at a country level weekly forecasts are of limited use. • Temporal: tactical strategic • Cross-sectional: local country

Cross temporal can support decisions at both dimensions: • Tactical/local; • strategic/local; • tactical/country; • strategic/country

42/75

56 (bottom level) quarterly tourism demand series • 6 years in-sample • 3 years out-of-sample

horizon: up to 2 years • rolling origin evaluation

MAPE %

Cross-temporal hierarchical forecasts: • Most accurate • Most complete

reconciliation (one number forecast)

• Flexible decision making support

43/75

Production ready?

• Multiple Aggregation Prediction Algorithm (MAPA) ‒ Kourentzes, N.; Petropoulos, F. & Trapero, J. R. Improving forecasting by estimating

time series structural components across multiple frequencies. International Journal of Forecasting, 2014, 30, 291-302 (Details)

‒ Petropoulos, F. & Kourentzes, N. Improving forecasting via multiple temporal aggregation. Foresight: The International Journal of Applied Forecasting, 2014, 2014, 12-17 (Easier introduction!)

‒ Petropoulos, F & Kourentzes, N. Forecast combinations for intermittent demand. Journal of the Operational Research Society , 2014, 66.6, 914-924. (Intermittent)

‒ Kourentzes, N. & Petropoulos, F. Forecasting with multivariate temporal aggregation: The case of promotional modelling. International Journal of Production Economics, 2015. (Promotional modelling)

‒ R package on CRAN: MAPA (and tsintermittent for slow movers) ‒ All papers, code and examples available at my website

(http://nikolaos.kourentzes.com)

• Temporal Hierarchies ‒ Working paper at my blog! ‒ R package on CRAN: thief

44/75


Agenda



3. Innovations:




45/75

What is the promotional modelling problem?

20 40 60 80 100 1200

1000

2000

3000

40005 - Product: 1080215 | Trend: 0 - Season 13: 0 - Season 52: 1

Dem

and

Promocode

Product:1080215

Adcode: 401

Adcode: 406

Adcode: 408

Adcode: 410

Adcode: 412

20 40 60 80 100 1200

0.1

0.2

0.3

0.4

0.5

Dis

count

%

What is the future demand? Normal demand? With promotion(s)?

Price Discount

SKU demand

46/75

A practical approach

• Use statistics for structured tasks that can be automated • Use human expertise for events

Promotional and advertising activity is one of the main reasons for adjusting statistical forecasts

Statistical Baseline Forecast

Expert

Judgement

Final Forecast

Time series methods: e.g. exponential

smoothing

Special events: e.g. promotions,

calendar events, etc

Demand planning forecast

47/75

Reason for using judgment

% indicating important

or extremely important

Promotional & advertising activity 51.4

Price change 45.5

Holidays 36.0

Insufficient inventories 33.3

Changes in regulations 32.7

Government policy 27.9

Substitute products produced by your company 22.5

Activity by competitors (promotions, advertising etc) 20.7

International crises 18.0

Weather 16.7

Insufficient inventories of competitors 16.2

Sporting events 11.8

Strikes 9.9

Other 36.4

% indicating

importance

Why do companies adjust forecasts?

48/75

Promotional modelling basics

What is the best forecast for this time series?

No additional information?

Use univariate modelling; e.g. exponential smoothing

18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100

150

200

250

300

350

400

450

500

550

Units


150

200

250

300

350

400

450

500

550

Units

Lets assume a company goes on with this forecast

Take a look at the inventory side of things

(Also, how do you forecast the next promotion?)


150

200

250

300

350

400

450

500

550

Units Safety stock

Out-of-stock: Promotion wasted

Overstocking

erroneously

49/75


What is the best forecast for this time series?

Unstructured information Use univariate modelling + judgemental adjustments

(S&OP meetings = planners + marketers)


150

200

250

300

350

400

450

500

550

Units


150

200

250

300

350

400

450

500

550

Units

Enough stock, but...


150

200

250

300

350

400

450

500

550

Units Safety stock

Wrong!

Promotional modelling requires building formal

statistical models that capture promotions in their

forecasts (systematically elasticities = decision

making) and therefore provide correct safety stocks

50/75


If inventory management does not account for the uncertainty (due to

events/promotions) correctly, then…

51/75

A statistical approach

We can build a forecast using statistical models, enhanced by: • Promotions • Price • Store • Events • Competition • …

10 20 30 40 50 60 70 80 90 100500

1000

1500

2000

2500

3000

3500

y(t) = 6.91 * Promo1(t)0.55 * Promo

2(t)0.69 * Promo

3(t)0.17 * Promo

2(t-1)-0.20 * Promo

3(t-1)-0.24; adj. R2 = 0.970

Period

Sale

s

-4 -2 0 2 40

5

10

15

20

25Histogram of standardised residuals

Standardised residuals

Fre

quency

-200 0 200 400500

1000

1500

2000

2500

3000

3500

Residuals

Actu

als

Scatterplot of residuals vs actuals

Promo_1 Promo_2 Promo_3-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

KeepPromo

3

Positive

interaction

Variable

Coeff

icie

nt-

1

Promotional impact

Actuals

Fit

Prediction

Promo(t)

Promo(t-1)

52/75

A statistical approach

What can go wrong? A lot!

• Effects across SKUs, categories, …

• Cannibalisation, how to identify it?

• Multiple effects multicollinearity (i.e. things happening together and

then it is difficult to identify which causes what)

• Modelling with limited data (or no data!)

• Scalability run the model for multiple time series within reasonable

times

53/75

Promotional forecasting: a case study

Used a subset of 60 SKUs:

• Sales • One-step-ahead system forecasts (SF) • One-step-ahead adjusted or final forecasts (FF) • Promotional information:

1. Price cuts 2. Feature advertising 3. Shelf display 4. Product category (22 categories) 5. Customer (2 main customers) 6. Days promoted in each week

Data withheld for out-of-sample forecast evaluation Some products do not contain promotions in the parameterisation (historical) sample

54/75


Periods highlighted in grey have some type of promotion

55/75


The advanced promotional model contains the following elements:

• Principal component analysis of promotional inputs: The 26 promotional inputs are combined a new composite inputs that are no longer multicollinear → Only a few new inputs are needed now, simplifying the model.

• Pooled regression: When an SKU does not have promotions in each history, we pool together information from other SKUs (the product category information is available to the model) to identify an average promotional effect. When enough promotional history for an SKU is available we do not pool information from other products

• Dynamics of promotions: Carry-over effects of promotions, after they are finished, are captured

• Dynamics of the time series: Demand dynamics are modelled for both promotional and non-promotional periods.

56/75


A series of benchmark models are used to assess the performance of the proposed promotional model

• System Forecast (SF): The baseline forecast of the case study company

• Final Forecast (FF): This is the SF adjusted by human experts in the company

• Naïve: A simple benchmark that assumes no structure in the demand data

• Exponential Smoothing (SES): An established benchmark that has been shown to perform well in supply chain forecasting problems. It cannot capture promotions

• Last Like promotion (LL): This is a SES forecast adjusted by the last observed promotion. When a promotion occurs the forecast is adjusted by the impact of the last observed promotion

Note that only FF and LL can capture promotions from our benchmarks

57/75

Promotional forecasting: a case study Forecast accuracy under promotions

Positive adjustments Negative adjustments

High error

Low error

Number of forecasts

Size of adjustment

DR: Advanced Promotional Model

58/75


59/75

Forecast accuracy under promotions High error

Low error DR is always better under promotions

Human judgment performs

inconsistently

LL has overall poor accuracy

DR is more accurate than human experts and statistical benchmarks in promotional periods


Forecast accuracy in non-promotional periods High error

Low error

DR is models accurately carry-over promotional

effects

Human judgment performs poorly No reason for

positive adjustments?

LL has overall poor accuracy

DR marginally worse that SF

60/75

DR accurately captures carry-over promotional effects. SF is marginally better because the forecasts are based on adjusted historical demand. DR is modelled in raw data.


0.607 0.652 0.762 0.629 0.612 0.601 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

SF FF Naïve SES LL DR

1.069 1.327 1.092 0.904 1.122 0.868 0

0.2

0.4

0.6

0.8

1

1.2

1.4


0.671 0.745 0.808 0.667 0.682 0.638 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


High error

Low error

Overall Promotions No promotions

Be

st

Be

st

Be

st

• Advanced promotional model is consistently the most accurate

• Substantially outperforms current case study company practice (FF)

• Major improvements during promotional periods

+7.8%

+34.6%

+14.4%

0.0% 10.0% 20.0% 30.0% 40.0%

No promotions

Promotions

Overall

Improvement of DR over FF

61/75


1. Advanced statistical promotional model significantly outperform human experts and statistical benchmarks

2. Company can benefit from increased automation of the forecasting process and reduced effort of experts to maintain forecasts, while accuracy is increased

3. Can produce promotional forecasts when there is no or limited promotional history for an SKU

4. Overcomes limitations of previous promotional models

5. Paper available: Trapero, J R., Kourentzes N. and Fildes R. On the identification of sales forecasting models in the presence of promotions. Journal of the Operational Research Society, 2014, 66.2, 299-307. (available at my blog)

62/75

Temporal hierarchies & promotions

Multivariate MAPA allows capturing the effects of external variables at different temporal scales (short vs. long-term effects). However, temporal aggregation may increase further the multicollinearity between variables, making the estimation of ci problematic. Consider the example of binary dummies, which will tend to become similar at high levels of temporal aggregation (e.g. promotional dummies at annual level). • Use principal components to mitigate this problem • Principal components construct composite orthogonal inputs • Useful also when various promotional campaigns happen simultaneously

63/75

Case study: major UK manufacturer of cider drinks. • 12 SKUs of a popular cider brand • Promotions at customer/store level – multiple at the same time • Weekly data: 86 observations for estimation, 18 observations for test • Predict t+4, t+8 and t+12 periods ahead (1, 2 and 3 months). • Rolling origin evaluation.

Case study

64/75

Case study

Bias – sME Method Horizon: 4 Horizon: 8 Horizon: 12 Naive -0.139 -0.194 -0.282 ETS -0.248 -0.288 -0.374 MAPA -0.229 -0.272 -0.410 Regression -0.305 -0.317 -0.482 ETSx -0.198 -0.164 -0.227 MAPAx -0.112 -0.100 -0.213

Accuracy - sMAE Method Horizon: 4 Horizon: 8 Horizon: 12 Naive 0.743 0.818 0.704 ETS 0.708 0.776 0.702 MAPA 0.678 0.758 0.737 Regression 0.611 0.659 0.714 ETSx 0.675 0.666 0.615 MAPAx 0.532 0.545 0.548

t+4 t+8 t+130.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Horizon

sM

AE

Naive

ETS

MAPA

Regression

ETSx

MAPAx

t+4 t+8 t+130.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Horizon

sM

AE

Naive

ETS

MAPA

Regression

ETSx

MAPAx

scaled Mean Absolute Error

Superior bias and error variance safety stock implications

65/75

Agenda



3. Innovations:




66/75

Using consumer search behaviour

• Another way to manage uncertainty is to use additional external information this must be leading sales, so that it is available for the forecasting model to use.

• Nowadays there is abundant information about how people search online, share posts, consume media, etc.

• Can this information help us produce better forecasts?

• Operational forecasting: evidence suggests that there is little gain, if any, given the model complexity.

• New product launches and lifecycle modelling promising!

67/75

Case study: forecasting computer games

02

04

060

80

10

0

Week since Launch

Peak S

cale

Call of Duty 3

−50 −20 1 20 40 60 80 100 130 160 190 220 250 280 310 340 370 400 430

Actual SalesGoogle Trends

02

04

060

80

10

0

Week since Launch

Peak S

ca

le

Wii Sports

−40 −20 1 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420

Actual SalesGoogle Trends

68/75

Different generations of games 0

600000

Sale

s

j=1 j=2 j=3 j=4 j=5 j=6 j=7

Sales for Assassin's Creed

040

80

Peak s

cale

d s

earc

h tra

ffic

j=1 j=2 j=3 j=4 j=5 j=6 j=7

Google Trends

69/75

The idea

• Gamers may search online for a game for multiple reasons.

• Pre-launch: news, pre-views, buy

• After-launch: reviews, updates, walkthroughs, buy, …

• We are interested in information that can be known well in advance, so that it can support meaningful decisions.

• Enhance lifecycle models (Bass) with pre-launch online search information (and from previous generations) increase confidence & accuracy, given minimal data availability.

• Focus on “market size” parameter.

70/75

Modelling process 0

200000

Sale

s

j=1 j=2 j=3 j=4

Sales

040

80

Peak s

cale

d s

earc

h tra

ffic

j=1 j=2 j=3 j=4

Search traffic

0200000

Sale

s

j=1

Sales

040

80

Peak s

cale

d s

earc

h tra

ffic

j=1

Search traffic

0200000

Sale

s

pj = pj−1

qj = qj−1

mj = Search Traffic

Actuals

Forecast

j=1 j=2

Sales

040

80

Peak s

cale

d s

earc

h tra

ffic

t

j=1 j=2

Search traffic

0200000

Sale

s

pj = pj−1

qj = qj−1

mj = Search Traffic

Actuals

Forecast

j=1 j=2 j=3

Sales

040

80

Peak s

cale

d s

earc

h tra

ffic

t

j=1 j=2 j=3

Search traffic

0200000

Sale

s

Actuals

Forecast

j=1 j=2 j=3 j4

Sales

040

80

Peak s

cale

d s

earc

h tra

ffic

t

j=1 j=2 j=3 j=4

Search traffic

71/75

Case study: evaluation

• Use 6 games series (43 games in total)

• Bass model with alternative estimations for market size:

• Naïve (same parameter as last generation)

• Naïve + difference between generations

• Linear trend for the parameter

• AR(1) process for the parameter

• Parameter is a function of google search data

• “Optimal”: estimate model once all data is available.

• Use relative mean absolute errors (denominator is with google search), error <1 means that it is improving.

72/75

Case study: results

Naïve Naïve Diff. Linear Trend

AR(1) Optimal

1.020 1.087 1.086 1.070 0.866

Horizon: 1 week

<1 = better

Naïve Naïve Diff. Linear Trend

AR(1) Optimal

1.007 1.143 1.088 1.033 0.861

Horizon: 6 weeks

<1 = better

In all comparable cases there are gains in using online search information! This is ongoing work…

73/75

Conclusions

• Managing uncertainty is one of the main objectives of business forecasting leads to superior decision making.

• We can do that by: • Better capturing historical structure of sales data • Using additional information (e.g. promotions, discounts, online search)

• Temporal hierarchies joins operational, tactical and strategic decision

making by reconciling forecasts `one-number’ forecast superior decision making.

• Statistical promotional modelling accurate and automatic at SKU level leaves human experts to focus on unstructured information.

• Online search behaviour gains in modelling sales using pre-launch information superior forecasts, but can we shape future demand?

74/75

Thank you for your attention! Questions?

Nikolaos Kourentzes

email: [email protected]

blog: http://nikolaos.kourentzes.com

Published, working papers and code available at my blog!

75/75

mailto:[email protected]


Appendix

Detailed M3 results for temporal hierarchies


BU: Bottom-Up; WLSH: Hierarchy scaling; WLSv: Variance scaling; WLSs: Structural scaling

756 series, forecast t+1 - t+8 quarters ahead

M3 quarterly dataset % error change over base % error change over base

ETS ARIMA


ETS ARIMA

BU: Bottom-Up; WLSH: Hierarchy scaling; WLSv: Variance scaling; WLSs: Structural scaling

1453 series, forecast t+1 - t+18 months ahead

M3 monthly dataset % error change over base % error change over base

Appendix

Calculation details for temporal hierarchies

Temporal Hierarchies - Notation

Non-overlapping temporal aggregation to kth level:

Annual

Semi-annual

Quarterly

Observations at each aggregation level

Temporal Hierarchies - Notation

Collecting the observations from the different levels in a column:

We can define a “summing” matrix S so that:

Lowest level observations

Annual

Semi-annual

Quarter

Example: Monthly

Aggregation levels k are selected so that we do not get fractional seasonalities

Temporal Hierarchies - Forecasting

We can arrange the forecasts from each level in a similar fashion:

The reconciliation model is: Reconciliation errors how much the

forecasts across levels do not agree

Summing matrix

Reconciled forecasts

Unknown conditional means of the future values at lowest level

The reconciliation error has zero mean and covariance matrix


If was known then we can write (GLS estimator):

But in general it is not know, so we need to estimate it. It can be shown that is not identifiable (you need to know the reconciled forecasts, before you reconcile them), however:

Reconciliation errors

Covariance of forecast errors

So our problem becomes:


All we need now is an estimation of W

Sample covariance of

in-sample errors

In principle this is fine, but its sample size is controlled by the number of top-level (annual) observations. For example 104 observations at weekly level, results in just 2 sample points (2 years). So the estimation of is typically weak in practice.


We propose three ways to estimate it, with increasing simplifying assumptions. Using as example quarterly data the approximations are: Hierarchy variance scaling Series variance scaling Structural scaling

Diagonal of covariance matrix less elements to

estimate

Assume within level equal variances. This is

what conventional forecasting does.

Increases sample size.

Assume proportional error variances. No need for estimates can be used when

unknown (e.g. expert forecasts).

Predictive analytics, forecasting and marketing · 16/11/2016 · Market Forecast Target ... -...

Documents

Transcript of Predictive analytics, forecasting and marketing · 16/11/2016 · Market Forecast Target ... -...