1009 Predictive Forecasting Using SAP BPC 10 on SAP HANA With Data Visualization[1]
Predictive analytics, forecasting and marketing · 16/11/2016 · Market Forecast Target ... -...
Transcript of Predictive analytics, forecasting and marketing · 16/11/2016 · Market Forecast Target ... -...
Predictive analytics, forecasting and marketing Advances in promotional modelling and demand forecasting
Nikolaos Kourentzes
Lancaster University
Stockholm School of Economics
16/11/2016
About the speaker
Associate Professor at Dept. Management Science Lancaster University Management School, UK Lancaster Centre for Forecasting
Editorial Board of the International Journal of Forecasting, the leading academic journal in the field of forecasting
Research interests and consulting experience in various fields of forecasting, including:
– Business Forecasting and Demand Planning
– Promotional Modelling
– Supply Chain Forecasting and Bullwip effect
Long experience in applied research projects with industry in various sectors, including: retailing, FMCG manufacturers, pharmaceuticals, media among others. Research blog: http://nikolaos.kourentzes.com
2/75
Agenda
1. Business forecasting and uncertainty
2. Forecasting challenges from a modelling view
3. Innovations:
3a. Temporal Hierarchies
3b. Promotional Modelling
3c. Using online search information
3/75
Forecasting & decision making
Decision making in organisations has at its core an element of forecasting
Accurate forecasts lead to reduced uncertainty better decisions
Forecasts maybe implicit or explicit
Forecasts aims to provide information about the future, conditional on historical and current knowledge Company targets and plans aim to provide direction towards a desirable future.
Present
Target
Forecast
Forecast
Difference between targets and forecasts, at different horizons, provide useful feedback
4/75
Forecasting & decision making
Future`
Market
Forecast Target
• Forecasts match targets – Great! Do they also match
organisational strategy and objectives?
• Forecasts to do match targets
‒ Revise plans/target to reflect probable (forecasted) outcomes.
‒ Act to invalidate forecasts change market to make your target the most probably outcome
• Our forecasts will always be wrong: we took actions since we produced them!
5/75
Forecasting & decision making
Forecasts are central in decisions relating to:
• Inventory management
• Promotional and marketing activities
• Logistics
• Human resource planning
• Purchasing and procurement
• Cash flow management
• Building new production/storage unit
• Entering new markets
• …
Accurate forecasts can support - Decision making - Identifying and capitalising on opportunities - Cost saving
6/75
Forecasting & uncertainty
Statistical forecasts are (often) able to provide a point forecast and a degree of
uncertainty, reflected in the prediction intervals Probability that true value will lie
within bounds.
Additional useful information will lead to tighter prediction intervals
Less uncertainty about the future
7/75
Forecasting & uncertainty
Therefore, it is crucial to identify and capture useful information to introduce in our
forecasts. Such information may be:
• Past univariate information (trend, season, …)
• Past exogenous information (events, price, competition, …)
• Future exogenous information (…)
The nature of this information does not have to be well-structured or even quantitative
(initially). Often it comprises of judgemental forecasts, though targets should be kept
separate.
Companies often have a wealth of available information, but do not use it effectively (or
at all!). Advances in forecasting allow us to take steps in using it reducing uncertainty
about the future taking better decisions.
8/75
Forecasting & uncertainty Let us assume we improved our forecast accuracy by 20%, so what?
Consider two products, one with low variability and one with high
… and some statistical forecasts
Let’s calculate how much stock we should have
9/75
Forecasting & uncertainty
For this example lets us always keep 1 week of stock
Great!
High service level
Do I really need all that stock? I always sell
about 300.
More variability, I probably need all this
stock
Stock = Forecast + Days of Supply
10/75
Forecasting & uncertainty Can we do any better? Given a statistical forecast we can estimate the uncertainty of its predictions this is essentially how accurate are our forecasts!
This uncertainty is invaluable for decision
making.
The point forecast will most probably be wrong, but the prediction intervals tell us
within which values the future values will probably
be.
With this we can calculate the inventory
11/75
Forecasting & uncertainty
The correct safety stock is associated with the forecast error (accuracy) Now the stock levels reflect how accurate are our forecasts and allow us to achieve the desired service level
Stock automatically adjusts to forecasting accuracy/uncertainty
The previous topics focused on increasing accuracy and automation Impact on inventory! Impact on financial
position!
Stock = Forecast + Service Level * Uncertainty
12/75
Agenda
1. Business forecasting and uncertainty
2. Forecasting challenges from a modelling view
3. Innovations:
3a. Temporal Hierarchies
3b. Promotional Modelling
3c. Using online search information
13/75
How do we build & models now?
• This is by no means a resolved question, but there are some reliable approaches
• Take the example of exponential smoothing family:
• Considered one of the most reliable and robust methods for automatic univariate forecasting .
• It is a family of methods: ETS (error type, trend type, seasonality type)
• Error: Additive or Multiplicative
• Trend: None or Additive or Multiplicative, Linear or Damped/Exponential
• Seasonality: None or Additive or Multiplicative
• Adequate for a most types of time series.
• Within the state space framework we can select and fit model parameters automatically and reliably.
14/75
How do we build & models now?
None Additive Multiplicative
None
Additive
Additive Damped
Multiplicative
Multiplicative Damped
Trend
Seasonality
tht
ttt
LF
FaaAL
)1(
11 tsttt LaSAL
stttt SLAS )1()(
ksttkt SLF
11 )1()( tttt TLLT
)( 1
1
t
t
tt T
L
AS
stS )1(
hstt
h
i
i
tht STLF
1
))(1( 11
tt
st
tt TLa
S
AaL
Use likelihood to find smoothing and initial values. Use some information criterion (AICc, BIC, etc) to choose appropriate model per series.
15/75
Any issues with current practice?
Issues with automatic modelling:
• Model selection How good is the best fit model? How reliable?
• Sampling uncertainty Identified model/parameters stable as new data appear?
• Model uncertainty Appropriate model structure and parameters?
• Transparency/Trust Practitioners do not trust systems that change substantially
0 20 40 60 80 100 1200
2000
4000
6000
8000
10000
ETS(A,Ad,M) - AIC: 1262.33
0 20 40 60 80 100 1200
2000
4000
6000
8000
10000
ETS(M,Ad,A) - AIC: 1273.45
Months
16/75
Any issues with current practice?
What can go wrong in parameter and model selection:
• Business time series are often short Limited data
• Estimation of parameters can fail miserably (for monthly data optimise up to 18 parameters, with often no more than 36 observations)
• Model selection can fail as well (30 models over-fitting?)
• Both optimisation and model selection are myopic Focus on data fitting in the past, rather than ‘forecastability’
• Special cases:
10 20 30 40 50 60 7080
100
120
140
160
180
200
220
Sale
s
Month
Demand Fit Forecast
True model: Additive trend, additive seasonality Identified model: No trend, additive seasonality Why? In-sample variance explained mostly by seasonality
17/75
Characteristics of a good solution
• Provide accurate forecasts (i.e. narrow prediction intervals).
• Being consistent: the nature of the forecast should not change, unless there are substantial changes in the market increase trust to users.
• Be relatively transparent increase trust to the users and insights in the market.
• Be automatic: it is common to have to deal with several thousand of time series cannot model in detail each one separately.
• Underlying specification of the forecasting model should match the desired business objective.
18/75
Agenda
1. Business forecasting and uncertainty
2. Forecasting challenges from a modelling view
3. Innovations:
3a. Temporal Hierarchies
3b. Promotional Modelling
3c. Using online search information
19/75
Forecasting model & objective Decisions need to be aligned:
• Operational short-term decisions
• Tactical medium-term decisions
• Strategic long-term decisions
Shorter term plans are bottom-up and based mainly on statistical forecasts & expert
adjustments.
Longer term plans are top-down and based mainly on managerial expertise factoring in
unstructured information and organisational environment.
Given different sources of information (and views) forecasts will differ plans and
decisions not aligned.
Objective: construct a framework to reconcile forecasts of different levels and eventually
align decisions less waste & costs, agility to take advantage of opportunities.
20/75
Long term forecasting
• We know that different forecasting models are better for different forecast horizons
• We also know that it helps to forecast long horizons using aggregate data
Forecasting a quarter ahead using daily data is `adventurous’ (90 steps ahead)
Forecasting a quarter ahead using quarterly data is easier (1 step ahead)
• At different data frequencies different components of the series dominate.
0 20 40 60 80 100 1200
2000
4000
6000
8000
10000
ETS(M,Ad,A) - AIC: 1273.45
Months
0 2 4 6 8 102
3
4
5
6
7
8x 10
4 ETS(A,Ad,N) - AIC: 85.92
Years
These forecasts often do not agree, which one is `correct’?
21/75
A different take on modelling: temporal tricks!
Traditionally we model time series at the frequency that we sampled them or take decisions.
However, a time series can be view in many different ways, adapting the notion of product
hierarchies to temporal hierarchies:
The advantages of temporal hierarchies can be highlighted by examining the data at both time and frequency domains.
22/75
How temporal aggregation changes the series
As an example let us use the following time series:
23/75
How temporal aggregation changes the series
Seasonal diagrams
24/75
The Idea
• Temporal aggregation strengthens and attenuates different elements of the
series:
at an aggregate level trend/cycle is easy to distinguish
at a disaggregate level high frequency elements like seasonality typically
dominate.
• Modelling a time series at a very disaggregate level (e.g. weekly) short-
term forecast. The opposite is true for aggregate levels (e.g. annual)
• Propose Temporal Hierarchies that provide a framework to optimally
combine information from various levels (irrespective of forecasting method)
to:
• reconcile forecasts
• avoid over-reliance on a single planning level
• avoid over-reliance on a single forecasting method/model
25/75
Temporal aggregation and forecasting
• It is not new, but the question has been at which single level to model the
time series. Econometrics have investigate the question for decades
inconclusive
• Supply chain applications: ADIDA beneficial to slow and fast moving items
forecast accuracy (like everything… not always!):
• Step 1: Temporally aggregate time series to the appropriate level
• Step 2: Forecast
• Step 3: Disaggregate forecast and use
• Selection of aggregation level No theoretical grounding for general
case, but good understanding for AR(1)/MA(1)/ARMA(1,1) cases.
26/75
Temporal aggregation and forecasting
Recently there has been a resurgence in using temporal aggregation for
forecasting.
• Non-overlapping temporal aggregation is a moving average filter.
• Filters high frequency components: noise, seasonality, etc.
• Reduces intermittency.
• But can increase complexity:
• Loss of estimation efficiency
• Complicates dynamics of underlying (ARIMA) process
• Identifiable process converge to IMA - often IMA(1,1)
• What is the best temporal aggregation to work on? Traditionally try to
identify a single best level.
27/75
Multiple temporal aggregation
What if we do not select an aggregation level? use multiple
10 20 30 40100
120
140
160
180
200
Period
Dem
and
Aggregation level 1 ETS(A,N,A) ETS(A,M,A)
ETS(A,M,N) ETS(A,A,N)
Issues: • Different model • Different length • Combination
2 4 6 8 10 12 14 16100
120
140
160
180
200
PeriodD
em
and
Aggregation level 3
1 2 3 4 5 6 7100
120
140
160
180
200
Period
Dem
and
Aggregation level 7
1 1.5 2 2.5 3 3.5 4100
120
140
160
180
200
Period
Dem
and
Aggregation level 12
28/75
Multiple temporal aggregation
Forecast combination:
• Forecast combination is widely considered as beneficial for forecast accuracy
• Simple combination methods (average, median) considered robust, relatively accurate to more complex methods
Issue:
• If there are different model types to be combined then the resulting forecast does not fit well at any component!
10 20 30 40100
120
140
160
180
200
Period
Dem
and
10 20 30 40100
120
140
160
180
200
Period
Dem
and
29/75
10 20 30 40 50 600
5000
10000
15000
5 10 15 20 25 300
5000
10000
15000
5 10 15 200
5000
10000
15000
1 2 3 4 5 60
5000
10000
15000
1 2 3 4 5 60
5000
10000
15000
1 2 3 4 50
5000
10000
15000
…
𝑦𝑡[1]
𝑦𝑡[2]
𝑦𝑡[3]
𝑦𝑡[10]
𝑦𝑡[11]
𝑦𝑡[12]
0 10 20 30 40 50 60 700
5000
10000
15000
Fit state space ETS
10 20 30 40 50 60-44.26
-44.24
-44.2210 20 30 40 50 60
2000
4000
6000
10 20 30 40 50 600.5
1
1.5
2
Save states
Level
Trend
Season
Fit state space ETS
0 5 10 15 20 25 30 352000
4000
6000
8000
10000
5 10 15 20 25 301
1.2
1.45 10 15 20 25 30
-96.27
-96.26
-96.255 10 15 20 25 30
2000
4000
6000
Save states
Level
Trend
Season
Aggregate
30/75
Transform states to additive and to original sampling frequency
Combine states (components)
Produce forecasts
31/75
Step 1: Aggregation
.
.
.
.
.
.
1Y
2Y
3Y
KY
2k
3k
Kk
Step 2: Forecasting
ETS
Model Selection
ETS
Model Selection
ETS
Model Selection
ETS
Model Selection
.
.
.
.
.
.
1b 1
s
2b 2
s
3b 3
s
Kb K
s
2l
3l
Kl
1l
Step 3: Combination
+
l
b
s
1Y
1
K
1
K
1
K
Strengthens and attenuates components
Estimation of parameters at multiple levels
Robustness on model selection and parameterisation
Multiple Aggregation Prediction Algorithm (MAPA)
32/75
Multiple Aggregation Prediction Algorithm (MAPA)
33/75
Multiple Aggregation Prediction Algorithm (MAPA)
MAPA was developed to take advantage of temporal aggregation and hierarchies:
• MAPA provides a framework to better identify and estimate the different time series components better forecasts
• On average outperforms ETS, one of the most widely used, robust and accurate univariate forecasting models in supply chain forecasting
• It provides reconciled forecasts across planning levels and forecast horizons
• Robust against model selection and parameterisation issues
• Shown to be useful for fast moving items, promotional modelling and intermittent time series forecasting.
MAPA is available for R, in the MAPA package: http://cran.r-project.org/web/packages/MAPA/index.html Its intermittent demand counterpart is available in the tsintermittent package: http://cran.r-project.org/web/packages/tsintermittent/index.html Examples and interactive demos for both are available at my blog: http://nikolaos.kourentzes.com
34/75
Temporal Hierarchies: A modelling framework
• MAPA demonstrated the strength of the approach, but it is not general:
• How to incorporate forecasts from any model/method?
• How to incorporate judgement?
• We can introduce a general framework for temporal hierarchies that borrows
many elements from cross-sectional hierarchies
• Eventually we will get to cross-temporal hierarchies, also touted as the ‘one-
number’ forecast, i.e. a reconciled forecast across planning horizons and
product/customer/location groups.
35/75
Cross-sectional and Temporal Hierarchies
• We know how to do cross-sectional hierarchies
• Top-down, bottom-up, middle-out
• Optimal combinations
Total
UK Spain
Product A Product B Product A Product B
We know how to do this!
… then we know how to do this as well, with some small-print!
but we have to correct for the different scales at each aggregation level, which is not that difficult due to the imposed temporal structure. For that we need to calculate the
covariance matrix between the forecast errors at different aggregation levels. There are easy and not so easy ways to do this.
36/75
Some evidence that it actually works!
Comparison with other M3 results (symmetric Mean Absolute Percentage Error):
• Monthly dataset ‒ Temporal (ETS based): 13.61% ‒ ETS: 14.45% [Hyndman et al., 2002]
‒ MAPA: 13.69% [Kourentzes et al., 2014]
‒ Theta: 13.85% (best original performance) [Makridakis & Hibon, 2000]
• Quarterly dataset
‒ Temporal (ETS based): 9.70% ‒ ETS: 9.94% [Hyndman et al., 2002]
‒ MAPA: 9.58% [Kourentzes et al., 2014]
‒ Theta: 8.96% (best original performance) [Makridakis & Hibon, 2000]
Detailed results available if you are interested at the end of the presentation!
37/75
Application: Predicting A&E admissions
Collect weekly data for UK A&E wards. 13 time series: covering different types of emergencies and different severities (measured as time to treatment) Span from week 45 2010 (7th Nov 2010) to week 24 2015 (7th June 2015) Series are at England level (not local authorities).
Accurately predict to support staffing and training decisions. Aligning the short and long term forecasts is important for consistency of planning and budgeting. Test set: 52 weeks. Rolling origin evaluation. Forecast horizons of interest: t+1, t+4, t+52 (1 week, 1 month, 1 year). Evaluation MASE (relative to base model) As a base model auto.arima (forecast package R) is used.
38/75
Application: Predicting A&E admissions Total Emergency Admissions via A&E
Red is the prediction of the base model (ARIMA) Blue is the temporal hierarchy reconciled forecasts (based on ARIMA)
Observe how information is `borrowed’ between temporal levels. Base models for instance provide very poor weekly and annual forecasts
39/75
Application: Predicting A&E admissions
• Accuracy gains at all planning horizons
• Crucially, forecasts are reconciled leading to aligned plans
40/75
Hierarchical forecasting & decision making
Hierarchical (or grouped) forecasting can improve accuracy, but their true strength lies in the reconciliation of the forecasts aligning forecasts is crucial for decision making.
Is the reconciliation achieved useful for decision making?
Cross-sectional Temporal
• Reconcile across different items. • Units may change at different levels of
hierarchy. • Suppose an electricity demand hierarchy:
lower and higher levels have same units. All levels relevant for decision making.
• Suppose a supply chain hierarchy. Weekly sales of SKU are useful. Weekly sales of organisation are not! Needed at different time scale.
• Reconcile across time units/horizons. • Units of items do not change. • Consider our application. NHS admissions
short and long term are useful for decision making.
• Suppose a supply chain hierarchy. Weekly sales of SKU is useful for operations. Yearly sales of a single SKU may be useful, but often not!
• Operational Tactical Strategic forecasts.
41/75
Cross-temporal hierarchies
Temporal hierarchies permit aligning operational, tactical and strategic planning, while offering accuracy gains useful for decision making BUT there can be cases that strategic level forecasts are not required for each item, but at an aggregate level.
Let us consider tourism demand for Australia as an example. Local authorities can make use of detailed forecasts (temporal/spatial) but at a country level weekly forecasts are of limited use. • Temporal: tactical strategic • Cross-sectional: local country
Cross temporal can support decisions at both dimensions: • Tactical/local; • strategic/local; • tactical/country; • strategic/country
42/75
56 (bottom level) quarterly tourism demand series • 6 years in-sample • 3 years out-of-sample
horizon: up to 2 years • rolling origin evaluation
MAPE %
Cross-temporal hierarchical forecasts: • Most accurate • Most complete
reconciliation (one number forecast)
• Flexible decision making support
43/75
Production ready?
• Multiple Aggregation Prediction Algorithm (MAPA) ‒ Kourentzes, N.; Petropoulos, F. & Trapero, J. R. Improving forecasting by estimating
time series structural components across multiple frequencies. International Journal of Forecasting, 2014, 30, 291-302 (Details)
‒ Petropoulos, F. & Kourentzes, N. Improving forecasting via multiple temporal aggregation. Foresight: The International Journal of Applied Forecasting, 2014, 2014, 12-17 (Easier introduction!)
‒ Petropoulos, F & Kourentzes, N. Forecast combinations for intermittent demand. Journal of the Operational Research Society , 2014, 66.6, 914-924. (Intermittent)
‒ Kourentzes, N. & Petropoulos, F. Forecasting with multivariate temporal aggregation: The case of promotional modelling. International Journal of Production Economics, 2015. (Promotional modelling)
‒ R package on CRAN: MAPA (and tsintermittent for slow movers) ‒ All papers, code and examples available at my website
(http://nikolaos.kourentzes.com)
• Temporal Hierarchies ‒ Working paper at my blog! ‒ R package on CRAN: thief
44/75
Agenda
1. Business forecasting and uncertainty
2. Forecasting challenges from a modelling view
3. Innovations:
3a. Temporal Hierarchies
3b. Promotional Modelling
3c. Using online search information
45/75
What is the promotional modelling problem?
20 40 60 80 100 1200
1000
2000
3000
40005 - Product: 1080215 | Trend: 0 - Season 13: 0 - Season 52: 1
Dem
and
Promocode
Product:1080215
Adcode: 401
Adcode: 406
Adcode: 408
Adcode: 410
Adcode: 412
20 40 60 80 100 1200
0.1
0.2
0.3
0.4
0.5
Dis
count
%
What is the future demand? Normal demand? With promotion(s)?
Price Discount
SKU demand
46/75
A practical approach
• Use statistics for structured tasks that can be automated • Use human expertise for events
Promotional and advertising activity is one of the main reasons for adjusting statistical forecasts
Statistical Baseline Forecast
Expert
Judgement
Final Forecast
Time series methods: e.g. exponential
smoothing
Special events: e.g. promotions,
calendar events, etc
Demand planning forecast
47/75
Reason for using judgment
% indicating important
or extremely important
Promotional & advertising activity 51.4
Price change 45.5
Holidays 36.0
Insufficient inventories 33.3
Changes in regulations 32.7
Government policy 27.9
Substitute products produced by your company 22.5
Activity by competitors (promotions, advertising etc) 20.7
International crises 18.0
Weather 16.7
Insufficient inventories of competitors 16.2
Sporting events 11.8
Strikes 9.9
Other 36.4
% indicating
importance
Why do companies adjust forecasts?
48/75
Promotional modelling basics
What is the best forecast for this time series?
No additional information?
Use univariate modelling; e.g. exponential smoothing
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units
Lets assume a company goes on with this forecast
Take a look at the inventory side of things
(Also, how do you forecast the next promotion?)
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units Safety stock
Out-of-stock: Promotion wasted
Overstocking
erroneously
49/75
Promotional modelling basics
What is the best forecast for this time series?
Unstructured information Use univariate modelling + judgemental adjustments
(S&OP meetings = planners + marketers)
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units
Enough stock, but...
18-Aug-2007 05-Jan-2008 24-May-2008 11-Oct-2008 28-Feb-2009 18-Jul-2009100
150
200
250
300
350
400
450
500
550
Units Safety stock
Wrong!
Promotional modelling requires building formal
statistical models that capture promotions in their
forecasts (systematically elasticities = decision
making) and therefore provide correct safety stocks
50/75
Promotional modelling basics
If inventory management does not account for the uncertainty (due to
events/promotions) correctly, then…
51/75
A statistical approach
We can build a forecast using statistical models, enhanced by: • Promotions • Price • Store • Events • Competition • …
10 20 30 40 50 60 70 80 90 100500
1000
1500
2000
2500
3000
3500
y(t) = 6.91 * Promo1(t)0.55 * Promo
2(t)0.69 * Promo
3(t)0.17 * Promo
2(t-1)-0.20 * Promo
3(t-1)-0.24; adj. R2 = 0.970
Period
Sale
s
-4 -2 0 2 40
5
10
15
20
25Histogram of standardised residuals
Standardised residuals
Fre
quency
-200 0 200 400500
1000
1500
2000
2500
3000
3500
Residuals
Actu
als
Scatterplot of residuals vs actuals
Promo_1 Promo_2 Promo_3-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
KeepPromo
3
Positive
interaction
Variable
Coeff
icie
nt-
1
Promotional impact
Actuals
Fit
Prediction
Promo(t)
Promo(t-1)
52/75
A statistical approach
What can go wrong? A lot!
• Effects across SKUs, categories, …
• Cannibalisation, how to identify it?
• Multiple effects multicollinearity (i.e. things happening together and
then it is difficult to identify which causes what)
• Modelling with limited data (or no data!)
• Scalability run the model for multiple time series within reasonable
times
53/75
Promotional forecasting: a case study
Used a subset of 60 SKUs:
• Sales • One-step-ahead system forecasts (SF) • One-step-ahead adjusted or final forecasts (FF) • Promotional information:
1. Price cuts 2. Feature advertising 3. Shelf display 4. Product category (22 categories) 5. Customer (2 main customers) 6. Days promoted in each week
Data withheld for out-of-sample forecast evaluation Some products do not contain promotions in the parameterisation (historical) sample
54/75
Promotional forecasting: a case study
Periods highlighted in grey have some type of promotion
55/75
Promotional forecasting: a case study
The advanced promotional model contains the following elements:
• Principal component analysis of promotional inputs: The 26 promotional inputs are combined a new composite inputs that are no longer multicollinear → Only a few new inputs are needed now, simplifying the model.
• Pooled regression: When an SKU does not have promotions in each history, we pool together information from other SKUs (the product category information is available to the model) to identify an average promotional effect. When enough promotional history for an SKU is available we do not pool information from other products
• Dynamics of promotions: Carry-over effects of promotions, after they are finished, are captured
• Dynamics of the time series: Demand dynamics are modelled for both promotional and non-promotional periods.
56/75
Promotional forecasting: a case study
A series of benchmark models are used to assess the performance of the proposed promotional model
• System Forecast (SF): The baseline forecast of the case study company
• Final Forecast (FF): This is the SF adjusted by human experts in the company
• Naïve: A simple benchmark that assumes no structure in the demand data
• Exponential Smoothing (SES): An established benchmark that has been shown to perform well in supply chain forecasting problems. It cannot capture promotions
• Last Like promotion (LL): This is a SES forecast adjusted by the last observed promotion. When a promotion occurs the forecast is adjusted by the impact of the last observed promotion
Note that only FF and LL can capture promotions from our benchmarks
57/75
Promotional forecasting: a case study Forecast accuracy under promotions
Positive adjustments Negative adjustments
High error
Low error
Number of forecasts
Size of adjustment
DR: Advanced Promotional Model
58/75
Promotional forecasting: a case study
59/75
Forecast accuracy under promotions High error
Low error DR is always better under promotions
Human judgment performs
inconsistently
LL has overall poor accuracy
DR is more accurate than human experts and statistical benchmarks in promotional periods
Promotional forecasting: a case study
Forecast accuracy in non-promotional periods High error
Low error
DR is models accurately carry-over promotional
effects
Human judgment performs poorly No reason for
positive adjustments?
LL has overall poor accuracy
DR marginally worse that SF
60/75
DR accurately captures carry-over promotional effects. SF is marginally better because the forecasts are based on adjusted historical demand. DR is modelled in raw data.
Promotional forecasting: a case study
0.607 0.652 0.762 0.629 0.612 0.601 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
SF FF Naïve SES LL DR
1.069 1.327 1.092 0.904 1.122 0.868 0
0.2
0.4
0.6
0.8
1
1.2
1.4
SF FF Naïve SES LL DR
0.671 0.745 0.808 0.667 0.682 0.638 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
SF FF Naïve SES LL DR
High error
Low error
Overall Promotions No promotions
Be
st
Be
st
Be
st
• Advanced promotional model is consistently the most accurate
• Substantially outperforms current case study company practice (FF)
• Major improvements during promotional periods
+7.8%
+34.6%
+14.4%
0.0% 10.0% 20.0% 30.0% 40.0%
No promotions
Promotions
Overall
Improvement of DR over FF
61/75
Promotional forecasting: a case study
1. Advanced statistical promotional model significantly outperform human experts and statistical benchmarks
2. Company can benefit from increased automation of the forecasting process and reduced effort of experts to maintain forecasts, while accuracy is increased
3. Can produce promotional forecasts when there is no or limited promotional history for an SKU
4. Overcomes limitations of previous promotional models
5. Paper available: Trapero, J R., Kourentzes N. and Fildes R. On the identification of sales forecasting models in the presence of promotions. Journal of the Operational Research Society, 2014, 66.2, 299-307. (available at my blog)
62/75
Temporal hierarchies & promotions
Multivariate MAPA allows capturing the effects of external variables at different temporal scales (short vs. long-term effects). However, temporal aggregation may increase further the multicollinearity between variables, making the estimation of ci problematic. Consider the example of binary dummies, which will tend to become similar at high levels of temporal aggregation (e.g. promotional dummies at annual level). • Use principal components to mitigate this problem • Principal components construct composite orthogonal inputs • Useful also when various promotional campaigns happen simultaneously
63/75
Case study: major UK manufacturer of cider drinks. • 12 SKUs of a popular cider brand • Promotions at customer/store level – multiple at the same time • Weekly data: 86 observations for estimation, 18 observations for test • Predict t+4, t+8 and t+12 periods ahead (1, 2 and 3 months). • Rolling origin evaluation.
Case study
64/75
Case study
Bias – sME Method Horizon: 4 Horizon: 8 Horizon: 12 Naive -0.139 -0.194 -0.282 ETS -0.248 -0.288 -0.374 MAPA -0.229 -0.272 -0.410 Regression -0.305 -0.317 -0.482 ETSx -0.198 -0.164 -0.227 MAPAx -0.112 -0.100 -0.213
Accuracy - sMAE Method Horizon: 4 Horizon: 8 Horizon: 12 Naive 0.743 0.818 0.704 ETS 0.708 0.776 0.702 MAPA 0.678 0.758 0.737 Regression 0.611 0.659 0.714 ETSx 0.675 0.666 0.615 MAPAx 0.532 0.545 0.548
t+4 t+8 t+130.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Horizon
sM
AE
Naive
ETS
MAPA
Regression
ETSx
MAPAx
t+4 t+8 t+130.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Horizon
sM
AE
Naive
ETS
MAPA
Regression
ETSx
MAPAx
scaled Mean Absolute Error
Superior bias and error variance safety stock implications
65/75
Agenda
1. Business forecasting and uncertainty
2. Forecasting challenges from a modelling view
3. Innovations:
3a. Temporal Hierarchies
3b. Promotional Modelling
3c. Using online search information
66/75
Using consumer search behaviour
• Another way to manage uncertainty is to use additional external information this must be leading sales, so that it is available for the forecasting model to use.
• Nowadays there is abundant information about how people search online, share posts, consume media, etc.
• Can this information help us produce better forecasts?
• Operational forecasting: evidence suggests that there is little gain, if any, given the model complexity.
• New product launches and lifecycle modelling promising!
67/75
Case study: forecasting computer games
02
04
060
80
10
0
Week since Launch
Peak S
cale
Call of Duty 3
−50 −20 1 20 40 60 80 100 130 160 190 220 250 280 310 340 370 400 430
Actual SalesGoogle Trends
02
04
060
80
10
0
Week since Launch
Peak S
ca
le
Wii Sports
−40 −20 1 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420
Actual SalesGoogle Trends
68/75
Different generations of games 0
600000
Sale
s
j=1 j=2 j=3 j=4 j=5 j=6 j=7
Sales for Assassin's Creed
040
80
Peak s
cale
d s
earc
h tra
ffic
j=1 j=2 j=3 j=4 j=5 j=6 j=7
Google Trends
69/75
The idea
• Gamers may search online for a game for multiple reasons.
• Pre-launch: news, pre-views, buy
• After-launch: reviews, updates, walkthroughs, buy, …
• We are interested in information that can be known well in advance, so that it can support meaningful decisions.
• Enhance lifecycle models (Bass) with pre-launch online search information (and from previous generations) increase confidence & accuracy, given minimal data availability.
• Focus on “market size” parameter.
70/75
Modelling process 0
200000
Sale
s
j=1 j=2 j=3 j=4
Sales
040
80
Peak s
cale
d s
earc
h tra
ffic
j=1 j=2 j=3 j=4
Search traffic
0200000
Sale
s
j=1
Sales
040
80
Peak s
cale
d s
earc
h tra
ffic
j=1
Search traffic
0200000
Sale
s
pj = pj−1
qj = qj−1
mj = Search Traffic
Actuals
Forecast
j=1 j=2
Sales
040
80
Peak s
cale
d s
earc
h tra
ffic
t
j=1 j=2
Search traffic
0200000
Sale
s
pj = pj−1
qj = qj−1
mj = Search Traffic
Actuals
Forecast
j=1 j=2 j=3
Sales
040
80
Peak s
cale
d s
earc
h tra
ffic
t
j=1 j=2 j=3
Search traffic
0200000
Sale
s
Actuals
Forecast
j=1 j=2 j=3 j4
Sales
040
80
Peak s
cale
d s
earc
h tra
ffic
t
j=1 j=2 j=3 j=4
Search traffic
71/75
Case study: evaluation
• Use 6 games series (43 games in total)
• Bass model with alternative estimations for market size:
• Naïve (same parameter as last generation)
• Naïve + difference between generations
• Linear trend for the parameter
• AR(1) process for the parameter
• Parameter is a function of google search data
• “Optimal”: estimate model once all data is available.
• Use relative mean absolute errors (denominator is with google search), error <1 means that it is improving.
72/75
Case study: results
Naïve Naïve Diff. Linear Trend
AR(1) Optimal
1.020 1.087 1.086 1.070 0.866
Horizon: 1 week
<1 = better
Naïve Naïve Diff. Linear Trend
AR(1) Optimal
1.007 1.143 1.088 1.033 0.861
Horizon: 6 weeks
<1 = better
In all comparable cases there are gains in using online search information! This is ongoing work…
73/75
Conclusions
• Managing uncertainty is one of the main objectives of business forecasting leads to superior decision making.
• We can do that by: • Better capturing historical structure of sales data • Using additional information (e.g. promotions, discounts, online search)
• Temporal hierarchies joins operational, tactical and strategic decision
making by reconciling forecasts `one-number’ forecast superior decision making.
• Statistical promotional modelling accurate and automatic at SKU level leaves human experts to focus on unstructured information.
• Online search behaviour gains in modelling sales using pre-launch information superior forecasts, but can we shape future demand?
74/75
Thank you for your attention! Questions?
Nikolaos Kourentzes
email: [email protected]
blog: http://nikolaos.kourentzes.com
Published, working papers and code available at my blog!
75/75
Appendix
Detailed M3 results for temporal hierarchies
Some evidence that it actually works!
BU: Bottom-Up; WLSH: Hierarchy scaling; WLSv: Variance scaling; WLSs: Structural scaling
756 series, forecast t+1 - t+8 quarters ahead
M3 quarterly dataset % error change over base % error change over base
ETS ARIMA
Some evidence that it actually works!
ETS ARIMA
BU: Bottom-Up; WLSH: Hierarchy scaling; WLSv: Variance scaling; WLSs: Structural scaling
1453 series, forecast t+1 - t+18 months ahead
M3 monthly dataset % error change over base % error change over base
Appendix
Calculation details for temporal hierarchies
Temporal Hierarchies - Notation
Non-overlapping temporal aggregation to kth level:
Annual
Semi-annual
Quarterly
Observations at each aggregation level
Temporal Hierarchies - Notation
Collecting the observations from the different levels in a column:
We can define a “summing” matrix S so that:
Lowest level observations
Annual
Semi-annual
Quarter
Example: Monthly
Aggregation levels k are selected so that we do not get fractional seasonalities
Temporal Hierarchies - Forecasting
We can arrange the forecasts from each level in a similar fashion:
The reconciliation model is: Reconciliation errors how much the
forecasts across levels do not agree
Summing matrix
Reconciled forecasts
Unknown conditional means of the future values at lowest level
The reconciliation error has zero mean and covariance matrix
Temporal Hierarchies - Forecasting
If was known then we can write (GLS estimator):
But in general it is not know, so we need to estimate it. It can be shown that is not identifiable (you need to know the reconciled forecasts, before you reconcile them), however:
Reconciliation errors
Covariance of forecast errors
So our problem becomes:
Temporal Hierarchies - Forecasting
All we need now is an estimation of W
Sample covariance of
in-sample errors
In principle this is fine, but its sample size is controlled by the number of top-level (annual) observations. For example 104 observations at weekly level, results in just 2 sample points (2 years). So the estimation of is typically weak in practice.
Temporal Hierarchies - Forecasting
We propose three ways to estimate it, with increasing simplifying assumptions. Using as example quarterly data the approximations are: Hierarchy variance scaling Series variance scaling Structural scaling
Diagonal of covariance matrix less elements to
estimate
Assume within level equal variances. This is
what conventional forecasting does.
Increases sample size.
Assume proportional error variances. No need for estimates can be used when
unknown (e.g. expert forecasts).