High De nition Price elasticity of demand for current ...
Transcript of High De nition Price elasticity of demand for current ...
Submitted to Management Sciencemanuscript
High Definition Price elasticity of demand for currentConsumer Packaged Goods and Retail challenges: a
new hyperparameter optimization strategy fordynamic linear models
V. Sanchez-Gil, D. PizarrosoData Science at Wise Athena [email protected], wiseathena.com
R. A. Queralt, J. M. Lopez ZafraCUNEF, [email protected]; [email protected]
A new strategy has been developed for finding the best input variances of the state transitions for
any dynamic linear model based on its predictive capability. This new approach, here called as Minimum
Predictive Error (MinPE) criteria, has been applied to measuring price elasticity of demand in order to
evaluate the sensitivity of consumers to daily price changes. The method has been compared to static linear
models using real data from consumer packaged goods companies. It has been proven that subtle oscillations
in price elasticity of demand are essential for correctly modeling the observed sales. The new proposed criteria
for estimating the input variances compared to those provided by the Maximum Likelihood Estimation
(MLE) method not only increases predictive capability of the model but also explores different and consistent
results rejected by MLE. Moreover, it has been proven that subtle daily deviations in the price elasticity of
demand can be unraveled only using MinPE criteria.
Key words : price-demand elasticity, Kalman filter, dynamic linear model, DLM, predictive elasticity, CPG
companies, Maximum Likelihood Estimation, MLE
1. Introduction
Since the 19th century, price elasticity of demand has been used as main indicator for
describing the behavior of products in the market. It is a valuable metric to evaluate
product sensitivity to price changes, and is essential in pricing and trade promotions for
companies all over the world (Heerde et al. (2013), Kocabiyikoglu and Popescu (2011)).
This is reflected in the several manuscripts recently published about price elasticity of
demand for food (Andreyeva et al. (2010)), water (Espey et al. (1997)), gasoline (Hughes
et al. (2006)), housing (Green et al. (2005)) and online products (Granados et al. (2011)).
1
Author: New hyperparameter optimization strategy for dynamic linear models2 Article submitted to Management Science; manuscript no.
Equally relevant is demand forecasting, which has a major impact in stock and pro-
duction planning, inventory management and new products launch in fields as diverse as
Energy (Suganthi and Samuel (2012)), Tourism (Law (2000)), Retail (Ma et al. (2016))
and grocery stores (Ali et al. (2009)). Nowadays, decision-making processes in Consumer
Packaged Goods and Retail companies strongly depend on accurate predictive models for
demand and price elasticity.
Even though studies of elasticity dynamics have provided valuable evidences of its time
variations and its importance for pricing products optimally (Simon (1979), Simon (1989),
Liu and Hanssens (1981), Fibich et al. (2005)), there are no specific computational methods
in the literature for estimating these time variations. The present study tries to correct
this deficiency.
Among the many models available, Dynamic Linear Models (DLM) with time-varying
parameters (Young (2011)) prove extremely useful for evaluating underlying unobserved
variables such as the price elasticity of demand and its variations over time. In combination
with the Kalman Filter (KF), dynamic linear models provide insightful solutions in many
different business problems (Beravs et al. (2012), Vaquero et al. (2017), Li et al. (2004),
Fei et al. (2011)). Since DLM models are very sensitive to input variances, Maximum
Likelihood Estimation (MLE) has been widely used for decades in order to find the best
possible combination (Myung (2003), McCulloch (1997)).
In this article, we propose a new hyperparameter optimization strategy for dynamic
linear models, called MinPE, as an alternative for the classic MLE. The MinPE perfor-
mance in predicting demand and price elasticity, has been compared to those provided
by a dynamic linear model with MLE and a static linear model, based on real prices and
quantities demanded for 83 products.
2. Method description
2.1. Demand modeling
The demand of a product is affected by its price, seasonal effects, marketing campaigns,
cannibalization and competitors effects coming from other products, etcetera (Kapoor
and Ravi (2016)). Since the aim of this study is estimating price elasticity of demand
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 3
fluctuations we are assuming that daily demand of a product in the market at time t,
defined as qt, is mainly affected by its price pt, and its weekly seasonality. It can be linearly
modeled as:
log qt = βt · log pt + γ1,t · d1,t + γ2,t · d2,t + γ3,t · d3,t + γ4,t · d4,t + γ5,t · d5,t + γ1,t6 · d6,t +αt (1)
where βt is the price elasticity of demand, d1,t, d2,t . . . d6,t are binary variables from
Monday to Saturday and αt is an additive value for a given day t. Sunday is not required
as it is collinear to the remaining days of week. Long term seasonality effects (monthly,
yearly. . . ), cannibalization and competitors effects could be added to the model. It is
important to note that this demand model could be generalized for any aggregation level,
from daily sales of a product in a store to monthly sales of one category of goods using a
weighted monthly price (Hoch et al. (1995)).
2.2. Static linear model (SLM)
As a first approximation, we use a static linear model where all time-dependent linear
parameters from eq. [1], βt, γ1,t. . . γ6,t and αt, are constant over time. Therefore, a constant
descriptive value for price elasticity of demand over a range of dates can be easily obtained
with a multiple linear regression from real historical data. Due to its simplicity, the SLM
approach is one of the most common procedures followed by companies for estimating price
elasticities (Shy (2008)). Nevertheless, this approach is unstable and very sensitive to local
effects for short periods of time.
2.3. Dynamic linear model with Kalman Filter
The second approach to be considered in this analysis is a dynamic linear model (DLM)
where all linear parameters in eq. [1] are time-dependent. In contrast to SLM, DLM pro-
vides a both continuous and soft time series for βt, γ1,t . . . γ6,t and αt.
For this study, we propose a State-Space Model (SSM) with an integrated random walk
(Young (2011)) for price elasticity βt with its time variations coming from its derivative
δt, elementary random walks for weekly seasonality γ1,t. . . γ6,t and the additive value αt
(see appendix A). The state vector θt, containing statistical noise, changes randomly over
Author: New hyperparameter optimization strategy for dynamic linear models4 Article submitted to Management Science; manuscript no.
time according to state transition equations. The Kalman filter (KF) provides an educated
guess of the current value of the state vector based on joint probability distributions every
time we make a new observation (Kalman (1960), Grewal and S. (2011)). The observation
and transitions equations are mathematically described as:
log qt = Ftθt + εt ; εt ∼N(0, σ2q) (2)
θt =Gtθt−1 +ωt ;ωt ∼N(0,Wt) (3)
where the 9-dimensional state vector, θt and Gt, Ft and Wt matrices are correspondingly
defined as:
θt =
βt
δt
γ1,t...
γ6,t
αt
, Ft =
[pt 0 d1,t d2,t d3,t d4,t d5,t d6,t 1
], (4)
Gt =
1 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
, Wt =
σ2β 0 0 0 0 0 0 0 0
0 σ2δ 0 0 0 0 0 0 0
0 0 σ2γ 0 0 0 0 0 0
0 0 0 σ2γ 0 0 0 0 0
0 0 0 0 σ2γ 0 0 0 0
0 0 0 0 0 σ2γ 0 0 0
0 0 0 0 0 0 σ2γ 0 0
0 0 0 0 0 0 0 σ2γ 0
0 0 0 0 0 0 0 0 σ2α
. (5)
As can be seen in the Wt matrix, it is assumed that the six weekly seasonality terms
have the same transition variance σ2γ = σ2
γ,1 . . . σ2γ,6, hence the three variances of the state
transitions to be optimized are σ2γ, σ
2δ and σ2
α. More detailed information about the selected
SSM can be found in appendix A.
DLM with KF can forecast further into the future by a sequence of generation steps
without observation steps, but they are extremely sensitive to the chosen variances of
state transitions which define the variability and the shape of the resulting state vector
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 5
curves. For this reason, there are different strategies available in order to find the best
combination of transition variances. In this article, we compare the classical Maximum
Likelihood Estimation (MLE) variances optimization strategy with the new approach called
Minimum Predictive Error (MinPE).
2.3.1. Maximum Likelihood Estimation (DLM+MLE) Maximum Likelihood Esti-
mation (MLE) finds the hyperparameters combination for a given model that maximizes
the likelihood of observing the real historical data (White (1982), Akaike (1998)). In this
case, MLE automatically finds the values of the variances σ2δ , σ
2γ and σ2
α that better fit
the observed demand values q. Initial starting variances chosen for MLE optimization are
σ2init,δ = σ2
init,γ = σ2init,α = 10−5 and the maximum number of models tried per product is 364.
The actual number of trained models varies from 133 to 364 when the convergence criterion
is reached before all combinations have been calculated. This means that, in average, 203
models are tested per product.
2.3.2. Minimum Predictive Error (DLM+MinPE) In contrast, Minimum Predictive
Error criteria finds the parameters combination for a given model that minimizes the
predictive error in the test dataset (Sanchez-Gil et al. (2019)). The main idea behind this
approach is that the higher the predictive capability, the more robust the model is. As a
standard practice in machine learning models, historical data are split in two groups of
data: train and test. Train dataset contains most of the available data and it is known by
DLM and KF for estimating the state vector θt. The remaining available data conform the
test dataset, a small fraction of the whole available data that are not visible to the model
(usually the most recent data) for testing purposes. In this case, 365 days and the last 30
days are used in train and test datasets, respectively.
All different combinations from a finite list of possible values of the required transition
variances are proven, and for every combination the predictive error of the trained model
is evaluated using the test dataset following the workflow diagram shown in figure 1. In
order to measure the error of the forecasting model, the metric being used is the mean
absolute percentage error, mape, in the last N test days:
mape(q,q) =1
N
N∑j=1
∣∣∣∣ qj − qjqj
∣∣∣∣ , (6)
Author: New hyperparameter optimization strategy for dynamic linear models6 Article submitted to Management Science; manuscript no.
where qj is the predicted value for a given combination of transition variances compared
with the observed one qj. Other error metrics such as mae, wmae, mse, rmse, etcetera
could be also used in this new approach.
Figure 1 Flow diagram of MinPE hyperparamater optimization strategy applied to demand prediction and price
elasticity estimation.
In our demand model, the list of possible values for the three transitions variances σ2δ ,
σ2γ, σ
2α are given by powers of ten from 10−5 to 10−10, from σ2
δ · 100 to σ2δ · 10−5 and from
σ2δ · 100 to σ2
δ · 10−5 respectively. Different ranges have been chosen on purpose, exclusively
testing equal or lower (but not higher) values for σ2γ, σ
2α compared to σ2
δ in order to enable
β fluctuations. Therefore, 216 models are trained per product for finding the optimal
variances combination (of the same order of magnitude as 203 mean attempts by SKU
tried using MLE criteria). In complex models with a higher number of input variances,
testing enough combinations in reasonable computational times would require efficient
search algorithms.
3. Input Data
Input data used in this article for testing and comparing the different approaches, provided
by Wise Athena c©, are daily sellout prices and sales at a given point of sale. Wise Athena c©
helps CPG companies to increase their sales and margins by optimizing pricing and trade
promotion through machine learning models fed with very granular data of dozens of
CPG companies from different countries. These input data have been anonymized for
confidentiality reasons. Among all the available pieces of information, the most recent
aggregated data at Stock-Keeping Unit (SKU) level of 83 selected products from different
countries have been used for an easy comparison of results (83 time series from August
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 7
2017 to September 2018 with 395 data points). In the case of DLM+MinPE, the train
dataset contains 365 points from 31th August 2017 to 31th August 2018 and the test
dataset contains 30 points corresponding to 1st-30th September 2018. Two time series
examples are shown in figure 2, where the dashed vertical line indicates the border between
train and test datasets. The remaining input data are attached as additional information
in Supplementary Materials section.
SKU1 SKU2
2500
5000
7500
10000
q
1.5
2.0
2.5
Oct 2017 Jan 2018 Apr 2018 Jul 2018 Oct 2018
Date
p
0
20000
40000
60000
80000
q
75
80
85
90
95
Oct 2017 Jan 2018 Apr 2018 Jul 2018
Date
p
Figure 2 Daily units sold q (above) and the corresponding daily prices p (below) of two example products. In
both examples, vertical dashed line indicates the border between train (left) and test (right) datasets.
4. Results
All calculations have been performed with R programming language and R library dlm
(R Development Core Team (2008), Petri (2010), Petri et al. (2009)) in 8 QEMU virtual
processors (cpu64-rhel6) with a total memory size of 118GiB supplied by Wise Athena c©.
Under these conditions, computer processing times by SKU for SLM, DLM+MLE and
DLM+MinPE are 0.05, 14.29 and 27.39 seconds. SLM is around ∼300 and ∼500 times
faster than dynamic linear models with MLE and MinPE criteria. As might be expected,
training around 200 models with time-dependent variables for providing a winning com-
bination of input variances is a lengthy task. In the case of DLM+MinPE, performing
all steps described in figure 1 for a given input variances combination takes about 0.13
seconds. It is important to note that powerful machines in terms of memory size are not
strictly necessary, since the most relevant requirement for the dlm library is the CPU
performance.
Author: New hyperparameter optimization strategy for dynamic linear models8 Article submitted to Management Science; manuscript no.
1e−06
1e−03
1e+00
−4 −2 0
β
σ2 (β)
SLM
1e−06
1e−03
1e+00
−4 −2 0
β
σ2 (β)
DLM+MLE
1e−06
1e−03
1e+00
−4 −2 0
β
σ2 (β)
0.0
0.2
0.4
0.6
0.8
DLM+MinPE
Figure 3 Mean price elasticity of demand β versus its variance σ2(β) for the 83 goods under study. Color scale
denotes SKU mape prediction error in test. Points align below correspond to SKUs with constant β
(σ2(β)< 10−8). DLM+MinPE is the only method that always provides significant fluctuations for price elasticity.
Regarding demand estimation performance for the three approaches, static linear models
exhibit higher mape errors than dynamic linear models in train and test datasets (see
table 1). This fact evinces that time variations of β, γ, and α from eq. 1 are important for
correctly describing demand (as is clearly shown in train dataset of figure 5). Although MLE
maximizes the likelihood of observing input data, is not the method that provides the best
fit to the train dataset. This can be attributed to both the local minima and the fact that
only a small fraction of the entire state space is scanned using MLE. Alternatively, MinPE
criteria is the best at describing the demand in the train dataset in most of the products.
This could denote that uncorrelated combinations of input variances help faster sampling
of different regions of state space. Besides, having an extra requirement in convergence
criteria (MinPE criteria) reduces the number of equivalent solutions that correctly fit train
and test data at the same time. Unsurprisingly, DLM+MinPE is the best approach at
predicting demand on a test dataset since MinPE criteria optimizes this value.
Table 1 Each row shows on top the mean absolute percentage errors in train and test datasets for the three
approaches considered and the number of SKUs for which each approach yields the smallest error in the train and
test datasets, at the bottom. In both datasets, MinPE is the model with the smallest error and is the winning
model for most of the 83 products in study.
SLM DLM+MLE DLM+MinPE
mapetrain 19.59% 17.31% 17.10%
winningtrain 0 29 54
mapetest 31.81% 30.01% 25.69%
winningtest 5 0 78
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 9
Concerning price elasticity estimation, table 2 shows the percentage of products with
constant elasticity (β time series with σ2 < 10−8 are considered as constant) according
to SLM, DLM+MLE and DLM+MinPE. Whereas SLM always provides static price elas-
ticities, DLM+MLE also exhibits constant results for 56 of the 83 products. Conversely,
all estimated elasticities using DLM+MinPE display fluctuations of varying intensity. In
14 of the remaining 27 cases for which β estimated with MLE shows variation over time,
MLE and MinPE provide similar results (see Supplementary Materials). In all cases, the
elasticities provided by all three methods are the same order of magnitude (between -5.5
and 0.5). For a large majority of SKUs, the non-constant price elasticity estimated by
DLM+MLE is the best one at describing and predicting the real demand. All these results
for the 83 products under study are summarized in fig. 3.
Table 2 Percentage of SKUs where the estimated price elasticity of demand β is constant over time for the
three considered approaches (β time series with σ2 < 10−8 are considered as constant). DLM+MinPE is the only
approach that systematically detects β fluctuations for all tested SKUs.
SLM DLM+MLE DLM+MinPE
constant β 100% 67% 0%
Finally, it is important to highlight the convergence issues found with MLE for a small
fraction of products not included in this study (less than 1%). In these cases, MLE does
not provide any valid solution as the convergence iterative process is not fulfilled for any
of the trained models (Mantel and Myers (1971)). MinPE criteria overcomes this problem
since it simply selects the model with the lowest test error.
4.1. Two SKU examples: a detailed review
For a better understanding, this section includes a more detailed examination of the results
provided by all three approaches for both products shown in figure 2.
Sku1 input data show that the largest price promotion in October 2017 did not bring a
relevant sales increase. However, the two price promotions around July 2018 and the multi-
ple price reductions in September 2018 brought visible demand raises ata lower investment
(see figure 2). This price sensitivity change over time is consistent with the price elasticity
of demand provided by DLM+MinPE (see fig. 4) where β of SKU1 continuously decreases
Author: New hyperparameter optimization strategy for dynamic linear models10 Article submitted to Management Science; manuscript no.
SKU1 SKU2
−1.40
−1.35
−1.30
−1.25
−1.20
Oct 2017 Jan 2018 Apr 2018 Jul 2018 Oct 2018
Date
β
DLM+MinPESLMDLM+MLE
−3
−2
−1
0
Oct 2017 Jan 2018 Apr 2018 Jul 2018
Date
β
DLM+MinPESLMDLM+MLE
Figure 4 Estimated price elasticities of demand β for four different products according to SLM (red),
DLM-MLE (blue) and DLM-MinPE (yellow) models. In both examples, vertical dashed line indicates the border
between train (left) and test (right) datasets. In both examples, only using MinPE criteria can subtle changes in
elasticity be observed.
during the last 13 months. Despite the fact that MLE provides a slightly better fit to the
train dataset (see table 3), MinPE finds a similar solution in terms of accuracy which also
unveils price elasticity changes over time. For this sample product SKU1, the optimal σ2δ
provided by MinPE is four orders of magnitude higher than the variance found by MLE
(see table 3).
Table 3 From top to bottom, mape predictions errors in the train and test datasets and the optimal
combinations of input variances σ2δ , σ2
γ , σ2α provided by SLM, DLM+MLE and DLM+MinPE for both SKU
examples.
SKU1 SKU2
SLM DLM+MLE DLM+MinPE SLM DLM+MLE DLM+MinPE
mapetrain 24.20% 21.77% 22.81% 100.36% 33.89% 33.18%
mapetest 38.10% 38.17% 37.80% 46.13% 20.21% 11.25%
σ2δ - 8.0 · 10−13 10−9 - 3.5 · 10−15 10−7
σ2γ - 7.9 · 10−11 10−13 - 4.3 · 10−15 10−7
σ2α - 2.3 · 10−4 10−11 - 7.5 · 10−3 10−7
In the case of SKU2, figure 4 shows that MinPE provides a radically different solution
incompatible with the two other approaches. Whereas β provided by DLM+MinPE indi-
cates that SKU2 is an elastic product with values around -3, SLM and DLM+MLE estimate
that it is barely price-sensitive with β positive values close to zero. Input data from figure
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 11
2 display price promotions in October 2017, January 2018 and September 2018 that posi-
tively affect demand, indicating an elastic behavior in full agreement with DLM+MinPE
results. Moreover, demand estimation using the MinPE criteria yields the lowest errors
(see table 3) and the best fitting, especially, in the test dataset (see figure 5). The price
reduction with its correlating demand increase in the test dataset (see figure 2), can only
be explained with a negative and relatively high price elasticity. As table 3 shows, accord-
ing to MLE most of demand fluctuations of SKU2 came from the additive term α while
MinPE finds same value 10−7 for the three optimized variances. In this specific example,
the best input variances combination (not explored by MLE) can be found using MinPE
criteria only.
Train SKU2 Test SKU2
0
20000
40000
Oct 2017 Jan 2018 Apr 2018 Jul 2018
Date
q
DLM+MinPESLMDLM+MLE
20000
40000
60000
80000
Aug 06 Aug 13 Aug 20 Aug 27
Date
q
DLM+MinPESLMDLM+MLE
Figure 5 Demand prediction in train and test dataset using using SLM (red), DLM-MLE (blue) and
DLM-MinPE (yellow) approaches compared with real provided sales (grey). In both datasets, MinPE criteria
yields the smallest predictive errors.
Ultimatelly, fluctuations in the β, γ and α defining variables of demand from equation
1 (price elasticity, weekly seasonality and additive terms), have a deep impact on taking
short and long term pricing decision-making. In figure 6, three demand curves of product
SKU1 in three different days show distinct behavioral changes during the year depending
on estimated daily values by DLM+MinPE. As dynamic linear models can predict demand
and price elasticity values in the near future (generations steps without observations),
pricing strategies based on DLM+MinPE results could help CPG and Retail companies
determine the optimal price every promotional cycle.
Author: New hyperparameter optimization strategy for dynamic linear models12 Article submitted to Management Science; manuscript no.
2000
4000
6000
8000
10000
1.5 2.0 2.5
p
q
15th Sept 17
2000
4000
6000
8000
10000
1.5 2.0 2.5
p
q
12th March 18
2000
4000
6000
8000
10000
1.5 2.0 2.5
p
q
16th Sept 18
−1.41
−1.36
−1.31
β
−0.2
0.0
0.2
γ
9.40
9.42
9.44
Oct 2017 Jan 2018 Apr 2018 Jul 2018 Oct 2018
Date
α
Figure 6 Demand curve evolution over time (in three different days marked with grey points below) and the
three time series (price elasticity of demand β, weekly seasonality γ and the additive value α) calculated with
DLM+MinPE for SKU1. The third demand curve and the time series to the right of the vertical dashed line are
predictive values within the test dataset.
5. Conclusions
In this article, it has been shown that fluctuations over time are really important for
correctly modeling the demand of a product. It has also been proven that among the three
tested approaches, only using a dynamic linear model with the new MinPE criteria can
daily changes in elasticity be observed. These fluctuations open the possibility for more
effective everyday and promotional pricing strategies by CPG and Retail companies that
could exploit these small changes in price sensitivity.
Despite the fact that MinPE criteria is slightly more computationally expensive than
MLE, it is a more robust and relatively easy to implement alternative to MLE for opti-
mizing input variances in DLM. It has been proven that exploring solutions which provide
small errors in train as well as in test according to MinPE criteria, achieves singular and
more consistent results with the observed input data. This happens thanks to better sam-
pling of the state space. Finally, demand predicting errors using DLM+MinPE compared
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 13
with those provided by SLM and DLM+MLE are the lowest for the vast majority of the
products tested.
Appendix A: State-Space model (SSM)
The observation equation for our proposed demand model follows eq. [1] with an additive term, εt, arising
from the zero-means observation noise:
log qt = βt · log pt + γ1,t · d1,t + γ2,t · d2,t + γ3,t · d3,t + γ4,t · d4,t + γ5,t · d5,t + γ1,t6 · d6,t +αt + εt, (7)
where εt ∼N(0, σ2q ) follows a normal distribution with variance σ2
q which is directly calculated from an
observed q time series.
Time evolution of linear parameters is mathematically described by the following nine state transition
equations:
βt = βt−1 + δt−1 +ωt,β , ωt,β ∼N(0, σ2β) (8)
δt = δt−1 +ωt,δ , ωt,δ ∼N(0, σ2δ ) (9)
γ1,t = γ1,t−1 +ωt,γ1 , ωt,γ1 ∼N(0, σ2γ1
) (10)
...... (11)
γ6,t = γ6,t−1 +ωt,γ6 , ωt,γ6 ∼N(0, σ2γ6
) (12)
αt = αt−1 +ωt,α , ωt,α ∼N(0, σ2α) (13)
where the price elasticity, β, follows an integrated random walk driven by its derivative δ, and the remaining
state variables δ, γ1 . . . γ6 and α are ruled by simpler randoms walks. For the sake of simplicity, we assume
equal variances for the six components of the weekly seasonality σ2γ = σ2
γ1= · · ·= σ2
γ6.
Observation and transitions equations can be translated into vectors and matrices for having a condensed
view for our proposed SSM:
log qt = Ftθt + εt ; εt ∼N(0, σ2q ) (14)
θt =Gtθt−1 +ωt ;ωt ∼N(0,Wt) (15)
where the 9-dimensional state vector, θt and Gt, Ft and Wt matrices are correspondingly defined as:
θt =
βtδtγ1,t
...γ6,tαt
, Ft =[pt 0 d1,t d2,t d3,t d4,t d5,t d6,t 1
], (16)
Author: New hyperparameter optimization strategy for dynamic linear models14 Article submitted to Management Science; manuscript no.
Gt =
1 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1
, Wt =
σ2β 0 0 0 0 0 0 0 0
0 σ2δ 0 0 0 0 0 0 0
0 0 σ2γ 0 0 0 0 0 0
0 0 0 σ2γ 0 0 0 0 0
0 0 0 0 σ2γ 0 0 0 0
0 0 0 0 0 σ2γ 0 0 0
0 0 0 0 0 0 σ2γ 0 0
0 0 0 0 0 0 0 σ2γ 0
0 0 0 0 0 0 0 0 σ2α
. (17)
Finally, the initial conditions for the state vector β0, γ0, γ1,0 . . . γ6,0, α0, and the initial variances σ20,γ1
. . .σ20,γ6
, σ20,α are taken from a multiple linear regression of eq. [1].
The initial values for the derivative of the price elasticity, δ0 and σ20,δ are chosen for convenience as zero.
Similarly, we choose σ2β = 0 and σ2
0,β = 0 since all price elasticity variations over time are directly derived
from its derivative δ. Consequently, the proposed demand model depends only on three input variances of
the state transitions σ2δ , σ2
γ and σ2α.
Acknowledgments
The authors gratefully acknowledge to A. Montoro, A. Vazquez, J. Rodrıguez, M.J. Martın, M. Romero
and J.M. Lopez-Zafra for their useful comments and corrections. Vicente Sanchez-Gil and Daniel Pizarroso
wish to acknowledge the support of Wise Athena c© in facilitating this research.
References
Akaike, H. 1998. Information theory and an extension of the maximum likelihood principle. Springer Series
in Statistics (Perspectives in Statistics) .
Ali, Ozden Gur, Serpil Sayın, Tom Van Woensel, Jan Fransoo. 2009. Sku demand forecasting in the presence
of promotions. Expert Systems with Applications 36(10) 12340–12348.
Andreyeva, Tatiana, Michael W. Long, Kelly D. Brownell. 2010. The impact of food prices on consumption:
A systematic review of research on the price elasticity of demand for food. Am J Public Health 100(2)
216–222.
Beravs, Tadej, Janez Podobnik, Marko Munih. 2012. Three-axial accelerometer calibration using kalman
filter covariance matrix for online estimation of optimal sensor orientation. IEEE Transactions on
Instrumentation and Measurement 61(9).
Espey, M., J. Espey, W. D. Shaw. 1997. Price elasticity of residential demand for water: A metaanalysis.
Water Resources Research 33(6).
Fei, Xiang, Chung-Cheng Lu, Ke Liu. 2011. A bayesian dynamic linear model approach for real-time short-
term freeway travel time prediction. Transportation Research Part C: Emerging Technologies 19(6)
1306–1318.
Author: New hyperparameter optimization strategy for dynamic linear modelsArticle submitted to Management Science; manuscript no. 15
Fibich, Gadi, Arieh Gavious, Oded Lowengart. 2005. The dynamics of price elasticity of demand in the
presence of reference price effects. Journal of the Academy of Marketing Science 33(1) 66.
Granados, Nelson, Alok Gupta, Robert J. Kauffman. 2011. Online and offline demand and price elasticities:
Evidence from the air travel industry. Information Systems Research 23(1). doi:10.1287/isre.1100.0312.
Green, Richard K., Stephen Malpezzi, Stephen K. Mayo. 2005. Metropolitan-specific estimates of the price
elasticity of supply of housing, and their sources. American Economic Review 95(2) 334–339.
Grewal, Mohinder S. 2011. Kalman Filtering . Springer Berlin Heidelberg, Berlin, Heidelberg. doi:10.1007/
978-3-642-04898-2 321.
Heerde, Harald J. Van, Maarten J. Gijsenberg, Marnik G. Dekimpe, Jan-Benedict E.M. Steenkamp. 2013.
Price and advertising effectiveness over the business cycle. Journal of Marketing Research 50(2) 177–
193. doi:10.1509/jmr.10.0414.
Hoch, Stephen J., Byung-Do Kim, Alan L. Montgomery, Peter E. Rossi. 1995. Determinants of store-level
price elasticity. Journal of Marketing Research 32(1) 17–29.
Hughes, Jonathan E., Christopher R. Knittel, Daniel Sperling. 2006. Evidence of a shift in the short-run price
elasticity of gasoline demand. The Energy Journal, International Association for Energy Economics
29(1) 113–134.
Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Journal of Basic Engineering
35–45.
Kapoor, Mudit, Shamika Ravi. 2016. Elasticity of intertemporal substitution in consumption in the presence
of inertia: Empirical evidence from a natural experiment. Management Science 63(12). doi:10.1287/
mnsc.2016.2564.
Kocabiyikoglu, Ayse, Ioana Popescu. 2011. An elasticity approach to the newsvendor with price-sensitive
demand. Operations Research 59(2). doi:10.1287/opre.1100.0890.
Law, Rob. 2000. Back-propagation learning in improving the accuracy of neural network-based tourism
demand forecasting. Tourism Management 21(4) 331–340.
Li, Gang, Haiyan Song, Stephen F Witt. 2004. Modeling tourism demand: A dynamic linear aids approach.
Journal of Travel Research 43(2) 141–150.
Liu, Lon-Mu, Dominique Hanssens. 1981. A bayesian approach to time-varying cross-sectional regression
models. Journal of Econometrics 15 341–356. doi:10.1016/0304-4076(81)90099-3.
Ma, Shaohui, Robert Fildes, Tao Huang. 2016. Demand forecasting with high dimensional data: The case of
sku retail sales forecasting with intra-and inter-category promotional information. European Journal
of Operational Research 249(1) 245–257.
Mantel, Nathan, Max Myers. 1971. Problems of convergence of maximum likelihood iterative procedures in
multiparameter situations. Journal of the American Statistical Association 66(335) 484–491.
Author: New hyperparameter optimization strategy for dynamic linear models16 Article submitted to Management Science; manuscript no.
McCulloch, Charles E. 1997. Maximum likelihood algorithms for generalized linear mixed models. Journal
of the American statistical Association 92(437) 162–170.
Myung, In Jae. 2003. Tutorial on maximum likelihood estimation. Journal of mathematical Psychology 47(1)
90–100.
Petri, Giovanni. 2010. An r package for dynamic linear models. Journal of Statistical Software 36(12) 1–16.
Petri, Giovanni, Sonia Petrone, Patrizia Campagnoli. 2009. Dynamic Linear Models with R. Springer.
R Development Core Team. 2008. R: A Language and Environment for Statistical Computing . R Foundation
for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. ISBN 3-900051-07-0.
Sanchez-Gil, V., D. Pizarroso, R. A. Queralt, J. M. Lopez-Zafra. 2019. Minimum predictive error: a new
hyperparameter optimization strategy for dynamic linear models. Working paper.
Shy, Oz. 2008. How to Price: A Guide to Pricing Techniques and Yield Management . Cambridge University
Press.
Simon, Hermann. 1979. Dynamics of price elasticity and brand life cycles: An empirical study. Journal of
Marketing Research 16(4) 439–452.
Simon, Hermann. 1989. Price Management . North Holland Elsevier Science Publishers, Amsterdam.
Suganthi, L, Anand A Samuel. 2012. Energy models for demand forecastinga review. Renewable and sus-
tainable energy reviews 16(2) 1223–1240.
Vaquero, Victor, Ivan del Pino an Francese Moreno-Noguer, Joan Sol, Alberto Sanfeliu. 2017. Deconvolutional
networks for point-cloud vehicle detection and tracking in driving scenarios. 2017 European Conference
on Mobile Robots (ECMR). –.
White, Halbert. 1982. Maximum likelihood estimation of misspecified models. Econometrica The Econo-
metric Society 50(1) 1–25.
Young, Peter C. 2011. Recursive Estimation and Time-Series Analysis: An Introduction for the Student and
Practitioner . Springer.