Post on 19-Sep-2020
Optimizing the return of advertising through time series forecasting and decomposition methods
Miguel Jerez (*), Alfredo García-Hiernaux, Sonia Sotoca
Universidad Complutense de Madrid
Abstract: In this paper the discussion of the application of time series analysis,
forecasting and decomposition methods to optimize the return of advertising is
addressed. To this end, we assume that there exists a dynamic log-linear relationship
between investment and response. We show that this assumption is supported by the
economic theory and very often by the data analyzed. In this framework, we formulate
three key questions for a decision-maker: How can one measure the return of
advertising? Which is the best timing for investments when the response is seasonal?
How much should a given advertiser invest? The answers and discussion of these
relevant questions are structured around a practical case based on the monthly sales of
Lydia Pinkham’s vegetable compound and the corresponding advertising investments.
Keywords: Marketing, Advertising, Forecasting, Time series decomposition, Lydia
Pinkham, Seasonality
JEL classification: C530; M37; M31
(*) Corresponding author. Departamento de Fundamentos del Análisis Económico II. Facultad de Ciencias Económicas. Campus de Somosaguas. 28223 Madrid (SPAIN). Email: mjerez@ccee.ucm.es Tel: (+34) 91 394 23 61, fax: (+34) 91 394 25 91.
2 !
1. Introduction.
While cross-section statistics and econometrics are now standard tools of
marketing, time series methods are not so widely applied. Perhaps it is our fault: much
effort has been devoted to specification, but few examples illustrate how time series
models may help the marketer.
This paper discusses the application of time series forecasting and
decomposition methods to improve the return of advertising. To this end, we assume
that there exists a dynamic log-linear relationship between advertising and response
(from now on “sales”) and, in this framework, we address three key questions: how can
one measure the return of advertising? which is the best timing for investments when
the response is seasonal? and how much should a given advertiser invest? The
discussion is structured around a practical case based on the monthly sales of Lydia
Pinkham’s vegetable compound and the corresponding advertising investments.
Section 2 provides some background on the dataset employed. Section 3
discusses the consequences of specifying a linear or log-linear relationship between
sales and advertising and Section 4 describes the model building process. Sections 5, 6
and 7 provide a step-by-step discussion about how can one apply a log-linear model to
solve the three issues previously described and, finally, Section 8 provides some
concluding remarks and synthesizes the learnings that could be extracted from the
previous exercises.
2. The dataset.
Now we will show how to apply time series decomposition methods to improve
the return of advertising. To this end, we will use the famous monthly series of sales
and advertising of the Lydia Pinkham vegetable compound, see Figure 1.
[Insert Figure 1]
This product was introduced in 1873. It was an herbal extract in a 18-20 percent
alcoholic solution, and was considered to be effective against “all those painful
Complaints and Weaknesses so common to our best female population”. Additional
3 !
medical claims followed its commercial success. The company gained strong public
exposure because of controversies around the product ingredients and a large court case,
which made public this dataset. This firm was finally sold in 1968, but some medicinal
products with the generic “Lydia Pinkham” brand can be acquired even today by direct
order.
These series are important in empirical research about advertising for several
reasons: (a) the product is a frequently purchased, low-cost consumer nondurable, being
this class of products specially interesting to marketing research, (b) advertising,
primarily in newspapers, was the only marketing instrument used by the company, (c)
price changes were small and infrequent while, (d) the distribution, mainly through drug
wholesalers, remained fairly stable, furthermore, (e) there were no direct competitors for
this product, so the market under study can be considered a closed sales-advertising
system and, last (f) due to confidentiality reasons, public sales and advertising datasets
are scarce. Because of these convenient features, this dataset has been modeled many
times, by early researchers such as Palda (1964) or Bhattacharyya (1982) and, more
recently, by Kim (2005) or Smith, Naik and Tsai (2006).
3. Data transformations and its implications in a sales-advertising framework.
Simple inspection of Figure 1 shows that both series display seasonal
fluctuations and a downward drift. Their profile suggests also that there is a positive
relationship between level and volatility, so that the higher their level, the higher their
volatility. This impression is confirmed by the mean-standard deviation plots shown in
Figure 2, which suggest that a logarithmic transformation may improve their statistical
properties (Box and Cox, 1964) by: (a) stabilizing the data volatility, (b) making it
independent of the level, (c) linearizing the relationship between both series and (d)
contributing to its gaussianity.
[Insert Figure 2]
On the other hand, log-transforming the data has other advantages when
modeling a sales-advertising system. To discuss this idea, consider the following
regressions:!
4 !
Rt = β0 + β1At1 + β2At
2 + St + Εt (3.1)
ln Rt = α0 + α1ln At1 + α2ln At
2 + st + εt (3.2)
where tR denotes the response at time t, tA1 and tA
2 are advertising investments of type
1 and 2, tS and ts are seasonal components and, last, the terms tE and εt are white
noise errors.
If the “true” model were the linear equation (3.1) this would imply that: (a)
advertising has constant returns of scale, (b) the return generated by tA1 does not
depend on the return of tA2 and vice versa, (c) ROI is independent of the specific period
t in which the investment is made, and, specifically, is not affected by the seasonal
cycle. These implications contradict the practical experience of most marketeers.
On the other hand, the log-linear specification has radically different
implications. As it is well known: (a) if α1 and α2 are less than one (3.2) implies
decreasing returns of scale and (b) in a log-linear framework all the terms considered
interact multiplicatively with each other. To see this, take antilog of both sides of (3.2)
obtaining:
Rt = exp α0( ) At1( )α1 At
2( )α2 exp st( )exp εt( ) (3.3)
where ( )exp denotes the exponential function.
The previously discussed implications have important consequences for optimal
investment decisions. For example, in the linear model (3.1) if β1 > β2 then the whole
budget should be allocated to the type 1 investment. Also the timing of investment is
irrelevant, as the expected response does not depend on t. On the other hand, it is easy
to see that expression (3.3) implies, for example, that the expected return of an increase
in tA1 depends, not only on the level of tA
2, but also on the phase of the seasonal cycle
( )exp ts . Because of this, the log-linear specification implies that diversifying over
different investment types and seasons is optimal.
Therefore, we can conclude that log-transforming the data is very important to
model the relationships between sales and investment over time for both, statistical and
5 !
practical reasons. It has also some inflexibilities and shortcomings. For example, it
cannot capture increasing returns for the first units invested, as S-shaped functions do,
see e.g. Hanssens, Parsons and Schultz (2003, Chapter 3), but provides a realistic
operational framework with a small cost in complexity.
4. Model building.
According to the notation and ideas in previous Section, we developed the
model-building exercise using the logged and scaled value of sales, rt =lnRt ×100 , and
the corresponding logged advertising, ln 100t ta A= × . The logged series were
multiplied by 100 to improve the scaling of the estimates.
On this basis, the modeling exercise can be viewed as finding a statistically
adequate representation for the terms in the decomposition:
rt = rta +rt
ε (4.1)
where atr is the deterministic, or input-related, component, and rt
ε is the stochastic, or
error-related, component. Following Box, Jenkins and Reinsel (2008) these terms can
be parametrized, respectively, with a rational transfer function
rta =
ω0 −ω1B−ω2B2 −…−ωrB
r
1−δ1B−δ2B2 −…−δsB
sat (4.2)
where B is the backshift operator, such that for any tw :! , 0, 1, 2,kt t kB w w k−= = ± ± K
On the other hand, the error-related term can be modeled as an ARIMA process: 2
1 22
1 22
1 22
1 2
(1 )(1 ) (1 ) (1 )(1 )(1 )
t d S D pp
S S Q SQ
tS S P SP
B B Br
B B B B BB B BB B B
ε θ θ θ
φ φ φ
ε⋅ ⋅
⋅ ⋅
− − − −=
− − − − − −−Θ −Θ − −Θ
⋅−Φ −Φ − −Φ
KK
KK
(4.3)
To choose the parametrization in (4.1)-(4.3) we: (a) did a standard univariate
analysis of log advertising, which yielded an IMA(1,1)x(1,1)12 specification, (b)
prewhitened both series using this model (Box, Jenkins and Reinsel, 2008, Chapter 11)
and (c) computed the corresponding sample cross-correlations, see Figure 3.
[Insert Figure 3]
6 !
Figure 3.a displays two significant cross-correlations at lags 0-1, perhaps mixed
with a rough sinusoidal response between current sales and lagged advertising. On the
other hand, Figure 3.b shows no substantial feedback.
We estimated several models within the family (4.1)-(4.3) coherent with the
pattern in Figure 3.a. In all cases, the observations between November 1957 and
February 1958 were treated as missing values because they were outliers. Finally, we
chose the following specification for the transfer function component (4.2):
at tr a
2
2
" " " " " "(.022)"(.022)" " (.026).048+ .016B + .043B= 1/.713B + .751B
(.121) (.087) (4.4)
where the figures in parentheses are standard errors. Note that:
1) The roots of the polynomials in the numerator and denominator of (4.4) are,
respectively, -.186±1.040i and .475±1.052i so, even though this model is
heavily parametrized, there are no redundant dynamic factors in this term.
2) Model (4.4) implies an impulse-response with both, positive and negative
values, see Figure 4. In a sales-advertising system this can happen when the
product has a loyal customer base and the advertising accelerates the
consumption, therefore changing the distribution of re-stocking purchases
over time.
[Insert Figure 4]
As for the error related-term (4.3), the chosen specification was an
IMA(1,1)x(1,1)12 process, with the following results:
( )( )( )( )
ˆ ˆt trε εε σ12
12
# # # # # # # #(.029)# # # # #(.063)1+.900B 1+.628B= ; = 7.5361+B 1+B
(4.4)
being tε is a white noise homoscedastic error. Note that a standard hypothesis testing
would not reject the null 1θ = , thus implying either overdifferencing or a deterministic
trend. However one must take into account that, under this null, Gaussian maximum-
likelihood (ML) estimates display a so-called “pile-up” effect, in which the probability
7 !
density under the null converges to a large positive value, so standard testing is biased
towards non-rejection. The test proposed by Davis, Chen and Dunsmuir (1994, Section
3) takes into account this distortion and, in this case, safely rejects 1θ = , in favor of
1θ < , with a 5% significance.
Finally an analysis of residual autocorrelations did not suggest any additional
structure. The corresponding value of the Q-statistic computed with the first ten
autocorrelations (9.69) confirms this impression. Also, there were no significant values
or patterns in the cross-correlation function between the residuals and the prewhitened
input and the corresponding Q-statistic, also computed with ten lags (6.94), was small
enough to consider that (4.3) captures all linear relationships between these series.
5. Estimating the return of advertising.
One of the basic uses of a sales-advertising model is estimating the ROI of
advertising. Building on previous results, it seems natural to do this using the terms in
the decomposition (4.1), which estimation requires solving the difference equation
(4.4). We did this using the algorithm proposed by Casals, Jerez and Sotoca (2010)
which provides orthogonal estimates for atr and tr
ε .
Figure 5 shows the profile of the sales series versus a “baseline”, computed as
( )exp /100trε . This baseline can therefore be interpreted as the expected value of sales
if the advertising investment were kept at a null steady state. According to this
interpretation, the grey area would be an estimate of the sales generated by advertising.
[Insert Figure 5]
These numbers provide some interesting conclusions. Total sales during the
sample period were roughly 10 million US$ with a total advertising of 4.8 million
(48.4% of previous figure); the estimate of sales generated resulting from Figure 5 is 4.5
million so the ROI is -.3 million US$ or, in percentage, -3.1% of sales, so advertising in
this case did not create value for the firm.
The superimposed regression lines in Figure 5 suggest that advertising could
have been inefficient, as the negative slope of the sales line is more pronounced than
8 !
that of the baseline, implying that actual sales deteriorated more rapidly than the
underlying market conditions.
6. Pro-seasonal vs. anti-seasonal investment.
To further investigate the cause of this poor performance we further decomposed
the error-related component into its structural components:
rtε = tt + ct + st +εt (6.1)
where the addends in the right-hand-side of (6.1) are, respectively, the trend ( tt ), cycle (
tc ), seasonal ( ts ) and irregular (εt ) components. Substituting (6.1) in (4.1) and taking
antilog of both sides of the resulting expression, we see that the sales can be written as:
Rt = exp rta( ) ⋅ exp tt ( ) ⋅ exp ct( ) ⋅ exp st( ) ⋅ exp εt( ) (6.2)
Assuming for simplicity that ( ) ( )exp expt tt c 1= = , it is clear that investing a
given budget when the seasonal factor, ( )exp ts , is larger should provide a higher return
than investing when it is smaller.
Figure 6 compares the advertising and seasonal factors in (6.2), being the latter
computed with the method proposed by Casals, Jerez and Sotoca (2002). The sample
correlation between both components is -.87, so advertising was systematically
increased in the “troughs” of the seasonal cycle and decreased in the “peaks”. Taking
into account equation (6.2) above, it is clear that this anti-cyclic budget allocation does
not maximize the sales.
[Insert Figure 6]
It is easy to measure the inefficiency of an anti-seasonal budget distribution
through a simple counterfactual experiment. To this end, we: (a) re-distributed the
annual advertising investment as a direct proportion of the multiplicative seasonal
component, (b) backcasted the corresponding sales, and (c) predicted the ROI of
advertising in this new scenario.
9 !
Now in formal terms. Assume that advertising budgets are allocated for T
consecutive periods (typically one year) and consider the investment sequences given
by the following vectors:
at:t+T -1 = at at+1 … at+T -1!"#
$%& (6.3)
at:t+T -1* = at
* at+1* … at+T -1
*!"#
$%&
(6.4)
where 1t:t+T -a includes the logs of the investments actually done in T consecutive
periods while 1t:t+T -*a characterizes an alternative or “counterfactual” budget allocation
such that total expenditure in both is the same and, αai* = exp St( ) . Consider also the
impulse-response function implied by (4.4) and denoted by:
ν B( ) = ν0 +ν1B +ν2B2 +…
Under these conditions, the deterministic component can be written as the
product of a vector of log advertising values and a Toeplitz matrix of impulse-response
weights:
rta = at-k at-k+1 … at-1 at at+1 … at+T -1
!"#
$%&
ν0 ν1 ν2 … νT+k0 ν0 ν1 … νT+k-10 0 ν0 … νT+k-2 0 0 0 … ν0
!
"
#######
$
%
&&&&&&&!
(6.5)
and, therefore, it is straightforward to write the counterfactual budget allocation as:
rta* = at-k at-k+1 … at-1 at
* at+1* … at+T -1
*!"#
$%&
ν0 ν1 ν2 … νT+k0 ν0 ν1 … νT+k-10 0 ν0 … νT+k-2 0 0 0 … ν0
!
"
#######
$
%
&&&&&&&
(6.6)
so, according to (6.2) the expected log sales in the counterfactual scenario would be:
Rt* = exp rt
a*( ) ⋅ exp tt ( ) ⋅ exp ct( ) ⋅ exp st( ) ⋅ exp εt( ) (6.7)
10 !
In our case, the alternative pro-seasonal allocation increased the estimate of total
sales generated by advertising by 13%, from 4.5 to 5.1 million US$. As the total amount
invested in both scenarios is 4.8 million USD, previous estimates imply that investing
against seasonality generated a negative ROI (4.5-4.8=-.3 million US$) while the
expected ROI of a pro-seasonal investment policy would have been positive (5.1 - 4.8 =
.3 million US$).
[Insert Figure 7]
7. Optimal investment budget.
A remarkable feature of Lydia Pinkham’s dataset is that, over the period
considered, advertising investment was more than 48% of sales. Therefore, after
estimating the loss of ROI that can be attributed to the anti-seasonal investment policy,
it is natural to wonder if the advertising-to-sales ratio was too high.
To test this hypothesis we defined a new experiment in which we computed the
counterfactual ROI corresponding to a dense grid of advertising budgets, from 0 to 6.5
million US$. The results are shown in the Figure 8.
[Insert Figure 8]
where the point “A” corresponds to the improved pro-seasonal allocation of the original
investment, 4.8 million US$ in 78 months. The point “B” corresponds to a pro-seasonal
allocation of the optimal investment: 1.0 million US$ in the same period.
8. Concluding remarks.
We have shown how to use a time series model relating sales and advertising to
solve three practical issues: measuring the ROI of advertising, assessing budget
allocation over the seasonal cycle and, last, optimizing the budget size.
About the first problem (ROI measurement) our proposal consists in deriving
estimates for the direct return from the advertising-related components of the sales
series. These measures can be computed “by hand” in the simplest cases, or using the
11 !
general decomposition procedure of Casals, Jerez and Sotoca (2010) when the
relationships are complex.
Second, sections 6 and 7 show how perform a counterfactual analysis to assess
the efficiency of the investment timing policy and the budget size. The basic idea
consists in comparing the results actually obtained with “backcasts” computed feeding
alternative investment scenarios to the model.
The results obtained in this particular case indicate that advertising performance
can be substantially improved by reallocating investments over time and adjusting the
budget to an efficient level. Obviously the practical relevance of these results for a
company that was sold in 1968 is scarce but, according to our professional experience,
the same improvement opportunities exist in many firms operating today and the
approach explained here can help them to bring this potential to fruition.
The computational procedures employed in this research are implemented in the
E4 functions “detcomp” (decomposition of a vector of time series into input-related and
error-related components) and “e4trend” (structural decomposition of a vector of time
series). E4 is a MATLAB toolbox for time series modeling, which can be downloaded
at: www.ucm.es/info/icae/e4. The source code for all the functions in the toolbox is
freely provided under the terms of the GNU General Public License. This site also
includes a complete user manual and other reference materials.
References:
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood
Principle, in B.N. Petrov and F. Csaki, Proc. 2nd International Symposium on
Information Theory, (eds.), 267-281, Akademia Kiado, Budapest.
Bhattacharyya, M.N. (1982). Lydia Pinkham Data Remodelled, Journal of Time Series
Analysis, 3, 81-102.
12 !
Box, G.E.P. and D.R. Cox (1964). An analysis of transformations. Journal of the Royal
Statistical Society, Series B 26, 2, 211–252.
Box, G.E.P. G.M. Jenkins, G.M. and G.C. Reinsel, G.C. (2008). Time Series Analysis:
Forecasting and Control. Wiley, New York.
Casals, J. M. Jerez and S. Sotoca (2002). An Exact Multivariate Model-based Structural
Decomposition, Journal of the American Statistical Association, 97, 458, 553-564.
Casals, J. M. Jerez and S. Sotoca (2010). Decomposition of a State-Space Model with
Inputs, Journal of Statistical Computation and Simulation, 80, 9, 979-992.
Davis, R.A. M. Chen and W.T.M. Dunsmuir (1994). Inference for MA(1) processes
with a unit root on or near the unit circle, Probability and Mathematical Statistics,
15, 227–242.
Hanssens, D.M. L.J. Parsons and R.L. Schultz (2003). Market Response Models:
Econometric and Time Series Analysis, International Series in Quantitative
Marketing, vol. 12, Kluwer Academic Publishers, Dordretch (Netherlands).
Kim, Jae H. (2005). Investigating the advertising-sales relationship in the Lydia
Pinkham data: a bootstrap approach, Applied Economics, 37, 3, 347 – 354.
Ljung, G. and G.E.P. Box (1978). On a Measure of Lack of Fit in Time Series Models,
Biometrika, 67, 297-303.
Palda, K. (1964). The Measurement of Cumulative Advertising Effects, Prentice-Hall,
Englewood Cliffs (NJ).
Smith, A. P.A. Naik, and C. Tsai (2006). Markov-switching model selection using
Kullback–Leibler divergence, Journal of Econometrics, 134, 2, 553-577.
Acknowledgements: This research was partially funded by Ministerio de Economía y
Competitividad through the Grant ECO2011-23972.
13 !
Figure 1: Monthly series of sales and advertising of the Lydia Pinkham vegetable
compound from January 1954 to June 1960 (78 monthly values). Source: Palda (1964).
14 !
Figure 2.a: Mean-standard deviation plot of sales, with a LS regression line
superimposed.
Figure 2.a: Mean-standard deviation plot of advertising, with a LS regression
superimposed.
15 !
Figure 3.a. Sample cross-correlation function between current sales and lagged
advertising. The lines at 0.2 are the limits of a 5% individual significance test, The Q-
statistic is 23.40, with a p-value smaller than 1%, thus rejecting the absence of a causal
relationship.
Figure 3.b. Sample cross-correlation function between current advertising and lagged
sales. The Q-statistic is 12.19, with a p-value of 11.5% and, therefore, does not reject
the absence of a causal relationship.
16 !
Figure 4: Impulse-response function implied by the transfer function model. The
implied cycle and damping factor are six months and 50% approximately. The long-
term gain is 0.103
!
17 !
Figure 5: Sales versus its baseline, computed as the exponential of the error-related
component. LS regression lines were added as an indication of the velocity of their
downward drift.
18 !
Figure 6: Multiplicative seasonality versus the advertising-related component of sales.
!
19 !
Figure 7: Budget optimization.