Forecasting the consumption and the purchase of a drug · Tools from univariate time series...

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS

Forecasting the consumption and the purchase of a drug

1st Angeliki Papana, 2nd Dimitris Folinas and 3rd Anestis Fotiadis

1 University of Macedonia, Greece

2 Department of Logistics, Alexander TEI of Thessaloniki, Greece

3 I-Shou University, Taiwan

1 [email protected], 2 [email protected], 3 [email protected]

Abstract

In this study, we indicate the usefulness of time series forecasting methods on very short data. Specifically, we apply some of the basic time series forecast methods in order to predict the future consumption and purchase of the drug RAPILYSIN LYPDINJ 2X1.16G/VIAL (RL). The available data are monthly measurements of the consumption and purchase of the drug RL from the General Hospital of Katerini and cover the period 2009-2011, i.e. three years. Tools from univariate time series analysis and forecasting are introduced, discussed and applied based on the type of the available data. Based on the accuracy of the forecasts, the most efficient method is fitting a simple seasonal exponential smoothing model.

Keywords: logistics, purchase, demand forecasting, drug, Greece.

1. Introduction

A synchronized and responsive flow of products and services is the goal of supply chain planning, while demand planning is the first step of supply chain planning that determines the effectiveness of manufacturing and logistics operations in the chain. A demand forecast is the prediction of the quantity of a product or service that will be purchased. Demand forecasting is essential for corporations and organizations such as hospitals in order to assess future capacity requirements.

There are two approaches to determine demand forecast, i.e. the qualitative approach and the quantitative one. Qualitative methods are usually used at ambiguous situations and when little data exists, and require the intuition and experience of the experts. Quantitative methods of demand forecasting involve many techniques that incorporate the information from past or current data, e.g. regression methods, extrapolation methods, neural networks and data mining techniques. The statistical methods tend to be superior in general, although there are occasions when model-based methods are not practical. The best demand forecast may be determined using a multi-

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS functional approach considering previous sales of the product while also factors based on marketing and finance. The determination of the proper forecast method is based on the available data, the size of the data and the type of the data.

For the determination of the demand forecast method, one should first determine the use of the forecast, i.e. stock availability. The time horizon of the forecast (short-term, medium term or long-term predictions) also is important in order to decide on the forecast method that should be used. Finally, one should always validate the results. Figure 1: Pharmaceutical decision framework

DoctorsHospital PresidentPatient Allergies or

CharacteristicsMinisterial Decisions

Hospital Scientific Committee

LegislationsFinancial Problems

Limited IT KnowledgeLimited Managerial Knowledge

Ministerial Decisions

Concerning the monthly orders of drugs in the General Hospital of

Katerini, the influential factors are the doctors with their medical decisions. Usually the heads of the departments (Director of Pathology, Cardiology, etc.) indicate to the director of pharmacy about the more ‘effective’ drugs from a category of similar drugs. The President of the institution influences the decision as the chairman of the board and shall be informed by the Director of Pharmacy on the monthly cost of purchasing drugs. Financial targets set by the Ministry of Health may impose lower costs. The fact that some patients may not receive any treatment because of allergies and special characteristics can influence the final decision after consultation with the attending doctors. The decisions of the Ministry of Health decisively affect the decisions of the pharmacy. A scientific committee of doctors also operates in each hospital informed on the needs of the patients of the hospital, affecting the final decision.


Limiting factors that affect the final decision of the monthly orders of drugs may be some legislative decisions that determine what will be the preference for a commission through the procurement system. In parallel, the Ministry may determine some limiting factors. Recently in Greece took place the first online auctions for substances. Therefore, the entire drug supply system is modified, as most managers will start ordering the active substances of the drugs and not nominally specific drugs. The economic problems that Greece is facing nowadays, clearly influences all the decisions. The continued need to reduce costs causes the supplement of cheaper drugs of dubious quality. The last two years there is a constant attempt to electronic data processing of all pharmacies and provide statistics to the health ministry but unfortunately the older employees of the pharmacy and the fear of contact with technology hinders the electronic operation of the pharmacy. Figure 1 displays in short the infuential and limitation factors for the final decision of the hospital concerning the monthly orders of drugs.

In this study, we will introduce time series methods, which are suitable for short term predictions. These methods search for patterns in the time series and extrapolate these patterns into the future. Time-series forecasting is a form of extrapolation in that it involves fitting a model to a set of data and then using that model outside the range of data to which it has been fitted. Forecasting of time series is a very difficult task as it is hard to recognize the underlying patterns and relationships due to noise and random and unexpected changes.

2. Time Series Forecasting Methods

A time series is a set of evenly spaced evenly spaced, continuous, numerical data obtained at regular time periods. In the time series forecasting methods, the forecast is based only on past values and assumes that factors that influence the past and the present will continue influence the future. If future values of a time series can be predicted from its past values, then the series is deterministic. If the future of a time series can only be partly determined by past values, then the time series is stochastic or random.

Some basic univariate linear time series methods are the moving average method, exponential smoothing (Brown, 1956), Auto-Regressive Moving Average (Huang and Yang, 1995), Auto-Regressive Integrated Moving Average (Box and Jenkins, 1976), Random Walk model, time series decomposition and Z-Chart. The simple moving average is a series of arithmetic means and is applicable data present no trends. The exponential smoothing is an averaging method that reacts more strongly to recent changes in demand by assigning weights. Exponential smoothing methods have been developed in order to take into account trend and seasonality. Auto-Regressive Moving Average (ARMA) is appropriate for non stationary data, when a system is a function of a series of unobserved shocks as well as its own behavior. Auto-Regressive Integrated Moving Average (ARIMA) is a more complex method that handles trend and seasonality, but requires larger data sets. The Random Walk model assumes that from one time period to the next, the original time series merely takes a random "step" away from its previous value. It is usually used when data present an irregular behaviour,

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS e.g. irregular growth. The time series decomposition adjusts the seasonality by multiplying the normal forecast by a seasonal factor. Another method of short-term forecasting is the use of a Z-Chart. It is assumed that basic principles that dominate the data do not alter, or alter on anticipated course and that any underlying trends at present will continue. More complex nonlinear methods are also develop for time series prediction, however these methods usually require larger data sets as they have more free parameters for their estimation.

The minimum time series length one needs to make ‘good’ forecasts using a statistical model depends on the type of the model and the amount of random variation in the data. From a purely statistical point of view, it is always necessary to have more observations than parameters. The minimum requirements apply when the amount of random variation in the data is very small. Real data often contain a lot of random variation and therefore sample size requirements increase accordingly. Therefore, the number of available data affects the choice of the corresponding forecasting method.

In order to be able to decide which forecast method is the best for each data set, one should know and understand the different types of methods and recognize the different components in the data. However, one should always validate the forecasts. In order to check the accuracy of the forecasts and the fact that are unbiased and efficient, we need to measure the prediction error, i.e. the difference between the actual time-series and the forecasts. For this purpose, many statistical measures have been developed such as the mean square error, root mean squared error, cumulative forecast error, mean absolute percent error, etc. Therefore, we display the original and forecasted values of each method for the three years in order to see their performance.

3. Forecasting the consumption and the purchase of the drug RAPILYSIN LYPDINJ 2X1.16G/VIAL

This study is part of a large on-going research which has been conducted during the last 2 years. The purpose of the main survey was to investigate the influence of the economic crisis to the logistics services sector in Greece. The process of analysis revealed that the 3PL’s have been significantly affected by the crisis and these effects have influenced all the main functional areas of the logistics management (procurement, warehousing, inventory management, transportation and distribution) as well as the main logistics philosophies and practices. These findings gave birth to two central questions:

The data examined here are the time series of the consumption and purchase of the drug RAPILYSIN LYPDINJ 2X1.16G/VIAL (RL), which is a drug from the cardiology department indicated for the thrombolytic treatment of suspected strokes. The data are for the years 2009-2011. The first step in any time series analysis is to plot the observations versus time, in order to observe any important features of the data such as trend, seasonality, outliers, discontinuities etc. The time plots of our data are displayed in Figure 2. Figure 2: The time plots of (a) the consumption and (b) the purchase of the drug RL


0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

12000(a)

months

cons

umpt

ion

0 5 10 15 20 25 30 35

0

0.5

1

1.5

2

2.5

3x 104 (b)

months

purc

hase

An important issue of time series analysis is the stationarity of the data. A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are constant over time. Most statistical forecasting methods are based on the assumption that the time series are stationary or can become stationary using mathematical transformations. In order to test the stationarity of the data, we implemented the Augmented Dickey-Fuller test (Dickey & Fuller, 1979) which indicated the rejection of the unit-root null hypotheses in favour of the alternative one, i.e. suggested that both time series are stationary. Therefore, we do not need to transform the original time series.

Let us denote as {xt}, t=1,…,N the observed time series. The sample autocovariance coefficient at lag k=0,1,2.. is

1( )( ) /

N k

k t t kt

c x x x x N−

+=

= − −∑

and the sample autocorrelation coefficient at lag k is 0/k kr c c= . We proceed by estimating the sample autocovariance coefficient and the correlogram, i.e. the graph of rk versus k. The correlogram provides information on the type of the data, i.e. whether data are deterministic, stochastic or random. For example, several significant coefficients at low lags provide evidence that the data do not come from a purely random process. Values of rk outside the range ±2 N are regarded to be significantly different from zero.

The correlogram of the consumption and the purchase of the drug RL are displayed in Figure 2, respectively. We can observe that for the consumption of the drug, two significant sample autocorrelation coefficient rk

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS values exist, for lag 6 and lag 8, while for the purchase of the drug only for lag 3 is rk significantly different from zero. From the two correlograms we can conclude that the data are stationary and present no trend. The two time series may be random as only 1 and 2 significant rk values exist, respectively, however time series may also be characterized by seasonal fluctuations, and therefore the correlogram is also exhibiting oscillations at the same frequency. If the series are truly random, then only an occasional autocorrelation should be larger than two standard errors in magnitude. The interpretation of a correlogram is a difficult task, especially when N is so small. Figure 3: The correlogram of (a) the consumption and (b) the purchase of the drug RL

0 2 4 6 8 10-0.5

0

0.5

1(a)

lags

sam

ple

auto

corr

elat

ion

0 2 4 6 8 10-0.5

0

0.5

1(b)

lags

sam

ple

auto

corr

elat

ion

In order to decide whether there is a cyclic component in our data, we use the seasonal subseries plots (Cleveland, 1993), which is a tool for detecting seasonality in a time series. This plot is only useful if the period of the seasonality is already known. Since our data are monthly, the period is considered to be 12. The seasonal subseries plots of the consumption and the purchase of the drug are displayed in Figure 4. From the plots, no apparent seasonality is observed for the two variables.

Figure 4: The seasonal subseries plots of (a) the consumption and (b) the purchase of the drug RL


Jan Feb Mar Apr May Jun Jul Aug Sep Oct Noe Dec0

2000

4000

6000

8000

10000

months

cons

umpt

ion

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Noe Dec0

0.5

1

1.5

2

2.5

x 104 (b)

months

purc

hase

After examining the existence of trend or seasonality components of the data, we will introduce some basic forecasting methods that can be used for the available data and we will find the most appropriate one by comparing their forecast accuracy. Let us consider that we have data up to time N and make forecasts about the future by fitting a model to the data. The accuracy of the forecasts is then tested by a statistical measure. Let us again denote Xt the real value of the time series at the period t, Ft its forecast at a time t, and et the forecast error, i.e. et = Xt - Ft. The mean square error (MSE) of the forecast is defined as

2

1

N

tt

eMSE

N==∑

.

The first time series forecasting method introduced here is the simple

moving average. This method is suitable for data that present no trend, seasonality or cyclic components. The forecasts are estimated as the mean of the K previous values of the time series

1

1 t K

t ii t

F XK

−

= −

= ∑ .

The largest the K, the better is the standardization in the random fluctuations of the values of the variable and the smaller is the effect of the possible extremes of the time series.

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS Table 1: MSE from the simple moving average method for the consumption and the purchase of the drug RL for K=3, 4, 5, respectively MSE K=3 K=4 K=5

Consumption 7.6957.106 7.5640.106 7.9501.106 Purchase 8.9091.107 8.7103.107 9.2444.107

From MSE values, it is obvious that simple moving average method is ineffective in correctly forecasting the two variables. In Figure 5, the original time series and their fitted values are displayed for K=3. MSE values are large also due to the fact that the time series values are also large. Figure 5: Plots of (a) the consumption and (b) the purchase of the drug RL and their fitted values from simple moving average method for K=3

0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

month

(a)

consumptionforecast

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

3 x 104

month

(b)

purchaseforecast

The simple exponential smoothing method takes into account all previous observations but gives greater weight to more recent observations. This method is again suitable for data with no trend or seasonality. The forecast is estimated from the equation

Ft+1 = αXt + α (1-α) Xt-1 + α (1-α)2 Xt-2 + … + α (1-α)m Xt-m + (1-α)m+1 Ft-m,

where α is a smoothing constant which takes values in [0,1]. The

parameters α reflects the weight given to each observation of the time series. Values of α closer to 1, give greater weight to the most recent data values. This equation is difficult to use, however it can be transformed in the form

Ft+1 = α Xt + (1-α) Ft,


where Ft+1 is given as a linear combination of the current real value Dt and the previous exponentially smoothed moving average Ft. The value of α is selected in order to result in the smallest MSE.

The MSE values from the simple exponential smoothing method for the consumption and the purchase of the drug for α=0.1, 0.3, 0.5, 0.7, 0.9, respectively, are displayed in Table 2. Although the MSE values are again large, we can see that the simple exponential smoothing method for large α simulates the oscillations of the original series but with some slight lag. In Figure 6, the plots for the consumption of the drug RL and its fitted values are displayed for α=0.3 and α=0.9.

Table 2: MSE from the simple exponential smoothing method for the consumption and the purchase of the drug RL for α=0.1, 0.3, 0.5, 0.7, 0.9, respectively

MSE α=0.1 α=0.3 α=0.5 α=0.7 α=0.9

Consumption 7.8859.106 7.3903.106 8.3950.106 9.8519.106 1.1794.107 Purchase 7.5347.107 8.4867.107 1.0061.108 1.2102.108 1.4600.108

Figure 6: Plots of the consumption of the drug RL and their fitted values from simple exponential smoothing method (a) for α=0.3 and (b) α=0.9, respectively

0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

months

cons

umpt

ion

(a)

consumptionforecast

0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

months

cons

umpt

ion

(b)

consumptionforecast

The Random Walk model, Yt = Yt-1 + εt, predicts that the value at time "t" will be equal to the last period value plus a stochastic (non-systematic) component that is a white noise, which means εt is independent and identically distributed with mean zero and variance σ². The forecasting model suggested is Yt - Yt-1 = εt or Yt - Yt-1 = α, where alpha is the mean of the first

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS differences, i.e. the average change from one period to the next. If we rearrange this equation, we get Yt = Yt-1 + α. In other words, we predict that this period's value will equal last period's value plus a constant representing the average change between periods. Therefore, the random walk model assumes that from one period to the next, the original time series merely takes a random "step" away from its last recorded position. The ‘best’ forecast of the next value is the same as the most recent value. This is a very simple method, but is often quite sensible, and has been widely applied to economic data even though one may expect that more complicated methods will generally be superior. The means of the first differences of the consumption and of the purchase of RL are 79.1003 and 0, respectively. Therefore, the two forecast models at each case are Yt = Yt-1 + 79.1003 and Yt = Yt-1, respectively. The MSE for the two data sets are 1.2992.107 and 1.6029.108, respectively. The original and fitted values from the random walk model are displayed in Figure 7. Figure 7: Plots of the (a) consumption and (b) purchase of the drug RL and their fitted values from the random walk model, respectively

0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

12000

months

cons

umpt

ion

(a)

consumptionforecast

0 5 10 15 20 25 30 35

0

0.5

1

1.5

2

2.5

3x 104

months

(b)

purchaseforecast

The next method is to fit to the data an Auto-Regressive model of order p, denoted as AR(p). The general form of the AR(p) model is Xt = φ0 + φ1Xt−1 + φ2Xt−2 + · · · + φpXt−p + Zt. Thus the value at time t depends linearly on the last p values and the model looks like a regression model. The order of the AR model is selected using the partial auto-correlation function or the Akaike information criterion (Akaike, 1974). We will implement here the simplest example of an AR(p) process, i.e. the AR(1) model Xt = φ0 + φ1Xt−1 + Zt. The AR model is fitted by least squares regression to find the values of the parameters for each data set which minimize the error term. The estimated

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS coefficients of the AR(1) models from fitting the data are φ0=368.202, φ1=0.171 and φ0=1103.91, φ1= 0.168, respectively. The MSE values for the two data sets are 5.7821.106 and 6.1048.107. The original and fitted values from the AR(1) model are displayed in Figure 8. Figure 8: Plots of the (a) consumption and (b) purchase of the drug RL and their fitted values from the AR(1) model, respectively

0 5 10 15 20 25 30 350

2000

4000

6000

8000

10000

month

(a)

consumptionforecast

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

3 x 104

month

(b)

purchaseforecast

Finally, the best model to fit the data is found to bee the simple seasonal exponential smoothing model. This model is appropriate for series with no trend and a seasonal effect that is constant over time. As the data are monthly, the number of periods in a seasonal interval is p = 12. Simple seasonal exponential smoothing has two parameters, the level parameter L(t) and the season parameter S(t)

where α is the level smoothing weight and δ is the season smoothing

weight. By fitting the data to the simple seasonal exponential smoothing model, we estimated for the consumption the parameters α=0.1, δ=0 and for the purchase α=0.1, δ=1,762x10-5, respectively. The corresponding MSE for these models are 4.3245x106 and 4.1437x107. Therefore, for the simple seasonal exponential smoothing model, we have the smallest MSE, and these are the best forecast models for the available data. In Figure 9, we display the observed (original) values of the data, the fitted values from the simple

( ) ( ( ) ( )) (1 ) ( 1)( ) ( ( ) ( )) (1 ) ( )

ˆ ( ) ( ) ( )t

L t a X t S t s a L tS t X t L t S t s

X k L t S t k s

δ δ= − − + − −= − + − −

= + + −

2nd INTERNATIONAL CONFERENCE ON SUPPLY CHAINS seasonal exponential smoothing model, and the forecasted values for the next six months. Figure 9: Plots of the (a) consumption and (b) purchase of the drug RL, their fitted values from the simple seasonal exponential smoothing model and their forecasts for the next six months, respectively.

4. Conclusions

This work concentrates on finding ‘best’ point forecasts using MSE. Although the available data are so few, we could still find a model that seems to able to simulate the oscillations of the original data. More advanced forecast methods cannot be used when the available data are so few. However, simple methods have proved to be better that more advanced ones at cases. In order to evaluate and compare the forecast methods, the easier way is to only compare the accuracy of the method, based on the fitted values of each method/ model.

In practice, different statistical measures for forecast accuracy may give different results. Therefore, it is important to check which method each statistical measure suggests and whether there is ‘significant’ difference among the methods. This work concentrates on finding ‘best’ point forecasts using MSE. In practice, we often need to produce interval forecasts, in order to better assess future uncertainty.

(a)

(b)


In conclusion, there is no method/ model suitable for all types of data and all contexts. For any forecasting problem, one should put in a reasonable amount of effort to get a good forecast. The analyst should get appropriate background information and carefully define the objectives, i.e. the type of forecast required. The following steps are essential in order to decide of the forecasting method. Make a time plot of the data and inspect it carefully in order to seek for trend, seasonal variation, outliers, etc. Pre-process the data if necessary by correcting obvious errors, adjusting outliers and imputing missing observations. Whatever method/ model is selected to make forecasts, the analyst needs to carry out post-fitting checks to check the adequacy of the forecast. Then the method/ model selected can be used to actually make forecasts. Finally, plot the forecasts on a time plot of the data and check that they look intuitively reasonable. References Akaike H. A new look at the statistical model identification. IEEE Transactions on

Automatic Control 19 (6), 716–723, 1974.

Brown, R.G. Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. pp. 15, 1956.

Box G.E.P. and G. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, 1976.

Cleveland W.S. Visualizing Data, Hobart Press, 1993.

Dickey D.A. and W.A. Fuller. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–431, 1979.

Huang C. and H. Yang. A Time Series Approach to Short Term Load Forecasting through Evolutionary Programming Structures. Proceedings of the International Conference on Energy Management and Power Delivery (EMPD'95), Vol. 2, 583-588, 1995.

Forecasting the consumption and the purchase of a drug · Tools from univariate time series...

Documents

Transcript of Forecasting the consumption and the purchase of a drug · Tools from univariate time series...