Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting
Time Series Forecasting Modeling CMG12
-
Upload
alex-gilgur -
Category
Technology
-
view
67 -
download
4
Transcript of Time Series Forecasting Modeling CMG12
![Page 1: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/1.jpg)
1
“And” or “Or” ?
CMG 12 December 4, 2012Alex Gilgur
Josep FerrandizMatthew Beason
![Page 2: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/2.jpg)
OVERVIEW
• Definitions: Loose but True
• Business Case
• So what’s the problem?
• How much traffic do you have to support?
• Regression
• Can you support the traffic at time T?
• Forecasting
• What If…
• Solution
• Real-World Use Case
• A Digression about Regression
• Conclusions
• Acknowledgments
• Q&A
2
HIGH
LOW
R
0 0.5 1 1.5 2 2.5 3 3.5 4
1
3
5
7
9
11
13
15
17
0
10
20
30
40
50
60
70
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121
1000s hosts * 100s apps * 10s metrics
12 months * 4.5 weeks * 7 days of the week * 24 hours
Business Demands
![Page 3: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/3.jpg)
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
3
Regression:
A black boxy = f (x)
x
y
![Page 4: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/4.jpg)
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
Statistics = The art of torturing data until they talk to you 4
Regression:
A black boxy = f (x)
x
A black boxy = f (t)t y
Time Series:
0
50
100
150
200
250
300
9/24
/201
1
10/4
/201
1
10/1
4/20
11
10/2
4/20
11
11/3
/201
1
11/1
3/20
11
11/2
3/20
11
12/3
/201
1
Max Daily Concurrency
TrendSeasonality
LevelEvents
![Page 5: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/5.jpg)
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
Statistics = The art of torturing data until they talk to you5
Regression:
A black boxy = f (x)
x
A black boxy = f (t)t y
Time Series:
0
50
100
150
200
250
300
9/24
/201
1
10/4
/201
1
10/1
4/20
11
10/2
4/20
11
11/3
/201
1
11/1
3/20
11
11/2
3/20
11
12/3
/201
1
Max Daily Concurrency
TrendSeasonality
LevelEvents
TSA and Regressionallow us to reconstruct the
y given the x,
and / or the t, and the
parameters
![Page 6: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/6.jpg)
DEFINITIONS: CONTINUED
Forecasting = The art of meaningful reflection on the past6
Forecasting: Predicting the future based on the past
0
200
400
600
800
1000
1200
9/24/2011 11/13/2011 1/2/2012 2/21/2012
<pool ABCD>: peak-hour busy threads of <app1234>
RSAS
ForecastPro...
Compute a Weighted Moving
Average
Extend it 1 point;Add that point to the
WMA
FOR(Level, Trend, Seasonality, Events)
![Page 7: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/7.jpg)
BUSINESS CASE
• You have a web site
• You know your business metric behavior
• can forecast it
• can simulate it• You need to size the servers while minimizing the cost
• CPU
• Memory
• Worker threads
• Storage
• Network
So what’s the problem?
7
![Page 8: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/8.jpg)
“It’s complicated”8
A black boxy = f (x)
x
The same black boxy = f (t)t y
y = f (x, t) + ε (t)
BM
, Q, X
, and
R a
s ti
mes
erie
s
Q (BM, t) = X(BM, t) * R(BM, t)
q
x
r
BMX = throughput (TPS)R = response time
Q = concurrency (traffic)BM = business metric
How much traffic do you need to support?
![Page 9: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/9.jpg)
HOW MUCH TRAFFIC DO YOU HAVE TO SUPPORT?
A better question is… 9
BM
, Q, X
, and
R a
s ti
mes
erie
s
Q (BM, t) = X(BM, t) * R(BM, t)
q
x
r
BM
X = throughput (TPS)R = response time
Q = concurrency (traffic)BM = business metric
t = time
BM = f(t)X = f (BM)
R = f(X, BM)Q = R * X = f (R, X)
Q (BM, t) = X(BM, t) * R[X(BM, t), BM, t]
Tools: 1: Enter Regression
The complexity of the relationships is enormous
![Page 10: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/10.jpg)
TOOLS: 2: A WORD FOR FORECASTING
• If we cannot regress it, we forecast it.
• Not an Excel-style regression to time
• Not a point forecast:
• need the prediction interval
Holt-Winters and ARIMA are standard tools; new methods are being developed. 10
Holt-Winters
ARIMA
Can you support the traffic that you will have at time T?
A simple example
![Page 11: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/11.jpg)
MORE SERIOUS CASES:
11
http://robjhyndman.com/papers/complex-seasonality/http://forecastingprinciples.com/
There are cases where regression would not have worked
![Page 12: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/12.jpg)
MORE SERIOUS CASES:
12
http://robjhyndman.com/papers/complex-seasonality/http://forecastingprinciples.com/
There are cases where regression would not have worked
Exponentially Weighted Moving Average (HW)
Auto Regressive Integrated Moving Average
Extend it 1 point;Add that point to the
Time Series
FOR(Level, Trend, Seasonality, Events)
![Page 13: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/13.jpg)
IF 𝑩𝑴 = 𝒇 𝒕 …
We need to outsmart the model 13
𝑄 𝐵𝑀 𝑡 , 𝑡 = 𝑋 𝐵𝑀 𝑡 , 𝑡 ∗ 𝑅{[𝑋 𝐵𝑀 𝑡 , 𝑡 , 𝐵𝑀, 𝑡}
1. Forecast the BM; get the value at time T2. Build a regression of performance metrics to BM
i. How good is the regression?ii. How do we measure the goodness of the regression?
Can you support the traffic that you will have at time T?
Q = f(BM) + ε The ε is the residualsif the fit is good, ε is small => R2 is high
What if the R2 is OK, but… we used linear model on quadratic data?we missed a pattern in the data?
What if the ε is time-dependent ? Q(t) = f[BM (t)] + ε (t)
![Page 14: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/14.jpg)
AN ILLUSTRATION (SOTTO VOCE)
14
Tried to fit a quadratic modelR2 = 0.995
Obviously missed a trendThe data are cubic
R2 is not good enough
![Page 15: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/15.jpg)
AN ILLUSTRATION (SOTTO VOCE)
15
Tried to fit a quadratic modelR2 = 0.995
Obviously missed a trendThe data are cubic
R2 is not good enough
Here the missed trend may not matter, but it’s only an illustration
![Page 16: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/16.jpg)
SOLUTION: FORECAST THE RESIDUALS!
Forecast IV; build regression; forecast residuals; add it all together 16
Start DV == f (IV)?
DATA
DV(t)
, IV
(t)
Generate TSA FORECASTS for IV and DV
Project IV and DV to t = T
independently
Done
NO
DATA
Generate DV(IV) REGRESSION
YES
Generate TSA FORECAST for
ResidualsAnd for IV
Project to t = T
Combine DV[IV(t=T)] + Residuals(t = T)
DV (t)IV(t)
DV (t) = f[IV(t), t] |t = T* + ε (t) |t = T*
![Page 17: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/17.jpg)
TRADITIONAL SOLUTION:
17
BM Response Time
ThroughputTraffic
A real-life exampleSize the worker threads for an application for the next year
![Page 18: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/18.jpg)
REGRESSION IS OPTIMIZATION
18
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
![Page 19: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/19.jpg)
REGRESSION IS OPTIMIZATION
19
y = 0.241x + 24.215R² = 0.03376
Con
curr
ency
Business Metric
Q Linear (Q)
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
95%ile?
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
![Page 20: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/20.jpg)
REGRESSION IS OPTIMIZATION
20
y = 0.241x + 24.215R² = 0.03376
Con
curr
ency
Business Metric
Q Linear (Q)
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
95%ile?
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
library(quantreg)Mdl = rq (DV ~ IV, tau = 0.95)DV_bar = predict (Mdl)
![Page 21: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/21.jpg)
EXAMPLE (CONTINUED)
21
Forecast BM
Build Regression Q ~ BM
Forecast Residuals
Q(t) = f[BM (t)] + ε (t)
Size the worker threads for an application for the next year
![Page 22: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/22.jpg)
a) b)
FINISHING TOUCHES
22
Red = regular regression
Blue = our methodGreen and black = data
Grey = predictive interval bounds
![Page 23: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/23.jpg)
CONCLUSIONS• Downsides:
• It is an extra step in building the projection, increasing the runtime of computing the models.
• If the regression model is good, then the residuals are unforecastable.
• Advantages:
• It is a very robust method:
• No worries about the data not being suitable for the regression:
• missed trend and periodicity in the residuals will be picked up by the TSA forecasts.
• It is a versatile method:
• Regression and TSA forecasting combined:
• give us more control in tuning regression and TSA models than regression by itself and TSA forecasting by itself.
• TSA forecast of residuals can only be inappropriate if the regression is good.
• Then the weight (significance) of the residuals is negligible compared with the actual data.
• There are forecasting methods even for unforecastable data.
• Forecast replacement for nonlinear time series data:
• Linear is too conservative
• Exponential is too optimistic
• Quadratic regression to time
• Forecast residualsThere is no reason not to use it 23
![Page 24: Time Series Forecasting Modeling CMG12](https://reader034.fdocuments.net/reader034/viewer/2022042723/587854eb1a28ab68198b70d1/html5/thumbnails/24.jpg)
• Co-authors and reviewers:
• Dr. Josep Ferrandiz
• Matthew Beason
• A big thank-you goes to
• Dr. Igor Trubin who inspired this paper at CMG’11
• Mike Perka who has been my guide on this journey into the world of IT data
ACKNOWLEDGMENTS