Josefina López Herrera Advisors: François Cellier Gabriela Cembrano IOC - UA - IRI Time Series...
-
Upload
baldric-mcdonald -
Category
Documents
-
view
216 -
download
0
Transcript of Josefina López Herrera Advisors: François Cellier Gabriela Cembrano IOC - UA - IRI Time Series...
Josefina López Herrera
Advisors:
François Cellier
Gabriela Cembrano
IOC - UA - IRI
Time Series PredictionUsing
Inductive Reasoning Techniques
Table of Contents
• Contributions principales. • Antecedents.• Time Series Analysis Techniques. • Fuzzy Inductive Reasoning (FIR) for Time Series Analysis.• Time Series Characteristics.• Conclusions and Future Research.
Contributions
• Evaluation of Prediction Error.
• Confidence Measures for Prediction in FIR.
• Dynamic Mask Allocation.
• Estimation of Horizon of Predictability.
• Applications:– Early Warning Using Smart Sensors.– Signal Predictive Control Using FIR
Antecedents
• George Klir at the State University of New York Uyttenhove 1978, Klir 1985
• François Cellier at the University of Arizona Cellier and Yandell 1987, D. Li and Cellier 1990,Cellier 1991,Cellier et al. 1996, Cellier et al. 1998
• Rafael Huber and Gabriela Cembrano at the IRI Institute (UPC-CSIC)
PhD. Dissertations UPC-UA
• Angela Nebot Castells (1994)
Qualitative Modeling and Simulation of Biomedical Systems using FIR
• Francisco Múgica (1995)
Diseño Sistemático de Controladores Difusos Usando Razonamiento Inductivo
• Alvaro de Albornoz Bueno (1996)
Inductive Reasoning and Reconstruction Analysis: Two Complementary Tools for Qualitative Fault Monitoring of Large-Scale Systems
LinearModelsLinearModels Non-Linear
ModelsNon-Linear
Models
FuzzyLogicFuzzyLogic
Pattern-Based Approaches
Pattern-Based Approaches
Time Series Analysis Techniques
Time Series Analysis Techniques
FIRFIR
• Stationarity will be assumed.
• Prefiltering of data may be necessary.
• Probabilistic Reasoning.
• Ljung 1999, Brockwell and David 1991, 1996 ,Box Jenkins 1994.
• Stochastic Time Series.
LinearModelsLinearModels
• Parametric Models, Learning Techniques• At least Quasi-stationary• Deterministic Elements• State Space Models (Casdagli and Eubank 1992)• Neural Networks (Weigend and Gershenfeld
1994)• Hybrid Models (Delgado 1998, Telecom 1994)
Non-LinearModels
Non-LinearModels
• Non-parametric Models, Synthesized Techniques • At least Quasi-stationary, Deterministic Elements• Fuzzy Neural Networks (Jang 1997)• FIR (López et al. 1996)• Mixed Models : Burr 1998, Takagi and Sugeno 1991
FuzzyLogicFuzzyLogic
• Fuzzification: Conversion to qualitative variables (Fuzzy Recoding)• Qualitative Modeling:Find the best qualitative relationship between inputs and outputs (Fuzzy Modeling)• Qualitative Simulation: Forecasting of future qualitative outputs (Fuzzy Simulation)• Defuzzification : Conversion to quantitative variables (Regeneration)
FIRFIR
Qualitative Modeling
Qualitative Simulation
31123
32212
11211
14321
oiiii
31123
32212
11211
14321
oiiii
Behavior Matrix
32121 yyyuu
Raw Data
Matrix
33?21
21?12
12?12
21?32
32?21
Optimal
Matrix
32131
21321
11311
1 ?
23
1
?1123
32121 yyyuu
Input Pattern
Distance Computation
Euclideandj
Output Forecast
Computation
fi=F(W*5-NN-out)
Forecast Value
5-Nearest Neighbors
Matched Input Pattern
Class
Side
Member
Time Series Forecasting
• In univariate time series, only a single variable has been observed, the future values of which are to be predicted on the basis of their own past.
1
1
1
1
1
1
u
t
tt
t7t
t14t
time 1
1
1
1
1
1
1
u
t
tt
t7t
t14t
time 1
• In this case, the mask candidate matrix has n-rows and one column. In order to decide the depth of the mask, the autocorrelation function is used.
Characteristics of Time Series
Natural BL Synthetic V
Stationary LV Non-stationary BTime invariant LV Time varying BLow dimensional LV Stochastic BClean BVL NoisyShort B Long LVDocumented BLV BlindLinear Non-linear BLVScalar BLV VectorSingle recording BLV Multiple
recordingsContinuous BLV DiscreteDormant Active BLV
Natural BL Synthetic V
Stationary LV Non-stationary BTime invariant LV Time varying BLow dimensional LV Stochastic BClean BVL NoisyShort B Long LVDocumented BLV BlindLinear Non-linear BLVScalar BLV VectorSingle recording BLV Multiple
recordingsContinuous BLV DiscreteDormant Active BLV
B-Barcelona water demand time series
V-Van-der-Pol oscillator time series
L- chaotic intensity pulsation of a single-mode far infrared NH3 laser beam
Weigend and Gershenfeld 1994
Water Demand Prediction
• Data Daily Demand in Barcelona. Jan 1985 - Nov 1986.
• The process is quasi-stationary, and its variance is roughly constant.
Water Demand Prediction
• The water demand on any given day is strongly correlated with the demand seven days earlier.
• Autocorrelation function of daily demand series.
Water Demand Prediction
• The result of prediction was:
1
3
0
2
0
1
u
t
tt
t7t
t14t
time 1
Prediction ErrorPrediction Error
)(0.25iiii dynstdmeantot errerrerrerr
))()(( terrterrmeanerriii simabsdyn ))()(( terrterrmeanerr
iii simabsdyn
Prediction ErrorPrediction Error
)))),(ˆ(())),(((()))(ˆ())(((
tymeanabstymeanabsmaxtymeantymeanabs
erri
imeani
)))),(ˆ(())),((((
)))(ˆ())(((tymeanabstymeanabsmax
tymeantymeanabserr
i
imeani
)))),ˆ(())),(((()))(ˆ())(((
i
istd ystdabstystdabsmax
tystdtystdabserr
i
)))),ˆ(())),((((
)))(ˆ())(((i
istd ystdabstystdabsmax
tystdtystdabserr
i
Prediction ErrorPrediction Error
))(ˆ),(( tytymaxy imax ))(ˆ),(( tytyminy imin
),()(
)(minmax
minnorm yymax
ytyty
),()(
)(minmax
minnorm yymax
ytyty
),(
)(ˆ)(minmax
mininorm yymax
ytyty
i
),()(ˆ)(
minmax
mininorm yymax
ytyty
i
))()(()( tytyabsterrii normnormabs ))()(()( tytyabsterr
ii normnormabs
)),(),((
))(),(()(
tytymax
tytymintsim
i
i
normnorm
normnormi
)),(),((
))(),(()(
tytymax
tytymintsim
i
i
normnorm
normnormi )(0.1 tsimerr isimi
)(0.1 tsimerr isimi
Qualitative Simulation with FIR
)4,9()3,8()2,7()1,6()5(
)4,8()3,7()2,6()1,5()4(
)4,7()3,6()2,5()1,4()3(
)4,6()3,5()2,4()1,3()2(
)4,5()3,4()2,3()1,2()(
)4,4()3,3()2,2()1,()(
)4,3()3,2()2,()1,()(
)4,2()3,()2,()1,()2(
)4,()3,()2,()1,2()3(
)4,()3,()2,2()1,3()4(
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttyty
ttyttyttytytty
ttyttytyttytty
ttytyttyttytty
tyttyttyttytty
Y
)4,9()3,8()2,7()1,6()5(
)4,8()3,7()2,6()1,5()4(
)4,7()3,6()2,5()1,4()3(
)4,6()3,5()2,4()1,3()2(
)4,5()3,4()2,3()1,2()(
)4,4()3,3()2,2()1,()(
)4,3()3,2()2,()1,()(
)4,2()3,()2,()1,()2(
)4,()3,()2,()1,2()3(
)4,()3,()2,2()1,3()4(
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttytty
ttyttyttyttyty
ttyttyttytytty
ttyttytyttytty
ttytyttyttytty
tyttyttyttytty
Y
prediction for time ),( ktnty tnt using k steps
real data predicted data
Comparison of FIR with other Methodologies for the Barcelona
Water Demand Time Series
TMethodogy ARIMA NAR ANN FIR
New errorRelativeerror
7.88
4.21%
9.15
3.9%
8.74
4.1%
4.6
3.7%
TMethodogy ARIMA NAR ANN FIR
New errorRelativeerror
7.88
4.21%
9.15
3.9%
8.74
4.1%
4.6
3.7%
without intervention analysis
Box Jenkins *)
Quevedo et al. 19884.5 % relative error
ANN*)Griñó 1992 4.2% relative error
Box Jenkins *)
Quevedo et al. 19884.5 % relative error
ANN*)Griñó 1992 4.2% relative error
*) with intervention analysis
Related Investigation
Comparison of FIR with other Methodologies
ConfidenceMeasures
ConfidenceMeasures
CrispCrispFuzzyLogicFuzzyLogic
ProximityProximitySimilaritySimilarity
Sources of Uncertainty in Predictions
•Dispersion among neighbors in input space.•Uncertainty related to quantity of measurements.
•Dispersion among neighbors in output space.•Uncertainty related to quality of measurements.
Proximity Measure
• This measure is related to establishing the distance between the testing input state and the training input states of its five nearest neighbors in the experience data base and to establishing distance measures between the output states of the five nearest neighbors among themselves.
Similarity Measure
• (Dubois and Pradé 1980).
BA
BABAS
),(1BA
BABAS
),(1
A=B then S1(A,B) = 1.0A=B then S1(A,B) = 1.0
A disjoint B then S1(A,B) = 0.0A disjoint B then S1(A,B) = 0.0
• This measure is defined without the explicit use of a distance function, the similarity measure presented is based on intersection, union and cardinality.
Similarity Measure
• The similarity of the ith m-input of the jth nearest neighbor to the testing m-input based on intersection can be defined as follows:
j
ii
jiij
i q,qmax
q,qminsim
• The overall similarity of the jth neighbor is defined as the average similarity of all its m-inputs in the input space:
n
1i
ji
jin sim
n
1sim
where qi are normalized values in the range from 0 to 1.
Similarity Measure
• The similarity of the jth neighbor to the estimated testing m-output based on intersection can be defined as follows:
jout
joutj
out q,qmaxq,qmin
sim
5
1j
jout
jin
jrelsim simsimwconf
• A confidence value based on similarity measures can thus be defined :
FIR Confidence Measures for NH3 Time Series
• Deterministic process
• Similarity and Proximity
FIR Confidence Measures for Barcelona Time Series
• Stochastic Process with deterministic elements.
• The relationship between the prediction error and the confidence measures is less evident.
• The two are positively correlated.
Evaluation of Confidence Measures
• The similarity measure is more sensitive to the prediction error because the similarity measure preserves the qualitative difference between a new input state and its neighbors in the experience data base.
• The confidence measures are indicators of how well the series may be fitted by an autoregressive or deterministic model.
FIRMask #1
FIRMask #2
Mask Selector
FIRMask #n
Switch Selector
c1
c2
yn
y1
y2
cn
Best mask
y
Ts
yi predicted output using mask mi
ci estimated confidence
Dynamic Mask Allocation in Fuzzy Inductive Reasoning (DMAFIR)
Optimal and Suboptimal Mask for Barcelona Time Series
Dynamic Mask Allocation Applied to Barcelona Time Series
Prediction and Simulation
• FIR Predictions use different masks to predict future values n-steps into the future, avoiding the use of already predicted (contamined) data in the predictions.
• FIR Simulations use the optimal mask of the single step prediction recursively, minimizing the distance of extrapolation at the expense of recursively using already contamined data.
Qualitative Prediction
1
3
0
0
0
2
0
0
0
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
3
0
0
0
2
0
0
0
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
0
3
0
0
2
0
0
0
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
0
3
0
0
2
0
0
0
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
0
0
0
0
3
0
2
0
1
2
6
7
16
21
t
tt
tt
tt
tt
tt
tt
1
0
0
0
0
3
0
2
0
1
2
6
7
16
21
t
tt
tt
tt
tt
tt
tt
1
1
1
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
1
1
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1-step prediction
Mask candidate matrix Optimal Mask
1
0
1
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
0
1
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
2-step prediction
1
0
0
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
1
0
0
1
1
1
1
1
1
1
2
6
7
8
20
21
t
tt
tt
tt
tt
tt
tt
tt
3-step prediction
Simulation and Prediction
• Without dynamic mask allocation for Barcelona time series.
• Comparison of FIR qualitative simulation and prediction with dynamic mask allocation for Barcelona time series.
DMAFIR Algorithm to Predict Time Series with Multiple Regimes
• The behavioral patterns change between segments.
• Van-der-Pol oscillator series is introduced. This oscillator is described by the following second-order differential
equation: 0)1( 2 xxxx
x1• By choosing the outputs of the two integrators as two state
variables:
x2• The following state-space model is obtained:
21 12
212 )1(
2y
2 Output Time Series
DMAFIR Algorithm to Predict Time Series with Multiple Regimes
9146.0))((~5.3
9085.0))((~5.2
9342.0))47(),((~5.1
Regime
ttyfy
ttyfy
ttyttyfy
QualityMask
* the input/output behaviors will be different because of the different training data used by the two models
Prediction Errors for Van-der-Pol Series
• FIR during the prediction looks for five good neighbors, it only encounters four that are truly pertinent.
Series =1.5 =2.5 =3.5
Models
=1.5 2.63 6.76 10.39
=2.5 2.96 0.97 4.65
=3.5 4.27 2.57 1.83
Series =1.5 =2.5 =3.5
Models
=1.5 2.63 6.76 10.39
=2.5 2.96 0.97 4.65
=3.5 4.27 2.57 1.83
• The values along the diagonal are
smallest and the values in the two remaining corners are largest.
One-day Predictions of the Van-der-Pol
Multiple Regimes Series.• A time series was constructed in which the variable assumes a value of 1.5 during one segment, followed by a value of 2.5 during the second time segment, followed by 3.5 .The multiple regimes series consists of 553 samples.
Prediction Errors for Multiple Regimes Van-der-Pol Series
1195.1
9317.15.3
2978.25.2
8759.55.1
DMAFIR
errorModel
• The model obtained for
= 1.5 cannot predict the higher peaks of the second and third time segment very well.
• The DMAFIR error demostrates that this new technique can indeed be successfully applied to the problem of predicting time series that operate in multiple regimes.
Variable Structure System Prediction with DMAFIR
• A time-varying system exhibits an entire spectrum of different behavioral patterns. To demonstrate DMAFIR’s ability of dealing with time-varying systems, the Van-der-Pol oscillator is used. A series was generated, in which
changes its value continuously in the range from 1.0 to 3.5. The time series contains 953 records sampled using a sampling interval of 0.05.
One-day Predictions of the Van-der-Pol Time-varying Series Using DMAFIR
with the Similarity Confidence Measure
2997.1
8791.15.3
4864.15.2
7431.55.1
DMAFIR
for
for
for
errorModel
• Predictions Errors for
Time-varying Van-der-Pol Series.
Predicting the Predictability Horizon
• The errors are likely to accumulate during iterative predictions of future values of a time series.
• It is thus of much interest to the user of such a tool to be able to assess the quality of predictions made not only locally, but as a function of time.
• When the predictions depend on previously predicted data points these are by themselves associated with a degree of uncertainty already.
• In the first step of a multiple-step prediction, the predicted value depends entirely on measurement data.
• The local error can be indirectly estimated using the proximity or similarity measure.
• Either measure can easily be extended to become an estimator of accumulated confidence
Water Demand of the City of Barcelona Multiple Step simulation using FIR
))14(),7(),((~
)( ttyttyttyfty ))14(),7(),((~
)( ttyttyttyfty 3
))14()7()(()()(
ttcttcttctctc aaa
la
3
))14()7()(()()(
ttcttcttctctc aaa
la
Conclusions• The prediction made by CIR (Causal Inductive Reasoning) were not significantly better.• The confidence measure of FIR are an indirect prediction error estimate.• A new formula to assess the error of predictions of a univariate time series, the FIR filters out
what it considers to be a noise.• FIR provides the model automatically, not requires a significant development effort as well as
knowledge about the nature of the process form wich the series was derived.• The confidence measures provide at least a statistical estimate for the quality of the prediction.• Several suboptimal mask are used to make, in parallel forecast of the same time series. Each of
the forecast is accompanied by an estimate of its quality. In each step, the one forecast is kept as the true forecast to be reported back to the user that shows the highest confidence value.
• A set of formulae has been devised to estimate the effects of data contamination on the accunulated confidence over multiple prediction steps.
• The FIR is a robust methodology, after López et al. 96 some UPC groups use FIR like Prediction Module in an Optimation Tool for Water Distribution Networks, Quevedo et al. 1999.
Publications• Cellier, F. And J. López (1995). Causal Inductive Reasoning. A new paradigm for data-
driven qualitative simulation of continuous-time dynamical systems. Systems Analysis Modelling Simulation 18(1), pp.26-43.
• Cellier F., J. López, A. Nebot, G. Cembrano (1996), Means for estimating the forecasting error in Fuzzy Inductive Reasoning, ESM´96:European Simulation Multiconference, Budapest, Hungary, June 2-6, pp.654-660.
• López J., G. Cembrano, F, Cellier (1996), Time series prediction using Fuzzy Inductive Reasoning, ESM´96:European Simulation Multiconference, Budapest, Hungary, June 2-6, pp.765-770.
• Cellier F., J. López, A. Nebot, G. Cembrano (1998), Confidence measures in Fuzzy Inductive Reasoning, International Journal of General Systems, in print.
• López J., F. Cellier (1999), Improving the Forecasting Capability of Fuzzy Inductive Reasoning by Means of Dynamic Mask Allocation, ESM´99:European Simulation Multiconference, in print.
• López J., F. Cellier, G. Cembrano, L. Ljung, (1999), Estimating the horizon of predictability in time series predictions using inductive modeling tools, International Journal of General Systems, submitted for publication.
Future Research
• Use of time-series predictors in the design of smart sensors with look-ahead capabilities. If a sensor with look-ahead capability can anticipate the crossing of a critical threshold, it may issue an early warning that might enable the plant operator to do something about the problem before it ever occurs. (Appendix A)
• The design of signal predictive controllers that make use of smart sensors of the class introduced in Appendix A, to improve the control performance of feedback control systems.