Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2....

57
lujing

Transcript of Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2....

Page 1: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

lujing

Page 2: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model

2. Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming

3. Forecasting nonlinear time series of energy consumption using a hybrid dynamic model

4. Genetic programming-based voice activity detection

Page 3: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Abstract Review Of Existing Time Series Forecasting

Methods The DYFOR GP Model

Page 4: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Several studies have applied genetic programming (GP) to the task of forecasting with favorable results. However , these studies, like those applying other techniques, have assumed a static environment.

Page 5: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

If a time series is produced in a nonstatic environment, frequently only the recent historical data that correspond to the current environment are analyzed and historical data that come from previous environments are ignored.

Page 6: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

This study investigates the development of a new “dynamic” GP model that is specifically tailored for forecasting in nonstatic environments.

This Dynamic Forecasting Genetic Program (DyFor GP) model incorporates features that allow it to adapt to changing environments automatically as well as retain knowledge learned from previously encountered environments.

Page 7: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The DyFor GP model is tested for forecasting efficacy on both simulated and actual time series including the U.S. Gross Domestic Product and Consumer Price Index Inflation.

Page 8: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Classical Methods:1) exponential smoothing methods;2) regression methods;3) autoregressive integrated moving average

(ARIMA)methods;4) threshold methods;5) generalized autoregressive conditionally heteroskedastic

(GARCH) methods.

Page 9: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Modern Heuristic Methods:1) methods based on neural networks (NNs);2) methods based on evolutionary computation ( GP ) .

Page 10: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

As discussed in Section II, existing forecasting methods rely, to some degree, on human judgment to designate an appropriate analysis window (i.e., the correct number of historical data to be analyzed).

Page 11: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Consider the following example. Suppose the time series given in Fig. 4 is to be analyzed and forecast.

Page 12: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

As depicted in the figure, this time series consists of two segments each with a different underlying data generating process.

The first segment’s process represents an older environment that no longer exists but may contain patterns that can be learned and exploited when forecasting the current environment.

The second segment’s underlying process represents the current environment and is valid for forecasting future data.

Page 13: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

This is accomplished in the following way. 1) Select two initial window sizes, one of size n

and one of size n+i , where n and i are positive integers.

2) Run dynamic generations at the beginning of the historical data with window sizes n and n+i , use the best solution for each of these two independent runs to predict a number of future data points, and measure their predictive accuracy.

Page 14: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

3) Select another two window sizes based on which window size had better accuracy. For example, if the smaller of the two window sizes (size n) predicted more accurately, then choose two new window sizes, one of size n and one of size n-i. If the larger of the two window sizes (size n+i) predicted more accurately, then choose window sizes n+i and n+2i.

Page 15: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

4) Slide the analysis window to include the next time series observation. Use the two selected window sizes to run another two dynamic generations, predict future data, and measure their prediction accuracy.

5) Repeat the previous two steps until the analysis window reaches the end of historical data.

Page 16: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.
Page 17: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

However , after several window slides, when the data analysis window spans data from both the first and second segments, it is likely that the window adjustment reverses direction. Figs. 7 and 8 show this phenomenon.

Page 18: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.
Page 19: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

In Fig. 7, win1 and win2 have sizes of 4 and 5, respectively. As the prediction data, pred lies inside the second segment, it is likely that the dynamic generation involving analysis window win1 has better prediction accuracy than that involving win2 because win1 includes less data produced by a process that is no longer in effect. If this is so, the two new window sizes selected for win1 and win2 are sizes 3 and 4, respectively. Thus, as the analysis window slides to incorporate the next time series value, it also contracts to include a smaller number of inappropriate data. In Fig. 8, this contraction is shown.

Page 20: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

After the data analysis window slides past the end of the first segment, it is likely to expand again to encompass a greater number of appropriate data. Figs. 9 and 10 depict this expansion.

Page 21: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.
Page 22: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

As illustrated in the above example, the DyFor GP uses predictive accuracy to adapt the size of its analysis window automatically.

When the underlying process is stable (i.e., the analysis window is contained inside a single segment), the window size is likely to expand.

When the underlying process shifts (i.e., the analysis window spans more than one segment), the window size is likely to contract.

Page 23: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Abstract ARIMA Model Hybrid Forecasting Model The Model Development

Page 24: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The autoregressive integrated moving average (ARIMA), which is a conventional statistical method, is employed in many fields to construct models for forecasting time series. Although ARIMA can be adopted to obtain a highly accurate linear forecasting model, it cannot accurately forecast nonlinear time series.

Page 25: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

This study proposes a hybrid forecasting model for nonlinear time series by combining ARIMA with genetic programming (GP). Finally, some real data sets are adopted to demonstrate the effectiveness of the proposed forecasting model.

Page 26: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Box and Jenkins presented the ARIMA model in 1970.The method has been widely used in financial, economic and social scientific fields.

In the ARIMA(p, d, q) model, p is the order of auto-regression, d is the order of differencing, and q is the order of the moving average process.

Generally speaking, the ARIMA model can be represented as a linear combination of the past observations and past errors as follows:

Page 27: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

,...3,2,)1(

)1)(1(2

21

221

tBBB

yBBBB

tq

q

tdp

p

where is the actual value, B is the backward shift operator, is the constant item, is the random error at time t, and are the coefficients of the model and can be estimated utilizing the leastsquare method.

tyt

pq

Page 28: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Several investigations have developed some hybrid forecasting models that combine different methods to reduce the forecast error.

Page 29: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The hybrid models can be expressed as follows:

where represents the original positive time series at time t; represents the linear component, and

is the nonlinear component of the model, respectively.

ttt NL y ( 1 )

tytL

tN

Page 30: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The residuals can be obtained using the ARIMA model:

where is estimated using such nonlinear methods as GP. is the forecasted value of and is estimated using the ARIMA model.

ttt Lyr ( 2 )

tLtr

tL

Page 31: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Accordingly, the residual can be rewritten as follows:

where represents the nonlinear function that is constructed using GP and is the random error term. The hybrid model for forecasting time series is:

tntttt rrrfr ),....,,( 21

ttt NLy

( 3 )

),....,,( 21 nttt rrrf

t

( 4 )

Page 32: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

This study proposes a novel hybrid forecasting model, which combines ARIMA to model the linear component ( )of a time series and the GP to model the nonlinear component ( ), to improve the accuracy of ARIMA forecasting.

tLtN

Page 33: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The proposed hybrid approach is as follows: Step 1: The ARIMA model is utilized to model the linear

component of time series. That is, is obtained by using the ARIMA model.

Step 2: From Step 1, the residuals from the ARIMA model

can be obtained. The residuals are modeled by the GP model in Eq. (3).That is, is the forecast value of Eq. (3) by using GP.

Step 3: Using Eq. (4), forecasts of the hybrid model are

obtained by adding the forecasted values of linear and nonlinear components , yield in Step 1 and Step 2, respectively.

tL

tN

Page 34: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Abstract Energy Consumption Models Hybrid Dynamic Grey Forecasting

Page 35: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Energy consumption is an important index of the economic development of a country. Rapid changes in industry and the economy strongly affect energy consumption.

Although traditional statistical approaches yield accurate forecasts of energy consumption, they may suffer from several limitations such as the need for large data sets and the assumption of a linear formula.

This work describes a novel hybrid dynamic approach that combines a dynamic grey model with genetic programming to forecast energy consumption.

Page 36: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

3.2.1. GM(1,1) forecasting model This model can be constructed as follows: Step 1: Obtain positive time-series data as follows:

Step 2: Apply the accumulated generating operator (AGO) to the original time-series data (i.e. ) to obtain the accumulated time-series as follows:

Where

4)],(,),3(),2(),1([y )0()0()0()0()0( nnyyyy

)](,),3(),2(),1([y )1()1()1()1()1( nyyyy

)0(y)1(y

)1()1( )0()1( yy

n

m

myny1

)0()1( )()(

Page 37: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Step 3: Construct GM(1,1) using a grey differential equation, where a and u denote the grey parameters of the GM(1,1) model, and represents the average of and . Also, the grey parameters of the grey differential equation can be estimated using the ordinary least squares (OLS) method.

utazty )()( )1()0(

)()1( tz )1()1( ty)()1( ty

Page 38: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Step4:Replace the estimated parameters ( and ) in the grey differential equation and then obtain the GM(1,1) forecasting equation using the inverse AGO (IAGO) technique, in the following exponential form.

,3,2,)1)()1(()1()()( )1()0(

)1()1()0(

teea

uyttt taayyy

u

a

Page 39: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

3.2.2. Dynamic GM(1,1) model

Some studies have developed dynamic GM(1,1) models (DGM(1,1)) to increase the forecasting accuracy of GM(1,1).

In the DGM(1,1) model, is predicted using GM(1,1) and where

k < n. Following the determination of , is added to the original time-series, and is removed from the original time-series to

yield a new series

)1()0( ky)](,),3(),2(),1([y )0()0()0()0()0( kyyyy

)1()0( ky)1()0( ky

)1()0(y

)]1(,),4(),3(),2([y )0()0()0()0()0(1 kyyyy

Page 40: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The predicted value of can be obtained using the new series . The evaluation procedure is continued to obtain

for l=3,4,5,…, n -k -1.

)2()0( ky)0(

1y

)()0( lky

Page 41: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

This section describes a novel nonlinear hybrid dynamic forecasting model that combines the dynamic grey model with GP. The proposed model is derived as follows:

Page 42: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Step 1: Assume that original time-series of energy consumption data is (n data points), and that

is predicted using a novel DGM(1,1) model (NDGM(1,1)). Because GM(1,1) requires at least four data points to construct the forecasting model, Therefore, in the first rolling, can be determined from the series

ty

ty

)1()0( ky

))3(),2(),1(),(( )0()0()0()0( kykykyky

Page 43: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

In the second rolling, can be determined from

Moreover, in each rolling cycle, the newly predicted values of original data

are determined using the GM(1,1) model. The residual series of the NDGM (1,1) model can be expressed .

)2()0( ky))2(),1(),(),1(( )0()0()0()0( kykykyky

),...)2(),1(()0()0(

kyky

ttt yye

Page 44: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Step 2: In each rolling cycle of NDGM(1,1), construct the model for forecasting the error using the nonlinear function , determined by GP as follows:

where denotes the jth point estimate of NDGM(1,1) that is conditioned in the ith rolling cycle; the series represents the errors of the ith rolling cycle and can be obtained using the GM(1,1) model in the four periods; represents a random error.

jir ,

),4(,),6,2(),5,1(

),,,( ,4,3,2,1,,

nn

rrrrfr jijijijijiji

jir ,

),,,( 4,3,2,1, jijijiji rrrr

ji ,

Page 45: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

In the GP model, the input variables are the lagging residual series and the output variable is .

To reduce the forecasting error, the fitness function in GP is defined as follows :

),,,( 4,3,2,1, jijijiji rrrrjir ,

4,,2,1,:5

,,

nirrMinimizen

jjiji

Page 46: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Step 3: Express the hybrid dynamic forecasting model that combines the NDGM(1,1) model and the GP model as follows.

where denotes the forecasted value of y; represents the series ; and

represents the series

tt eyy

y

ty))(),...,6(),5((

)0()0()0(

nyyy

te

nnrrr ,46,25,1 ,...,,

Page 47: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Abstract Definition of GP-VAD algorithm

Page 48: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

A voice activity detector (VAD) is a classifier the output of which is 1 or 0 indicating, respectively, the presence of voice or silence (noise) in each speech frame

A voice activity detection (VAD) algorithm is generated by using genetic programming (GP). The inputs of this VAD are the parameters extracted from the speech signals according to the ITU-T G.729B VAD standard.

The GP-based VAD algorithm (GP-VAD) is evaluated using the AURORA-2 database.

Page 49: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

GP-VAD employs the same five parameters extracted by G.729B within each 10 ms frame :

a) the full-band energy, ;b) full-band energy difference, ; c) low-band energy difference, ; d) zero-crossing rate difference, ; e) the spectral distortion, . Let Y(n) be the GP-VAD decision at frame n. The

previous decisions Y(n-1) and Y(n-2) are also incorporated as inputs.

fEfElEZC

S

Page 50: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

For the GP-VAD approach, the five preparatory steps mentioned above were defined as follows:

a)Function set.b)Terminal set.c)Fitness measure.d)Control parameters.e)The termination criterion

Page 51: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

There are arithmetic functions and logical functions. The function set is F={+, -, *, %, AND, OR, NOT, GT}, where % is the protected division , i.e. it returns 1 when division by zero takes place, otherwise returns the normal quotient. The AND, OR ,NOT and greater-than (GT) functions return the values 1 or 0 instead of Boolean values.

Page 52: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

There are three types of terminals: numerical constants, the parameters extracted by G.729B, and the VAD decisions in the two previous frames.

The terminal set is

T={ , , , , , ,Y(n-1), Y(n-2)},

where represents the set of floating points constants from 100.0 to -100.0.

fE fE lE ZC S

Page 53: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

Every GP-VAD program tree i in the population is evaluated according to the error rate per unit defined as:

where Es and EN are the detection error rates on speech and non-speech frames , respectively; and the constant k is set to give more importance to ES than EN, typically k=0.4.

)()()(E ikEiEi Ns

Page 54: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The fitness function, f(i), is defined as:

where Q(i) is a penalty function, which penalizes individuals whose number of transitions between speech and non-speech stages, NTGPVAD , is greater than the target number of transitions of the model used as a reference, NTREF. The penalty function is defined as,

)(1.0)(1)( iQiEif

}0,{)(REF

REFGPVAD

NT

NTNTMaxiQ

Page 55: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The control parameters were set as follows:a) population size, 300; b) maximum number of generations, 500; c) maximum depth size, 17; d) tournament selection size, 2; e) crossover probability,0.9; f) standard mutation probability, 0.05;

Page 56: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.

The termination criterion was:a) The maximum number of generations.b) The result corresponded to the best evolved

individual

Page 57: Lujing. 1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model 2. Forecasting time series using a methodology based on.