ARMA-Stochastic Time Series Modeling

8/2/2019 ARMA-Stochastic Time Series Modeling

1/19

3

Contents

Abstract

Chapter 1 Critical Reviews:

1.1 Stochastic Time Series Modeling, Simulation & Prediction

1.2 Regression Analysis Time Series Modeling & Simulation

1.3 Chaotic Time Series without Rule Based Fuzzy logic(FL),

Mackey Glass Simulation with FL and Prediction

1.4 Rule Based Fuzzy Logic Time Series Prediction,

Modeling and Simulation

1.5 Artificial Neural Network Time Series (ANNTS)

Modeling, Simulation & Prediction

1.6 Thesis Plan


2/19

4

1.1 Stochastic Time Series Modeling, Simulation & Prediction

A method of forecasting wind power output a few hours advance, from a wind

power generator that is supplying power and energy system, is required to ensure

efficient utilization of the power. Time series modeling of wind speed has been the

subject of many discussions because of the interest in wind as an alternative form of

energy. When the records of wind speed are incomplete or of too short a duration or the

handling and storage of large values of the data are not desirable, then a time series

model is needed .Since wind power is a function of wind speed, simulation of power

generally are derived from simulations of speed. Wind speed simulations can be done

with Monto Carlo methods that rely solely on the estimated parameters of the marginal

distribution of wind speeds. The

The multiplicative ARMA (autoregressive moving average) models to generate

hourly series of global radiation by Mora-Lopez and Sidrarch-de-Cardona (1998),

stochastic simulation using ARIMA (autoregressive integrated moving average)

modeling of solar irradiation by Craggs et al (1999) and a time dependent autoregressive

Gaussian model (TAG) for generating synthetic hourly radiation by Aguiar and Collares

Pereira (1992) are important contributions from modeling and simulation point of view.

Lalarukh and Jafri (1999) used an ARMA process on hourly global radiation data,

performed stochasting modeling through MTM (Markov Transition Matrix) and

generated synthetic sequences of hourly global solar irradiation for Quetta, Pakistan.

They found MTM approach relatively better as a simulator compared to ARMA

modeling. But, their analysis for ARMA process to simulate and forecast hourly averaged

wind speed for Quetta, Pakistan also yielded good results Lalarukh and Jafri (1997).

Several non-Gaussian distributions have been suggested as appropriate models for

wind speed. These models include the inverse Gaussian distribution Bardsley (1980), thelog normal distribution Luna and Church (1974), the gamma distribution Sherlock

(1951), the Weibull distribution Hennessey (1977); Justus, et al (1976); Stewart and

Essenwanger (1978) and Takle and Brown (1978) and the squared normal distribution

Carlin and Haslett (1982). We have seen from our previous studies Nasir et al (1991);

Raza and Jafri, (1987) and Brown ((1981) that the Weibull distribution fits the actual


3/19

5

wind speed frequencies quite well. However, the use of inverse Gaussian distribution on

wind data Bardsley (1980) ignores the positive correlations between consecutive

observations of wind speed. Failure to take this autocorrelation into account leads to

underestimation of the variances of the time averages of wind speeds. Moreover, the long

runs of high and low wind speeds that are characteristic of such data do not occur

frequently enough in simulated data when wind speeds are assured to be uncorrelated

over time.

To overcome this problem Chou and Corotis (1981) and Goh and Nathan (1979)

have attempted to incorporate autocorrelation into wind speed models, but they do not

consider the Gaussian shape of transformed wind speed distributions and its

corresponding statistics. Some of the studies have neglected the non Gaussian shape of

the wind speed distribution. Brown et al (1984) suggested methods to take into account

the autocorrelated nature of wind speed, the diurnal non-stationarity and non Gaussian

shape of wind speed distribution so that forecasting of hourly averaged wind speed could

be done. Brown et al (1982) in their previous study, have also indicated the need for

standardization to remove diurnal non-stationarity. Diurnal variations in wind speed

occur as a natural phenomenon Jafri et al (1989) and as mentioned in a paper by Kamal

and Jafri (1996) standardization corresponds to smoothing of a profile, such as of a

Gaussian distribution that is obtained after transforming a non- Gaussian shape to an

approximately Gaussian shape,.i.e., by bringing scattered data points close to the profile.

We accomplished this standardization procedure in the present study, for hourly averaged

wind data for a period of twenty years ,.i.e ., 1985-2004, of Quetta, Pakistan before using

ARMA process.

Jafri (1996)a established that the hierarchical random process is a Markovian

random process, which can be characterized by a scaling probability distribution. A

generating function for such a process was obtained. These observations can be

successfully applied to chaotic time series Jafri (1996)b to overcome the non-stationarity

in ARMA process but it would require handy stochastic simulation techniques. Jafri

(1996)b suggested that the chaotic time series both in Bayesian and non Bayesion

statistics is deterministic. Jafri (1995) developed a first order Markov transition matrix

(MTM) for non Gaussian nature of wind speed of Quetta for 1985 and suggested a


4/19

6

Gaussian form of MTM sequence to yield HAWS (hourly averaged wind speed)

sequences. The same work was extended further on wind speed data for a period of

twenty years, .i.e.,1985-2004. Needless to mention, the simulation of wind data using

MTM Jafri (2001) is relatively difficult compared to simulation on solar radiation data

Lalarukh and Jafri (1999).The number of iterations exceeds beyond a certain limit thus

causing for HAWS and DAWS (daily average wind speed) sequences to become

cumbersome and entangled. Jafri (1995; 2001) also found autocorrelation coefficient for

wind data, which shows levels of persistence in wind speed frequencies and of wind

speed magnitudes when compared with diurnal variations over daily averaged wind speed

(DAWS) sequences.

Blanchard and Deserochers (1984) and Brown et al (1984) employed a class of

parametric time series models called autoregressive moving average processes (ARMA)

of Box and Jenkins (1976). Such processes have been employed to model many

meteorological time series Katz and Skaggs (1981). The model of Blanchard and

Desrochers (1984) takes into account high autocorrelation and allows a time series to be

generated which presumes all the main characterstics of the data ; and it does not require

any assumption about the wind speed distribution. In fact, a larger class of seasonal

models include ARIMA models Blanchard and Desrochers (1984). Sfetos (2002) studied

the linear ARIMA models and feed forward artificial neural networks (FFANN). He

found that the model order is selected from the minimization of the evaluation set error in

the ARIMA process. He suggested the multi step forecasting and the subsequent

averaging to generate mean hourly prediction of wind data. The ARIMA models have

been critically analyzed by Jain and Lungu (2002). They considered both non- seasonal

and seasonal ARIMA models by using stochastic components. They also deliberated to

determine the persistence patterns if any, of the stochastic components.

We know the model of Chou and Corotis (1981) is based on Weibull distribution

and does not require stationarity in the data. McWilliams and Sprevak (1982 a) described

a new version of an existing time series modeling procedure Box and Jenkins (1976)

from which the distribution of wind speeds and wind directions are obtained McWilliams

et al (1979) and McWilliams, and Sprevak (1982)b. Their model incorporates diurnal

variations observed in wind speed in such a manner that the time series of wind speed


5/19

7

component remain stationary; the sample autocorrelation functions for the series have

identical stochastic behavior as far as the second order statistics are concerned, thus

reducing the problem to modeling single Gaussian series. This model is corrected for

autocorrelation functions, to account for diurnal variations. There is one point which is

obvious: they did not use transformation of hourly averaged wind speed. Instead, they

considered annual deterministic variation (t) and 2(t) which are modeled by harmonic

series representation to account for diurnal variation of wind speed . With regard to our

conjecture, diurnal variation Jafri et al (1989) should be employed in model development

in a manner similar to McWilliams and Sprevak (1996b)b.

We followed the approach of Daniel & Chen (1991) which consists of first fitting

ARMA processes of various orders to hourly averaged wind speed (HAWS) data which

have been transformed to make their distribution approximately Gaussian and standardize

to remove the so called diurnal stationarity . We did not like procedures of

transformation and standardization but preferred this approach for the reason that the

model had the capability of using wind data of more than one year .The primary

advantage of including more than one year of data in the model development is the

increased reliability of the estimates of the model parameter.

We used MINITAB (version 11) for ARMA, non seasonal ARIMA and seasonal

ARIMA modeling and simulation. ARIMA models are used to model a special class of

non- stationary series. Seasonal ARIMA (SARIMA) models are used to incorporate

cyclic components in models. In other words, ARIMA models are, in theory the most

general class of models (Parsemonius) for forecasting a time series which can be

stationarized by transformations such as differencing and logging. SARIMA has the same

structure as ARIMA . We used both non seasonal and seasonal models on hourly

averaged wind data of 1985-2004. For non- seasonal ARIMA modeling and simulation,

the six options,. i.e., random walk (ARIMA(0,1,0)), differenced first order autoregressive

model (ARIMA(1,1,0)), constant (ARIMA(0,1,1), linear exponential smoothing (LES)

without constant (ARIMA (0,2,1) or (0,2,2)) and mixed ARIMA(1,1,1) are tried for each

month and on four seasons. Non seasonal ARIMA (0,1,1) which deals with exponential

growth and constant incorporates simple exponential smoothing (SES) model. MA(1)


6/19

8

coefficients correspond to 1- in the SES formula. The term is called training

parameter. For LES without constant, MA(1) coefficient corresponds to 2.

For seasonal ARIMA (SARIMA) modeling and simulation, the seven

options,. i.e., SARIMA(0,1,1)x(0,1,1)12, SARIMA(0,0,0)x(0,1,0)12 with constant,

SARIMA(0,1,0)x(0,1,0)12 SARIMA(1,0,1)x(0,1,1)12 with constant, SARIMA following

SES with =0.4772 and Browns SARIMA(LES) with = 0.2106 are tried for each

month only. The most oftenly used model of ARIMA is SARIMA(0,1,1)x(0,1,1)12 which

strictly follows seasonal exponential smoothing. SARIMA(0,1,0)x(0,1,0)12 is also

known as seasonal random trend (SRT) model. The alternate to SRT model is seasonal

random walk model,.i.e., SARIMA (1,0,0)x(0,1,0)12. There is, of course, a difference

between seasonal and simple exponential models. The values of = 1- is used in

exponential smoothing formulas. The best option is selected by considering the most

minimum chi- squared value at 5% confidence interval.

1.2Regression Analysis Time Series Modeling & SimulationThe regression is strictly the correlation analysis, accomplished with time and

sometimes without time series. The modern interpretations and fundamental concepts of

regression analysis are thoroughly presented by Gujarati (1988), Siegel (1997), Rawlings

(1988) and Newton (1988). All kinds of regression analysis can be accomplished by theleast squares regression technique, which minimizes the discrepancy between data points

and the fit Chapra and Canal(1990). It comprises of linear regression (LR), polynomial

regression (PR), multiple linear regression (MLR), general linear least square (GLLS)

and non-linear regression (NLR). For NLR, least square technique is used. Gauss-Siedel

technique can not be employed because the normal equations are not diagonally

dominant. NLR analysis is sometimes very useful to fit but it also requires minimization

of the sum of the square of the residuals (SSR). This analysis is only carried out on a

single independent variable, therefore, multiple parameters which are interrelated with

each other such as in MLR can not be studied. However, NLR analysis has the advantage

over PR because it exploits iteration. For NLR analysis the Gauss-Newton method has

some short comings such as slow convergence, wide oscillations,.i.e., changing directions

and sometimes divergence Draper and Smith (1981). These discrepancies were overcome


7/19

9

by other methods such as the steepest descent and the Lavenberg-Marquardt techniques

Trabea and Shaltout (2000). However, PR in some cases, especially when data is

distributed like a parabola or in a cubic polynomial can be applied because it is dependent

on a single variable, such as PRATS, in our case. Trabea and Shaltout (2000) studied

correlation of global solar radiation with meteorological parameters like mean daily

maximum temperature, mean daily relative humidity, mean daily sea level pressure, mean

daily vapour pressure and hours of bright sunshine, by using MLR analysis. The

correlation, the regression coefficient and the standard error were estimated. But they did

not consider the interdependence of the meteorological parameters. Rapti (2000)

developed mathematical correlation of atmospheric turbidity with specific humidity and

of diffuse radiation with atmospheric turbidity for maritime and for continental air

masses. This study does not include any statistical correlations.

Ilyas and Nasir (2000) developed a relationship between humidity and

temperature and found Guassian trend. The best fit to the experimental data as suggested

by them, is as follows:

2ln

o

th o eT

H Hk

=

whereHthis the theoretical humidity,Hoand Toare the experimental values of humidity

and temperature, respectively and kis a constant for the fit. Hussain, Jafri and Kamal [10]

used regression modeling of weather data and found PRATS relatively better than PR.

Ilyas (2000) found an inverse Guassian relationship for percentage cumulative frequency

of sunshine hours and solar energy,. i.e.,

2(%) exp 0.5cum

th

Ef k

E

=

where

{ }2

exp ln cumth

x Ek x f and x

n E

= =

-----------(3)

_____________________ (1)

___________ (2)


8/19

10

In eq.2, symbols E, Eth, x and n represent solar energy, threshold solar energy,

square of the ratio of solar energy to its threshold values and the total number of data

respectively.

The overall behavior of humidity on temperature and solar energy on its

cumulative frequency of sunshine hours shows a reversal,.i.e., the former is Guassian and

the later is inverse Guassian. We tried to establish the best fit to our diverse data by using

regression analysis. Kamal and Jafri (1999) developed stochastic modeling and generated

synthetic sequences of hourly global solar irradiation. They also found the Markov

transition matrices (MTM) approach relatively better as a simulator compared to

Autoregressive Moving Average (ARMA) process. The time series models to stimulate

and forecast hourly averaged wind speed (HAWS) were presented by Kamal and Jafri

(1997). They also used simulation of Weibull distribution of HAWS Kamal and Jafri

(1996). With the use of triangulation method and statistical correlation from regression

equations, solar radiations were estimated at locations where there were no observatory

and found it very much reliable Raza and Kamal(2002). Jafri recently performed fuzzy

logic time series (FLTS) prediction modeling on HAWS (2007). Needless to mention,

regression modeling despite many of its short comings is a better predictor. The fuzzy

regression analysis is defined as the model which includes the fuzziness (uncertainty) in

itself Tanaka and Ishibuchi (1992). Ozawa et al.(1997) used the fuzzy autoregressive

(AR) model to describe the fuzzy time series Ozawa et.al (1997) which can not be dealt

by stochastic models. The fuzzy time series analysis was proposed by Watada (1992).

1.3 Chaotic Time series without Rule Based Fuzzy logic (FL), Mackey

Glass Simulation with FL and Prediction

The original fuzzy logic (FL) pioneered by Lotfi Zadeh (1965) has been around

for forty years, and yet it is unable to handle uncertainties. Zadeh introduced the conceptof a fuzzy set, a set whose boundary is not sharp or precise. This concept contrasts with

the classical concept of a set recently called a crisp set, whose boundary is required to be

precise. Probability and fuzzy sets describe different kind of uncertainty .The probability

is the theory of sets. It deals with the likelihood of relevant events or with the expectation

of a future event based on something now known (outcome of a random event) while the


9/19

11

fuzziness is not the uncertainty expectation. Fuzzy set theory, on the other hand is not

concerned with events. It is concerned with concepts. Rule based fuzzy logic system

(FLS) is a powerful design methodology to minimize the effect of uncertainty Mendel

(2001). Model free designs are artificial neural networks (ANN) and fuzzy logic(FL).The

fuzzy logic (FL) rules are extracted from numerical data and are then combined with

linguistic knowledge. The richness of fuzzy logic is that there are enormous members of

possibilities that lead to a lot of non-linear mappings of an input data vector into a scalar

output. In model free approaches, the associated model is a representation of architecture

to solve a specific problem. With model approach in fuzzy logic, one can endeavor the

truth or close approximation theory. FLSs employ 500 rules for one pass (OP) and

sixteen rules for back propagation (BP) steepest descent method of designs, respectively.

We followed a model free approach, .i.e., fuzzy logic on hourly wind speed data to

predict future value, . i.e., consequents from antecedents (past values) . A single stage

forecasting for a chaotic time series wind data will be used.

1.4 Rule based Fuzzy Logic Time series Prediction, Modeling and

Simulation

Rule based fuzzy logic systems (FLS), a powerful design methodology, minimize

the effect of uncertainty Mendel(2001). The two most popular FLSs used by engineers

today are the Mamdani and Takagi-Sugano-Kang (TSK) systems. Both are characterized

by IF-Then rules and have the same antecedent structures. They differ in the structure of

the consequents. The consequent of a Mamdani rule is a Fuzzy set, whereas the

consequent of a TSK rule is a function. The type-1 TSK FLSs have been widely used in

control and other applications Terano et al (1994). The output of type-1 TSK forecaster

occurs without a defuzzification step. Lieng and Mendel (1999; 2000) developed type-2

TSK FLSs. The FLS forecasters comprise of singleton type-1 (with virtually nouncertainties), non-singleton type-1 (with uncertainties), singleton type-2, type-1 non-

singleton type-2, type-2 non-singleton type-2, type-1 TSK and type-2 TSK Mendel

(2001). The rule based fuzzy logic systems (FLSs), both type-1 and type-2, handle

uncertainties because modeling and minimization of uncertainties can be accomplished.

If all uncertainties disappear, type-2 FL reduces to type-1 FL, in much the same manner


10/19

12

that if randomness disappears, probability reduces to determinism. For basic singleton

type-1 FLSs, we assume that there are no uncertainties; all fuzzy sets are of type-1,

measurements are perfect and treated as crisp values,.i.e., as singletons. Thus, the non-

singleton FLS do not yield crisp values, i.e., uncertainties are inherently present. A FLS

that is described completely in terms of type-1 fuzzy sets is called a type-1 FLS. Type-1

FLSs are unable to directly handle rule uncertainties, because they use type-1 fuzzy sets

that are certain. Therefore, a better way to handle uncertainties is to use a type-2 FLS.

But, a non-singleton type-1 FLS is a type-1 FLS whose inputs are modeled as type-1

fuzzy numbers; hence, it can be used to handle uncertainties. Moreover, the type-1 FL, in

its applications, deciphers rule based systems as a powerful design methodology.

The rules of a non singleton-type-1 FLS are the same as those for a singleton

type-1 FLS Mendel (2001). The difference is of the fuzzifier, which treats the inputs, as

type-1 fuzzy sets, and the effect of this on the inference block. The output of the

inference block will again be a type-1 fuzzy set. The type-1 FLS, both for singleton and

non-singleton, is shown in Fig.1. So the defuzzifiers that are described for a singleton

type-1 FLS apply as well to a non-singleton type-1 FLS Mendel (2001).

We know that non-stationarity (randomness) in our wind data inherently exists

Jafri (2005); Kamal and Jafri (1996), therefore, uncertainties or randomness cannot be

reduced. It can be handled properly with non-singleton type-1 FLS, therefore, there

appears no reason to use a type-2 FLS.

We recently performed fuzzy logic (FL) time series prediction modeling on

hourly averaged wind speed (HAWS) data of 1985-2004 and used Mackey-Glass

simulation, for Quetta, Pakistan.. We shall use the same results of wind data with the

applications of rule based type-1 FLS. We used the MATLAB M-files which are:

URL:http://sipi.usc.edu/~mendle/software. The M-files are available in three folders:

type-1 FLS, general type-2 FLSs and Interval type-2 FLSs. We used, in this study, the

following type-1FLSs:

- Singleton Mamdani type-1 FLSsfls_type1.m: compute the output(s) of a singleton

type-1 FLS when the antecedent membership functions are Gaussian


11/19

13

train_fls_type.1.m: tune the parameters of a singleton type-1 FLS when the

antecedent membership functions are Gaussian using some input-output training data

Non-singleton Mamdani type-1 FLS

nsfls_type1.m: compute the output(s) of a non-singleton type-1 FLS when the

antecedent membership functions are Gaussian and the input sets are Gaussian

train_nsfls_type1.m: tune the parameters of a non- singleton type-1 FLS when the

antecedent membership functions are Gaussian, using some input- output training data.

We avoid the extraneous matter on the development and historical background of

rule- based FLSs because we are concerned only with use of FLSs in time series. The

exhaustive literature and indeed critical review on rule-based FLSs are available in the

form of a book by M. Mendel (2001). However, we shall deliberate on fundamental rules

extracted from the data under consideration. The rules in fuzzy logic time-series are

usually extracted from designing the FLSs. Prior to 1992, all FLSs reported in the open

literature fixed the parameters, such as the type of fuzzification, composition,

implication, t-norm (operators for fuzzy intersection), defuzzification (produces crisp

output) and membership functions, arbitrarily,.e.g., the locations and spreads of the

membership functions were chosen by the designer independent of the numerical training

data. Then, at the first IEEE conference in Fuzzy systems, held in San Diago in 1992,

three different groups of researchers,.i.e., Horikowa et al (1992), Jang (1992) and Wang

and Mendel (1992), presented the same idea: tune the parameters of a FLS using the

numerical training data. Since that time, quite a few adaptive training procedures have

been published. Because tuning of free parameters had been in feed forward neural

network (FFNN) long before it was done in a FLS, a tuned FLS has also come to be

known as a neural fuzzy system. Designing a FLS Mendel and Mouzouris (1997) can be

viewed as approximating a function or fitting a complex surface in a multidimensional

space. Given a set of input-output pairs, tuning is essentially equivalent to determining a

system that provides an optimal fit to input-output pairs, with respect to a cost function

(tuning algorithm). Utilizing concepts from real analysis, Monzouris and Mendel have

proven that a non-singleton FLS can uniformly approximate any continuous function on a

compact set. Although the proof of approximation Mendel and Mouzouris (1997)

provides some insight, it does not tell us how to choose the parameters of the non-


12/19

14

singleton FLS, nor does it tell us how many basis functions will be needed to achieve

such performance. The latter are accomplished through design. The designing of FLSs

require one-pass (OP), least square, back-propagation (steepest descent, BP), SVD-QR

(SVD-QR is a matrix tool in numerical linear algebra used in signal processing,

extracting fuzzy rules, reducing fuzzy rules and modeling the fuzzy rules) and iterative

design methods.

The forecasting of timeseries following the rule-based FLSs designing employ

only two methods, .i.e., one pass (OP) and back propagation (BP) methods, respectively.

The OP design constructs 500 rules for each antecedent consequent membership

functions. We set the value of the standard deviation equal to 0.1 for all Gaussians in a

pre-defined OP design. But, the OP is exhaustive as compared to BP designing in FLSs.

On the contrary the BP constructs only 16 rules for each antecedent and consequent

membership functions. The initial values of the standard deviation of Gaussian

membership function are all set equal to 0.5240 in a pre-defined BP design. The BP

designing, in many respects, is better than OP, Mendel (2001). The predefined values of

all four antecedent membership functions and for the centers of the consequent

membership functions ( ly -height defuzzifier) for each corresponding 16 rules in a BP

design for FLSs are used in the form of a matrix as an input. We use the height

defuzzifier (l

y or centers of the consequent membership functions); to be a random

number from the interval (0,1). After training and using BP design, the FLS forecaster

was fixed. We use the learning parameter=0.2 in BP design.

Withtractable learning laws, we set the learning parameters. Alpha stable statistics model

the impulsiveness as a parameterized family of probability density functions. Additive

fuzzy systems can filter impulsive noise from signals. With < 2 one gets impulsive

noise and noise has infinite variance. The alpha in statistics is an exponent parameter.

With

=2, we get the classical Gaussian case, .i.e., exponential tail and finite variance.

The predefined initial mean (center) values of antecedent membership functions

along with height defuzzifiers (mean values of consequent membership functions) and

the standard deviations of the Gaussian antecedent, in the form of matrix membership

functions, as shown in tables 1 and 2, are used for determining the values of singleton


13/19

15

consequent membership functions, .i.e., )( ks sf for hourly 600 trainee wind data and 120

or 144 testing wind data, respectively.

The predefined final mean (center) values of antecedent membership functions

along with height defuzzifiers (mean values of the consequent membership functions)and the standard deviations of the Gaussian antecedent membership functions, in the

form of a matrix, after six epochs of training, as shown in tables 2 and 3, are used for

determining the values of non-singleton consequent member functions, .i.e., fns(sk

), for

hourly 600 trainee data and 120 or 144 testing data, respectively. In both cases, 600

trainee wind data and 120 or 144 testing data for all four antecedent membership

functions are used as an input matrix, X, in sfls_type1.m and nsfls_type1.m, respectively.

For trainee as well as for testing data, we calculated the predicted values Jafri (2005);

Jafri et al (2005). It is difficult to reproduce all predicted values and the values of

consequent membership functions for singleton and non-singleton type-1 FLSs in this

manuscript. Therefore, we will compare root mean square error,.i,e., RSMEs (BP) with

RSMEns (BP) only for testing data.

RMSEs = 2)(719

600

)]()1([120

1 ks

k

xfks +=

-------------------(4)

RMSEns = 2)(719

600

)]()1([120

1 kns

k

xfks +=

where x(k)

= [ x (k-18), x(t-12), x(t-6) x(t)]T

------------------(5)

s(k+1) = x(t+6)

It is worth mentioning that trainee pairs are obtained with testing data, therefore,

the analysis of testing data will be the same for trainee data, We input predefined initial

mean values of all antecedent membership functions (table 1) in case of a singleton type-

1 FLS because we assume that there are no uncertainties in the data. But, we cannot

totally ignore the noisy measurement environment, therefore, we tested our final FLS

forecasters on noisy testing data, .i.e.,


14/19

16

x(k) = s(k) + n(k) -------------------(6)

where n (k) is OdB (decibel) uniformly distributed noise.

We accomplished this task for a Monte Carlo set of 60 realizations. After each

epoch we used the testing data to see how FLS performed by computing RMSEs(BP) and

RMSEns(BP), respectively by using equation (4). This entire process was repeated 60

times using 60 independent sets of mean and standard deviation of 720 or 744 hourly

averaged wind data. The predefined BP RMSEs (BP), Mendel (2001) for each of the six

epochs of tuning are:

RMSEs (BP) = {.0548,.0431,.0322,.0261,.0237,.0232}-------(4)

The non-singleton FLS shares most of the same parameters as the singleton FLS.

So we shall use the partially dependent BP design approach. In BP design we use only

two fuzzy sets for each of the four antecedents, so that there are only 16 rules. Each rule

is characterized by eight antecedent membership function parameters (the mean and

standard deviation for each of the four Gaussian membership functions) and one

consequent parameter, y . More specifically, we initially chose the mean of each and

every antecedents, two Gaussian membership functions as xxm 2 or xxm 2+ ,

respectively, and the standard deviations of these membership functions as x2 .

For the non-singleton type-1 FLS, we modeled each of the four noisy input

measurements using a Gaussian membership function. Two choices are possible: (1) use

a different standard deviation for each of the four input measurement membership

functions, or (2) use the same standard deviation for each of the four input measurement

membership functions. We tried both approaches and got similar results because theadditive noise n(k) is stationary. The predefined average values and standard deviations

Mendel (2001) of RMSEs (BP) and RMSEns (BP) are shown in fig. 2 for each of the 6

epochs.


15/19

17

1.5 Artificial Neural Network Time Series (ANNTS) Modeling,

Simulation & Prediction

McCulloch-Pitts neuron is the earliest artificial neuron described with fixed

weights, a threshold activation function and a fixed discrete (non zero) time step for thetransmission of a signal from one neuron to the next McCllouch and Pitts (1943). A

processing unit is termed as a neuron or node. An artificial neural network (ANN) is an

information processing paradigm that is inspired by the biological nervous system such

as the brain and its processing information. A biological neuron has three types of

components, that are of particular interest in understanding an artificial neuron: its

dendrites, soma and axon. The dendrites receive signals from neighboring neurons. The

signals are electric impulses that are transmitted across a synaptic gap by means of a

chemical process. The synapse is a connection amongst neurons where their membranes

almost touch and signal are transmitted from one to the other by chemical

neurotransmitters. The soma or cell body sums the incoming signals, fixes signals when

sufficient input is received and transmits signals over its axons to other cells. The axon is

a long fiber over which a biological neuron transmits its output signals to other neurons.

Neural networks are computer algorithms following the information processing exactly

in the same manner as in the nervous system. They learn from the past to predict the

future; offer solutions when explicit algorithms and modules are unavailable or too

cumbersome. The neural network representative data is gathered and training algorithms

are invoked to automatically learn the structure of data. There are many types of network

ranging from simple Boolean networks (perceptron), to complex self-organizing

networks (Kohonen Networks),to networks modeling thermodynamic properties

(Boltzmann machines) Haykins (1994).There are nearly as many training methods as

there are network types but some of the more popular ones include back propagation, the

delta rule and Kohonen learning. A standard network architecture consists of several

layers of neurons.

An ANN is configured for a specific application, such as pattern recognition or

data classification, through a learning process. Learning in biological system involves

adjustments to the synaptic connections that exist between the neurons. This is true of

ANNs as well. We shall emphasize only on ANN simulations which appear to be a


16/19

18

recent development. This discipline of knowledge was established before the advent of

computers. Many important advances in ANNs reported during five decades since its

discovery in 1943, resulted into frustration among researches Fausset (1994). Recently,

the neural networks (NN) enjoy resurgence of interest and have begun to emerge as an

entirely novel approach for the modeling of complex and non-linear phenomena Hertz et

al (1991); Bishop(1995); Candill and Butler (1993); Whitley (1995); Connor et al (1994);

Dorffner (1996); Ababarnal et al (1993); Gershenfeld and Weigend (1993); Fahlman and

Lebiere (1990); Kanter et al (1995); Eisentein et al (1995); Bengio et al (1995); Fessant

et al (1995); Ruiz-Suarez et al (1995) and Boznar et al (1993). Neural network (NN) is

particularly useful when problems are driven rather by data than by concept or theory. To

date NNs have yielded many successful applications in areas, as diverse as finance,

medicine, engineering, geology, and physics indeed, any where that they are problems of

prediction or classification, neural network are being introduced. ANN models have been

applied to problems involving runoff forecasting and weather predictions Kang et al

(1993) ANNs have been applied to groundwater reclamation problems Ranjethan and

Eheart (1993), predicting average air temperature Cook and Wolfe(1991), predicting

precipitation Kalogirou et al (1998) and for forecasting of price increments Castiglioue

(2002). There has been intensive research on NNs Engel and Broeck(2000) and Kinzel

(1999); Gardner and Dorling (1998); Kulkarni et al (1997); Edwards et al (1997); Geva

(1998); Giles et al (2001); Khotanzad et al; Biehl and Caticha (2001); Schroder and

Kinzel (1998); Eindor and Kanter (1998); Priel and Kanter (2000); A-Hujazi and

Nashash (1996); Hertz and Krogh (1991); Andreas et al ((1994) and Azoff (1994)..

Prediction of time series is an important application of NNs. Since 1995 the time series

prediction by NNs have been exhaustively studied, Kalogirou et al (1998); Castigioue

(2002); Engel and Broeck(2000); Kinzel (1999); Garden Dorling (1998); Kulkarni et al

(1997); Edwards et al (1997); Geva (1998); Giles et al (2001); Khotanzad and

Abaye(1997); Biehl and Caticha (2001); Schroder and Kinzel (1998); EinDor and

Kanter (1998); Priel and Kanter (2000); Al-Hujazi and Al-Nashash (1996); Hertz et al

(1991); Andreas et al ((1994) and Azoff (1994); Gately (1996); Refenes et al (1997);

Mohandes et al (1998); Zhand et al (1998) and Hill et al (1996). Detecting trends and

patterns in financial data is of great interest to the business world to support the decision


17/19

19

making process through time series forecasting,. i.e., with neural networks Lin et al

(1995). Generally wind speed is a highly non-linear phenomenon Kamal and Jafri

(1996)a and Kamal and Jafri (1997). ANNs have recently been used successfully in

prediction of wind speed/energy Mohandes et al (1998); Kariniotakis et al (1996); Li et al

(1997); Shuhui et al (2001); Sfetsos (2002) and Kamal (2004). ANNs which are trained

on a time series are supported to achieve firstly to predict the time series many time steps

ahead and secondly to learn the rule which has produced. The prediction and learning are

not necessarily related to each other especially for chaotic time series Freking et al

(2005).

Burney (1999) studied artificial Neural networks (ANNs) with emphasis on predictive

data mining. Burney and Jilani (2001) applied methods of ANNs for the forecasting of

stock exchange. They performed the supervised ANNs for stock exchange share rates

prediction Burney and Jilani (2003). The most notable work on ANNs was the

comparison of first and second order algorithms, Burney et al (2004).More and Deo

(2003) employed the technique of neural networks to forecast daily, weekly and monthly

wind speed. Both feed forward (FF) as well as recurrent networks (RN) are used and

trained on past data in the autoregressive (AR) manner using back propagation (BP) and

cascade correlation (CC) algorithms. They conclude that the CC algorithms yield more

accurate forecasts compared to that of BP.

With critical analysis & review on ANNs, we are of the opinion that ANNs yield better

forecasts than the traditional stochastic time series model of ARIMA. We have not been

able to find any relevant research article pertaining to ANNs in Journal of the American

Statistical Association of the last two decades.Recent research activities in forecasting

with ANNs can be a promising alternatives to the traditional ARMA structure. Zhang

(2003) presented a hybrid ARMA and neural network model. Org et al (2005) worked on

model identification of ARIMA using genetic algorithms. Pai and Lin (2005) obtained

stock price forecasting using hybrid ARIMA and support vector machines model. With

hybridization of intelligent techniques such as ANNs, fuzzy systems and evolutionary

algorithms, one could expect a relatively better time series such as ANNs, fuzzy systems,

other intelligent systems prediction. Valenzuela et al (2008) exploited hybridization of

intelligent techniques and ARIMA models for time series prediction. A critical survey on


18/19

20

neural networks in business forecasting is self-explanatory to reflect modeling issues for

forecasting applications Zhang (2004).

1.6 Thesis Plan

With critical analysis and review on various time series modeling, simulation and prediction, we have been able to unravel the unattended areas of researches as

well as the areas which were overemphasized. It has been realized that statistical

techniques like ARMA, ARIMA, non-seasonal ARIMA and seasonal ARIMA

have limited capabilities when modeling time series data. Likewise, the regression

analysis time series modeling and simulation have enormous limitation. In such a

trivial situation, we shall generalize statistical techniques and accomplish

modeling of time series wind data.

We shall compare MTM ( Markov Transition Matrices) with stochastic timeseries models. On comparison of statistical and generalized techniques for

stochastic time series, we shall find very pertinent and useful results. The minor

statistical details are useful for deciphering proper stochastic time series such as

the comparison of MTM with ARMA as a simulator, suitability of short range

with large rang prediction, stochastic simulator in ARIMA and indeed the

heteroscedasticity /homoscedasticity tests in regression analysis time series partly

on some weather data.

We find the recent trends of modeling & simulation of time series only in feedforward back propagation neural network (FFBPNN), therefore, we shall attempt

FFBPNN on our data.

We shall apply singleton and non singleton type- 1 back propagation (BP)designed sixteen rule fuzzy logic system (FLS) on hourly averaged wind data,

which to our knowledge, nobody has ever attempted till todate.

We shall also use design free fuzzy logic and obtain prediction on wind data,which again to our use knowledge, has never been done on wind data till todate.

We shall perform Mackey Glass simulation on wind data. There are diverse categories of time series like neuro fuzzy logic Burney et al

(2006), Burney and Jilani (2007), second order modeling of fuzzy time series Tsai

&Wu (1999), multivariate fuzzy logic Jilani and Burney (2007), autoregressive


19/19

21

fuzzy logic Kezuhiro et al(1997), fuzzy predictor by extrapolating a time series

and parallel structure fuzzy system Kim et al (2001) which would, of course, have

extensive applications in business and trade related activities, risk assessments

and small scale weather or climate predictions.

ARMA-Stochastic Time Series Modeling

Documents

Transcript of ARMA-Stochastic Time Series Modeling