GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

13
JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION VOL. 37, NO.2 AMERICAN WATER RESOURCES ASSOCIATION APRIL 2001 GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING1 Soon Thiam Khu, Shie-Yui Liong, Wadan Babovic, Henrik Madsen, and Nitin Muttil2 ABSTRACT: Genetic programming (GP), a relatively new evolu- tionary technique, is demonstrated in this study to evolve codes for the solution of problems. First, a simple example in the area of symbolic regression is considered. GP is then applied to real-time runoff forecasting for the Orgeval catchment in France. In this study, GP functions as an error updating scheme to complement a rainfall-runoff model, MIKE 11/NAM. Hourly runoff forecasts of dif- ferent updating intervals are performed for forecast horizons of up to nine hours. The results show that the proposed updating scheme is able to predict the runoff quite accurately for all updating inter- vals considered and particularly for updating intervals not exceed- ing the time of concentration of the catchment. The results are also compared with those of an earlier study, by the World Meteorologi- cal Organization, in which autoregression and Kalman filter were used as the updating methods. Comparisons show that GP is a bet- ter updating tool for real-time flow forecasting. Another important finding from this study is that nondimensionalizing the variables enhances the symbolic regression process significantly. (KEY TERMS: genetic programming; evolutionary algorithms; rainfall-runoff; real-time forecasting; updating; regression.) INTRODUCTION One of the central challenges of computer science is to get a computer to perform a task without telling it how to do it. In hydrologic engineering, the challenge is to derive a model that relates two or more physical processes without knowing the actual mechanics of conversion. Genetic Programming (GP) addresses the first challenge by providing a method that automati- cally creates a working computer program from a high-level statement of the problem. GP achieves this automatic program discovery (also known as program synthesis or program induction) by genetically breed- ing a population of computer programs using princi- pies of Darwinian natural selection and biologically inspired operations. GP can also be applied to infer models in hydrologic engineering problems such as rainfall-runoff model- ing or runoff forecasting. In problems where complete understanding of the physical process is lacking or the process is too complicated to be modeled, GP may offer some assistance or insight. An application area of GP is real-time runoff forecasting. In real-time runoff forecasting for example, incorporating knowl- edge of prediction errors of the past forecast to fore- casting models of different horizons can greatly improve the models' performance. In runoff forecasting, information on the immediate past and current states of meteorological conditions and those of the catchment are essential to forecast the catchment's response for different forecast hori- zons. When applied in a real-time mode, it is neces- sary to modify or update the forecast based on current information such as observed discharges. There are four updating approaches (Refsgaard, 1997) that update either: (1) the input parameters, (2) the state variables, (3) the model parameters, or (4) the output variables. The most commonly used scheme is updat- ing the output variables or error correction. This approach is adopted in this study. iPaper No. 99178 of the Journal of the American Water Resources Association. Discussions are open until December 1, 2001. 2Respectively, Lecturer, Civil Engineering, School of Engineering and Computer Science, Harrison Bldg., University of Exeter, North Park Road, Exeter, United Kingdom EX4 4QF; Associate Professor, Department of Civil Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260; Principal Scientist and Research Engineer, DHI Water and Environment, Agern Alle 11, DK-2970 Hor- sholm, Denmark; and Research Scholar, Department of Civil Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singa- pore 119260 (E-MaiLfLiong: [email protected]). JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 439 JAWRA

Transcript of GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Page 1: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATIONVOL. 37, NO.2 AMERICAN WATER RESOURCES ASSOCIATION APRIL 2001

GENETIC PROGRAMMING AND ITS APPLICATIONIN REAL-TIME RUNOFF FORECASTING1

Soon Thiam Khu, Shie-Yui Liong, Wadan Babovic, Henrik Madsen, and Nitin Muttil2

ABSTRACT: Genetic programming (GP), a relatively new evolu-tionary technique, is demonstrated in this study to evolve codes forthe solution of problems. First, a simple example in the area ofsymbolic regression is considered. GP is then applied to real-timerunoff forecasting for the Orgeval catchment in France. In thisstudy, GP functions as an error updating scheme to complement arainfall-runoff model, MIKE 11/NAM. Hourly runoff forecasts of dif-ferent updating intervals are performed for forecast horizons of upto nine hours. The results show that the proposed updating schemeis able to predict the runoff quite accurately for all updating inter-vals considered and particularly for updating intervals not exceed-ing the time of concentration of the catchment. The results are alsocompared with those of an earlier study, by the World Meteorologi-cal Organization, in which autoregression and Kalman filter wereused as the updating methods. Comparisons show that GP is a bet-ter updating tool for real-time flow forecasting. Another importantfinding from this study is that nondimensionalizing the variablesenhances the symbolic regression process significantly.(KEY TERMS: genetic programming; evolutionary algorithms;rainfall-runoff; real-time forecasting; updating; regression.)

INTRODUCTION

One of the central challenges of computer science isto get a computer to perform a task without telling ithow to do it. In hydrologic engineering, the challengeis to derive a model that relates two or more physicalprocesses without knowing the actual mechanics ofconversion. Genetic Programming (GP) addresses thefirst challenge by providing a method that automati-cally creates a working computer program from ahigh-level statement of the problem. GP achieves this

automatic program discovery (also known as programsynthesis or program induction) by genetically breed-ing a population of computer programs using princi-pies of Darwinian natural selection and biologicallyinspired operations.

GP can also be applied to infer models in hydrologicengineering problems such as rainfall-runoff model-ing or runoff forecasting. In problems where completeunderstanding of the physical process is lacking orthe process is too complicated to be modeled, GP mayoffer some assistance or insight. An application areaof GP is real-time runoff forecasting. In real-timerunoff forecasting for example, incorporating knowl-edge of prediction errors of the past forecast to fore-casting models of different horizons can greatlyimprove the models' performance.

In runoff forecasting, information on the immediatepast and current states of meteorological conditionsand those of the catchment are essential to forecastthe catchment's response for different forecast hori-zons. When applied in a real-time mode, it is neces-sary to modify or update the forecast based on currentinformation such as observed discharges. There arefour updating approaches (Refsgaard, 1997) thatupdate either: (1) the input parameters, (2) the statevariables, (3) the model parameters, or (4) the outputvariables. The most commonly used scheme is updat-ing the output variables or error correction. Thisapproach is adopted in this study.

iPaper No. 99178 of the Journal of the American Water Resources Association. Discussions are open until December 1, 2001.2Respectively, Lecturer, Civil Engineering, School of Engineering and Computer Science, Harrison Bldg., University of Exeter, North Park

Road, Exeter, United Kingdom EX4 4QF; Associate Professor, Department of Civil Engineering, National University of Singapore, 10 KentRidge Crescent, Singapore 119260; Principal Scientist and Research Engineer, DHI Water and Environment, Agern Alle 11, DK-2970 Hor-sholm, Denmark; and Research Scholar, Department of Civil Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singa-pore 119260 (E-MaiLfLiong: [email protected]).

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 439 JAWRA

Page 2: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

GENETIC PROGRAMMING

Genetic Programming (GP) is a relatively newdomain-independent method for evolving computerprograms to solve, or approximately solve, problems(Koza, 1992). In engineering applications, GP is fre-quently applied to model structure identificationproblems. In such applications, GP is used to infer theunderlying structure of either a natural or experimen-tal process in order to model the process numerically.GP inferred models have the advantages of generat-ing simple parsimonious expressions and offeringsome possible interpretations to the underlying pro-cess.

GP began as an attempt to discover how computerscould learn to solve problems without being explicitlyprogrammed to do so. One successful application ofGP in automatic program discovery is that of symbolicregression, instead of the traditional numericalregression. This makes the application of GP evenmore relevant in fields where large amounts of dataare accumulating in machine-readable form. Forexample, GP has been applied to predict chaoticfinancial time series (Oakley and Howard, 1994); topredict occurrence of sunspots (Lee and Suzuki,1995); to solve various hydraulics problems, such asrainfall-runoff relationship from synthetic data,sediment transport modeling, salt intrusion in estuar-ies and flow over a flexible vegetated bed (Babovic,1996; Babovic and Abbott, 1997b); and to emulate therainfall-runoff process (Whigham and Crapper, 1999).

GP belongs to a class of probabilistic search proce-dures known as Evolutionary Algorithms (EAs) whichincludes Genetic Algorithms (GA) (Holland, 1975),Evolutionary Programming (EP) (Fogel et al., 1966)and Evolutionary Strategy (ES) (Schwefel, 1981).These techniques use computational models of natu-ral evolutionary process for the development of com-puter based problem-solving systems. All evolutionaryalgorithms function by simulating the evolution ofindividual structures via processes of reproductivevariation and fitness based selection. The techniqueshave become extremely popular due to their successat searching complex nonlinear spaces and theirrobustness in practical applications.

Basic Principles of Genetic Symbolic Regression

Genetic Symbolic Regression (GSR) is a specialapplication of GP in the field of symbolic regression.In traditional numerical regression, one predeter-mines the functional form, either linear, polynomial,

or nonlinear, and the task is to determine the coeffi-cients. In symbolic regression, the task is to both finda suitable functional form and determine the coeffi-cients. Hence, GSR involves finding a mathematicalexpression, in symbolic form, relating a finite sampleof values of a set of independent variables (x1) and aset of dependent variables (y,).

GSR can be viewed as an extension of Genetic Algo-rithm (GA) in terms of the basic principles of opera-tions. Like GA, GSR works with a number of solutionsets, known collectively as a population, rather than asingle solution at any one time. With a large numberof solution sets, it gives both techniques the advan-tage of avoiding the possibility of getting trapped inthe local optima. There are, however, two major dif-ferences between GP and GA. They are:

1. GSR works with two sets of variables, instead ofone set of variables as in GA. One set of variables,known as the terminal set, contains the independentvariables and constants, {x}, similar to GA. The otherset, known as the functional set, contains the basicoperators used to form the function, f( ). For example,the function set may contain the following operators

-, , I, ", log, sin, tanh, exp I depending on theperceived degree of complexity of the regression.Thus, the symbolic regression is performed usingthese two variable sets and it is possible to derive alarge number of possible functional relationships to fitthe data.

2. In most EAs, the length of the solution set isnormally fixed. In GP, however, the length is allowedto vary from one solution set to another. This varia-tion in length is due to the two genetic operators,crossover and mutation. The flexibility in the struc-ture length increases the search space significantly.

The solution sets in each iteration are collectivelyknown as a generation. In GPs, the size of a popula-tion does not have to be the same from one generationto the next. The solutions of the very first generationare usually generated through a random process.However, those of the subsequent generations aregenerated through genetic operations. Each possiblesolution set can be represented and visualized ineither parse tree form or in Polish notation(Lukasiewicz, 1957) as shown in Figure 1. As the pop-ulation evolves, new solution sets replace the olderones and are supposed to perform better. The solutionsets in a population associated with the best fit indi-viduals will, on average, be reproduced more oftenthan the less-fit solution sets. This is known as theDarwinian principle of the "survival of the fittest."

JAWRA 440 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Page 3: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Genetic Programming and Its Application in Real-Time Runoff Forecasting

(i) A simple expression: a + (b * c)

(ii) as Polish notation (prefix): + * b c a

(iii) as reverse Polish notation (postfix): b c * a +

or a b c * +

+

/ \a

I \b C

(iv) as Parse tree:

Figure 1. Different Forms of Representationin Genetic Programming.

The basic procedure of GP, Figure 2, can bedescribed as follows:

1. Generate the set of initial population.2. Evaluate each parse tree and assign the fitness.3. Form the temporary population by selecting

candidates according to their fitness. This tem-porary population is called the mating pool.Candidates with higher fitness are givengreater probabilities to mate, to produce childrenor offspring.

4. Choose pairs of parse trees from the temporarymating pool randomly for mating and apply thegenetic operator called crossover. Crossover isthe exchange of genetic material (such as fitness,composition) between two selected candidates;

5. Select a crossover site where the material willbe exchanged randomly, thereby resulting inthe creation of offspring.

6. Apply another genetic operator known as muta-tion which randomly changes the genetic infor-mation of the candidate;

7. Copy the resultant chromosomes into the newpopulation.

8. Evaluate the performance of the new population.9. Repeat steps 3-8 until a predetermined criterion

is reached.

Figure 2. Basic Procedure of Genetic Programming.

Selection, Crossover and Mutation

Selection is the process of altering the fitness of theindividual with respect to the whole generation'saverage performance. This is an important stepbecause it determines directly the individual'schances of its representation in the next generation. Acommon selection method is fitness ranking. Individu-als are sorted according to the fitness values andranked accordingly. Another selection method is thetournament selection. This method fills the matingpooi without the need of fitness mapping. Pairs ofindividuals are picked at random from the population.The individual with a higher fitness value is copiedinto the mating pool directly. This is repeated untilthe pooi is filled.

Crossover is the first process of producing new indi-viduals from the selected individuals in the matingpool. It takes two individuals and prunes their branch

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 441 JAWRA

Page 4: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

at some randomly chosen position, into two segments(Figure 3a). Exchanging the segments produces twonew possible solution sets (Figure 3b). The two newsolution sets or children inherit some characteristicsfrom their parents and genetic information is therebyexchanged in the process. As the process continues,the fitness of the whole population increases and con-verges to finding the near optimal solution set.

(a)

(b)

Figure 3. Crossover in Genetic Programming.

Mutation is the random alteration of the individualparse tree at the branch or node level (Figure 4).Mutation is a mechanism that perturbs the parse treestructure. It does not usually change the tree struc-ture but the information content in the parse tree.Therefore, it explores a new domain and serves to freethe search from the possibility of being trapped inlocal optima. It should be noted that mutation can bedestructive, causing rapid degradation of relatively"fit" solution sets if the probability of mutation is set

too high. Depending on the strategy adopted, muta-tion rate can be low according to GA-type mutation orhigh according to ES-type mutation.

Figure 4. Mutation in Genetic Programming.

The genetic programming introduced here is one ofthe simplest forms available. For a rather completepresentation of GP, refer to Babovic and Abbott(1997a) or Babovic and Keijzer (2000). Instead of Pol-ish notations, GP solution sets can also be represent-ed in other forms such as: direct acylic graphs(Handley, 1994), linear representation (Perkis, 1994),and direct graphs (Poli, 1996). The evolution process-es can also vary such as using automatically definedfunctions (Koza, 1994), and adaptive representationthrough learning (Rosca and Ballard, 1996). Thesedetails can be found, for example, in Koza (1992,1994), Kinnear (1994), Angeline and Kinnear (1996),and Langdon (1998).

AN EXAMPLE OF GENETICSYMBOLIC REGRESSION

An example is shown here to illustrate the conceptof GSR. The problem of interest is to infer theBernoulli equation for a steady, one-dimensional fluidflow:

E = z + + — = consty 2g

(1)

where z = vertical distance above a datum (m); ppressure (N/m2); v = velocity (mis); g = Earth's gravi-tational acceleration (9.81 m/s2); y = specific gravity ofwater (9810 N/m3)

From Equation (1), 1000 sets of different combina-tions of z, p, and v are generated using a standardrandom number generator. The values of the energyhead, E, are then computed correspondingly. It should

JAWRA 442 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Parent 1 Parent 2

*

a

be

a a ( //\ /\b c b c

a+(b*c) a+(blc)

/\ \b d

Direct algebraic form:

a+(b*c) (b+a)*((a*c)_d)

Child 1 Child 2

+ */\ ZNa

db a b c

acDirect algebraic form:

a+((a*c)d) (b+a)*b*c

Page 5: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Genetic Programming and Its Application in Real-Time Runoff Forecasting

be noted that this example was used in Keijzer andBabovic (1999), where problems involving variables ofdifferent dimensions are discussed. For example, ifvariable z were to be multiplied with variable v, thenthe resulting expression has a dimension of square-of-length/time which is inconsistent with the dimensionof variable E (length). In their paper, such occur-rences of dimensional inconsistency are penalized, toensure dimensional consistency of the resulting GPexpression.

In this study, however, to circumvent the problemof dimensional inconsistency, each of the values of z,p, and u is normalized or nondimensionalized byusing each variable's maximum value. By doing so,Equation (1) is then transformed to a new form:

or

EmaxE = Zma + PmaxP + Vax2 = constI 2g

The terminal set used is the set of nondimensional-ized variables {i, ji, } while the functional set is

-, I, '}. Crossover is performed by randomlychoosing the subtree insertion location. The objectivefunction is to find the minimum of the root meansquare error (RMSE) of the predicted energy head.The other GP relevant parameters and their valuesare shown in Table 1. More details of the parameterscan be found in Keijzer and Babovic (1999). Thegenetic programming software used in this study isGPkernel developed at the Danish Hydraulic Insti-tute. The initial population is generated with use of arandom number generator. The size of the tree of theinitial population is constrained to a maximum of 15levels and the subsequent tree size is constrained to45 levels. This restriction is necessary since GP hasthe tendency to infer a Fourier expansion-type func-

(2) tion if the tree size is not limited. This type of expan-sion, although it may fit the data very well, does notadd value in the function interpretation.

Parameter Value

Size of Parent 1000

Size of Children 1000

Thurnament Size 3

Crossover Rate 1.0

Mutation Rate 0.3

Maximum Initial Tree Size 15

Maximum Tree Size 45

LIU =—Umax

and the coefficients are:

c1 = Zmax

max

Pmax

I.Emax

UmaxC3 = 2g.E

(4d) Ten different runs are performed, each using a dif-ferent seed, to generate the random numbers. Eachtime GPkernel is run for 15 minutes. Figure 5 showsthe average root mean square error (RMSE) of eachgeneration in the GSR runs. It should be noted that

(5a) the average RMSE, for all 10 runs, decreases rapidlyas the generation progresses and reached values nearto zero for most of the runs. As a result, exact formu-

(5b)lae (up to three significant figures) are produced fromeach run.

The above simple example illustrates the capabilityof GP, or the GSR technique, to infer the correct func-

(5c) tional relationship when there are no errors in theraw data. The use of GSR as a new updating proce-dure combined with rainfall-runoff simulation modelsis considered next.

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 443 JAWRA

E=C1+C2+C3i52 =const (3)

where the nondimensionalized variables are given by:

E(4a)

Emax

zz= (4b)Zmax

p= (4c)Pmax

TABLE 1. Genetic Programming ParametersUsed in Bernoulli Equation Example.

Page 6: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

Figure 5. Rapid Convergence of GSRfor Bernoulli Equation Example.

GP APPLICATION IN REAL-TIMERUNOFF FORECASTING

A forecasting system is a system that takes infor-mation on the past and current states of meteorologi-cal conditions and those of the catchment as inputs toit and forecasts the catchment's response into thefuture. In real-time forecasting, however, the originalforecast values may be updated or modified as mea-sured data become available and, thus, predictionerrors can be determined and used to improve fore-casting. In real-time runoff forecasting with rainfall-runoff simulation models, forecasted rainfall timeseries up to the desired runoff forecast horizon mustbe available. The required rainfall time series withinthe runoff forecast horizon may be estimated with, forexample, a nonlinear prediction method. In this study,the measured rainfall time series, at any runoff fore-cast horizon, is made available to evaluate the perfor-mance of the proposed GSR based error updatingscheme.

The focus of this study is to: (1) compare the fore-casts of a calibrated rainfall-runoff model (e.g.,MIKE li/NAM) with and without the GSR based errorupdating scheme; and (2) suggest how far in thefuture (i.e., the maximum forecast horizon) the GSR-based error updating scheme can be used with confi-dence.

The catchment simulation model used in this studyNAM is a lumped, conceptual rainfall-runoff modelwhich forms part of the MIKE 11 river modelingsystem (Havno et al., 1995). The NAM model hasbeen developed by the Hydrological Section of theInstitute of Hydrodynamics and Hydraulic Engineer-ing at the Technical University of Denmark (Nielsenand Hansen, 1973). The model can be defined as a

deterministic, conceptual, lumped type model withmoderate input data requirement. The MIKE 11/NAMmodel consists of a set of linked mathematical state-ments describing, in a simplified quantitative form,the behavior of the land phase of the hydrologicalcycle. The input data requirements are the catchmentsize, precipitation, potential evapotranspiration, andtemperature (if snow modeling is included). It oper-ates by continuously accounting for the moisture con-tent in the snow, surface, subsurface and groundwater storages. The model can be calibrated usinghistorical data by adjusting the model parameters. Inthe present application, the eight most importantmodel parameters were calibrated.

Figure 6 shows the schematic diagram of the pro-posed error updating using GSR. The MIKE il/NAMmodel is first used to simulate the discharge, QSIM,for the whole period of interest based on the meteoro-logical data. The proposed procedure is then used tocompute the prediction error, e, by comparing the sim-ulated discharge, QSIM, with the observed discharge,QOBS, for time, t. The new simulated or improveddischarge, QIMPt, is computed by adjusting QSIM foreach forecast lead-time within the forecast horizon.

Output:Improved Runoff

(QIMP)

JAWRA 444 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

0.9

ci)

p0.50.4

0.30.2

0 20 40 60 80 100

Generation

Rainfall-runoffSimulation Model(MIKE! 1/NAM)

Output:Simulated

Runoff(QSIM)

MeteorologicalData such as Rainfall

Observed

$f(QOBS)Updating Procedure(Genetic Symbolic

Regression)

Figure 6. Schematic Diagram of Updating Procedure.

Page 7: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Genetic Programming and Its Application in Real-Time Runoff Forecasting

Mathematically, the measured discharge, QOBS,at time t, can be expressed as:

or

QOBS = QSIM +

= QOBS - QSIM

(6a)

(6b)

GSR is used to infer the functional relationship, F( ),between the simulated discharges and the past andpresent simulation errors. For a lead time of one hour,the functional relationship for GSR prediction error,

, may be expressed as follows:

= F{QSIM1, QSIM, ... QSIM Et.1 Et4}

and the forecast improved discharge, QIMP+1, can beobtained from:

QIMP+1, QSIM+1 +

For lead time of 2, 3,..., a hours, the recursive form ofEquation (7) can be written as:

= F{QSIM ,..., QSIM4, t+a-1 '"S' t-s-a-4}

QIMP = QSIM +

QSIM and E of the immediate past five time steps areincluded in the functional set since the catchment'stime of concentration varies up to a maximum of fivehours (i.e., five time steps) (WMO, 1992).

It should be noted that the values of in Equa-tion (8a) may be either the actual errors at instanceswhen measured data are available or GSR derivederrors.

The real-time flood forecasting with updating pro-cedure for a one-hour lead-time can be summarized asfollows:

1. The NAM model is first calibrated using anautomatic calibration routine [e.g., Accelerated Con-vergence Genetic Algorithm (ACGA)] (Liong et al.,1998) on the entire runoff period from 1972-1974;

2. The prediction errors, denoted by c, between theNAM simulated and observed runoff for each timeinterval, are computed.

3. Ten storm events, representing high flowregimes with minimum discharge of 4 m3/s, areselected from the calibration period, for the symbolicregression using GP. This minimum discharge criten-

on is also used in the selection of the verification dataset.

4. GSR is then used to derive the functional rela-tionship between the present prediction error, , andthe NAM simulated discharge, QSIM, and the pastprediction errors, , as given in Equation (7a).

5. The improved simulated discharge, QIMP, isfinally calculated, using Equation (7b) and comparedwith the measured discharge.

In this study, the catchment under consideration isthe Orgeval catchment in France (Figure 7), whichhas been studied extensively in the World Meteorolog-ical Organization's intercomparison project (WMO,

(7a) 1992). The catchment is located about 80 km east ofParis and the main river that drains the catchmentrunoff is the Orgeval. The catchment has an area ofabout 104 km2. The catchment comprises mainlyrural area, with only 1 percent of the total area urban

(7b) and 18 percent of the total area forest.

Figure 7. Location of the Orgeval Catchment.

(8a)

(8b)

In this study, ten storm events (denoted as stormsSi-Sb) from a 1972-1974 hourly flow record wereselected for training the GSR (Figure 8a) while a totalof six storm events (denoted as storms S1i-S16)between 1979-1980 (Figure 8b) were selected andused for the verification of the updating procedure.Figures 9a and 9b show the observed and NAM simu-lated hydrographs for two of the selected GSR train-ing events. These ten storm events represent highflow regimes that are the focus of GSR training in thisstudy. It should be noted that the maximum peak dis-charge of the GSR training storms is 7.38 m3/s while

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 445 JAWRA

Page 8: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

(a)

(b)

Figure 8. Hydrographs of (a) Training and (b) Verification Storm Events.

those of the verification storm events range from 10m3/s to 29 m3/s.

Following the example in the previous section, boththe dependent variable, , and the independentvariables, QSIM and E, are nondimensionalized using

their respective maximum values. Therefore, theterminal set in GSR for the one-hour lead time,for example, is given by the normalized values

and the

JAWRA 446 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

8

7

6

C,,E41)4

C)(0

2

020/07/72 05/02/73 24/08/73

Time12/03/74 28/09/74 16/04/75

Validation data

Storm SI 1S13

S12

S14

35

30

25Cd,

0C)

I:11/06/78 02/04/79 05/05/79 08/03/79 11/01/79 01/30/80 04/29/80 07/28/80

S16

1

Threshold for storm event

Time

Page 9: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Genetic Programming and Its Application in Real-Time Runoff Forecasting

functional set is given by the basic algebraic operators-, , /1. Henceforth, all variables used in the study

are normalized variables and the bar sign on eachvariable is therefore suspended. The objective func-tion searches the minimum of the root mean squareerror (RMSE) of the predicted error, . In this study,the GP program, GPKernel, ran for 30 minutes on aPentium II 300 PC. The other GP relevant parame-ters and their values are shown in Table 2. The popu-lation size of the parent and children are both set at3000. It is to be noted that, from various runsattempted, it was difficult to achieve good predictionaccuracy with a smaller population size.

Parameter Value

Size of Parent 3000

Size of Children 3000

Tournament Size 3

Crossover Rate 1.0

Mutation Rate 0.3

Maximum Initial Tree Size 15

Maximum Tree Size 30

Discussion and Analysis of Results

The best functional form, with the minimumRMSE, resulting from GSR is as follows:

or

= 0.009 + 1.61lc —O.644ci+0.087s(QSIM_2 —QSIM_i) (9a)

QIMP+1 = QSIM+1 + 0.009 + 1.6k1 —O.644g+0.087Et(QSIMt_2 —QSIM_i) (9b)

Equation (9a) shows a certain degree of similaritywith the commonly used autoregressive form with theexception of the fourth term, an interaction termbetween the simulated discharges and a predictionerror. This interaction term is significant in rectifyingunderprediction or overprediction trends of the simu-lation model. Equation (9b) also shows that only sim-ulated discharges andlor prediction errors of up tothree previous time-steps are important. This impliesthat, for data used to derive the above functional rela-tionship, the catchment's time of concentration maybeabout three hours.

Table 3 shows the root mean square errors (RMSE)of the various prediction horizons for each of the veri-fication storm events. From this table, it can be seenthat the RMSE of each event is relatively better or ofthe same order of magnitude as that of the simulationmodel (NAM). Up to a four-hour lead time for allstorm events considered, except for storm event Sil,the RMSE values of NAM+GSR are categorically bet-ter than those resulting from NAM only. Thus, theproposed updating GSR can be used up to a lead timeof four hours with high confidence. Figures 10 and 11show the performance of the proposed procedurewith different updating frequencies of two hours, four

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 447 JAWRA

TABLE 1. Genetic Programming Parameters Usedin Real-Time Runoff Forecasting Example.

8

7

6

5

4

312/02173 12:00 13/02/73 00:00 13/02/73 12:00 14102/73 00:00 14/02/73 12:00

Time

(a)

E

U

8

7

6

5

4

320/03/74 00:00

(b)

Figure 9. Comparison of Observed and Simulated Hydrographsfor Two Training Storm Events: (a) S2 and (b) S7.

21/03/74 00:00 22/03/74 00:00 23/03/74 00:00

Time

Page 10: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

TABLE 3. Root Mean Square Error of Testing Storms for Different Prediction Lead-Times.

Lead-Time(hours)

Average RMSE

Storm511

StormS12

StormS13

StormS14

StormS15

StormS16

Averagefor SixStorms

1 0.248 0.163 0.188 0.679 0.635 0.596 0.418

(NAM) (0.878) (2.675) (1.810) (4.515) (3.199) (1.657) (2.456)

2 0.561 0.409 0.377 1.624 1.190 0.960 0.853

3 0.847 0.679 0.607 2.609 1.546 1.274 1.260

4 1.046 0.943 0.838 3.557 1.749 1.554 1.614

5 1.116 1.190 1.072 4.373 1.798 1.746 1.882

6 1.075 1.415 1.270 4.991 1.821 1.874 2.074

7 1.024 1.617 1.440 5.274 1.971 1.903 2.205

8 0.994 1.789 1.573 5.331 2.251 1.805 2.290

9 0.991 1.923 1.686 5.641 2.504 1.656 2.400

hours and six hours for two verification storm events.It shows clearly that the performance of the GSRerror updating method is acceptable for all the updat-ing frequencies.

The World Meteorological Organization (WMO)conducted a workshop in 1988 and published a reportentitled "Simulated Real-Time Inter-Comparison ofHydrological Models" (WMO, 1992). The WIVIO studycompared the performances of 14 different simulationmodels and updating procedures. The study foundthat the error updating procedure NAMS11 (Havnoet al., 1995) and the state updating procedureNAMKAL (Storm et al., 1988) yielded best perfor-mance on the French Orgeval catchment. Briefly,NAMS11 applies: (1) NAM rainfall-runoff simulationmodel and the MIKE11 hydrodynamic module; and(2) an error correction technique based on a first orderautoregressive model. NAMKAL is a modified NAMmodel, reformulated in state space form and updatedwith an extended Kalman filtering algorithm.

In the WMO study, forecasts were updated at everyfourth hour. Thus, their results are now comparedwith this study's proposed GSR based updatingscheme of four-hour runoff forecast horizon. Thechoice of the updating interval coincides with the ear-lier drawn conclusion from Table 3. Figures 12(a) and12(b) show the average RMSE, of two verificationstorm events (storms S12 and S15) resulting fromvarious updating schemes, NAM+GSR, NAMS11 andNAMKAL. These two storm events are the same asthose chosen in Figures 10 and 11. Figure 12(a) showsthat for storm event S12, the proposed NAM+GSRperforms better in the first three hours than NAMS11

and NAMKAL, while on the fourth hour they all per-form equally. Figure 12(b), however, shows clearlythat for storm event S15, the NAM÷GSR is categori-cally better than the two other techniques.

CONCLUSIONS

A relatively new evolutionary technique, known asgenetic programming (GP) has been introduced. GPwas used to evolve codes for the solution of problems.A simple example of the Bernoulli equation was usedto illustrate how GP symbolically regresses or infersthe relationship between the input and output vari-ables. An important conclusion from this study is thatnon-dimensionalizing the variables prior to symbolicregression process significantly enhance the successof GSR.

GP was then applied to the problem of real-timerunoff forecasting for the Orgeval catchment inFrance. GP functions as an error updating procedurecomplementing the rainfall-runoff model, MIKE11/NAM. Ten storm events were used to infer the rela-tionship between the NAM simulated runoff and thecorresponding prediction error. That relationship wassubsequently used for real-time forecasting of sixstorm events.

The results indicate that the proposed methodologyis able to forecast different storm events with greataccuracy for different updating intervals. The forecasthydrograph performs well even for a long forecasthorizon of up to nine hours. However, for practicalapplications in real-time runoff forecasting, the

JAWRA 448 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Page 11: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

a,1)Ea,a 2''5.5• a

2' 0(ICUa,0

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 449 JAWRA

Genetic Programming and Its Application in Real-Time Runoff Forecasting

20

15

t0

Ea,

10a,CtI00

001/0211912:00 02/02/79 00:00 02/02/79 12:00 03/02/79 00:00 03/02/79 12:00

Time

a,

E

a,2'a,CUa,

0

0,C.,

E

a,2'CaC

0

(a)

013/07/80 12:00 14/07/8000:00 14/07/80 12:00 15/07/80 00:00

Time

(a)

18

16

14

12

I

001/02/7912:00 02/0211900:00 02102/79 12:00 03/02/7900:00 03/02/7912:00

Time

013/07/80 12:00

(b)

14/07/80 00:00 14/07/80 12:00Time

(b)

15/07/80 00:00

15

10

5

001102/7912:00 02/02/79 00:00 02/02179 12:00 03/02/79 00:00 03/02/79 12:00

Time

(c)

Figure 10. Updating Every a Hours and Forecasting Upto Six Hours for Verification Storm Event S12:

(a) a = Two Hours; (b) a = Four Hours; and (c) a = Six Hours.

14/07/8000:00Time

15/07/80 00:00

(c)

Figure 11. Updating Every a Hours and Forecasting Upto Six Hours for Verification Storm Event S15:

(a) a =Two Hours; (b) a = FourHours; and (c) a = Six Hours.

Page 12: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Khu, Liong, Babovic, Madsen, and Muttil

O NAM without updating

0 £GP+GSR

XNAMS11

0S

R

ONAMKAL

o i 2 3 4

Forecast lead time thrsl

(a)

aing°NAM without

£GP+GSR

XNAMS11

0NAML

x

x 0

a 0£

x 0

0 AA

0 1 2 3

Figure 12. Average RMSE of Two VerificationStorm Events: (a) S12 and (b) S15.

ACKNOWLEDGMENTS

This work was jointly sponsored by the National University ofSingapore (under research project RP 3972705) and DanishHydraulic Institute [under the talent project No. 9800463, Data toKnowledge (D2K), funded by the Danish Technical Research Coun-cil (STVF)]. The software tool used in this study, GPkernel, wasdeveloped as a part of the D2K effort and is available athttp://www.d2k.dk. Part of the work was conducted by the firstauthor during his study leave at Danish Hydraulic Institute (DHI).

LITERATURE CITED

Angeline, P. J. and K. E. Kinnear, 1996. Advances in Genetic Pro-gramming 2. MIT Press, Cambridge, Massachusetts.

Babovic, V., 1996. Emergence, Evolution, Intelligence; Hydroinfor-matics. Balkema Publishers, Rotterdam.

Babovic, V. and M. B. Abbott, 1997a. The Evolution of EquationsFrom Hydraulic Data, Part I: Theory. Journal of HydraulicResearch 35(3).

Babovic, V. and M. B. Abbott, 1997b. The Evolution of EquationsFrom Hydraulic Data, Part II: Applications. Journal ofHydraulic Research 35(3):411-430.

Babovic, V. and M. Keijzer, 2000. Genetic Programming as a ModelInduction Engine. Journal of Hydroinformatics 2(1).

Fogel, L. J., A. J. Owens, and M. J. Walsh, 1966. Artificial Intelli-gence Through Simulated Evolution. John Wiley, New York,New York.

Handley, S., 1994. On the Use of a Directed Acycic Graph to Repre-sent a Population of Computer Programs. In: Proceedings of the1994 IEEE World Congress on Computational Intelligence.IEEE Press, pp. 154-159.

Havno, K., M. N. Madsen, and J. Dorje, 1995. MIKE11 — A Gener-alised River Modelling Package. In: Computer Models of Water-shed Hydrology, V. P. Singh (Editor). Water ResourcesPublications, pp. 733-782.

Holland, J. H., 1975. Adaptation in Natural and Artificial Systems.University of Michigan Press, Ann Arbor, Michigan.

Keijzer, M. and V. Babovic, 1999. Dimensionally Aware Genetic Pro-gramming. In: Proceedings of the Genetic and EvolutionaryComputation Conference, GECCO-99, W. B. Daida, A. E. Eiben,M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, (Edi-tors). Morgan Kaufmann, pp. 1065-1076.

Kinnear, K E., 1994. Advances in Genetic Programming. The MITPress, Cambridge, Massachusetts.

Koza, J. R., 1992. Genetic Programming: On the Programming ofComputers by Means of Natural Selection. The MIT Press, Cam-bridge, Massachusetts.

Koza, J. R., 1994. Genetic Programming 2: Automatic Discovery ofReusable Programs. The MIT Press, Cambridge, Massachusetts.

Langdon, W. B., 1998. Genetic Programming and Data Structures.Kluwer Academic Publishers, Norwell, Massachusetts.

Lee, G. Y. and A. Suzuki, 1995. Genetic Programming Approach forTime Series Analysis and Prediction. Journal of GraduateSchool and Faculty of Engineering, University of Tokyo (B)43(2):201-220.

5 Liong, S. Y., S. T. Khu, and W. T. Chan, 1998. Derivation of ParetoFront With Accelerated Convergence Genetic Algorithm, ACGA.In: Proceedings of the Third International Conference onHydroinformatics, V. Babovic and L. C. Larsen (Editors). Volume2, pp. 889-897.

Lukasiewicz, J., 1957. Aristotle's Syllogistic From the Standpoint ofModern Formal Logic. Clarendon Press, Oxford, United King-dom.

Nielsen, S. A. and E. Hansen, 1973. Numerical Simulation of Rain-fall Runoff Process on a Daily Basis. Nordic Hydrology 4:171-190.

Oakley, N. and E. Howard, 1994. The Application of Genetic Pro-gramming to the Investigation of Short, Noisy, Chaotic DataSeries. In: Evolutionary Programming, Lecture Notes in Com-puter Sciences, T. C. Fogarty (Editor). No. 865, Springer-Verlag,pp. 320-332.

Perkis, T., 1994. Stack-Based Genetic Programming. In: Proceed-ings of the 1994 IEEE World Congress on Computational Intelli-gence, Vol. 1, IEEE Press, pp. 148-153.

Poli, R., 1996. Parallel Distributed Genetic Programming. TechnicalReport CSRP-96-15, School of Computer Science, University ofBirmingham, United Kingdom.

JAWRA 450 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

updating interval should be less than or equal to thetime of concentration of the catchment. The resultswere also compared with two known updating meth-ods such as the auto-regression and Kalman filter.Comparisons show that the proposed scheme,NAM÷GSR, is comparable to these methods for real-time runoff forecasting.

3.0

2.5

2.0

E1.5

(0

1.0

0.5

0.0

5.0

4.0

3.0Ew(I)

2.0

1.0

0.04

Forecast lead time (hrs]

(b)

Page 13: GENETIC PROGRAMMING AND ITS APPLICATION IN REAL-TIME RUNOFF FORECASTING

Genetic Programming and Its Application in Real-Time Runoff Forecasting

Refsgaard, J. C., 1997. Validation and Intercomparison of DifferentUpdating Procedures for Real-Time Forecasting. Nordic Hydrol-ogy 28:65-84.

Rosca, J. P. and D. H. Ballard, 1996. Discovery of Subroutines inGenetic Programming. In: Advances in Genetic Programming 2,P. J. Angeline and K. E. Kinnear (Editors). The MIT Press,Cambridge, Massachusetts, pp. 177-202.

Schwefel, H. P., 1981. Numerical Optimization of Computer Models.John Wiley, Chichester, United Kingdom.

Storm, B., K. H. Jensen, and J. C. Refsgaard, 1988. Estimation ofCatchment Rainfall Uncertainty and Its Influence on RunoffPrediction. Nordic Hydrology 19:77-88.

Whigham, P. A. and P. F. Crapper, 1999. Modelling Rainfall-RunoffRelationships Using Genetic Programming. Special Issue ofJournal of Mathematical and Computer Modelling (in press).

WMO, 1992. Simulated Real-Time Inter-Comparison of Hydrologi-cal Models. WMO Operational Hydrology Report No. 38, WMONo. 779. World Meteorological Organization, Geneva, Switzer-land.

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 451 JAWRA