Cancun Arima

10
Time Series Analysis of Water Quality Parameters in an Estuary using Box-Jenkins ARIMA Models and Cross-Correlation Techniques Hongbing Sun and Manfred Koch Department of Geology, Temple University, Philadelphia, PA, 19122 The Apalachicola Bay is one of the most productive estuaries in Florida. Variations of salinity in the bay directly influence the aquatic habitats and their productivity. Multivariates which appear to affect the salinity include tidal elevation, wind and current velocity, precipitation and discharge of the Apalachicola River. ARIMA and dynamic regression transfer models using the Box-Jenkins methodology and crosscorrelation techniques are employed to analyze the various data time series. The rational distributed lag transfer functions between hourly variations of tidal water levels and salinity allow the forecasting of short-term fluctuations of the salinity, whereas the multivariate correlation analyses of the daily salinity with the river discharge, the wind stresses, the water levels and currents and the precipitation are shedding light on the important control variables. Several interesting conclusions as to the hydrodynamics and the water quality of the bay can be drawn from the identification of auto- and crosscorrelations and the appropriate ARIMA models. Fluctuations of tidal water levels result only in short-term periodical variations of salinity, with a linear transfer function that has a lag-two as the highest coefficient. The crosscorrelation analysis shows that the Apalachicola River, being the major fresh water source of the bay, affects strongly the currents and the salinity in the bay area over the long term. Though regional precipitation controls the amount of river discharge and groundwater seepage, its effect on the daily variations of salinity is statistically insignificant, in contrast to daily wind stress. Salinity is positively correlated with western currents since most of the oceanic flow enters the bay from the east. A lag between the daily discharge and salinity indicates that up to a week is required for the peak of the inflow fresh water to mix or flush through the exit of the bay. 1. Introduction The Apalachicola Bay is located in northwest Florida, adjacent to the Gulf of Mexico (Fig. 1), and is one of the most productive estuaries both in Florida and the entire northern hemisphere. It yields 90 percent of the oysters consumed in Florida and 10 percent of those consumed in the United States (Johnson, 1993). Variations of salinity in the bay area directly control seafood production, as well as the overall ecological system of the bay, including peripheral marsh zones which serve as important fish nursery areas. The main factors controlling the variations in salinity are the discharge of the Apalachicola River, the seawater level, the regional precipitation, the activities of winds and gulf currents, and to a lesser extent, the surface water runoff and the groundwater recharge. The critical salinity concentrations that arise from these mixing processes create a unique habitat environment for oysters to grow in the Apalachicola Bay. Minor seasonal variations of the above control variables are often sufficient to destabilize temporarily the balance of the bay ecosystem in a detrimental manner (Johnson, 1993). Employing methods of structural univariate time series analysis within the framework of the Box-Jenkins formalism (Box and Jenkins, 1976), that includes the use of linear transfer function models, and simple correlation analysis, the present paper attempts to quantify some of the interplaying variables that affect the short- and long-term variations of the salinity of the bay. An extension of the correlation approach to a full multivariate analysis of the control variables and long-term forecasting of the salinity variations using a Kalman filter is carried out by Sun et al. (1996).

description

Cancun_Arima.pd

Transcript of Cancun Arima

Page 1: Cancun Arima

Time Series Analysis of Water Quality Parameters in an Estuary using Box-Jenkins ARIMA Models and Cross-Correlation Techniques

Hongbing Sun and Manfred Koch Department of Geology, Temple University, Philadelphia, PA, 19122

The Apalachicola Bay is one of the most productive estuaries in Florida. Variations of salinity in the

bay directly influence the aquatic habitats and their productivity. Multivariates which appear to affect the salinity include tidal elevation, wind and current velocity, precipitation and discharge of the Apalachicola River. ARIMA and dynamic regression transfer models using the Box-Jenkins methodology and crosscorrelation techniques are employed to analyze the various data time series. The rational distributed lag transfer functions between hourly variations of tidal water levels and salinity allow the forecasting of short-term fluctuations of the salinity, whereas the multivariate correlation analyses of the daily salinity with the river discharge, the wind stresses, the water levels and currents and the precipitation are shedding light on the important control variables. Several interesting conclusions as to the hydrodynamics and the water quality of the bay can be drawn from the identification of auto- and crosscorrelations and the appropriate ARIMA models. Fluctuations of tidal water levels result only in short-term periodical variations of salinity, with a linear transfer function that has a lag-two as the highest coefficient. The crosscorrelation analysis shows that the Apalachicola River, being the major fresh water source of the bay, affects strongly the currents and the salinity in the bay area over the long term. Though regional precipitation controls the amount of river discharge and groundwater seepage, its effect on the daily variations of salinity is statistically insignificant, in contrast to daily wind stress. Salinity is positively correlated with western currents since most of the oceanic flow enters the bay from the east. A lag between the daily discharge and salinity indicates that up to a week is required for the peak of the inflow fresh water to mix or flush through the exit of the bay.

1. Introduction

The Apalachicola Bay is located in northwest Florida, adjacent to the Gulf of Mexico (Fig. 1), and is one of the most productive estuaries both in Florida and the entire northern hemisphere. It yields 90 percent of the oysters consumed in Florida and 10 percent of those consumed in the United States (Johnson, 1993). Variations of salinity in the bay area directly control seafood production, as well as the overall ecological system of the bay, including peripheral marsh zones which serve as important fish nursery areas. The main factors controlling the variations in salinity are the discharge of the Apalachicola River, the seawater level, the regional precipitation, the activities of winds and gulf currents, and to a lesser extent, the surface water runoff and the groundwater recharge. The critical salinity concentrations that arise from these mixing processes create a unique habitat environment for oysters to grow in the Apalachicola Bay. Minor seasonal variations of the above control variables are often sufficient to destabilize temporarily the balance of the bay ecosystem in a detrimental manner (Johnson, 1993).

Employing methods of structural univariate time series analysis within the framework of the Box-Jenkins formalism (Box and Jenkins, 1976), that includes the use of linear transfer function models, and simple correlation analysis, the present paper attempts to quantify some of the interplaying variables that affect the short- and long-term variations of the salinity of the bay. An extension of the correlation approach to a full multivariate analysis of the control variables and long-term forecasting of the salinity variations using a Kalman filter is carried out by Sun et al. (1996).

Page 2: Cancun Arima

2. Data

Both salinity and water elevation data were collected at various stations by the Northwest Florida Water Management District (NWFWMD) on a half- hour interval from April 1993 to August 1994 and are shown partly in Fig. 2. Daily precipitation and discharges of the Apalachicola river at the Sumatra gauge were obtained from the Florida Climate Center and the USGS, respectively, and are depicted in Fig.3. Salinity records from a total of 20 stations were analyzed and short gaps in the data series were filled using cubic splines. Missing tidal records were generated by means of a least square harmonic analysis that included 25 major tidal constituents. Data records with long gaps were eliminated entirely.

For a detailed discussion of the raw data and its interpretation with respect to the general hydrodynamics and circulation in the Apalachicola Bay based on observations and numerical models (Jones et al. 1994) see Sun et al. (1996). Here it suffices to state that (1) because of the entrance of the Apalachicola River, the west side of the bay is less salty than the east side, (2) because of the fresh/salt water density stratification, bottom water is more salty than surface water, and (3) currents are generally stronger at the river mouth, the bay's east entrance, and at the exits and cuts of the estuary where the bottom of the bay is mainly covered by sandy sediments (Brooks, 1973). The results of the basic correlation analysis carried out in Section 4 will, in fact, corroborate some of these modeling inferences.

3. Box-Jenkins ARIMA- and linear transfer function models for the short-term fluctuations of salinity, water levels and currents

Univariate time series analysis using Box-Jenkins ARIMA models is a major tool in hydrology and has been used extensively, mainly for the prediction of surface water processes, such as precipitation and streamflow events (cf. Govindasamy, 1991; Maidment, 1993; Salas, 1993). On the other hand, in contrast to economical forecasting, the use of multivariate dynamic regression or transfer models between different variables has been somewhat more limited in hydrological applications (cf. Cuen and Snyder, 1986). 3.1 Univariate Box-Jenkins ARIMA models: Methodology

A general non-seasonal/seasonal ARIMA (p,d,q) (P,D,Q)S model (with non-seasonal parameters p, d, q, seasonal parameters P, D, Q, and seasonality S ) can be written in the following form:

(1)

here φp(B) and φP(B

S) are, respectively, the series of the non-seasonal and seasonal

autoregressive (AR) components of order p and P of the time series zt; θq(B) and θQ(B

S) are the moving average (MA) components of the random shock at. For

example, φp(B) can be written as φ p(B)= (1- φ 1B- φ 2B2-....- φ P B

P ), and similar

expressions hold the other three series. B is the backshift operator, defined as Bk zt

=zt-k . ∇ d

and ∇SD are the non-seasonal and seasonal difference operators, defined as

∇ d

= (1- B )

d and ∇S

D = (1- B

S )

D, respectively, that are used to make the time-series zt stationary, by either non-seasonal or, in the presence of seasonality, seasonal differencing with lag-times S. Finally, C is a constant term of regression (see

Page 3: Cancun Arima

Bowerman and O’Connel, 1987; and Pankraz, 1991, for details). As prescribed by the Box-Jenkins formalism, the build-up of the general ARIMA

model (1) for a time series requires three stages that are identification, diagnostic checking and estimation. The first two stages involve the computation of the total and partial autocorrelations; a graphical check of the latter to determine the positions of spikes and their general decay pattern for various lag-times zt-k, (k=1,...,p); and applying various t-tests of statistical significance to ascertain which lag-coefficients do not contribute to the tentative ARIMA model. The latter is then estimated in the third stage by maximum likelihood least squares regression. The Box Jenkins formalism is implemented computationally, for example, in the SAS statistical package (SAS Inc., 1993) which has been used throughout this study. 3.2 Auto- and crosscorrelations of salinity, water levels and current velocities As a first step in the ARIMA process, auto- and crosscorrelations are computed for the short-term fluctuations of hourly, semi-diurnal and diurnal variations of the salinity, tidal water levels and flow currents. As shown by the autocorrelations (Figs.4a and 4b), the major periodicities of the water levels and the salinity are of semi-diurnal nature. The crosscorrelations of the salinity versus both the water levels and the current velocities for different stations projected onto the principal SW-NE elongation of the bay (Figs. 4c and 4d) provide information on possible `flushing’ and fresh/salt water mixing processes that occur in the bay. Thus, the statistically significant crosscorrelations between the salinity and the water levels at station s398 (at the mouth of the Apalachicola River) and s391 (at the isthmus between St George Island and Dog Island) (Fig. 4c) can be taken as evidence of the tidal mixing of the fresher river water with saline water that enters the bay in the northeast during high tide. On the other hand, although the water levels and the current velocities crosscorrelate significantly (see Sun et al., 1996, for details) , the crosscorrelations between the salinity itself at station s395 and the current velocities at station s392 are statistically insignificant (Fig. 4d). This might be due to the fact that both of these stations are located within the bay proper and are thus experiencing less relevant short-term fluctuations of the salinity. 3.3 Transfer function models between salinity and tidal water heights

We forego a discussion of the third step (estimation) in the build-up of the ARIMA models for the time series of salinity, tidal water heights and flow currents (see Sun et al., 1996), and proceed directly to the estimation of linear transfer functions (LTF), or dynamic regression models, between tidal water levels and the salinity . A linear transfer function model between the dependent variable Yt (salinity) and the independent variable Xt (tidal heights) can be written in the general form

(2) where v(B) is the rational distributed transfer, or impulse function, that is given more specifically as v(B)=β w(B) B

b /δ(B), with w(B) and δ(B) polynomial delay functions of

the form w(B) = (1+ w1B+ w2B2

+ w3B3 + ... ) and δ(B) = (1+ δ1B+ δ2B

2 + δ3B

3 + ...

), where Bb specifies the “dead-time” of lag b. Thus b is the delay-time before Yt

responds to Xt at all. w(B) denotes then the further-delayed MA- response of Yt to Xt , and δ(B) the AR- response of Yt to itself. C and β are constants and N t is the stochastic disturbance. The latter can itself be described as a general seasonal ARIMA process (1) , with a MA random shock component at (cf. Pankraz, 1991).

Page 4: Cancun Arima

From the analysis of the prewhitened input Xt (water heights) and output Y t (salinity) series, tentative LTF models were identified and fine-tuned after diagnostic checking for various bay stations. Because of the large, non-stationary variability of the salinity, the latter was differenced (operator ∇) and subjected to a so-called Box-Cox natural logarithm transformation. As a representative example, the best LTF model for station s385, which is located at the east entrance of the bay (see Fig.1), is:

(3)

The highest AR terms in the two denominators of (3) indicate that salinity levels are persistent in themselves for up to two lag-times (hours) . The lag 11 and 5 for the MA terms approximate the periods of three major tidal semi-diurnal constituents M2 , S2 , N2 and M4 , M6 , respectively. Similar forms for the LTF were obtained for most of the other salinity stations in the bay, though with lag-term coefficients that vary from one station-location to another. 3.4 One-step ahead forecast of the salinity

Knowing the dynamic regression model (3) between Xt and Yt , an explicit forecast model for Yt as a function of earlier values for both Yt-k and Xt-k can be formulated. This can be considered as a fourth and final step of the Box-Jenkings technique which is forecasting . A special case of forecasting is the one-step ahead forecast, whereby all observed variables Yt and Xt up to time t are used to predict Yt at time t+1. The results are shown for station s392 and s385 in Fig.5. One notices a close match between the forecasted and the observed salinity data series. 4. Correlation analysis of long -term variations of the salinity

One can expect long-term variations of the salinity in the bay to be controlled by (1) the discharge of the Apalachicola river, (2) the wind stress, (3) the water levels, (5) the direction and speed of the bay currents and, (4) the amount of direct precipitation. To analyze these effects quantitatively, crosscorrelations of the daily averages of the salinity with these control variables were computed. 4.1 Discharge of the Apalachicola River Discharges of the river comprise the major source of the fresh water for the bay, with an annual average inflow of 650 m

3/s. Figs.6a and 6b show that for two stations

at different locations away from the mouth of the river, unsurprisingly, distributions of the salinity are strongly and negatively correlated with the discharge. Moreover, negative correlation coefficients are stronger in the SW-portion of the bay, with peaks at shorter lag-times (station s385 close to the river mouth) than in the NE-section (station s398). The peak lag-time is indicative of the time it takes to mix or to flush-out the fresh water of the river. A similar analysis for the other stations (see Fig.1) shows the longest lag-times of about six days for stations that are located in the SE-`cul-de-sac’ of the bay. The correlation coefficients are generally higher for the surficial than for the bottom salinities. This is due to the fact that surface water is more mobile than bottom water.

Page 5: Cancun Arima

4.2 Wind stress

Wind stress is a major driving force for current velocities in the bay area and should, therefore, have an effect on the mixing and flushing times of the seawater that flows into the bay from the east. Because of its shallow depth (averaging 2.7 meters), wind stress also changes the water levels (tidal ranges) of the bay. In the crosscorrelation analysis, the speeds and directions of the average daily winds are projected onto the elongated SW-NE axis of the bay, such that winds into NE- directions are associated with positive, and those into SW-directions with negative signs. As shown in Fig.6c and 6d, the signs and significance of the crosscorrelation coefficients vary with station locations. For s385 at the far east site of the bay, positive coefficients demonstrate that salinities are generally higher when the wind is blowing toward the NE, due to the inflow from East Pass (see Fig. 1). In contrast, for stations in the center of the bay, such as s416, the negative crosscorrelation coefficients are evidence that the salinity is generally higher when the wind is blowing toward the SW. 4.3 Water levels

Because of daily averaging, diurnal and semi-diurnal tidal periodical variations in the water elevations in the bay are eliminated so that the effects of longer-term seasonal fluctuations of the water levels, that arise from persistent weather fronts and storms, on the salinity can be investigated. Fig.7a and 7b show that mostly positive correlation coefficients exist between these two variables, i.e. high water levels correlate with high salinity and vice versa. Note again, differences are observed for various station locations in the bay, with station s424 in the center of the bay showing a stronger effect than s386 at the east entrance. 4.4 Current velocities

Similar to the water levels, the long-term, daily-averaged current velocities (projected onto the NE-axis of the bay) are affected mainly by persistent seasonal weather patterns and storm events. However, there is also a strong influence of the river discharge on the NE-current velocities that extends far into the NE-section of the bay, as illustrated in Fig. 7c by the significant positive correlation between these two variables for station s392. This may explain partly the significant negative cross-correlations that are observed between the NE current velocity and the salinity at that station (and for most of the other sites in the bay) in Fig. 7d. 4.5 Precipitation

Although slightly negative correlations of the salinity with the direct precipitation over the bay area were observed, they turned out to be t-test statistically insignificant. It appears that the small dilution of seawater by fresh rainfall water, especially during storm events, is masked by a rise of the water table (as supported by positive correlations between precipitation and water levels) that brings in more salty seawater into the bay. This is also consistent with the positive correlations found earlier between the salinity and the water levels. 5. Conclusions

The results of the correlation analysis described above point out the most significant control variables that appear to affect the long-term variations of the salinity

Page 6: Cancun Arima

in the Apalachicola bay. Simple correlation analysis is thus able to provide some valuable insight in the basic physical mechanisms that govern the dynamics and water quality of an estuary without using complex hydrodynamical models. It is demonstrated that the Apalachicola River has a strong effect on both the currents and the salinity in the bay area over the long term. Though precipitation controls the river discharge, its impact on the daily variations of the salinity is statistically insignificant, in contrast to daily wind stress. Salinity is positively correlated with southwestern currents that enter the bay from the east. A lag between the daily river discharge and the salinity indicates that up to a week is required for the peak of the inflow fresh water to flush through the exit of the bay.

However, for a more detailed analysis it is necessary to quantify the interdependencies that exist between the various salinity-controlling multivariates. For example, precipitation increases the discharge of the river, which in turn has been found to significantly influence the salinity. The winds trigger changes both of the water level and of the current velocities in the bay. Understanding these mutual interdependencies in more detail requires a full multivariate state space modeling approach. Once the state space model is set up, it can be used for the long-term forecasting of the salinity variations by incorporating, for example, a Kalman filter. This has been done by Sun et al. (1996) who show that statistically reliable predictions for the salinity of up to one week can be obtained using this technique. References [1] Bowerman, B.L. and R.T. O’Connel, Time Series Forecasting, Duxbury Press, Boston, MA, 1987. [2] Box, G.E.P. and G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, CA, 1976. [3] Cuen, R.H. and W.M. Snyder, Hydrological Modeling: Statistical Methods and Applications, Prentice Hall, Englewood Cliffs, NJ, 1986. [4] Brooks, H., Geological Oceanography, In: A Summary of Knowledge of the of the Eastern Gulf of Mexico, FIO, St. Petersburg, FL, 1973. [5] Govindasamy, R., Univariate Box--Jenkins forecasts of water discharge in the Missouri river, Int. Journ. Wat. Resour. Develop., 7, 168-175, 1991. [6] Jones, W.K, B. Galpin, T.S. Wu and R.H. Weisberg,, Preliminary Circulation Simulations in Apalachicola Bay, FL., Water Resources Special Report, NWFWMD, Tallahassee, FL, 1994. [7] Johnson, V., Apalachicola Bay: Endangered Estuary, Florida Water, 2, 1993. [8] Maidment, D.R., Handbook of Hydrology, McGraw Hill, New York, NY,1993. [9] Pankraz, A., Forecasting with Dynamic Regression Models. Wiley- Interscience Publication, New York, NY, 1991. [10] SAS Institute Inc., SAS/ETS User's Guide, 2nd Edition, Cary, NC, 1993. [11] Salas, J.D., Analysis and modeling of hydrological time series, In: Handbook of Hydrology, Maidment, D.R. (ed), McGraw Hill, New York, NY, 1992. [12] Sun, H., M. Koch and J. K. William, Multivariate analysis of salinity Variations and forecasting with Kalman filter techniques in the Apalachicola Bay, Florida, Journ. Hydraulic Eng., 1996 (In Press). [13] USGS, Water Resources Data Florida, Water Year 1993-1994, USGS, Tallahassee, FL, 1994.

Page 7: Cancun Arima

Figures

Figure 1: Study area of the Apalachicola Bay, showing the locations of the current- and salinity meters, tidal stations, and wind- and river streamflow gauges.

Figure 2. Hourly salinity and tidal elevation time series from 4/1/1993 to 8/1994. a). Salinity at station s386; b). Salinity at station s392; c). Water levels at station s391. The dotted line indicates data filled in by regression of the major tidal harmonics.

Page 8: Cancun Arima

Figure 3. Average daily time series data for the five major variables which affect the salinity variations from 4/1/1993 to 8/31/1994. a). Precipitation; b). Apalachicola River discharge; c). water levels for 3 selected stations; d). wind speed projected onto the SW-NE 63

o major axis of the bay (positive in NE, negative in SW direction);

e). current velocity for station s392 projected onto the SW-NE 63o major axis.

Page 9: Cancun Arima

Figure 4. Left column: Autocorrelations of the hourly tidal water levels and salinity for two stations. a): Water level at s391; b): salinity at s389. Right column: Crosscorrelations of hourly salinity variation versus tidal elevation and current velocity; a). salinity at s398 versus water level at s391; b) salinity at s385 versus current at s392. The horizontal lines delineate the 2σ- confidence interval of the correlation coefficients.

Figure 5. One-step ahead predicted hourly salinity time series using the LTF model versus observed hourly salinity data for May 1994 to August 1994. a) s385; b) s392

Page 10: Cancun Arima

Figure 6: Left column: Crosscorrelations of average daily salinity values at stations s385 (a) and s398 (b) versus Apalachicola River daily stream discharge. Right column: Correlations of salinity versus wind stress at stations s385 (a) and s416 (b).

Figure 7: Left column: Crosscorrelations of daily salinity variations versus daily water levels for stations s386 (a) and s424 (b). Right column: Correlations between currents and river discharge (a) and salinity versus current velocities (b) for s392.