Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical...

1
Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical Methods. 1.INTRODUCTION In environmental statistics, model based and design based approaches are used to solving the problem of estimating the size of the sample and the location of the sampling sites (Caselton & Zidek, 1984; Aldworth & Cressie, 1999; Groenigen, 2000; Caeiro et al., 2003). For spatial-mean predicition over the local region, ordinary kriging predictor (model based approach) is better than classical design-based estimators, when an appropiate model is choice (Aldworth & Cressie, 1999). In this situation, good designs tend to spread point uniformly in the region (Mc Bratney et al., 1981; Olea, 1984; Cox et al., 1997). Accordingly, the problem of design sampling networks for local estimation is limited to establishing for sampling networks of different size, with a regular grid, the relationship between the maximum prediction variances and their associated costs. As the kriging variances are influences by spatial correlation, it is very important to have good estimates of the semivariogram parameters (Groenigen, 2000). When model based geostatistics (Diggle & Ribeiro, 2000) is assumed, maximum likelihood (ML) would be preferred in order to estimate the parameters of the spatial correlation model instead of ordinary least squares (Stein, 1999) Ramón Giraldo H 1,2 1 Ph. D. Student. Statistical and Operational Research. Polytechnic University of Catalonia, Barcelona, Spain. E-mail. [email protected] 2 Associated Professor. Statistics Department. National University of Colombia, Bogotá, Colombia. E-mail. [email protected] REFERENCES 1. Aldworth, J. & N. Cressie. 1999. Sampling designs and prediction methods for gaussian spatial processes. In "Multivariate analysis, design of experiments and survey sampling" (S. Ghosh, ed.), pp. 1-54. Marcel Dekker Inc, New York , USA 2. Caeiro, S., Painho, M., Goovaerts, P., Costa, H. and S. Sousa. 2003. Spatial sampling design for sediment quality assessment in estuaries. Environ. Mod. & Soft., 18: 853-859 3. Caselton, W. & J. Zidek. 1984. Optimal monitoring networks designs. Statistics & Probability Letters, 2: 223-227 4. Cox, D., Cox, L & K. Ensor. 1997. Spatial sampling and the environment: some issues and directions. Environ. Ecol. Stat., 4:219- 233 5. Diggle, P. & P. Ribeiro. 2000. Model based geostatistics. 14 Sinape, Brasil. 6. Groenigen, J. 2000. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma, 97:223-236. 7. McBratney, A., Webster, R. & T. Burgess. 1981. The design of optimal sampling schemes for local estimation and mapping of regionalized variables I. Computers and Geosciences, 7(4): 331-334 8. Olea, R. 1984. Sampling design optimization for spatial functions. Math. Geol., 16:369-392 9. Ribeiro, P & P. Diggle. 2001. geoR. Package for geostatistical data analysis. R- NEWS, Vol 1, No 2, 15- 18. I ABSTRACT. A network for monitoring physical chemistry and biological variables in the Ciénaga Grande de Santa Marta (CGSM) estuary, located in the Caribbean coast of Colombia, was designed. Initially a set of 114 sampling points was chosen to measure the considered variables (Fig. 1.a). Based on the data, a spatial auto- correlation structure for each variable was estimated, using the Matérn model and maximum likelihood. Some variables were assumed Gaussian. In other cases it was necessary to transform the variables in order to obtain Gaussian processes. Later, for different size networks, the kriging prediction variances were calculated, taking the adjusted autocorrelation models as a basis. The comparison among the prediction variances for the different networks and their associated costs allowed establishing a set of sampling sites, that at a reasonable cost, substantially diminishes the prediction error for the variables of interest. Key Words: Estuary, geostatistics, Gaussian processes, ML estimation, sampling networks. 3. RESULTS . The adjusted Matern models (table 1) show strong spatial dependence for some variables (temperature, nitrites, salinity) in the area. The ranges are relatively high because the distance between the extreme north and south of the system (the longest distance) is not more than 20 km. The nugget was not greater than 50% of the sill. This, is recommendable for the spatial correlation model describe adequately reality (Caeiro et al.,2003) 2. MATERIALS AND METHODS The information used for the analysis was obtained in March 1997, at the CGSM (Fig. 1) . Water samples from the surface of the water column were analyzed for the following variables: temperature (Cº), salinity, total suspended solids (mg l -1 ), depth (m), silicates (mol l -1 ), chlorophyll “a” (g l -1 ), dissolved oxygen (mg l -1 ), nitrites (mol l -1 ) and chlorophyll “c” (g l -1 ). Between 103 and 114 observations were obtained for each variable. The data was taken throughout the system by systematic samples of squares of 4 km 2 . For each variable, the spatial auto-correlation structure was estimated by ML assuming Gaussian processes (dissolved oxygen and nitrites were log-transformed. Chlorophyll “c” was transformed with =0.35 using Box-Cox transformation) and correlation models of the Matérn Family (Diggle & Ribeiro, 2004). Sampling networks were simulated with distances of 2 (the observed), 3, 4, 5 and 6 km between points (Fig.1) and kriging prediction over 1000 unsampled points was carried out with each one. The corresponding mean prediction variances of each variable were estimated and related to the associated costs in each sampling density. The final decision on the proposed sampling network was based on practical criteria founded on the prediction variance-cost relationship. The analysis it was carried out using geoR package (Ribeiro & Diggle, 2001) 0 5 10 15 20 25 30 35 40 % Increase in Precision R espect6000 m netw or 2000 m 3000 m 4000 m 5000 m 6000 m D istance betw een sam pling points Salinity Tem perature O xygen C hlorophylla Silicates C hlorophyllc D epht Solids N itrites Figure 2. Increase in precision (% reduction standard prediction error) each sampling network respect 6000 m network (least sampling points) 0 50 100 150 200 250 300 350 400 450 500 $ U S Sam pling C o 2000 m 3000 m 4000 m 5000 m 6000 m D istance betw een sam plespoints Salinity Tem perature O xygen C hlorophyll"a" Silicates C hlorophyll"c" D epht Solids N itrites Figure 3. Sampling cost for each variable on five sampling network 2 2 2000 7.76 0.00 0.41 Chlorophyll “c” ( = 0.35) 4842 1271 0.00 0.44 Chlorophyll “a” 7240 2089 1810 0.50 Silicates 12434 0.71 0.18 0.50 Log (Nitrites) Suspended Solids Log (Oxygen) Salinity Temperature Depth Variable 8296 2158 0.01 0.21 11390 0.20 0.00 0.55 10000 21.8 0.00 0.60 14520 9.03 0.14 0.70 1720 0.13 0.00 0.50 k Table 1. ML estimation of spatial correlation Matérn model. Salinity is the variable which the greatest gain in precision was obtained (35%) when changing to the less dense network to the densest (Fig. 2) . Other variables such as temperature, dissolved oxygen, silicates, and chlorophyll “a” had increases in precision that varied between 15.9% and 23.8% (Fig. 2). Finally, for depth, nitrites, total suspended solids and chlorophyll “c”, the increase in precision was only in percentages between 5.7% and 10.1% (Fig. 2). Obviously, when comparing the intermediate networks, those with grid distances between 3, 4, and 5 km., with the 6 km network, the relative increase in precision was much less. The sampling costs associated each variable under each sampling density were different (Fig.3). Temperature, depth, and salinity had cost was much lower than other variables. For some of the variables (dissolved oxygen, silicates, and chlorophyll) going from a 3000 m network to a 2000 m the sampling, increased cost about US $240. Hence, for temperature and salinity, it would be more convenient to make an intense sampling (the densest network) as this would increase the efficiency in a considerable percentage (Fig. 2), with costs increased only about US $90 (Fig 3). For depth, even if the sampling costs are not significantly increased (Fig.3), is more recommendable to sample it in the less dense network, given that the efficiency is increased by a maximum of 7% in comparison with other networks (Fig. 2). For nitrite, total suspended solids and chlorophyll “c”, there is only a little increase in the efficiency with increasing network density; on the contrary, the With variables dissolved oxygen, silicates and chlorophyll “a” the decision is complex given that there are considerable increases in the costs (Fig. 3) and the efficiences with increasing samples (Fig. 2). A global analysis of the increases in cost and in efficiency show that the 2 km network is the least recommendable given that, compared to the 3 km one, there is a high increase in costs (more that 200%) but the relative efficiency increases in only 4.9%. While in relative terms, the change in efficiency and the costs going from one network to another with a greater number of points is similar (with the exception of the 2 km one), the networks with distances between sampling points of 4 km and 5 km should be considered to be the most advisable, given that they produce a greater efficiency that the one obtained in the 6 km, with slightly higher costs. The suggestion given in the foregoing paragraph about the optimum sampling arrangement to monitoring the variables considered in the ecosystem, are not absolute. In the final analysis, while comparing the functions of cost and of statistical efficiency, many purely empirical criteria have been used. Nevertheless it is considered that the agencies that make the final decision should have a tool that allows them to plan the most adequate monitoring strategy for the future. Figure 1. The sampling networks under which the estimation of the prediction variances were made for each variable The distances between the sampling points: a) 2 km; b) 3 km; c) 4 km; d) 5 km and e) 6 km

Transcript of Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical...

Page 1: Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical Methods. 1.INTRODUCTION In environmental statistics, model.

Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical Methods.

1.INTRODUCTION

In environmental statistics, model based and design based approaches are used to solving the problem of estimating the size of the sample and the location of the sampling sites (Caselton & Zidek, 1984; Aldworth & Cressie, 1999; Groenigen, 2000; Caeiro et al., 2003). For spatial-mean predicition over the local region, ordinary kriging predictor (model based approach) is better than classical design-based estimators, when an appropiate model is choice (Aldworth & Cressie, 1999). In this situation, good designs tend to spread point uniformly in the region (Mc Bratney et al., 1981; Olea, 1984; Cox et al., 1997).Accordingly, the problem of design sampling networks for local estimation is limited to establishing for sampling networks of different size, with a regular grid, the relationship between the maximum prediction variances and their associated costs. As the kriging variances are influences by spatial correlation, it is very important to have good estimates of the semivariogram parameters (Groenigen, 2000). When model based geostatistics (Diggle & Ribeiro, 2000) is assumed, maximum likelihood (ML) would be preferred in order to estimate the parameters of the spatial correlation model instead of ordinary least squares (Stein, 1999)

Ramón Giraldo H1,2

1 Ph. D. Student. Statistical and Operational Research. Polytechnic University of Catalonia, Barcelona, Spain. E-mail. [email protected] 2 Associated Professor. Statistics Department. National University of Colombia, Bogotá, Colombia. E-mail. [email protected]

REFERENCES

1. Aldworth, J. & N. Cressie. 1999. Sampling designs and prediction methods for gaussian spatial processes. In "Multivariate analysis, design of experiments and survey sampling" (S. Ghosh, ed.), pp. 1-54. Marcel Dekker Inc, New York , USA

2. Caeiro, S., Painho, M., Goovaerts, P., Costa, H. and S. Sousa. 2003. Spatial sampling design for sediment quality assessment in estuaries. Environ. Mod. & Soft., 18: 853-859

3. Caselton, W. & J. Zidek. 1984. Optimal monitoring networks designs. Statistics & Probability Letters, 2: 223-227

4. Cox, D., Cox, L & K. Ensor. 1997. Spatial sampling and the environment: some issues and directions. Environ. Ecol. Stat., 4:219-233

5. Diggle, P. & P. Ribeiro. 2000. Model based geostatistics. 14 Sinape, Brasil.

6. Groenigen, J. 2000. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma, 97:223-236.

7. McBratney, A., Webster, R. & T. Burgess. 1981. The design of optimal sampling schemes for local estimation and mapping of regionalized variables I. Computers and Geosciences, 7(4): 331-334

8. Olea, R. 1984. Sampling design optimization for spatial functions. Math. Geol., 16:369-392

9. Ribeiro, P & P. Diggle. 2001. geoR. Package for geostatistical data analysis. R- NEWS, Vol 1, No 2, 15-18.

10. Stein, M. 1999. Interpolation of spatial data. Some theory for kriging. Springer

I

ABSTRACT.

A network for monitoring physical chemistry and biological variables in the Ciénaga Grande de Santa Marta (CGSM) estuary, located in the Caribbean coast of Colombia, was designed. Initially a set of 114 sampling points was chosen to measure the considered variables (Fig. 1.a). Based on the data, a spatial auto-correlation structure for each variable was estimated, using the Matérn model and maximum likelihood. Some variables were assumed Gaussian. In other cases it was necessary to transform the variables in order to obtain Gaussian processes. Later, for different size networks, the kriging prediction variances were calculated, taking the adjusted autocorrelation models as a basis. The comparison among the prediction variances for the different networks and their associated costs allowed establishing a set of sampling sites, that at a reasonable cost, substantially diminishes the prediction error for the variables of interest.Key Words: Estuary, geostatistics, Gaussian processes, ML estimation, sampling networks.

3. RESULTS.

The adjusted Matern models (table 1) show strong spatial dependence for some variables (temperature, nitrites, salinity) in the area. The ranges are relatively high because the distance between the extreme north and south of the system (the longest distance) is not more than 20 km. The nugget was not greater than 50% of the sill. This, is recommendable for the spatial correlation model describe adequately reality (Caeiro et al.,2003)

2. MATERIALS AND METHODS

The information used for the analysis was obtained in March 1997, at the CGSM (Fig. 1) . Water samples from the surface of the water column were analyzed for the following variables: temperature (Cº), salinity, total suspended solids (mg l-1), depth (m), silicates (mol l-

1), chlorophyll “a” (g l-1), dissolved oxygen (mg l-1), nitrites (mol l-1) and chlorophyll “c” (g l-1). Between 103 and 114 observations were obtained for each variable. The data was taken throughout the system by systematic samples of squares of 4 km2. For each variable, the spatial auto-correlation structure was estimated by ML assuming Gaussian processes (dissolved oxygen and nitrites were log-transformed. Chlorophyll “c” was transformed with =0.35 using Box-Cox transformation) and correlation models of the Matérn Family (Diggle & Ribeiro, 2004). Sampling networks were simulated with distances of 2 (the observed), 3, 4, 5 and 6 km between points (Fig.1) and kriging prediction over 1000 unsampled points was carried out with each one. The corresponding mean prediction variances of each variable were estimated and related to the associated costs in each sampling density. The final decision on the proposed sampling network was based on practical criteria founded on the prediction variance-cost relationship. The analysis it was carried out using geoR package (Ribeiro & Diggle, 2001)

0

5

10

15

20

25

30

35

40

% I

ncr

ease

in

Pre

cisi

on R

esp

ect

6000

m n

etw

ork

2000 m 3000 m 4000 m 5000 m 6000 mDistance between sampling points

SalinityTemperatureOxygenChlorophyll aSilicatesChlorophyll cDephtSolidsNitrites

Figure 2. Increase in precision (% reduction standard prediction error) each sampling network respect 6000 m network (least sampling points)

0

50

100

150

200

250

300

350

400

450

500

$ U

S S

am

pli

ng C

ost

2000 m 3000 m 4000 m 5000 m 6000 mDistance between samples points

Salinity

Temperature

Oxygen

Chlorophyll "a"

Silicates

Chlorophyll "c"

Depht

Solids

Nitrites

Figure 3. Sampling cost for each variable on five sampling network

22

2000 7.760.000.41Chlorophyll “c” ( = 0.35)

484212710.000.44Chlorophyll “a”

7240208918100.50Silicates

12434 0.710.180.50Log (Nitrites)

Suspended Solids

Log (Oxygen)

Salinity

Temperature

Depth

Variable

829621580.010.21

11390 0.200.000.55

1000021.80.000.60

145209.030.140.70

17200.130.000.50

k

Table 1. ML estimation of spatial correlation Matérn model.

Salinity is the variable which the greatest gain in precision was obtained (35%) when changing to the less dense network to the densest (Fig. 2) . Other variables such as temperature, dissolved oxygen, silicates, and chlorophyll “a” had increases in precision that varied between 15.9% and 23.8% (Fig. 2). Finally, for depth, nitrites, total suspended solids and chlorophyll “c”, the increase in precision was only in percentages between 5.7% and 10.1% (Fig. 2). Obviously, when comparing the intermediate networks, those with grid distances between 3, 4, and 5 km., with the 6 km network, the relative increase in precision was much less.

The sampling costs associated each variable under each sampling density were different (Fig.3). Temperature, depth, and salinity had cost was much lower than other variables. For some of the variables (dissolved oxygen, silicates, and chlorophyll) going from a 3000 m network to a 2000 m the sampling, increased cost about US $240.

Hence, for temperature and salinity, it would be more convenient to make an intense sampling (the densest network) as this would increase the efficiency in a considerable percentage (Fig. 2), with costs increased only about US $90 (Fig 3). For depth, even if the sampling costs are not significantly increased (Fig.3), is more recommendable to sample it in the less dense network, given that the efficiency is increased by a maximum of 7% in comparison with other networks (Fig. 2). For nitrite, total suspended solids and chlorophyll “c”, there is only a little increase in the efficiency with increasing network density; on the contrary, the costs, especially in the 2000 m network, increase considerably (Fig. 3). Hence the less dense networks (5000 m and 6000 m between sampling points) are the most adequate for the follow-up of these variables.

With variables dissolved oxygen, silicates and chlorophyll “a” the decision is complex given that there are considerable increases in the costs (Fig. 3) and the efficiences with increasing samples (Fig. 2).A global analysis of the increases in cost and in efficiency show that the 2 km network is the least recommendable given that, compared to the 3 km one, there is a high increase in costs (more that 200%) but the relative efficiency increases in only 4.9%. While in relative terms, the change in efficiency and the costs going from one network to another with a greater number of points is similar (with the exception of the 2 km one), the networks with distances between sampling points of 4 km and 5 km should be considered to be the most advisable, given that they produce a greater efficiency that the one obtained in the 6 km, with slightly higher costs.The suggestion given in the foregoing paragraph about the optimum sampling arrangement to monitoring the variables considered in the ecosystem, are not absolute. In the final analysis, while comparing the functions of cost and of statistical efficiency, many purely empirical criteria have been used. Nevertheless it is considered that the agencies that make the final decision should have a tool that allows them to plan the most adequate monitoring strategy for the future.

Figure 1. The sampling networks under which the estimation of the prediction variances were made for each variable The distances between the sampling points: a) 2 km; b) 3 km; c) 4 km; d) 5 km and e) 6 km