Bayesian design methods for improving the effectiveness of ...1 Bayesian design methods for...
Transcript of Bayesian design methods for improving the effectiveness of ...1 Bayesian design methods for...
1
Bayesian design methods for improving the effectiveness of monitoring coral reefs
Thilan AWLPa,b,g*, Peterson EEa,b,f, Menendez Pc,e, Caley MJa,b, Drovandi Ca,b, Mellin Cc,d, McGree
JMa,b aSchool of Mathematical Sciences, Science and Engineering Faculty, Queensland University of
Technology, Brisbane, Queensland, Australia bAustralian Research Council Centre of Excellence for Mathematical and Statistical Frontiers
(ACEMS), Australia cAustralian Institute of Marine Sciences, Townsville,
Queensland, Australia d The Environment Institute and School of Biological Sciences, University of Adelaide, Adelaide,
South Australia 5005, Australia eSchool of Mathematics and Physics, Brisbane, Australia
fInstitute for Future Environments, Queensland University of Technology, Brisbane, Australia gDepartment of Mathematics, University of Ruhuna, Sri Lanka
*Corresponding Author: Email: [email protected]; Tel:+61(0)410372540;
Fax: +61 7 3138 2310;
Postal address: School of Mathematical Sciences, Science and Engineering Faculty,
Queensland University of Technology, 2 George Street, Brisbane, QLD 4000.
ABSTRACT
Survey design underpins our ability to successfully monitor and manage the environment. There are
two basic design types; static designs, which remain fixed over time, and adaptive designs, which can
change over time. An advantage of adaptive designs is that changes can be made as more is learned
about the system, ensuring that informative data are collected in an on-going manner. Here, we propose
a model-based adaptive design approach that incorporates spatial and disturbance information when
monitoring large-scale environmental systems. We apply this new approach to derive sampling designs
for monitoring coral reef systems within Australia’s Great Barrier Reef, and show that these adaptive
designs can provide twice the amount of information as designs found using previously proposed
methods from the literature. As such, we suggest our new methods can be used to enhance the
effectiveness and efficiency of environmental monitoring initiatives.
Key words: Adaptive design; Coral bleaching; Coral cover; Cyclone impacts; Great Barrier Reef.
2
1. Introduction
The health and the long-term resilience of coral reefs around the world are at risk due to rising
environmental and human impacts (Hoegh-Guldberg et al., 2007, Hughes et al., 2003, Jackson et
al., 2001). The Great Barrier Reef (GBR) is currently one of the best managed and monitored
natural wonders of the world with a view to safeguarding its health from anthropogenic
disturbances (Pandolfi et al., 2005). However, environmental pressures such as climate change
resulting in coral bleaching, crown of thorns starfish (CoTS) outbreaks, and cyclones can
compromise the health of the GBR (Hoegh-Guldberg et al., 2007, Sweatman et al., 2011,
Vercelloni et al., 2017). By effectively monitoring such ecological systems, it should be possible
to identify their vulnerabilities and the potential causes to inform the development of management
practices and/or policies to reduce the impact of disturbances and foster more resilient ecosystems.
The Australian Institute of Marine Science (AIMS) has been monitoring coral reefs in the GBR
since 1983 through the Long-term Monitoring Program (LTMP). The LTMP collects data that are
used to infer reef health and condition (Sweatman et al., 2008). Samples are collected from benthic
communities on selected reefs which are representative of the benthic communities in each of the
GBR regions (Jonker et al., 2008). The LTMP is based on a static design in that data are gathered
from predetermined reefs, and sites within reefs that do not change over time (De'ath and
Fabricius, 2010, Sweatman et al., 2011). As such, the LTMP does not incorporate knowledge
gained from previous years of data collection, nor does it allow for disturbance data to be included
when selecting reefs for future surveys (Miller et al., 2003). Thus, there is the potential to enhance
these current monitoring practices using an adaptive sampling regime, which provides a way to
incorporate new information when selecting reefs and/or sites to collect data.
The past twenty years have seen the rapid development of adaptive design methods particularly
in the field of clinical trials (McGree et al., 2012, Weir et al., 2007), and to a lesser extent in
astrophysics (Ford, 2008, Loredo, 2004, Loredo et al., 2012) and environmental monitoring (Falk
et al., 2014). To our knowledge, Kang et al. (2016) was the first study to introduce adaptive design
for improving the effectiveness of monitoring in the GBR. Adaptive design methods were
proposed that allow information accumulated over time to inform where and when samples should
be collected on the GBR in an ongoing manner (Morgan et al., 2014). Kang et al. (2016) treated
the problem of finding adaptive designs over time as an optimisation problem, and found designs
to lower experimental costs and increase the information gained from the collected data. The
3
authors described monitoring objectives through a utility function that characterized the expected
worth of the monitoring data obtained given a particular design (Chaloner and Verdinelli, 1995).
For illustration, sampling on the Cook-Lizard region of the GBR was considered, and they
demonstrated value in being able to adapt sampling over time. However, one limitation of this
work was that the adaptive designs were derived based on an overly simplified model for coral
cover that did not capture spatial variability or temporal disturbance information. As such, the
adaptive designs found by the authors may be sub-optimal as it is possible that important
components that affect the health of coral reef systems were not considered.
In this paper, we propose a design framework to incorporate spatial variability when modelling
coral cover and the effect of time-varying disturbances such as CoTS outbreaks when deriving
adaptive designs for monitoring the GBR. To evaluate our proposed framework, we consider
adaptive design methods for visiting fewer LTMP sites, and assess the impact this has on the
information obtained. Further, we compare our adaptive designs with those derived from recently
proposed methods in the literature (Kang et al., 2016). To conclude, we discuss how adaptive
sampling methods can improve the effectiveness of reef monitoring programs and provide
guidance for where samples should be collected to efficiently gather information about reef health.
2. Material and methods
A Bayesian design framework is proposed in this paper, and it is comprised of three key
components (see Figure 1 which shows a diagram of these three key components and how they
link together). The first component involves quantifying prior information about the ecological
process being monitored (Figure 1a). For this purpose, we fit a statistical model to the LTMP hard
coral cover data (i.e. the proportion of the sea floor occupied by hard coral, without accounting
for three dimensional overlap) that accounts for spatial dependency and important environmental
and disturbance covariates (i.e. predictors). In the second component, this prior information is
exploited to assess the usefulness of a proposed design (Figure 1b). This involves mathematically
defining the monitoring objective via a utility function (Chaloner and Verdinelli, 1995), and
targeting data collection to inform this objective. Finally, we evaluate our proposed methods by
comparing our designs with the LTMP design and those found using recently proposed methods
(Kang et al., 2016) across a variety of future scenarios (Figure 1c). In the next section, we describe
each of these components in more detail.
4
(a) Quantifying prior information
(b) Assessing the usefulness of a design
(c) Optimisation and evaluation of the design
Figure 1: Diagram of the proposed Bayesian adaptive design framework. This consists of three key components: (a) Quantifying prior information, (b) Assessing the usefulness of a design, and (c) Optimisation and evaluation of the design.
2.1 Quantifying prior information
For undertaking adaptive design, we consider a Bayesian inference framework due to the
mathematically rigorous handling of uncertainty and the availability of important utility
functions. Further, a Bayesian framework provides an opportunity to incorporate knowledge
gained from historical data into the formation of the design through a prior distribution. In
Bayesian methods, a prior represents the uncertainty about a quantity/parameter of interest.
Such prior information can be created using a number of methods including model fitting,
expert opinion, and knowledge gained through a literature review. In this study, we fit a model
LTMP data/Covariates
Fit a statistical model
Obtain posterior distribution of model
parameters
Form prior for design
Propose design
Estimate or approximate the expected utility
Optimise design
Reef monitoring scenarios
1. Comparison with Kang et al. (2016)
2. Impact of reduced sampling
3. Impacts of different disturbance conditions
5
to the existing LTMP data to obtain prior information for the Bayesian adaptive design. The
data and model used for this purpose are described next.
2.1.1 LTMP data and design
The LTMP provides a semi-continuous record of change in reef communities over the last three
decades across six regions of the GBR (Townsville, Cairns, Capricorn Bunkers, Whitsunday,
Swain, and Cooktown/Lizard island) (Sweatman et al., 2008). Here, we focus on the
Whitsunday region due to the relatively large amount of data being available (Figure 2) and
the large and diverse range of disturbances that have occurred in this region over time (Osborne
et al., 2011, Vercelloni et al., 2017).
As part of the LTMP sampling design, 5 coral cover observations are collected from each site,
and three sites are sampled on each reef (Jonker et al., 2008). A total of three reefs are sampled
in each of the inner, middle, and outer reef habitats (5 observations × 3 sites × 3 reefs × 3
habitats). In some years, however, surveys could only be partly completed due to bad weather,
resulting in fewer observations. Consequently, the data set used in this study contained a total
of 1077 observations collected over the sampling years of 2002, 2004, 2005, 2007, 2009, 2011,
2013, and 2015.
At each site, the LTMP samples 5 permanent 50x1m2 transects at a depth of 6m and 9m each
separated by at least 10m and parallel to the reef crest. Fifty images are taken from each transect
using video frames (from 2006 onwards) or digital photographs at 1m intervals (prior to 2006).
A site-level coral cover estimate obtained by projecting five points onto each of 40 randomly
selected images (Jonker et al., 2008), which are subsequently classified manually by a marine
scientist (Beijbom et al., 2015).
2.1.2 Covariate data
We considered a number of potential covariates in our statistical model, which represent
physico-chemical conditions, topographic position, and natural and anthropogenic disturbances
known to have a direct or indirect influence on coral cover, see Table 1. Plots of the spatial and
temporal distribution of coral cover and these covariates are provided in Appendix A.
6
Figure 2: Survey sites for the Long-Term Monitoring Program in the Whitsunday region, one of six regions of the Great Barrier Reef. The Whitsunday region is divided into three shelf-positions: inner- (Hayman, Langford-bird, and Broder Island), middle- (19131S, 19138S, and 20104S), and outer-shelf (Slate, Hyde, and Rebe) reefs. Survey sites are represented by red dots. A small amount of jitter was added to the locations for visualisation purposes.
Covariates Description Source Spatial Resolution
Temporal Resolution
Time Sampling years NA NA 2002-2015
Cyclone exposure
The number of hours each grid cell was exposed to potentially damaging seas:
0 = No cyclone effects, 1 = Some cyclone effects
Puotinen et al. (2016)
0.01° 2002-2015
7
Covariates Description Source Spatial Resolution
Temporal Resolution
Bleaching exposure
0= No coral bleaching,
1 1% coral bleached
Matthews et al. (2019)
0.01°
2002
CoTS Mean A.solaris densities Matthews et al. (2019)
0.01° 2002-2015
Shelf position
Position of reefs on the continental shelf; 1= inshore/inner shelf; 2 = middle shelf; 3 = outer shelf
GBRMPA (2014)
0.005° Great Barrier Reef Zoning Plan 2003
Bathymetry
Depth below sea level
(meters)
Beaman (2017)
0.0003°
2017
Opened reef
Protected areas where no fishing is allowed. 1 = no-take, 0 = otherwise
GBRMPA (2014)
0.005°
Great Barrier Reef Zoning
Plan 2003
Sea surface temperature anomaly (SSTA)
Difference between measured Sea Surface Temperature (SST) and the monthly long-term mean SST (°C)
BOM (2014)
0.01°
The monthly long-term mean SST for 2002-2015
Light attenuation
Attenuation coefficient (between 0 and 1): The rate of decrease light penetrating the water column with depth
CERF (2009)
0.01° 1997-2009
Chlorophyll Long term mean concentration (µg/m³) of chlorophyll A pigments in the water column
CERF (2009)
0.01° 1997-2009
CRS_T_AV Temperature (mean ºC) at the sea surface
Dunn (2009)
0.01° 1960-2006
Primary Primary flood plume frequency (weeks occurred/total weeks) during wet season (max = 26)
Delvin et al. (2012)
0.01° 2007-2013
Secondary Secondary flood plume: representing chlorophyll dominated plume
Delvin et al. (2012)
0.01° 2007-2013
Tertiary Representing further extent of plume, as delineated by salinity less than 34ppt
Delvin et al. (2012)
0.01° 2007-2013
Table 1. Summary of the potential covariates considered in the coral cover model. The spatial resolution is recorded in decimal degrees.
8
2.1.3 Statistical model for coral cover
We fit a spatial Beta regression model to the LTMP coral cover data as such a model can be
considered for bounded data (i.e. proportions) and can accommodate a variety of distributional
forms including symmetric and skewed distributions (Figure 1a, Fit a statistical model). The
model was parameterized in terms of a mean and a precision parameter, with a probability
density function defined as follows:
| ,1
1 , 0 1, (1)
where represents coral cover, ∙ denotes the gamma function, , and ( 0) is
the precision parameter. Accordingly, we assume that ~Beta , where denotes
the th datum, from the th site, in the th sampling year, where 1,… , ,
1, … ,5and 1, … ,8. To account for potential relationships between coral cover and
covariates (Table 1) and for spatial dependence (i.e. autocorrelation through space), the
following regression structure was assumed for mean coral cover :
Time , (2)
where ∙ is a logit link function (Lagos-Alvarez et al., 2017), is the intercept, is the
matrix of static site-specific covariates (e.g. Inner-, Middle-, and Outer-shelf, Chlorophyll, and
CRS_T_AV), is the vector of regression coefficients for the site-specific covariates, is
the matrix of time-varying covariates (e.g. CoTS, Bleaching, and Cyclone), is the vector of
regression coefficients for time-varying covariates, and is the regression coefficient for
Time. The precision parameter was assumed unknown and common across the Whitsunday
(Ferrari and Cribari-Neto, 2004). In order to capture the spatial variability in coral cover, we
included a spatially correlated random effect, , in the model. We assumed that follows a
multivariate Normal distribution, | ~MVN(0, ),where is based on a Gaussian
covariance function (Ecker and Gelfand, 1997):
= exp , , 1, … , , (3)
9
where is the distance between sites and , ( 0 is the variance of the spatial
process (i.e. the partial sill) and 0 is the range parameter.
Within a Bayesian framework, we are interested in the posterior distribution of the parameters
defined as | , , ∝ | , , | , , where is the sampling design (i.e.
static site-specific covariate values), represents time-varying covariates, | , is the
distribution of the time-varying covariates depending on parameter (see Section 2.2.2),
| , , is the likelihood function and is the prior distribution of .
2.1.4 Obtaining the posterior distribution of model parameters
To undertake a Bayesian analysis, the prior distribution needs to be defined. We chose a weakly
informative, multivariate Normal prior for , which includes the regression coefficients,
log of variance (i.e. log of the reciprocal of the precision), and the log of the covariance
parameters (i.e. partial sill and range). Approximating the posterior distribution for a model
like the one defined above can be computationally expensive, particularly when covariate
selection needs to be undertaken. Therefore, we approximated the posterior distribution using
Laplace-based methods, via a Monte Carlo approximation to the (full data) likelihood (Faraway
et al., 2018, Long et al., 2013, McGree et al., 2016, Overstall et al., 2018). Please see Appendix
B for additional.
To determine which covariates should appear in the model, forward stepwise model-selection
was undertaken. Specifically, we started with the null model (intercept only) and then included
covariates (Table 1) one at-a-time to determine which covariates (if any) improved the model
fit as determined by the posterior model probability (MacKay, 2003). This process was
repeated until no further improvement in the model fit was observed. The final model identified
using this procedure was then checked in terms of goodness-of-fit via posterior predictive
checks, which proved to be satisfactory. The posterior distribution of the parameters from the
final model could then be used as the prior information (Figure 1a) to find adaptive designs for
monitoring (Figure 1b), and this is discussed in the next section.
10
2.2 Assessing the usefulness of a design
This section describes the approach used to assess the usefulness of a given design in
addressing specific monitoring objectives. This relates to the second component of our
Bayesian adaptive design framework as shown in Figure 1b.
2.2.1 Propose a design
Within our design framework, a sampling design defines locations for data collection. As
shown in Equation (2), this constitutes defining the (static) site-specific covariates used in
modelling coral cover. Let i denote the site-specific covariates for the th site, then the LTMP
design can be defined as = ( 1, 2, …, i, ..., ). The other covariates that appear in Equation
(2) can vary through time. Accordingly, we will optimise these covariates but rather optimise
the design over the distribution of these covariates. This is discussed below.
In the context of Bayesian experimental design, a utility function , , is used to quantify
the worth of observing data from design in terms of achieving a specific monitoring
objective (e.g. estimate trends or the impact of disturbances). As the notation indicates, the
utility function , , depends on and , however, these are unknown a priori. Thus, this
uncertainty needs to be integrated out to form an expected utility function before it can
be used in Bayesian design. Such an expected utility can be defined as follows:
, , | , , (4)
where the optimal design is defined as the design that maximises the above expected utility
function.
As mentioned above, in natural ecosystems such as coral reefs, there are additional
uncertainties associated with time-varying covariates (e.g. where and when disturbances will
occur). To account for this, the expectation in (4) is also taken with respect to the distribution
of time-varying covariates as follows:
, , , | , , | , . (5)
11
To capture the uncertainty about the time-varying covariates, an assumption must be made
about the distribution of the as yet unobserved time-varying covariates; in this case, that they
follow a distribution | , , see Section 2.2.2. Thus, the above expected utility is not
evaluated based on specific values of these time-varying covariates, but rather evaluated across
their distribution.
In order to precisely estimate trends and the impact of disturbances, we adopted a parameter
estimation utility function called the Kullback-Leibler divergence (KLD) between the prior and
posterior distribution (Kullback and Leibler, 1951), which is defined as follows:
, , | , , log | , , log | , , (6)
where | , | , , is the marginal likelihood. This utility does not
depend on because its integrated out, and so it will be denoted as , , . Thus, we seek
a design that maximizes Equation (5) where the utility is given in Equation (6).
2.2.2 Estimate or approximate the expected utility function
In general, the expectation defined by Equation (5) does not have a closed form solution, and
therefore needs to be approximated. One common approach is to use Monte Carlo integration
as follows (Ryan, 2003):
1, , , . (7)
This approach to approximate the expected utility of a given design is outlined in Algorithm
1. In Equation (7), is the controlling parameter for the Monte Carlo approximation and is
typically large (i.e. 500), and ~ , ~ | , , ~ , ,
(Algorithm 1, lines 2-5). As our utility function (defined in Equation (6)) is a function of the
posterior distribution, posterior distributions need to be approximated or sampled from in
order to approximate the expected utility. Further, this evaluation needs to be undertaken for
each proposed design, which imposes significant computational demands (Ryan, et. al.,
2016). Thus, for computational efficiency, we again adopt the Laplace approximation within
12
the Monte Carlo approximation to the expected utility (Algorithm 1, line 6) (Faraway et al.,
2018, Long et al., 2013, McGree et al., 2016, Overstall et al., 2018).
Algorithm 1. Implementing the Bayesian adaptive sampling scheme.
Algorithm: Approximating expected utility functions
1. Initialise 2. For 1 to do 3. Simulate ~ 4. Simulate ~ | , 5. Simulate ~ , , 6. Estimate | , , via Laplace approximation 7. Evaluate KLD utility , , 8. Store , , 9. End For
10. Output ∑ , ,
To evaluate the above approximation to the expected utility function, time-varying covariates
( ) need to be simulated (line 4). Thus, distributions | , for these are needed. In order to
find such distributions, the existing LTMP data were analysed. For categorical covariates (i.e.
bleaching and cyclone impacts; Table 1), the proportion of observed occurrences of each
disturbance were estimated for each site, and the outcome (disturbance or not) was assumed to
follow a Bernoulli distribution. In contrast, CoTS density is a continuous covariate with many
zeros (i.e. no observation of CoTS). To develop a distribution for such data, we first determined
the proportion of sites where no observations of CoTS were recorded (Zeileis et al., 2008), and
the outcome (CoTS density zero or not) was assumed to follow a Bernoulli distribution. To
obtain the distribution of non-zero CoTS data (i.e. the mean CoTS densities), a Log-normal
distribution was estimated. Then, to simulate CoTS data, we first generated a random number
1, … , , between 0 and 1, and if (i.e. proportion of CoTS=0 at the
site), we set CoTS = 0, otherwise we generated CoTS data from the fitted Log-normal
distribution. These distributions were then used to simulate time-varying covariates to
approximate the expected utility as shown in Algorithm 1.
2.3 Optimisation and evaluation of the design
This section describes the third component of our Bayesian adaptive design framework. Given
we are now able to approximate the expected utility of a given design, the next step is to
13
optimise this expected utility through the choice of the design. The procedure used for this
optimisation is described next, along with a number of approaches to evaluate the subsequently
found designs.
2.3.1 Optimise the design In the examples that follow, we will optimise designs within reef monitoring scenarios where
there are a number of sites to choose from. Thus, there will be a large but fixed number of
potentially optimal designs. Enumerating all possible designs would be computationally
infeasible, so we employ an optimisation algorithm. For searching within a fixed number of
sites (i.e. a discrete design space), the coordinate-exchange algorithm (Meyer and Nachtsheim,
1995) can be used. This algorithm begins with a random design (i.e. a random selection of
sites), which is then optimised, one site at-a-time. In practice, this means holding all but one
site fixed, and then iteratively substituting each alternative site for the one unfixed site and
calculating the expected utility of the design. The included site that maximizes the expected
utility is then selected for inclusion into the design. This process is then repeated for all sites
in the design. As optimal choices for each dimension may change depending on what other
sites have been selected, the algorithm iteratively cycles through the whole design a fixed
number of times (i.e. maximum number of iterations) or until no further improvement is
observed in the expected utility.
2.3.2 Reef monitoring scenarios In order to evaluate the optimal designs, we firstly consider future disturbance patterns that are
consistent with historical patterns, and find optimal designs using our approach and the
approach from Kang et al. (2016). Secondly, we explore the performance of our designs in
comparison to the LTMP design. This comparison is undertaken with respect to reduced
sampling scenarios and a variety of different future disturbance patterns.
Comparison with Kang et al. (2016) designs
We compared our designs to those found by using the methods proposed by Kang et al. (2016).
To find adaptive designs using the approach of Kang et al. (2016), we used their proposed
linear model (with no spatial effects) within our Bayesian adaptive design framework. The
resulting designs were then evaluated with respect to our Beta regression model with spatial
14
random effects (Eq. 10). Given evaluation of the expected utility is stochastic, for each design,
it was evaluated 20 times using independent draws from the prior predictive distribution and
for the time-varying covariates. Then, to quantify the information loss (or gain) when using the
approach of Kang et al. (2016), the design efficiency was evaluated as follows:
∗ , 1,2,⋯ ,20, (8)
where and ∗ are the th evaluations of the expected utilities of the optimal design
under the linear modelling approach from Kang et al. (2016) and the optimal design under our
spatial model, respectively. Then, the average efficiency ( ) was evaluated as the mean of
( 1,2,⋯ ,20. Such an efficiency can be interpreted as the proportion of sampling
required under design to achieve an equivalent amount of information under design ∗. An
average efficiency less than one will suggest that our designs are expected to provide more
information than those based on methods from Kang et al. (2016) and vice versa for an average
efficiency of greater than one.
Impacts of reduced sampling
To further evaluate our proposed design framework, we optimised designs under reduced
sampling scenarios. This will then allow us to determine which reefs/sites could potentially be
dropped from the LTMP, and explored the consequences of doing so. For this purpose, two
approaches were undertaken: 1) dropping the least informative reef from the LTMP design and
2) dropping the least informative site from each reef within the LTMP design. First, to
determine the least informative reef, the approximate expected utility was evaluated for all
combinations of reefs where one reef was omitted. Then, the design that yielded the largest
utility was inspected to determine which reef was missing, and then proposed as the least
informative reef. Second, we similarly investigated the impact of dropping the least informative
site from each reef (see Table 2 for a description of sites). For this latter investigation, the
optimisation of the design was performed by using the coordinate exchange algorithm as
described in Section 2.3.1. Such an optimisation approach is not needed for the first
investigation as there a relatively few designs to choose from, so an exhaustive search was
employed.
15
Reef names Reef numbers Site numbers
19131S 1 19,20,21
19138S 2 13,14,15
20104S 3 10,11,12
Broder Island reef 4 1,2,3
Hayman Island reef 5 7,8,9
Hyde reef 6 22,23,24
Langford-bird reef 7 4,5,6
Rebe reef 8 16,17,18
Slate reef 9 25,26,27
Table 2: The Whitsunday region’s reefs and corresponding site numbers. To compare our optimal designs with the LTMP, design efficiency was again used. However,
as we will be exploring optimal designs under reduced sampling (when compared to the
LTMP), the inverse of the above efficiency was evaluated as follows:
∗
, 1,2,⋯ ,20, (9)
where ∗ and are the thevaluations of expected utilities of the optimal design
∗and the LTMP design , respectively. The interpretation of the resulting average efficiency
is as given above with an average efficiency close to one meaning little information is expected
to be lost by using our reduced sampling designs when compared to the LTMP design.
Impacts of different disturbance conditions
To evaluate the performance of our optimal designs under different disturbance patterns, we
considered two different disturbance scenarios. In the first scenario, we considered disturbance
conditions consistent with historical data in the Whitsunday region (Table 4). In the second
scenario, we created four schemes, where CoTS disturbance conditions varied as follows:
i. One site from each reef affected,
ii. All the sites in inshore reefs affected,
iii. All the sites in middle-shelf reefs affected,
iv. All the sites in outer-shelf reefs affected.
16
Under the scheme (i), we randomly selected one site from each of the nine reefs in the
Whitsunday region, and changed the probability of CoTS disturbance at this site to 1. Under
the schemes (ii), (iii), and (iv), we changed this CoTS disturbance proportion to 1 for each of
the inshore, middle-shelf, and outer-shelf sites, respectively. In order to find the optimal
designs under these scenarios, we followed the procedure described in Section 2.3.1. We
compared our optimal designs against the performance of the LTMP design. To do so, we again
evaluated the design efficiency as given in Equation (8).
3. Results
3.1 Quantifying prior information
The most appropriate coral cover model found based on the procedure outlined in Section 2.1.4
can be described as follows:
logit Middle-shelf Outer-shelf Opened Reef
Bathymetry Chlorophyll CRS_T_AV Cyclone
Bleaching log CoTS Time , 1, … , and 1,… ,8.
(10)
The baseline categories for the shelf position and open/closed to fishing are inshore and open
for fishing respectively, and are incorporated into the intercept. A summary of the posterior
distributions of the parameters for the above model is given in Table 3. The posterior means
and standard deviations are shown with 95% credible intervals. All parameters were significant
except the coefficients for Time, Middle-shelf, and log CoTS. In general, these results are
consistent with what other similar studies have reported (Kang et al., 2016). However, some
variation is expected as we are only focusing on a particular region on the GBR, and we are
fitting a different model.
3.2 Assessing the usefulness of a design
Table 4 shows the estimated parameters for the distributions of time-varying covariates for
each site. These distributions were used to simulate time-varying covariates ( ) for impacts of
bleaching, cyclones, and CoTS (Algorithm 1, line 4). Once time-varying covariates were
simulated, the above coral cover model (Equation (10)) was used to simulate hard coral cover
data (Algorithm 1, line 5).
17
Mean Standard deviation
Lower bound of 95% credible
interval
Upper bound of 95% credible
interval
Intercept -1.27 0.08 -1.43 -1.12
Time -0.04 0.03 -0.10 0.01
Middle-shelf 0.15 0.08 -0.01 0.32
Outer-shelf 0.91 0.21 0.50 1.31
log CoTS -0.01 0.01 -0.02 0.00
Opened reef 0.28 0.09 0.11 0.45
Cyclone -0.45 0.05 -0.55 -0.35
Bleaching -0.22 0.07 -0.35 -0.08
Bathymetry -0.11 0.02 -0.15 -0.06
Chlorophyll -0.80 0.10 -0.99 -0.61
CRS_T_AV -0.23 0.05 -0.33 -0.13
log variance -2.52 0.04 -2.61 -2.44
log partial sill -5.98 0.48 -6.93 -5.03
log range -1.12 0.06 -1.24 -1.00
Table 3: Summary of the posterior distributions of the model parameters.
3.3 Optimisation and evaluation of the design
3.3.1 Reef monitoring scenarios
Comparison with Kang et al. (2016)
The mean efficiency of the optimal design found by using methods from Kang et al. (2016)
compared to the optimal design found by using the spatial model described in this paper was
approximately 47%. This means that twice as much sampling is needed usingthe optimal
design found using methods from Kang et al. (2016) when compared to the optimal design
found using the methods proposed in this paper in order to achieve an equivalent amount of
information about trends in coral cover and the impact of disturbances.
It is worth noting that an efficiency of less than 100% is expected here as both designs were
evaluated based on the Beta regression model (i.e. the model assumed when finding our
design). However, of note is the significant reduction in the performance of a design when it is
found assuming a different model is appropriate for coral cover. This suggests that the choice
18
Site number
Bleaching proportion
Cyclone proportion
CoTS proportion
log CoTS mean
log CoTS standard deviation
1 0.12 0.25 0.62 -4.44 2.80 2 0.12 0.25 0.62 -4.60 2.99 3 0.12 0.25 0.62 -4.83 3.27 4 0.12 0.12 0.62 -3.84 2.88 5 0.12 0.12 0.62 -3.82 2.86 6 0.12 0.12 0.62 -3.80 2.85 7 0.12 0.12 0.62 -4.31 2.85 8 0.12 0.12 0.62 -4.29 2.81 9 0.12 0.12 0.62 -4.30 2.83
10 0.12 0.37 0.75 -5.45 3.90 11 0.13 0.38 0.77 -5.28 3.73 12 0.12 0.37 0.75 -5.28 3.73 13 0.12 0.37 0.75 -9.70 1.41 14 0.12 0.37 0.75 -9.62 1.40 15 0.12 0.37 0.75 -9.39 1.40 16 0.12 0.50 0.75 -6.90 2.43 17 0.13 0.49 0.77 -6.65 2.27 18 0.12 0.50 0.75 -6.43 2.12 19 0.12 0.25 0.75 -8.45 1.47 20 0.12 0.25 0.75 -8.14 1.47 21 0.12 0.25 0.75 -7.96 1.47 22 0.12 0.37 0.75 -7.17 2.18 23 0.12 0.37 0.75 -6.86 2.05 24 0.12 0.37 0.75 -6.62 1.96 25 0.12 0.37 0.75 -8.26 2.28 26 0.12 0.37 0.75 -8.26 2.28 27 0.13 0.36 0.77 -8.26 2.28
Table 4: The estimated parameters for the distributions of time-varying disturbance covariates at each site in the Whitsunday region. The second and third columns display the proportions of observing bleaching and cyclone at each site, respectively. The proportions where CoTS impact was recorded at each site are displayed in the fourth column. The last two columns display the means and standard deviations of the Log-normal distributions fitted to the non-zero CoTS data at each site.
of model has significant implications for determining optimal designs, so we provide
justification for why our model is preferred over the linear model of Kang et al. (2016). Support
for our model is justified through evaluating posterior model probabilities and inspecting the
posterior predictive checks, see Appendix C. Further, our model allows observations collected
closer together (in space) to be correlated rather than being treated as independent (as in the
model from Kang et al., 2016). Given the nature of coral cover, such correlation seems more
19
reasonable than assuming independence, and this is supported by statistical measures such as
the posterior model probabilities.
Impact of reduced sampling
Here we evaluated the effect of reduced sampling when compared to the LTMP in the
Whitsunday region by dropping reefs and sites. The results from dropping reefs can be seen in
Figure 3. These results indicate that the design without Hayman Island reef (Figure 3a, Design
choice d5), and the design without both Hayman Island reef and Rebe reef (Figure 3b, Design
choice d2) still retain around 89% and 81% mean efficiencies, respectively. This suggests that
little information is expected to be lost (when compared to the LTMP) if data are not collected
on the Hayman Island and Rebe reefs. Interestingly, some designs remain more than 75%
efficient even after dropping three reefs (i.e. 33% of the sampling effort; Figure 3c).
Figure 3: Efficiencies of designs after dropping (a) one, (b) two, and (c) three reefs in the Whitsunday region of the Great Barrier Reef. Design choices represent designs formations after dropping one, two, and three reef/reefs. The black horizontal line is the 75% efficiency level.
Hayman and Lanford-bird reefs are located in inshore habitat and are in close proximity (Figure
4), while the remaining inshore reef, Broder Island is relatively isolated. As our model can
capture the spatial variability, this may be the reason that Hayman reef was identified as the
least informative reef in the Whitsunday region. That is, information about the coral cover of
this reef can be obtained from neighbouring reefs. A similar pattern can be seen in the outer-
shelf habitat where Hyde and Rebe reefs are close to each other. Thus, Rebe was identified as
20
the second least informative reef. Out of interest, we also compared these designs with those
based on the linear model proposed by Kang et al. (2016). It was found that our designs appear
to exploit the spatial dependence in coral cover while those based on the linear model do not,
see Appendix C for further details.
Figure 4: Visualisation of spatial locations of the two least informative reefs (sites) in the Whitsunday region of the Great Barrier Reef. Sites on these two reefs are displayed in red. A small amount of jitter was added for visualisation purposes.
To further evaluate the effect of visiting fewer LTMP sites, we considered the effects of
dropping the least informative site from each reef. The corresponding optimal design retains
following sites (see Table 2 for more details):
2, 3, 4, 5, 8, 9, 11, 12, 13, 14, 16, 18, 19, 20, 22, 23, 26, 27.
This design maintained an approximate mean efficiency of 85% despite retaining only 66.7%
of the original sampling effort. The spatial locations of the retained/dropped sites from each
reef are shown in Figure 5. When considering the optimal design, there can be one or more
contributing factors towards observing one site as less informative compared to the other two
sites on a given reef. These factors may include distance between reefs/sites (spatial effect in
the model), differences in covariate values between reefs/sites, and prior uncertainty about
estimated effects (Table 3), so all of such factors should be considered when determining why
certain reefs/sites were not selected within the optimal design.
21
Figure 5: Spatial locations of the reefs/sites in the Whitsunday region of the Great Barrier Reef after dropping the least informative site from each reef. Red triangles denote dropped sites from each reef. A small amount of jitter was added for visualisation purposes.
For Broder Island reef, the optimal design retains sites 2 and 3. As all three sites on this reef
share similar features (Figure 6a and 6b), it is difficult to explain why site 1 was dropped over
the other two sites. However, it may be related to the distance between sites. That is, short
distances imply sites are related, so it may be that more information is obtained from sites that
are further apart. This is similar for sites on the Langford-bird and Hayman Island reefs. In
contrast, from 20104S reef, sites 11 and 12 seem to have a quite dissimilar bathymetries (Figure
6b), and thus, these two sites are retained in the optimal design. For 19138S reef, sites 14 and
15 share similar features (Figure 6b). Therefore, the optimal design drops one of them (sites
15), and retains the two most dissimilar sites. Likewise, sites 16 and 18 are retained from the
Rebe reef due to dissimilarities in their bathymetry and mean temperature values (Figure 6b).
In summary, these results show that sites appear to be retained/dropped depending on
heterogeneities in site-specific features as this would allow the effect of these covariates to be
estimated more precisely.
22
Figure 6: (a) Distributions of time-varying disturbances proportions and (b) distributions of other covariates at each site in the Whitsunday region of the Great Barrier Reef. Reef names corresponding to numbers given here are shown in Table 2.
Impacts of different disturbance conditions
To find designs that vary over time depending on the effects of environmental disturbances,
two scenarios were developed. In Scenario 1, environmental disturbances were simulated to
match the historical data in the Whitsunday region. In this scenario, the mean efficiency of the
LTMP was only 41% when compared to the optimal design. This confirms that the optimal
design provides highly informative data compared to the LTMP when disturbance patterns
similar to historical patterns are observed. To understand how design points were distributed,
spatial locations of the sites in the optimal design are shown with the current LTMP sites
(Figure 7).
To help interpret the results in Figure 7, a dot plot was produced (see Figure 8) which shows
the number of visits to each site under the aforementioned optimal design. The optimal design
23
does not visit all the sites in the Whitsunday region. Instead, the results suggest that collecting
more data from some selected sites provides more informative data. To describe these
differences in the number of visits to different sites, some potential factors can be considered
in habitat, reef, and individual sites levels.
Figure 7: Spatial locations of sites in the optimal design in the Whitsunday region of the Great Barrier Reef when disturbance patterns match historical disturbance patterns in the region. The Whitsunday region is depicted in three parts as Inner- (a), Middle- (b), and Outer-shelf (c) habitats. Frequency refers to the number of visits to a site. A small amount of jitter was added for visualisation purposes.
Within a habitat, there are more visits to the sites on a reef that is far away from the other reefs
in the same habitat (Figure 7). Furthermore, when two reefs are in close proximity, the optimal
design proposes less visits to the sites on either of the reefs. If we turn to the reef scale, Hayman
is the only reef (sites 7, 8, and 9) which is open for fishing in inshore habitat. Thus, there are
more visits to the sites in this reef in order to capture the underlying contrast of this reef
compared to the others. Similarly, Slate reef is the only reef closed to fishing in outer-shelf
habitat, and the optimal design collects more data from the sites on this reef. At the site scale,
sites 11 and 27 are the most diverse sites (in terms of covariates) in the Whitsunday region
(Figure 6a and 6b). To capture this dissimilarity, our optimal design visits these two sites more
often compared to the other sites in the region (Figure 8). Overall, the optimal design collects
more data from reefs/sites that are quite dissimilar from others.
24
Figure 8: Sites in the optimal design and the number of visits to each site in the Whitsunday region of the Great Barrier Reef when disturbance patterns match historical disturbance patterns in the region.
In Scenario 2, we determined optimal designs subject to CoTS disturbance under four sampling
schemes (i.e. one site from each reef affected, all the sites in inshore reefs affected, all the sites
in middle-shelf reefs affected, and all the sites in outer-shelf reefs affected) as described in
Section 2.3.2. The mean efficiencies of the LTMP with respect to the optimal designs were
47%, 48%, 50%, and 51% for these four schemes, respectively. Figures 9 and 10 visualise the
sites in the optimal designs in the Whitsunday region under these four schemes. In Figure 11,
dot plots show the number of visits to CoTS affected/unaffected sites under these four schemes.
It is evidence from these figures that the optimal designs do not visit all the CoTS affected
sites.
Under scheme (i), in inner-shelf reefs, the optimal design does not visit one of the CoTS
affected site (site 5) (Figure 9a). This site is located on Langford-bird reef, and the sites on this
reef have similar features (Figure 6a and 6b) except for the CoTS affected proportion. As CoTS
is not a significant covariate in the model (under 95% credible level) used for design selection
(Table 3), the contrast of site-specific features of site 5 against other sites might not be
substantial enough for it to be selected in the optimal design. Further, the optimal design has
the highest number of visits to a CoTS affected site (site 21 on 9131S reef), which is in middle-
shelf (Figure 9b). As this reef is close to 19138S reef, the optimal design visits neither the
affected site nor any other sites on 19138S reef. It is interesting to note that the optimal design
visits only the CoTS affected site on 20104S reef, which is located further away from the
remaining two reefs in the middle-shelf. In the outer-shelf, the optimal design does not visit
the CoTS affected site on Hyde reef (Figure 9c). One explanation for this may be that the
optimal design has collected more data from the CoTS affected site on nearby Rebe reef.
25
Figure 9: Spatial locations of sites in the optimal designs in the Whitsunday region of the Great Barrier Reef under CoTS disturbance for one selected site on each reef ((a), (b), and (c)) and for all inshore-shelf sites ((d), (e), and (f)). In each panel, the Whitsunday region is divided into three parts based on habitat as Inner- (left), Middle- (middle), and Outer-shelf (right). Red dots represent the CoTS affected sites and black dots represent unaffected sites. Frequency represents the number of visits to a site. A small amount of jitter was added for visualisation purposes.
In the optimal design under scheme (ii), where we considered all inshore sites as affected sites,
the optimal design does not visit all the affected sites (Figure 9d). Instead fewer sites are visited
in the affected area compared to the current LTMP design where all the sites would be visited.
A similar pattern of sampled reefs in the middle- (scheme (iii)) and outer-shelf (scheme (iv))
habitats as was found in the inner-shelf (Figures 10 and 11). Overall, these results indicate that
26
the optimal design provides much more informative data compared to the current LTMP design
with reduced resources.
Figure 10: Spatial locations of sites in the optimal designs in the Whitsunday region of the Great Barrier Reef under CoTS disturbance for all middle-shelf sites ((a), (b), and (c)) and for all outer-shelf sites ((d), (e), and (f)). In each panel, the Whitsunday region is divided into three parts based on habitat as Inner- (left), Middle- (middle), and Outer-shelf (right). Red dots represent CoTS affected sites and black dots represent unaffected sites. Frequency represents the number of visits to a site. A small amount of jitter was added for visualisation purposes.
27
Figure 11: Number of visits to CoTS affected and unaffected sites under the four schemes considered in this study. These four schemes include (a) one site from each reef affected, (b) all sites in inshore reefs affected, (c) all sites in middle-shelf reefs affected, and (d) all sites in outer-shelf reefs affected.
4. Discussion
In this paper, we focused on improving the effectiveness of reef monitoring in a Bayesian
experimental design context through reducing monitoring costs or resources and increasing the
information gained for addressing specified monitoring objectives. The present study makes
several contributions with respect to sampling designs for monitoring the GBR and potentially
other reef ecosystems, using an approach that could be applied to ecosystem monitoring more
broadly. First, this paper demonstrates the use of time-varying covariates such as cyclone
impacts, bleaching, and CoTS outbreaks when sampling locations are selected for the coming
year. Second, the model used for design selection has been enhanced through the incorporation
of spatial random effects, which has contributed to a gain of almost twice the amount of
information when compared to designs found using methods from Kang et al. (2016). These
design innovations have the potential to significantly improve the knowledge captured
regarding the ecological dynamics in coral cover, and thus improving the effectiveness of reef
monitoring.
One of the objectives of the current study was to compare the effect of having fewer
LTMP sites in the Whitsunday region, either by removing one reef or removing one site from
each reef. Most notably, removing these reefs and sites did not result in substantial loss of
28
information about coral cover parameters. For example, removing one site from each reef
resulted in the retention of 85% of the information obtained using the fixed LTMP design. Our
other objective was to find designs that could change over time depending on reef condition.
With this approach, the designs found do not visit all LTMP sites, but instead collected more
data from some specific sites. Our results suggest that the level of sampling effort in the LTMP
could be better spent in other areas of the reef. As travel costs make up a significant portion of
monitoring costs (Hill and Wilkinson, 2004), our findings could facilitate reduced monitoring
costs, allowing these resources to be used for in other studies.
There is scope to extend the methods presented here in future research. For example, in this
work, we did not consider the serial correlation of time-varying covariates, or the correlations
that may exist between such variables. The effects of such correlations could be explored in
future studies, and potentially could lead to more informative experimental designs.
Furthermore, while this study assessed the objective of maximising the precision in parameter
estimation, it would be straightforward to extend this approach to evaluate other functions such
as accurate predictions at un-sampled sites. Additional monitoring objectives related to the
LTMP experimental design could also be incorporated through the utility function, and
financial/time constraints could also be imposed. Lastly, previous research has shown that
inferences can change depending on the spatial scale and extent of spatial smoothing that is
considered (Kang et al., 2014, Kang et al., 2013). It would be interesting, therefore, to explore
whether such changes have a significant impact upon the chosen optimal design.
5. Acknowledgements
The corresponding author of this study is supported by the Australian Technology Network of
Universities Industry Doctoral Training Centre (ATN IDTC) Scholarship. Drovandi C and C
Mellin were supported by an Australian Research Council’s Discovery Early Career
Researcher Award funding scheme (DE160100741 and DE140100701). We would like to
thank the Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS) and the
Australian Institute of Marine Science. We are also immensely grateful to Terry Walshe, who
provided expertise that greatly assisted on the initial stages of this study. And finally, thanks to
Samuel Clifford and Alan Pearse for their help and advice regarding LTMP data. Author
contributions: McGree JM, Peterson EE, and Thilan AWLP, designed the research; Thilan
AWLP performed the research and wrote the paper; McGree JM, Menendez P, Peterson EE,
29
Caley MJ, Drovandi C, and Mellin C provided research conception and critical review of
manuscript drafts.
30
7. References
Beaman, R. J. (2017-12-10). High-resolution depth model for the Great Barrier Reef - 30 m.
https://ecat.ga.gov.au/geonetwork/srv/eng/search#!0f4e635c-81ec-46d0-9c99-65e5fe0b8c01.
Beijbom, O., Edmunds, P. J., Roelfsema, C., Smith, J., Kline, D. I., Neal, B. P., Dunlap, M. J., Moriarty, V., Fan, T.-Y. & Tan, C.-J. (2015). Towards automated annotation of benthic survey images: Variability of human experts and operational modes of automation. PloS one, 10(7). https://doi.org/10.1371/journal.pone.0130312.
BOM. (2014). EReefs marine water quality dashboard data product specification. Bureau of Meteorology. http://www.bom.gov.au/environment/activities/mwqd/documents/data-product-specification.pdf.
CERF. (2009). Marine Biodiversity Hub. https://www.nespmarine.edu.au/.
Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 273-304. https://doi.org/10.1214/ss/1177009939.
De'ath, G., & Fabricius, K. (2010). Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef. Ecological Applications, 20(3), 840-850. https://doi.org/10.1890/08-2023.1.
Devlin, M., Schroeder, T., McKinna, L., Brodie, J., Brando, V., & Dekker, A. (2012). Monitoring and mapping of flood plumes in the Great Barrier Reef based on in situ and remote sensing observations. Environmental Remote Sensing and Systems Analysis, 147-191: Taylor and Frances Group – the CRCPress.
Dunn, J. R. (2009). CSIRO Atlas of Regional Seas (CARS) Database. http://www.marine.csiro.au/~dunn/cars2009/.
Ecker, M. D., & Gelfand, A. E. (1997). Bayesian variogram modeling for an isotropic spatial process. Journal of Agricultural, Biological, and Environmental Statistics, 347-369. https://doi.org/10.2307/1400508.
Falk, M. G., McGree, J. M., & Pettitt, A. N. (2014). Sampling designs on stream networks using the pseudo-Bayesian approach. Environmental and ecological statistics, 21(4), 751-773. https://doi.org/10.1007/s10651-014-0279-2.
Faraway, J. J., Wang, X., & Ryan, Y. Y. (2018). Bayesian Regression Modeling with INLA: Chapman and Hall/CRC. https://doi.org/10.1201/9781351165761.
Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501.
Ford, E. B. (2008). Adaptive scheduling algorithms for planet searches. The Astronomical Journal, 135(3), 1008. https://doi.org/10.1088/0004-6256/135/3/1008.
31
GBRMPA. (2014). Great Barrier Reef (GBR) Features (Reef boundaries, QLD Mainland, Islands, Cays, Rocks, and Dry Reefs) shapefile. Great Barrier Reef Marine Park Authority GeoPortal. https://eatlas.org.au/data/uuid/ac8e8e4f-fc0e-4a01-9c3d-f27e4a8fac3c.
Hill, J., & Wilkinson, C. (2004). Methods for ecological monitoring of coral reefs. Australian Institute of Marine Science, Townsville, 117.
Hoegh-Guldberg, O., Mumby, P. J., Hooten, A. J., Steneck, R. S., Greenfield, P., Gomez, E., Harvell, C. D., Sale, P. F., Edwards, A. J., Caldeira, K., Knowlton, N., Eakin, C. M., Iglesias-Prieto, R., Muthiga, N., Bradbury, R. H., Dubi, A. & Hatziolos, M. E. (2007). Coral reefs under rapid climate change and ocean acidification. Science, 318, 1737-42. https://doi.org/10.1126/science.1152509.
Hughes, T. P., Baird, A. H., Bellwood, D. R., Card, M., Connolly, S. R., Folke, C., Grosberg, R., Hoegh-Guldberg, O., Jackson, J. B. & Kleypas, J. (2003). Climate change, human impacts, and the resilience of coral reefs. science, 301, 929-933. https://doi.org/10.1126/science.1085046.
Jackson, J. B., Kirby, M. X., Berger, W. H., Bjorndal, K. A., Botsford, L. W., Bourque, B. J., Bradbury, R. H., Cooke, R., Erlandson, J. & Estes, J. A. (2001). Historical overfishing and the recent collapse of coastal ecosystems. science, 293, 629-637. https://doi.org/10.1126/science.1059199.
Jonker, M. M., Johns, K. K., & Osborne, K. K. (2008). Surveys of benthic reef communities using underwater digital photography and counts of juvenile corals. In Long-Term Monitoring of the Great Barrier Reef Standard Operational Procedure Number 10; Australian Institute of Marine Science: Townsville, Australia. https://www.aims.gov.au/docs/research/monitoring/reef/sops.html.
Kang, S. Y., McGree, J., Baade, P., & Mengersen, K. (2014). An investigation of the impact of various geographical scales for the specification of spatial dependence. Journal of Applied Statistics, 41(11), 2515-2538. https://doi.org/10.1080/02664763.2014.920779.
Kang, S. Y., McGree, J., & Mengersen, K. (2013). The impact of spatial scales and spatial smoothing on the outcome of bayesian spatial model. PLoS One, 8(10). https://doi.org/10.1371/journal.pone.0075957.
Kang, S. Y., McGree, J. M., Drovandi, C. C., Caley, M. J., & Mengersen, K. L. (2016). Bayesian adaptive design: improving the effectiveness of monitoring of the Great Barrier Reef. Ecol Appl, 26(8), 2635-2646. https://doi.org/10.1002/eap.1409.
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86.
Lagos-Alvarez, B. M., Fustos-Toribio, R., Figueroa-Zúñiga, J., & Mateu, J. (2017). Geostatistical mixed beta regression: a Bayesian approach. Stochastic Environmental Research and Risk Assessment, 31(2), 571-584. https://doi.org/10.1007/s00477-016-1308-5.
32
Long, Q., Scavino, M., Tempone, R., & Wang, S. J. (2013). Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations. Computer Methods in Applied Mechanics and Engineering, 259, 24-39. https://doi.org/10.1016/j.cma.2013.02.017.
Loredo, T. J. (2004). Bayesian adaptive exploration. In AIP Conference Proceedings (Vol. 707, 330-346). https://doi.org/10.1063/1.1751377.
Loredo, T. J., Berger, J. O., Chernoff, D. F., Clyde, M. A., & Liu, B. (2012). Bayesian methods for analysis and adaptive scheduling of exoplanet observations. Statistical Methodology, 9(1-2), 101-114. https://doi.org/10.1016/j.stamet.2011.07.005.
MacKay, D. C. J. (2003). Information theory, inference and learning algorithms: Cambridge university press.
Matthews, S. A., Mellin, C., Macneil, A., Heron, S. F., Skirving, W., Puotinen, M., Devlin, M. J. & Pratchett, M. (2019). High‐resolution characterization of the abiotic environment and disturbance regimes on the Great Barrier Reef, 1985–2017. Ecology. https://doi.org/10.1002/ecy.2574.
McGree, J. M., Drovandi, C. C., Thompson, M., Eccleston, J., Duffull, S., Mengersen, K., Pettitt, A. N. & Goggin, T. (2012). Adaptive Bayesian compound designs for dose finding studies. Journal of Statistical Planning and Inference, 142, 1480-1492. https://doi.org/10.1016/j.jspi.2011.12.029.
McGree, J. M., Drovandi, C. C., White, G., & Pettitt, A. N. (2016). A pseudo-marginal sequential Monte Carlo algorithm for random effects models in Bayesian sequential design. Statistics and Computing, 26(5), 1121-1136. https://doi.org/10.1007/s11222-015-9596-z.
Meyer, R. K., & Nachtsheim, C. J. (1995). The coordinate-exchange algorithm for constructing exact optimal experimental designs. Technometrics, 37(1), 60-69. https://doi.org/10.2307/1269153.
Miller, I., Jonker, M., & Coleman, G. (2003). Crown-of-thorns starfish and coral surveys using the manta tow and SCUBA search techniques: Australian Institute of Marine Science Townsville, Australia.
Morgan, C. C., Huyck, S., Jenkins, M., Chen, L., Bedding, A., Coffey, C. S., Gaydos, B. & Wathen, J. K. (2014). Adaptive design: results of 2012 survey on perception and use. Therapeutic Innovation & Regulatory Science, 48, 473-481. https://doi.org/10.1177%2F2168479014522468.
Osborne, K., Dolman, A. M., Burgess, S. C., & Johns, K. A. (2011). Disturbance and the dynamics of coral cover on the Great Barrier Reef (1995–2009). PloS one, 6(3). https://doi.org/10.1371/journal.pone.0017516.
Overstall, A. M., McGree, J. M., & Drovandi, C. C. (2018). An approach for finding fully Bayesian optimal designs using normal-based approximations to loss functions.
33
Statistics and Computing, 28(2), 343-358. https://doi.org/10.1007/s11222-017-9734-x.
Pandolfi, J. M., Jackson, J. B., Baron, N., Bradbury, R. H., Guzman, H. M., Hughes, T. P., Kappel, C., Micheli, F., Ogden, J. C., Possingham, H. P., & Sala E. (2005). Are US coral reefs on the slippery slope to slime? Science, 307(5716), 1725-6. https://doi.org/10.1126/science.1104258.
Puotinen, M., Maynard, J. A., Beeden, R., Radford, B., & Williams, G. J. (2016). A robust operational model for predicting where tropical cyclone waves damage coral reefs. Sci Rep, 6(6). https://doi.org/10.1038/srep26009.
Ryan, E. G., Drovandi, C. C., McGree, J. M., & Pettitt, A. N. (2016). A Review of Modern Computational Algorithms for Bayesian Optimal Design. International Statistical Review, 84(1), 128-154. https://doi.org/10.1111/insr.12107.
Ryan, K. J. (2003). Estimating expected information gains for experimental designs with application to the random fatigue-limit model. Journal of Computational and Graphical Statistics, 12(3), 585-603. https://doi.org/10.1198/1061860032012.
Sweatman H., Cheal A., Coleman G., Emslie M., Johns K., Jonker M., Miller I., & Osborne K. (2008). In Long-term monitoring of the Great Barrier Reef. Statut report number 8 (Vol. 369 pp): Australian Institute of Marine Science Townsville, Australia. https://eatlas.org.au/content/long-term-monitoring-great-barrier-reef-status-report-no-8-aims-ltmp.
Sweatman, H., Delean, S., & Syms, C. (2011). Assessing loss of coral cover on Australia's Great Barrier Reef over two decades, with implications for longer-term trends. Coral Reefs, 30(2), 521-531. https://doi.org/10.1007/s00338-010-0715-1.
Vercelloni, J., Caley, M. J., & Mengersen, K. (2017). Crown-of-thorns starfish undermine the resilience of coral populations on the Great Barrier Reef. Global Ecology and Biogeography, 26(7), 846-853. https://doi.org/10.1111/geb.12590.
Weir, C. J., Spiegelhalter, D. J., & Grieve, A. P. (2007). Flexible design and efficient implementation of adaptive dose-finding studies. Journal of Biopharmaceutical Statistics, 17(6), 1033-1050. https://doi.org/10.1080/10543400701643947.
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. http://dx.doi.org/10.18637/jss.v027.i08.
34
Appendix A - Spatial and Temporal Distributions of coral cover and covariates considered in the model
It is apparent from Figure A.1 that coral cover was moderate to low on the surveyed reefs.
Cyclones occurred in some regions of the GBR during the years 2009, 2011, 2013, 2014, and
2015. Coral bleaching is visible over the GBR only during 2002 (Figure A.3). CoTS outbreaks
were only present only few reefs during the sampled years. The remaining figures in Appendix
A represent the spatial and temporal distribution of site-specific covariates.
35
Figure A.1: Spatial and temporal distribution of coral cover at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
36
Figure A.2: Spatial and temporal distribution of cyclone impacts at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef. Cyclone impacts have been aggregated to two levels in order to overcome the limitation of not having enough data in each level to estimate effect sizes with reasonable precision. Zero represents no cyclone effects and one represents some cyclone effects.
37
Figure A.3: Spatial and temporal distribution of coral bleaching at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef. Bleaching data have been aggregated to two levels in order to overcome the limitation of not having enough data in each level to estimate effect sizes with reasonable precision. Zero represents no coral bleaching and one represents 1% or more coral bleached.
38
Figure A.4: Spatial and temporal distribution of Crown of Thorns Starfish impacts at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
39
Figure A.5: Spatial and temporal distribution of reefs open and closed to fishing at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
40
Figure A.6: Bathymetry of sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
41
Figure A.7: Spatial and temporal distribution of mean temperature at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
42
Figure A.8: Spatial and temporal distribution of long-term mean Chlorophyll A concentration (µg/m³) in the water column at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.
43
Appendix B – Further details about statistical methods
This section describes the estimation of the posterior distribution, posterior model probabilities, and utility approximation as implemented in this paper.
Posterior estimation
We chose a weakly informative multivariate Normal distribution for the parameter θ with
means (0, …, 0, -2, -2, 0) and a diagonal variance-covariance matrix with diagonal values
(100,…,100, 1, 1, 2). The corresponding components in each vector represent means and
variances of each regression coefficient, log partial sill, log range, and log variance,
respectively. Given a design with response data , the posterior distribution can be defined
as | , , ∝ | , , where | , , is the likelihood function. As random
effects are included in the model, the likelihood is expressed as follows:
| , , | , , , | , , (B.1)
where | , , , is the likelihood conditional on the random effects and | , is the
distribution of the random effects. In general, the above integral cannot be solved analytically,
so we used Monte Carlo methods (McGree et al., 2016) to approximate the likelihood as
follows:
| , ,1
| , , , , (B.2)
where ~ | , . This approximation can be used to approximate the posterior
distribution via the Laplace approximation as follows:
| , , MVN | ∗, ∗ , (B.3)
where ∗ denotes the mode of the posterior distribution | , , and ∗ ∗ denotes the variance covariance matrix being the inverse of the negative
Hessian matrix evaluated at ∗.
44
Posterior model probabilities
Let 1, . . , index the models considered for the coral cover data. The parameter of th
model includes the regression coefficients for covariates included in the th model, log of
variance (i.e. log of the reciprocal of the precision), and the log of covariance parameters (i.e.
partial sill and range). The posterior distribution of can be defined as
| , , ,| , , , |
| , ,,
(B.4)
where
| , , | , , , | , (B.5)
is called the marginal likelihood or the model evidence for model , | , , , is the
likelihood of model and | is the prior distribution of the model . The posterior
model probability is then given by (MacKay, 2003):
| , ,| , ,
| , (B.6)
where is the prior model probability and the term in the denominator is given by
| , | , , . (B.7)
The Laplace approximation can be used to form an approximation to | , , as follows
| , , 2 det ∗ | ∗ , , , ∗ | ,
(B.8)
where is the dimension of the parameter vector of the th model, ∗ is the mode of the
posterior density and ∗ ∗ (Overstall et al., 2018, Ryan, 2003). Thus, posterior
45
model probabilities can be estimated by substituting this approximation into Equation (B.6).
We denote the mean and variance-covariance matrix for the multivariate Normal posterior
distribution for the preferred model as ∗ and ∗ , respectively. It is this posterior
distribution that is considered as the prior distribution for design selection.
Utility approximation
We adopted the KLD utility and it can be expressed as follows:
, | , log | , log | . (B.9)
This utility can be extended to incorporate time-varying covariates as follows:
, , | , , log | , , log | , .
(B.10)
In order to evaluate the approximation to the expected utility (see Equation (7)), posterior
distributions need to be approximated. We approximate these posterior distributions using
Laplace approximation. As these will be multivariate Normal distributions, we denote these
with means ∗ and covariance matrices ∗ . Then, the KLD between the prior and posterior
distribution can be evaluated analytically as follows:
, , tr ∗ ∗ ∗ ∗ ∗ ∗
∗ ln∗
∗ .
(B.11)
46
Appendix C Model comparison To compare the linear model from Kang et al. (2016) to our proposed model, we first compared
the 95% posterior predictive checks of both models (Figure C1 and Figure C2). As can be seen,
both models appear to capture the average behaviour of coral cover. However, there are
differences when comparing the variability. The linear model yields intervals that contain all
of the data while for the Beta regression model 4% of the data falls outside these intervals. This
suggests that the Beta regression model is preferred over the linear model as we expect 5% of
the data to lie outside these intervals. Second, we evaluated the posterior model probabilities
of our model and the linear model in Kang et al. (2016). This yielded a posterior model
probability of approximately one for the Beta regression model, providing strong evidence that
it is preferred over the linear model.
Figure C1: Scatter plot of arcsine square root transformed coral cover proportions versus coded years with posterior median (black) and 95% posterior predictive interval (red) when using the model from Kang et al. (2016).
47
Figure C2: Scatter plot of coral cover proportions versus coded years with posterior median (black) and 95% posterior predictive interval (red) when using the spatial model. Qualitative comparison of the optimal designs In this section, we compare the designs found under our Beta regression model and the linear
model proposed by Kang et al. (2016). One of the main extensions of our model is the inclusion
of the spatial random effect that accounts for the fact that observations collected in space may
not be independent. To explore this, we can inspect the estimated range parameter reported in
Table 3. On a standardized distance scale, the estimated range is 0.33. After un-standardizing
this value, the estimated range is approximately equal to 12.76km. This implies that coral cover
at reefs/sites separated by distances less than 12.76km are spatially correlated, whereas
reefs/sites farther than 12.76km are not. With this in mind, we compare designs selected based
on the linear and Beta regression models. As can be seen, based on our model, Hayman reef
is selected as one of the least informative reefs (Figure C3). This reef is within 12.76kms of
Langford-bird reef implying that information about Hayman reef can be obtained by sampling
at this reef. In contrast, Broder Island was selected as the least informative reef based on the
linear model, which is relatively isolated in space. Thus, the designs found based on our
modelling approach appear to leverage the information that can be obtained due to significant
spatial variability in coral cover leading to the exclusion of reefs within a close vicinity to
others.
48
Figure C3: Visualisation of spatial locations of the two least informative reefs (sites) in the
Whitsunday region of the Great Barrier Reef. Reefs (sites) removed based on the Beta
regression model are shown in red while those based on the linear regression model are circled
in green. A small amount of jitter was added for visualisation purposes.