Bayesian design methods for improving the effectiveness of ...1 Bayesian design methods for...

1

Bayesian design methods for improving the effectiveness of monitoring coral reefs

Thilan AWLPa,b,g*, Peterson EEa,b,f, Menendez Pc,e, Caley MJa,b, Drovandi Ca,b, Mellin Cc,d, McGree

JMa,b aSchool of Mathematical Sciences, Science and Engineering Faculty, Queensland University of

Technology, Brisbane, Queensland, Australia bAustralian Research Council Centre of Excellence for Mathematical and Statistical Frontiers

(ACEMS), Australia cAustralian Institute of Marine Sciences, Townsville,

Queensland, Australia d The Environment Institute and School of Biological Sciences, University of Adelaide, Adelaide,

South Australia 5005, Australia eSchool of Mathematics and Physics, Brisbane, Australia

fInstitute for Future Environments, Queensland University of Technology, Brisbane, Australia gDepartment of Mathematics, University of Ruhuna, Sri Lanka

*Corresponding Author: Email: [email protected]; Tel:+61(0)410372540;

Fax: +61 7 3138 2310;

Postal address: School of Mathematical Sciences, Science and Engineering Faculty,

Queensland University of Technology, 2 George Street, Brisbane, QLD 4000.

ABSTRACT

Survey design underpins our ability to successfully monitor and manage the environment. There are

two basic design types; static designs, which remain fixed over time, and adaptive designs, which can

change over time. An advantage of adaptive designs is that changes can be made as more is learned

about the system, ensuring that informative data are collected in an on-going manner. Here, we propose

a model-based adaptive design approach that incorporates spatial and disturbance information when

monitoring large-scale environmental systems. We apply this new approach to derive sampling designs

for monitoring coral reef systems within Australia’s Great Barrier Reef, and show that these adaptive

designs can provide twice the amount of information as designs found using previously proposed

methods from the literature. As such, we suggest our new methods can be used to enhance the

effectiveness and efficiency of environmental monitoring initiatives.

Key words: Adaptive design; Coral bleaching; Coral cover; Cyclone impacts; Great Barrier Reef.

2

1. Introduction

The health and the long-term resilience of coral reefs around the world are at risk due to rising

environmental and human impacts (Hoegh-Guldberg et al., 2007, Hughes et al., 2003, Jackson et

al., 2001). The Great Barrier Reef (GBR) is currently one of the best managed and monitored

natural wonders of the world with a view to safeguarding its health from anthropogenic

disturbances (Pandolfi et al., 2005). However, environmental pressures such as climate change

resulting in coral bleaching, crown of thorns starfish (CoTS) outbreaks, and cyclones can

compromise the health of the GBR (Hoegh-Guldberg et al., 2007, Sweatman et al., 2011,

Vercelloni et al., 2017). By effectively monitoring such ecological systems, it should be possible

to identify their vulnerabilities and the potential causes to inform the development of management

practices and/or policies to reduce the impact of disturbances and foster more resilient ecosystems.

The Australian Institute of Marine Science (AIMS) has been monitoring coral reefs in the GBR

since 1983 through the Long-term Monitoring Program (LTMP). The LTMP collects data that are

used to infer reef health and condition (Sweatman et al., 2008). Samples are collected from benthic

communities on selected reefs which are representative of the benthic communities in each of the

GBR regions (Jonker et al., 2008). The LTMP is based on a static design in that data are gathered

from predetermined reefs, and sites within reefs that do not change over time (De'ath and

Fabricius, 2010, Sweatman et al., 2011). As such, the LTMP does not incorporate knowledge

gained from previous years of data collection, nor does it allow for disturbance data to be included

when selecting reefs for future surveys (Miller et al., 2003). Thus, there is the potential to enhance

these current monitoring practices using an adaptive sampling regime, which provides a way to

incorporate new information when selecting reefs and/or sites to collect data.

The past twenty years have seen the rapid development of adaptive design methods particularly

in the field of clinical trials (McGree et al., 2012, Weir et al., 2007), and to a lesser extent in

astrophysics (Ford, 2008, Loredo, 2004, Loredo et al., 2012) and environmental monitoring (Falk

et al., 2014). To our knowledge, Kang et al. (2016) was the first study to introduce adaptive design

for improving the effectiveness of monitoring in the GBR. Adaptive design methods were

proposed that allow information accumulated over time to inform where and when samples should

be collected on the GBR in an ongoing manner (Morgan et al., 2014). Kang et al. (2016) treated

the problem of finding adaptive designs over time as an optimisation problem, and found designs

to lower experimental costs and increase the information gained from the collected data. The

3

authors described monitoring objectives through a utility function that characterized the expected

worth of the monitoring data obtained given a particular design (Chaloner and Verdinelli, 1995).

For illustration, sampling on the Cook-Lizard region of the GBR was considered, and they

demonstrated value in being able to adapt sampling over time. However, one limitation of this

work was that the adaptive designs were derived based on an overly simplified model for coral

cover that did not capture spatial variability or temporal disturbance information. As such, the

adaptive designs found by the authors may be sub-optimal as it is possible that important

components that affect the health of coral reef systems were not considered.

In this paper, we propose a design framework to incorporate spatial variability when modelling

coral cover and the effect of time-varying disturbances such as CoTS outbreaks when deriving

adaptive designs for monitoring the GBR. To evaluate our proposed framework, we consider

adaptive design methods for visiting fewer LTMP sites, and assess the impact this has on the

information obtained. Further, we compare our adaptive designs with those derived from recently

proposed methods in the literature (Kang et al., 2016). To conclude, we discuss how adaptive

sampling methods can improve the effectiveness of reef monitoring programs and provide

guidance for where samples should be collected to efficiently gather information about reef health.

2. Material and methods

A Bayesian design framework is proposed in this paper, and it is comprised of three key

components (see Figure 1 which shows a diagram of these three key components and how they

link together). The first component involves quantifying prior information about the ecological

process being monitored (Figure 1a). For this purpose, we fit a statistical model to the LTMP hard

coral cover data (i.e. the proportion of the sea floor occupied by hard coral, without accounting

for three dimensional overlap) that accounts for spatial dependency and important environmental

and disturbance covariates (i.e. predictors). In the second component, this prior information is

exploited to assess the usefulness of a proposed design (Figure 1b). This involves mathematically

defining the monitoring objective via a utility function (Chaloner and Verdinelli, 1995), and

targeting data collection to inform this objective. Finally, we evaluate our proposed methods by

comparing our designs with the LTMP design and those found using recently proposed methods

(Kang et al., 2016) across a variety of future scenarios (Figure 1c). In the next section, we describe

each of these components in more detail.

4

(a) Quantifying prior information

(b) Assessing the usefulness of a design

(c) Optimisation and evaluation of the design

Figure 1: Diagram of the proposed Bayesian adaptive design framework. This consists of three key components: (a) Quantifying prior information, (b) Assessing the usefulness of a design, and (c) Optimisation and evaluation of the design.

2.1 Quantifying prior information

For undertaking adaptive design, we consider a Bayesian inference framework due to the

mathematically rigorous handling of uncertainty and the availability of important utility

functions. Further, a Bayesian framework provides an opportunity to incorporate knowledge

gained from historical data into the formation of the design through a prior distribution. In

Bayesian methods, a prior represents the uncertainty about a quantity/parameter of interest.

Such prior information can be created using a number of methods including model fitting,

expert opinion, and knowledge gained through a literature review. In this study, we fit a model

LTMP data/Covariates

Fit a statistical model

Obtain posterior distribution of model

parameters

Form prior for design

Propose design

Estimate or approximate the expected utility

Optimise design

Reef monitoring scenarios

1. Comparison with Kang et al. (2016)

2. Impact of reduced sampling

3. Impacts of different disturbance conditions

5

to the existing LTMP data to obtain prior information for the Bayesian adaptive design. The

data and model used for this purpose are described next.

2.1.1 LTMP data and design

The LTMP provides a semi-continuous record of change in reef communities over the last three

decades across six regions of the GBR (Townsville, Cairns, Capricorn Bunkers, Whitsunday,

Swain, and Cooktown/Lizard island) (Sweatman et al., 2008). Here, we focus on the

Whitsunday region due to the relatively large amount of data being available (Figure 2) and

the large and diverse range of disturbances that have occurred in this region over time (Osborne

et al., 2011, Vercelloni et al., 2017).

As part of the LTMP sampling design, 5 coral cover observations are collected from each site,

and three sites are sampled on each reef (Jonker et al., 2008). A total of three reefs are sampled

in each of the inner, middle, and outer reef habitats (5 observations × 3 sites × 3 reefs × 3

habitats). In some years, however, surveys could only be partly completed due to bad weather,

resulting in fewer observations. Consequently, the data set used in this study contained a total

of 1077 observations collected over the sampling years of 2002, 2004, 2005, 2007, 2009, 2011,

2013, and 2015.

At each site, the LTMP samples 5 permanent 50x1m2 transects at a depth of 6m and 9m each

separated by at least 10m and parallel to the reef crest. Fifty images are taken from each transect

using video frames (from 2006 onwards) or digital photographs at 1m intervals (prior to 2006).

A site-level coral cover estimate obtained by projecting five points onto each of 40 randomly

selected images (Jonker et al., 2008), which are subsequently classified manually by a marine

scientist (Beijbom et al., 2015).

2.1.2 Covariate data

We considered a number of potential covariates in our statistical model, which represent

physico-chemical conditions, topographic position, and natural and anthropogenic disturbances

known to have a direct or indirect influence on coral cover, see Table 1. Plots of the spatial and

temporal distribution of coral cover and these covariates are provided in Appendix A.

6

Figure 2: Survey sites for the Long-Term Monitoring Program in the Whitsunday region, one of six regions of the Great Barrier Reef. The Whitsunday region is divided into three shelf-positions: inner- (Hayman, Langford-bird, and Broder Island), middle- (19131S, 19138S, and 20104S), and outer-shelf (Slate, Hyde, and Rebe) reefs. Survey sites are represented by red dots. A small amount of jitter was added to the locations for visualisation purposes.

Covariates Description Source Spatial Resolution

Temporal Resolution

Time Sampling years NA NA 2002-2015

Cyclone exposure

The number of hours each grid cell was exposed to potentially damaging seas:

0 = No cyclone effects, 1 = Some cyclone effects

Puotinen et al. (2016)

0.01° 2002-2015

7

Covariates Description Source Spatial Resolution

Temporal Resolution

Bleaching exposure

0= No coral bleaching,

1 1% coral bleached

Matthews et al. (2019)

0.01°

2002

CoTS Mean A.solaris densities Matthews et al. (2019)

0.01° 2002-2015

Shelf position

Position of reefs on the continental shelf; 1= inshore/inner shelf; 2 = middle shelf; 3 = outer shelf

GBRMPA (2014)

0.005° Great Barrier Reef Zoning Plan 2003

Bathymetry

Depth below sea level

(meters)

Beaman (2017)

0.0003°

2017

Opened reef

Protected areas where no fishing is allowed. 1 = no-take, 0 = otherwise

GBRMPA (2014)

0.005°

Great Barrier Reef Zoning

Plan 2003

Sea surface temperature anomaly (SSTA)

Difference between measured Sea Surface Temperature (SST) and the monthly long-term mean SST (°C)

BOM (2014)

0.01°

The monthly long-term mean SST for 2002-2015

Light attenuation

Attenuation coefficient (between 0 and 1): The rate of decrease light penetrating the water column with depth

CERF (2009)

0.01° 1997-2009

Chlorophyll Long term mean concentration (µg/m³) of chlorophyll A pigments in the water column

CERF (2009)

0.01° 1997-2009

CRS_T_AV Temperature (mean ºC) at the sea surface

Dunn (2009)

0.01° 1960-2006

Primary Primary flood plume frequency (weeks occurred/total weeks) during wet season (max = 26)

Delvin et al. (2012)

0.01° 2007-2013

Secondary Secondary flood plume: representing chlorophyll dominated plume


0.01° 2007-2013

Tertiary Representing further extent of plume, as delineated by salinity less than 34ppt


0.01° 2007-2013

Table 1. Summary of the potential covariates considered in the coral cover model. The spatial resolution is recorded in decimal degrees.

8

2.1.3 Statistical model for coral cover

We fit a spatial Beta regression model to the LTMP coral cover data as such a model can be

considered for bounded data (i.e. proportions) and can accommodate a variety of distributional

forms including symmetric and skewed distributions (Figure 1a, Fit a statistical model). The

model was parameterized in terms of a mean and a precision parameter, with a probability

density function defined as follows:

| ,1

1 , 0 1, (1)

where represents coral cover, ∙ denotes the gamma function, , and ( 0) is

the precision parameter. Accordingly, we assume that ~Beta , where denotes

the th datum, from the th site, in the th sampling year, where 1,… , ,

1, … ,5and 1, … ,8. To account for potential relationships between coral cover and

covariates (Table 1) and for spatial dependence (i.e. autocorrelation through space), the

following regression structure was assumed for mean coral cover :

Time , (2)

where ∙ is a logit link function (Lagos-Alvarez et al., 2017), is the intercept, is the

matrix of static site-specific covariates (e.g. Inner-, Middle-, and Outer-shelf, Chlorophyll, and

CRS_T_AV), is the vector of regression coefficients for the site-specific covariates, is

the matrix of time-varying covariates (e.g. CoTS, Bleaching, and Cyclone), is the vector of

regression coefficients for time-varying covariates, and is the regression coefficient for

Time. The precision parameter was assumed unknown and common across the Whitsunday

(Ferrari and Cribari-Neto, 2004). In order to capture the spatial variability in coral cover, we

included a spatially correlated random effect, , in the model. We assumed that follows a

multivariate Normal distribution, | ~MVN(0, ),where is based on a Gaussian

covariance function (Ecker and Gelfand, 1997):

= exp , , 1, … , , (3)

9

where is the distance between sites and , ( 0 is the variance of the spatial

process (i.e. the partial sill) and 0 is the range parameter.

Within a Bayesian framework, we are interested in the posterior distribution of the parameters

defined as | , , ∝ | , , | , , where is the sampling design (i.e.

static site-specific covariate values), represents time-varying covariates, | , is the

distribution of the time-varying covariates depending on parameter (see Section 2.2.2),

| , , is the likelihood function and is the prior distribution of .

2.1.4 Obtaining the posterior distribution of model parameters

To undertake a Bayesian analysis, the prior distribution needs to be defined. We chose a weakly

informative, multivariate Normal prior for , which includes the regression coefficients,

log of variance (i.e. log of the reciprocal of the precision), and the log of the covariance

parameters (i.e. partial sill and range). Approximating the posterior distribution for a model

like the one defined above can be computationally expensive, particularly when covariate

selection needs to be undertaken. Therefore, we approximated the posterior distribution using

Laplace-based methods, via a Monte Carlo approximation to the (full data) likelihood (Faraway

et al., 2018, Long et al., 2013, McGree et al., 2016, Overstall et al., 2018). Please see Appendix

B for additional.

To determine which covariates should appear in the model, forward stepwise model-selection

was undertaken. Specifically, we started with the null model (intercept only) and then included

covariates (Table 1) one at-a-time to determine which covariates (if any) improved the model

fit as determined by the posterior model probability (MacKay, 2003). This process was

repeated until no further improvement in the model fit was observed. The final model identified

using this procedure was then checked in terms of goodness-of-fit via posterior predictive

checks, which proved to be satisfactory. The posterior distribution of the parameters from the

final model could then be used as the prior information (Figure 1a) to find adaptive designs for

monitoring (Figure 1b), and this is discussed in the next section.

10

2.2 Assessing the usefulness of a design

This section describes the approach used to assess the usefulness of a given design in

addressing specific monitoring objectives. This relates to the second component of our

Bayesian adaptive design framework as shown in Figure 1b.

2.2.1 Propose a design

Within our design framework, a sampling design defines locations for data collection. As

shown in Equation (2), this constitutes defining the (static) site-specific covariates used in

modelling coral cover. Let i denote the site-specific covariates for the th site, then the LTMP

design can be defined as = ( 1, 2, …, i, ..., ). The other covariates that appear in Equation

(2) can vary through time. Accordingly, we will optimise these covariates but rather optimise

the design over the distribution of these covariates. This is discussed below.

In the context of Bayesian experimental design, a utility function , , is used to quantify

the worth of observing data from design in terms of achieving a specific monitoring

objective (e.g. estimate trends or the impact of disturbances). As the notation indicates, the

utility function , , depends on and , however, these are unknown a priori. Thus, this

uncertainty needs to be integrated out to form an expected utility function before it can

be used in Bayesian design. Such an expected utility can be defined as follows:

, , | , , (4)

where the optimal design is defined as the design that maximises the above expected utility

function.

As mentioned above, in natural ecosystems such as coral reefs, there are additional

uncertainties associated with time-varying covariates (e.g. where and when disturbances will

occur). To account for this, the expectation in (4) is also taken with respect to the distribution

of time-varying covariates as follows:

, , , | , , | , . (5)

11

To capture the uncertainty about the time-varying covariates, an assumption must be made

about the distribution of the as yet unobserved time-varying covariates; in this case, that they

follow a distribution | , , see Section 2.2.2. Thus, the above expected utility is not

evaluated based on specific values of these time-varying covariates, but rather evaluated across

their distribution.

In order to precisely estimate trends and the impact of disturbances, we adopted a parameter

estimation utility function called the Kullback-Leibler divergence (KLD) between the prior and

posterior distribution (Kullback and Leibler, 1951), which is defined as follows:

, , | , , log | , , log | , , (6)

where | , | , , is the marginal likelihood. This utility does not

depend on because its integrated out, and so it will be denoted as , , . Thus, we seek

a design that maximizes Equation (5) where the utility is given in Equation (6).

2.2.2 Estimate or approximate the expected utility function

In general, the expectation defined by Equation (5) does not have a closed form solution, and

therefore needs to be approximated. One common approach is to use Monte Carlo integration

as follows (Ryan, 2003):

1, , , . (7)

This approach to approximate the expected utility of a given design is outlined in Algorithm

1. In Equation (7), is the controlling parameter for the Monte Carlo approximation and is

typically large (i.e. 500), and ~ , ~ | , , ~ , ,

(Algorithm 1, lines 2-5). As our utility function (defined in Equation (6)) is a function of the

posterior distribution, posterior distributions need to be approximated or sampled from in

order to approximate the expected utility. Further, this evaluation needs to be undertaken for

each proposed design, which imposes significant computational demands (Ryan, et. al.,

2016). Thus, for computational efficiency, we again adopt the Laplace approximation within

12

the Monte Carlo approximation to the expected utility (Algorithm 1, line 6) (Faraway et al.,

2018, Long et al., 2013, McGree et al., 2016, Overstall et al., 2018).

Algorithm 1. Implementing the Bayesian adaptive sampling scheme.

Algorithm: Approximating expected utility functions

1. Initialise 2. For 1 to do 3. Simulate ~ 4. Simulate ~ | , 5. Simulate ~ , , 6. Estimate | , , via Laplace approximation 7. Evaluate KLD utility , , 8. Store , , 9. End For

10. Output ∑ , ,

To evaluate the above approximation to the expected utility function, time-varying covariates

( ) need to be simulated (line 4). Thus, distributions | , for these are needed. In order to

find such distributions, the existing LTMP data were analysed. For categorical covariates (i.e.

bleaching and cyclone impacts; Table 1), the proportion of observed occurrences of each

disturbance were estimated for each site, and the outcome (disturbance or not) was assumed to

follow a Bernoulli distribution. In contrast, CoTS density is a continuous covariate with many

zeros (i.e. no observation of CoTS). To develop a distribution for such data, we first determined

the proportion of sites where no observations of CoTS were recorded (Zeileis et al., 2008), and

the outcome (CoTS density zero or not) was assumed to follow a Bernoulli distribution. To

obtain the distribution of non-zero CoTS data (i.e. the mean CoTS densities), a Log-normal

distribution was estimated. Then, to simulate CoTS data, we first generated a random number

1, … , , between 0 and 1, and if (i.e. proportion of CoTS=0 at the

site), we set CoTS = 0, otherwise we generated CoTS data from the fitted Log-normal

distribution. These distributions were then used to simulate time-varying covariates to

approximate the expected utility as shown in Algorithm 1.

2.3 Optimisation and evaluation of the design

This section describes the third component of our Bayesian adaptive design framework. Given

we are now able to approximate the expected utility of a given design, the next step is to

13

optimise this expected utility through the choice of the design. The procedure used for this

optimisation is described next, along with a number of approaches to evaluate the subsequently

found designs.

2.3.1 Optimise the design In the examples that follow, we will optimise designs within reef monitoring scenarios where

there are a number of sites to choose from. Thus, there will be a large but fixed number of

potentially optimal designs. Enumerating all possible designs would be computationally

infeasible, so we employ an optimisation algorithm. For searching within a fixed number of

sites (i.e. a discrete design space), the coordinate-exchange algorithm (Meyer and Nachtsheim,

1995) can be used. This algorithm begins with a random design (i.e. a random selection of

sites), which is then optimised, one site at-a-time. In practice, this means holding all but one

site fixed, and then iteratively substituting each alternative site for the one unfixed site and

calculating the expected utility of the design. The included site that maximizes the expected

utility is then selected for inclusion into the design. This process is then repeated for all sites

in the design. As optimal choices for each dimension may change depending on what other

sites have been selected, the algorithm iteratively cycles through the whole design a fixed

number of times (i.e. maximum number of iterations) or until no further improvement is

observed in the expected utility.

2.3.2 Reef monitoring scenarios In order to evaluate the optimal designs, we firstly consider future disturbance patterns that are

consistent with historical patterns, and find optimal designs using our approach and the

approach from Kang et al. (2016). Secondly, we explore the performance of our designs in

comparison to the LTMP design. This comparison is undertaken with respect to reduced

sampling scenarios and a variety of different future disturbance patterns.

Comparison with Kang et al. (2016) designs

We compared our designs to those found by using the methods proposed by Kang et al. (2016).

To find adaptive designs using the approach of Kang et al. (2016), we used their proposed

linear model (with no spatial effects) within our Bayesian adaptive design framework. The

resulting designs were then evaluated with respect to our Beta regression model with spatial

14

random effects (Eq. 10). Given evaluation of the expected utility is stochastic, for each design,

it was evaluated 20 times using independent draws from the prior predictive distribution and

for the time-varying covariates. Then, to quantify the information loss (or gain) when using the

approach of Kang et al. (2016), the design efficiency was evaluated as follows:

∗ , 1,2,⋯ ,20, (8)

where and ∗ are the th evaluations of the expected utilities of the optimal design

under the linear modelling approach from Kang et al. (2016) and the optimal design under our

spatial model, respectively. Then, the average efficiency ( ) was evaluated as the mean of

( 1,2,⋯ ,20. Such an efficiency can be interpreted as the proportion of sampling

required under design to achieve an equivalent amount of information under design ∗. An

average efficiency less than one will suggest that our designs are expected to provide more

information than those based on methods from Kang et al. (2016) and vice versa for an average

efficiency of greater than one.

Impacts of reduced sampling

To further evaluate our proposed design framework, we optimised designs under reduced

sampling scenarios. This will then allow us to determine which reefs/sites could potentially be

dropped from the LTMP, and explored the consequences of doing so. For this purpose, two

approaches were undertaken: 1) dropping the least informative reef from the LTMP design and

2) dropping the least informative site from each reef within the LTMP design. First, to

determine the least informative reef, the approximate expected utility was evaluated for all

combinations of reefs where one reef was omitted. Then, the design that yielded the largest

utility was inspected to determine which reef was missing, and then proposed as the least

informative reef. Second, we similarly investigated the impact of dropping the least informative

site from each reef (see Table 2 for a description of sites). For this latter investigation, the

optimisation of the design was performed by using the coordinate exchange algorithm as

described in Section 2.3.1. Such an optimisation approach is not needed for the first

investigation as there a relatively few designs to choose from, so an exhaustive search was

employed.

15

Reef names Reef numbers Site numbers

19131S 1 19,20,21

19138S 2 13,14,15

20104S 3 10,11,12

Broder Island reef 4 1,2,3

Hayman Island reef 5 7,8,9

Hyde reef 6 22,23,24

Langford-bird reef 7 4,5,6

Rebe reef 8 16,17,18

Slate reef 9 25,26,27

Table 2: The Whitsunday region’s reefs and corresponding site numbers. To compare our optimal designs with the LTMP, design efficiency was again used. However,

as we will be exploring optimal designs under reduced sampling (when compared to the

LTMP), the inverse of the above efficiency was evaluated as follows:

∗

, 1,2,⋯ ,20, (9)

where ∗ and are the thevaluations of expected utilities of the optimal design

∗and the LTMP design , respectively. The interpretation of the resulting average efficiency

is as given above with an average efficiency close to one meaning little information is expected

to be lost by using our reduced sampling designs when compared to the LTMP design.

Impacts of different disturbance conditions

To evaluate the performance of our optimal designs under different disturbance patterns, we

considered two different disturbance scenarios. In the first scenario, we considered disturbance

conditions consistent with historical data in the Whitsunday region (Table 4). In the second

scenario, we created four schemes, where CoTS disturbance conditions varied as follows:

i. One site from each reef affected,

ii. All the sites in inshore reefs affected,

iii. All the sites in middle-shelf reefs affected,

iv. All the sites in outer-shelf reefs affected.

16

Under the scheme (i), we randomly selected one site from each of the nine reefs in the

Whitsunday region, and changed the probability of CoTS disturbance at this site to 1. Under

the schemes (ii), (iii), and (iv), we changed this CoTS disturbance proportion to 1 for each of

the inshore, middle-shelf, and outer-shelf sites, respectively. In order to find the optimal

designs under these scenarios, we followed the procedure described in Section 2.3.1. We

compared our optimal designs against the performance of the LTMP design. To do so, we again

evaluated the design efficiency as given in Equation (8).

3. Results

3.1 Quantifying prior information

The most appropriate coral cover model found based on the procedure outlined in Section 2.1.4

can be described as follows:

logit Middle-shelf Outer-shelf Opened Reef

Bathymetry Chlorophyll CRS_T_AV Cyclone

Bleaching log CoTS Time , 1, … , and 1,… ,8.

(10)

The baseline categories for the shelf position and open/closed to fishing are inshore and open

for fishing respectively, and are incorporated into the intercept. A summary of the posterior

distributions of the parameters for the above model is given in Table 3. The posterior means

and standard deviations are shown with 95% credible intervals. All parameters were significant

except the coefficients for Time, Middle-shelf, and log CoTS. In general, these results are

consistent with what other similar studies have reported (Kang et al., 2016). However, some

variation is expected as we are only focusing on a particular region on the GBR, and we are

fitting a different model.

3.2 Assessing the usefulness of a design

Table 4 shows the estimated parameters for the distributions of time-varying covariates for

each site. These distributions were used to simulate time-varying covariates ( ) for impacts of

bleaching, cyclones, and CoTS (Algorithm 1, line 4). Once time-varying covariates were

simulated, the above coral cover model (Equation (10)) was used to simulate hard coral cover

data (Algorithm 1, line 5).

17

Mean Standard deviation

Lower bound of 95% credible

interval

Upper bound of 95% credible

interval

Intercept -1.27 0.08 -1.43 -1.12

Time -0.04 0.03 -0.10 0.01

Middle-shelf 0.15 0.08 -0.01 0.32

Outer-shelf 0.91 0.21 0.50 1.31

log CoTS -0.01 0.01 -0.02 0.00

Opened reef 0.28 0.09 0.11 0.45

Cyclone -0.45 0.05 -0.55 -0.35

Bleaching -0.22 0.07 -0.35 -0.08

Bathymetry -0.11 0.02 -0.15 -0.06

Chlorophyll -0.80 0.10 -0.99 -0.61

CRS_T_AV -0.23 0.05 -0.33 -0.13

log variance -2.52 0.04 -2.61 -2.44

log partial sill -5.98 0.48 -6.93 -5.03

log range -1.12 0.06 -1.24 -1.00

Table 3: Summary of the posterior distributions of the model parameters.

3.3 Optimisation and evaluation of the design

3.3.1 Reef monitoring scenarios

Comparison with Kang et al. (2016)

The mean efficiency of the optimal design found by using methods from Kang et al. (2016)

compared to the optimal design found by using the spatial model described in this paper was

approximately 47%. This means that twice as much sampling is needed usingthe optimal

design found using methods from Kang et al. (2016) when compared to the optimal design

found using the methods proposed in this paper in order to achieve an equivalent amount of

information about trends in coral cover and the impact of disturbances.

It is worth noting that an efficiency of less than 100% is expected here as both designs were

evaluated based on the Beta regression model (i.e. the model assumed when finding our

design). However, of note is the significant reduction in the performance of a design when it is

found assuming a different model is appropriate for coral cover. This suggests that the choice

18

Site number

Bleaching proportion

Cyclone proportion

CoTS proportion

log CoTS mean

log CoTS standard deviation

1 0.12 0.25 0.62 -4.44 2.80 2 0.12 0.25 0.62 -4.60 2.99 3 0.12 0.25 0.62 -4.83 3.27 4 0.12 0.12 0.62 -3.84 2.88 5 0.12 0.12 0.62 -3.82 2.86 6 0.12 0.12 0.62 -3.80 2.85 7 0.12 0.12 0.62 -4.31 2.85 8 0.12 0.12 0.62 -4.29 2.81 9 0.12 0.12 0.62 -4.30 2.83

10 0.12 0.37 0.75 -5.45 3.90 11 0.13 0.38 0.77 -5.28 3.73 12 0.12 0.37 0.75 -5.28 3.73 13 0.12 0.37 0.75 -9.70 1.41 14 0.12 0.37 0.75 -9.62 1.40 15 0.12 0.37 0.75 -9.39 1.40 16 0.12 0.50 0.75 -6.90 2.43 17 0.13 0.49 0.77 -6.65 2.27 18 0.12 0.50 0.75 -6.43 2.12 19 0.12 0.25 0.75 -8.45 1.47 20 0.12 0.25 0.75 -8.14 1.47 21 0.12 0.25 0.75 -7.96 1.47 22 0.12 0.37 0.75 -7.17 2.18 23 0.12 0.37 0.75 -6.86 2.05 24 0.12 0.37 0.75 -6.62 1.96 25 0.12 0.37 0.75 -8.26 2.28 26 0.12 0.37 0.75 -8.26 2.28 27 0.13 0.36 0.77 -8.26 2.28

Table 4: The estimated parameters for the distributions of time-varying disturbance covariates at each site in the Whitsunday region. The second and third columns display the proportions of observing bleaching and cyclone at each site, respectively. The proportions where CoTS impact was recorded at each site are displayed in the fourth column. The last two columns display the means and standard deviations of the Log-normal distributions fitted to the non-zero CoTS data at each site.

of model has significant implications for determining optimal designs, so we provide

justification for why our model is preferred over the linear model of Kang et al. (2016). Support

for our model is justified through evaluating posterior model probabilities and inspecting the

posterior predictive checks, see Appendix C. Further, our model allows observations collected

closer together (in space) to be correlated rather than being treated as independent (as in the

model from Kang et al., 2016). Given the nature of coral cover, such correlation seems more

19

reasonable than assuming independence, and this is supported by statistical measures such as

the posterior model probabilities.

Impact of reduced sampling

Here we evaluated the effect of reduced sampling when compared to the LTMP in the

Whitsunday region by dropping reefs and sites. The results from dropping reefs can be seen in

Figure 3. These results indicate that the design without Hayman Island reef (Figure 3a, Design

choice d5), and the design without both Hayman Island reef and Rebe reef (Figure 3b, Design

choice d2) still retain around 89% and 81% mean efficiencies, respectively. This suggests that

little information is expected to be lost (when compared to the LTMP) if data are not collected

on the Hayman Island and Rebe reefs. Interestingly, some designs remain more than 75%

efficient even after dropping three reefs (i.e. 33% of the sampling effort; Figure 3c).

Figure 3: Efficiencies of designs after dropping (a) one, (b) two, and (c) three reefs in the Whitsunday region of the Great Barrier Reef. Design choices represent designs formations after dropping one, two, and three reef/reefs. The black horizontal line is the 75% efficiency level.

Hayman and Lanford-bird reefs are located in inshore habitat and are in close proximity (Figure

4), while the remaining inshore reef, Broder Island is relatively isolated. As our model can

capture the spatial variability, this may be the reason that Hayman reef was identified as the

least informative reef in the Whitsunday region. That is, information about the coral cover of

this reef can be obtained from neighbouring reefs. A similar pattern can be seen in the outer-

shelf habitat where Hyde and Rebe reefs are close to each other. Thus, Rebe was identified as

20

the second least informative reef. Out of interest, we also compared these designs with those

based on the linear model proposed by Kang et al. (2016). It was found that our designs appear

to exploit the spatial dependence in coral cover while those based on the linear model do not,

see Appendix C for further details.

Figure 4: Visualisation of spatial locations of the two least informative reefs (sites) in the Whitsunday region of the Great Barrier Reef. Sites on these two reefs are displayed in red. A small amount of jitter was added for visualisation purposes.

To further evaluate the effect of visiting fewer LTMP sites, we considered the effects of

dropping the least informative site from each reef. The corresponding optimal design retains

following sites (see Table 2 for more details):

2, 3, 4, 5, 8, 9, 11, 12, 13, 14, 16, 18, 19, 20, 22, 23, 26, 27.

This design maintained an approximate mean efficiency of 85% despite retaining only 66.7%

of the original sampling effort. The spatial locations of the retained/dropped sites from each

reef are shown in Figure 5. When considering the optimal design, there can be one or more

contributing factors towards observing one site as less informative compared to the other two

sites on a given reef. These factors may include distance between reefs/sites (spatial effect in

the model), differences in covariate values between reefs/sites, and prior uncertainty about

estimated effects (Table 3), so all of such factors should be considered when determining why

certain reefs/sites were not selected within the optimal design.

21

Figure 5: Spatial locations of the reefs/sites in the Whitsunday region of the Great Barrier Reef after dropping the least informative site from each reef. Red triangles denote dropped sites from each reef. A small amount of jitter was added for visualisation purposes.

For Broder Island reef, the optimal design retains sites 2 and 3. As all three sites on this reef

share similar features (Figure 6a and 6b), it is difficult to explain why site 1 was dropped over

the other two sites. However, it may be related to the distance between sites. That is, short

distances imply sites are related, so it may be that more information is obtained from sites that

are further apart. This is similar for sites on the Langford-bird and Hayman Island reefs. In

contrast, from 20104S reef, sites 11 and 12 seem to have a quite dissimilar bathymetries (Figure

6b), and thus, these two sites are retained in the optimal design. For 19138S reef, sites 14 and

15 share similar features (Figure 6b). Therefore, the optimal design drops one of them (sites

15), and retains the two most dissimilar sites. Likewise, sites 16 and 18 are retained from the

Rebe reef due to dissimilarities in their bathymetry and mean temperature values (Figure 6b).

In summary, these results show that sites appear to be retained/dropped depending on

heterogeneities in site-specific features as this would allow the effect of these covariates to be

estimated more precisely.

22

Figure 6: (a) Distributions of time-varying disturbances proportions and (b) distributions of other covariates at each site in the Whitsunday region of the Great Barrier Reef. Reef names corresponding to numbers given here are shown in Table 2.

Impacts of different disturbance conditions

To find designs that vary over time depending on the effects of environmental disturbances,

two scenarios were developed. In Scenario 1, environmental disturbances were simulated to

match the historical data in the Whitsunday region. In this scenario, the mean efficiency of the

LTMP was only 41% when compared to the optimal design. This confirms that the optimal

design provides highly informative data compared to the LTMP when disturbance patterns

similar to historical patterns are observed. To understand how design points were distributed,

spatial locations of the sites in the optimal design are shown with the current LTMP sites

(Figure 7).

To help interpret the results in Figure 7, a dot plot was produced (see Figure 8) which shows

the number of visits to each site under the aforementioned optimal design. The optimal design

23

does not visit all the sites in the Whitsunday region. Instead, the results suggest that collecting

more data from some selected sites provides more informative data. To describe these

differences in the number of visits to different sites, some potential factors can be considered

in habitat, reef, and individual sites levels.

Figure 7: Spatial locations of sites in the optimal design in the Whitsunday region of the Great Barrier Reef when disturbance patterns match historical disturbance patterns in the region. The Whitsunday region is depicted in three parts as Inner- (a), Middle- (b), and Outer-shelf (c) habitats. Frequency refers to the number of visits to a site. A small amount of jitter was added for visualisation purposes.

Within a habitat, there are more visits to the sites on a reef that is far away from the other reefs

in the same habitat (Figure 7). Furthermore, when two reefs are in close proximity, the optimal

design proposes less visits to the sites on either of the reefs. If we turn to the reef scale, Hayman

is the only reef (sites 7, 8, and 9) which is open for fishing in inshore habitat. Thus, there are

more visits to the sites in this reef in order to capture the underlying contrast of this reef

compared to the others. Similarly, Slate reef is the only reef closed to fishing in outer-shelf

habitat, and the optimal design collects more data from the sites on this reef. At the site scale,

sites 11 and 27 are the most diverse sites (in terms of covariates) in the Whitsunday region

(Figure 6a and 6b). To capture this dissimilarity, our optimal design visits these two sites more

often compared to the other sites in the region (Figure 8). Overall, the optimal design collects

more data from reefs/sites that are quite dissimilar from others.

24

Figure 8: Sites in the optimal design and the number of visits to each site in the Whitsunday region of the Great Barrier Reef when disturbance patterns match historical disturbance patterns in the region.

In Scenario 2, we determined optimal designs subject to CoTS disturbance under four sampling

schemes (i.e. one site from each reef affected, all the sites in inshore reefs affected, all the sites

in middle-shelf reefs affected, and all the sites in outer-shelf reefs affected) as described in

Section 2.3.2. The mean efficiencies of the LTMP with respect to the optimal designs were

47%, 48%, 50%, and 51% for these four schemes, respectively. Figures 9 and 10 visualise the

sites in the optimal designs in the Whitsunday region under these four schemes. In Figure 11,

dot plots show the number of visits to CoTS affected/unaffected sites under these four schemes.

It is evidence from these figures that the optimal designs do not visit all the CoTS affected

sites.

Under scheme (i), in inner-shelf reefs, the optimal design does not visit one of the CoTS

affected site (site 5) (Figure 9a). This site is located on Langford-bird reef, and the sites on this

reef have similar features (Figure 6a and 6b) except for the CoTS affected proportion. As CoTS

is not a significant covariate in the model (under 95% credible level) used for design selection

(Table 3), the contrast of site-specific features of site 5 against other sites might not be

substantial enough for it to be selected in the optimal design. Further, the optimal design has

the highest number of visits to a CoTS affected site (site 21 on 9131S reef), which is in middle-

shelf (Figure 9b). As this reef is close to 19138S reef, the optimal design visits neither the

affected site nor any other sites on 19138S reef. It is interesting to note that the optimal design

visits only the CoTS affected site on 20104S reef, which is located further away from the

remaining two reefs in the middle-shelf. In the outer-shelf, the optimal design does not visit

the CoTS affected site on Hyde reef (Figure 9c). One explanation for this may be that the

optimal design has collected more data from the CoTS affected site on nearby Rebe reef.

25

Figure 9: Spatial locations of sites in the optimal designs in the Whitsunday region of the Great Barrier Reef under CoTS disturbance for one selected site on each reef ((a), (b), and (c)) and for all inshore-shelf sites ((d), (e), and (f)). In each panel, the Whitsunday region is divided into three parts based on habitat as Inner- (left), Middle- (middle), and Outer-shelf (right). Red dots represent the CoTS affected sites and black dots represent unaffected sites. Frequency represents the number of visits to a site. A small amount of jitter was added for visualisation purposes.

In the optimal design under scheme (ii), where we considered all inshore sites as affected sites,

the optimal design does not visit all the affected sites (Figure 9d). Instead fewer sites are visited

in the affected area compared to the current LTMP design where all the sites would be visited.

A similar pattern of sampled reefs in the middle- (scheme (iii)) and outer-shelf (scheme (iv))

habitats as was found in the inner-shelf (Figures 10 and 11). Overall, these results indicate that

26

the optimal design provides much more informative data compared to the current LTMP design

with reduced resources.

Figure 10: Spatial locations of sites in the optimal designs in the Whitsunday region of the Great Barrier Reef under CoTS disturbance for all middle-shelf sites ((a), (b), and (c)) and for all outer-shelf sites ((d), (e), and (f)). In each panel, the Whitsunday region is divided into three parts based on habitat as Inner- (left), Middle- (middle), and Outer-shelf (right). Red dots represent CoTS affected sites and black dots represent unaffected sites. Frequency represents the number of visits to a site. A small amount of jitter was added for visualisation purposes.

27

Figure 11: Number of visits to CoTS affected and unaffected sites under the four schemes considered in this study. These four schemes include (a) one site from each reef affected, (b) all sites in inshore reefs affected, (c) all sites in middle-shelf reefs affected, and (d) all sites in outer-shelf reefs affected.

4. Discussion

In this paper, we focused on improving the effectiveness of reef monitoring in a Bayesian

experimental design context through reducing monitoring costs or resources and increasing the

information gained for addressing specified monitoring objectives. The present study makes

several contributions with respect to sampling designs for monitoring the GBR and potentially

other reef ecosystems, using an approach that could be applied to ecosystem monitoring more

broadly. First, this paper demonstrates the use of time-varying covariates such as cyclone

impacts, bleaching, and CoTS outbreaks when sampling locations are selected for the coming

year. Second, the model used for design selection has been enhanced through the incorporation

of spatial random effects, which has contributed to a gain of almost twice the amount of

information when compared to designs found using methods from Kang et al. (2016). These

design innovations have the potential to significantly improve the knowledge captured

regarding the ecological dynamics in coral cover, and thus improving the effectiveness of reef

monitoring.

One of the objectives of the current study was to compare the effect of having fewer

LTMP sites in the Whitsunday region, either by removing one reef or removing one site from

each reef. Most notably, removing these reefs and sites did not result in substantial loss of

28

information about coral cover parameters. For example, removing one site from each reef

resulted in the retention of 85% of the information obtained using the fixed LTMP design. Our

other objective was to find designs that could change over time depending on reef condition.

With this approach, the designs found do not visit all LTMP sites, but instead collected more

data from some specific sites. Our results suggest that the level of sampling effort in the LTMP

could be better spent in other areas of the reef. As travel costs make up a significant portion of

monitoring costs (Hill and Wilkinson, 2004), our findings could facilitate reduced monitoring

costs, allowing these resources to be used for in other studies.

There is scope to extend the methods presented here in future research. For example, in this

work, we did not consider the serial correlation of time-varying covariates, or the correlations

that may exist between such variables. The effects of such correlations could be explored in

future studies, and potentially could lead to more informative experimental designs.

Furthermore, while this study assessed the objective of maximising the precision in parameter

estimation, it would be straightforward to extend this approach to evaluate other functions such

as accurate predictions at un-sampled sites. Additional monitoring objectives related to the

LTMP experimental design could also be incorporated through the utility function, and

financial/time constraints could also be imposed. Lastly, previous research has shown that

inferences can change depending on the spatial scale and extent of spatial smoothing that is

considered (Kang et al., 2014, Kang et al., 2013). It would be interesting, therefore, to explore

whether such changes have a significant impact upon the chosen optimal design.

5. Acknowledgements

The corresponding author of this study is supported by the Australian Technology Network of

Universities Industry Doctoral Training Centre (ATN IDTC) Scholarship. Drovandi C and C

Mellin were supported by an Australian Research Council’s Discovery Early Career

Researcher Award funding scheme (DE160100741 and DE140100701). We would like to

thank the Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS) and the

Australian Institute of Marine Science. We are also immensely grateful to Terry Walshe, who

provided expertise that greatly assisted on the initial stages of this study. And finally, thanks to

Samuel Clifford and Alan Pearse for their help and advice regarding LTMP data. Author

contributions: McGree JM, Peterson EE, and Thilan AWLP, designed the research; Thilan

AWLP performed the research and wrote the paper; McGree JM, Menendez P, Peterson EE,

29

Caley MJ, Drovandi C, and Mellin C provided research conception and critical review of

manuscript drafts.

30

7. References

Beaman, R. J. (2017-12-10). High-resolution depth model for the Great Barrier Reef - 30 m.

https://ecat.ga.gov.au/geonetwork/srv/eng/search#!0f4e635c-81ec-46d0-9c99-65e5fe0b8c01.

Beijbom, O., Edmunds, P. J., Roelfsema, C., Smith, J., Kline, D. I., Neal, B. P., Dunlap, M. J., Moriarty, V., Fan, T.-Y. & Tan, C.-J. (2015). Towards automated annotation of benthic survey images: Variability of human experts and operational modes of automation. PloS one, 10(7). https://doi.org/10.1371/journal.pone.0130312.

BOM. (2014). EReefs marine water quality dashboard data product specification. Bureau of Meteorology. http://www.bom.gov.au/environment/activities/mwqd/documents/data-product-specification.pdf.

CERF. (2009). Marine Biodiversity Hub. https://www.nespmarine.edu.au/.

Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 273-304. https://doi.org/10.1214/ss/1177009939.

De'ath, G., & Fabricius, K. (2010). Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef. Ecological Applications, 20(3), 840-850. https://doi.org/10.1890/08-2023.1.

Devlin, M., Schroeder, T., McKinna, L., Brodie, J., Brando, V., & Dekker, A. (2012). Monitoring and mapping of flood plumes in the Great Barrier Reef based on in situ and remote sensing observations. Environmental Remote Sensing and Systems Analysis, 147-191: Taylor and Frances Group – the CRCPress.

Dunn, J. R. (2009). CSIRO Atlas of Regional Seas (CARS) Database. http://www.marine.csiro.au/~dunn/cars2009/.

Ecker, M. D., & Gelfand, A. E. (1997). Bayesian variogram modeling for an isotropic spatial process. Journal of Agricultural, Biological, and Environmental Statistics, 347-369. https://doi.org/10.2307/1400508.

Falk, M. G., McGree, J. M., & Pettitt, A. N. (2014). Sampling designs on stream networks using the pseudo-Bayesian approach. Environmental and ecological statistics, 21(4), 751-773. https://doi.org/10.1007/s10651-014-0279-2.

Faraway, J. J., Wang, X., & Ryan, Y. Y. (2018). Bayesian Regression Modeling with INLA: Chapman and Hall/CRC. https://doi.org/10.1201/9781351165761.

Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501.

Ford, E. B. (2008). Adaptive scheduling algorithms for planet searches. The Astronomical Journal, 135(3), 1008. https://doi.org/10.1088/0004-6256/135/3/1008.

31

GBRMPA. (2014). Great Barrier Reef (GBR) Features (Reef boundaries, QLD Mainland, Islands, Cays, Rocks, and Dry Reefs) shapefile. Great Barrier Reef Marine Park Authority GeoPortal. https://eatlas.org.au/data/uuid/ac8e8e4f-fc0e-4a01-9c3d-f27e4a8fac3c.

Hill, J., & Wilkinson, C. (2004). Methods for ecological monitoring of coral reefs. Australian Institute of Marine Science, Townsville, 117.

Hoegh-Guldberg, O., Mumby, P. J., Hooten, A. J., Steneck, R. S., Greenfield, P., Gomez, E., Harvell, C. D., Sale, P. F., Edwards, A. J., Caldeira, K., Knowlton, N., Eakin, C. M., Iglesias-Prieto, R., Muthiga, N., Bradbury, R. H., Dubi, A. & Hatziolos, M. E. (2007). Coral reefs under rapid climate change and ocean acidification. Science, 318, 1737-42. https://doi.org/10.1126/science.1152509.

Hughes, T. P., Baird, A. H., Bellwood, D. R., Card, M., Connolly, S. R., Folke, C., Grosberg, R., Hoegh-Guldberg, O., Jackson, J. B. & Kleypas, J. (2003). Climate change, human impacts, and the resilience of coral reefs. science, 301, 929-933. https://doi.org/10.1126/science.1085046.

Jackson, J. B., Kirby, M. X., Berger, W. H., Bjorndal, K. A., Botsford, L. W., Bourque, B. J., Bradbury, R. H., Cooke, R., Erlandson, J. & Estes, J. A. (2001). Historical overfishing and the recent collapse of coastal ecosystems. science, 293, 629-637. https://doi.org/10.1126/science.1059199.

Jonker, M. M., Johns, K. K., & Osborne, K. K. (2008). Surveys of benthic reef communities using underwater digital photography and counts of juvenile corals. In Long-Term Monitoring of the Great Barrier Reef Standard Operational Procedure Number 10; Australian Institute of Marine Science: Townsville, Australia. https://www.aims.gov.au/docs/research/monitoring/reef/sops.html.

Kang, S. Y., McGree, J., Baade, P., & Mengersen, K. (2014). An investigation of the impact of various geographical scales for the specification of spatial dependence. Journal of Applied Statistics, 41(11), 2515-2538. https://doi.org/10.1080/02664763.2014.920779.

Kang, S. Y., McGree, J., & Mengersen, K. (2013). The impact of spatial scales and spatial smoothing on the outcome of bayesian spatial model. PLoS One, 8(10). https://doi.org/10.1371/journal.pone.0075957.

Kang, S. Y., McGree, J. M., Drovandi, C. C., Caley, M. J., & Mengersen, K. L. (2016). Bayesian adaptive design: improving the effectiveness of monitoring of the Great Barrier Reef. Ecol Appl, 26(8), 2635-2646. https://doi.org/10.1002/eap.1409.

Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86.

Lagos-Alvarez, B. M., Fustos-Toribio, R., Figueroa-Zúñiga, J., & Mateu, J. (2017). Geostatistical mixed beta regression: a Bayesian approach. Stochastic Environmental Research and Risk Assessment, 31(2), 571-584. https://doi.org/10.1007/s00477-016-1308-5.

32

Long, Q., Scavino, M., Tempone, R., & Wang, S. J. (2013). Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations. Computer Methods in Applied Mechanics and Engineering, 259, 24-39. https://doi.org/10.1016/j.cma.2013.02.017.

Loredo, T. J. (2004). Bayesian adaptive exploration. In AIP Conference Proceedings (Vol. 707, 330-346). https://doi.org/10.1063/1.1751377.

Loredo, T. J., Berger, J. O., Chernoff, D. F., Clyde, M. A., & Liu, B. (2012). Bayesian methods for analysis and adaptive scheduling of exoplanet observations. Statistical Methodology, 9(1-2), 101-114. https://doi.org/10.1016/j.stamet.2011.07.005.

MacKay, D. C. J. (2003). Information theory, inference and learning algorithms: Cambridge university press.

Matthews, S. A., Mellin, C., Macneil, A., Heron, S. F., Skirving, W., Puotinen, M., Devlin, M. J. & Pratchett, M. (2019). High‐resolution characterization of the abiotic environment and disturbance regimes on the Great Barrier Reef, 1985–2017. Ecology. https://doi.org/10.1002/ecy.2574.

McGree, J. M., Drovandi, C. C., Thompson, M., Eccleston, J., Duffull, S., Mengersen, K., Pettitt, A. N. & Goggin, T. (2012). Adaptive Bayesian compound designs for dose finding studies. Journal of Statistical Planning and Inference, 142, 1480-1492. https://doi.org/10.1016/j.jspi.2011.12.029.

McGree, J. M., Drovandi, C. C., White, G., & Pettitt, A. N. (2016). A pseudo-marginal sequential Monte Carlo algorithm for random effects models in Bayesian sequential design. Statistics and Computing, 26(5), 1121-1136. https://doi.org/10.1007/s11222-015-9596-z.

Meyer, R. K., & Nachtsheim, C. J. (1995). The coordinate-exchange algorithm for constructing exact optimal experimental designs. Technometrics, 37(1), 60-69. https://doi.org/10.2307/1269153.

Miller, I., Jonker, M., & Coleman, G. (2003). Crown-of-thorns starfish and coral surveys using the manta tow and SCUBA search techniques: Australian Institute of Marine Science Townsville, Australia.

Morgan, C. C., Huyck, S., Jenkins, M., Chen, L., Bedding, A., Coffey, C. S., Gaydos, B. & Wathen, J. K. (2014). Adaptive design: results of 2012 survey on perception and use. Therapeutic Innovation & Regulatory Science, 48, 473-481. https://doi.org/10.1177%2F2168479014522468.

Osborne, K., Dolman, A. M., Burgess, S. C., & Johns, K. A. (2011). Disturbance and the dynamics of coral cover on the Great Barrier Reef (1995–2009). PloS one, 6(3). https://doi.org/10.1371/journal.pone.0017516.

Overstall, A. M., McGree, J. M., & Drovandi, C. C. (2018). An approach for finding fully Bayesian optimal designs using normal-based approximations to loss functions.

33

Statistics and Computing, 28(2), 343-358. https://doi.org/10.1007/s11222-017-9734-x.

Pandolfi, J. M., Jackson, J. B., Baron, N., Bradbury, R. H., Guzman, H. M., Hughes, T. P., Kappel, C., Micheli, F., Ogden, J. C., Possingham, H. P., & Sala E. (2005). Are US coral reefs on the slippery slope to slime? Science, 307(5716), 1725-6. https://doi.org/10.1126/science.1104258.

Puotinen, M., Maynard, J. A., Beeden, R., Radford, B., & Williams, G. J. (2016). A robust operational model for predicting where tropical cyclone waves damage coral reefs. Sci Rep, 6(6). https://doi.org/10.1038/srep26009.

Ryan, E. G., Drovandi, C. C., McGree, J. M., & Pettitt, A. N. (2016). A Review of Modern Computational Algorithms for Bayesian Optimal Design. International Statistical Review, 84(1), 128-154. https://doi.org/10.1111/insr.12107.

Ryan, K. J. (2003). Estimating expected information gains for experimental designs with application to the random fatigue-limit model. Journal of Computational and Graphical Statistics, 12(3), 585-603. https://doi.org/10.1198/1061860032012.

Sweatman H., Cheal A., Coleman G., Emslie M., Johns K., Jonker M., Miller I., & Osborne K. (2008). In Long-term monitoring of the Great Barrier Reef. Statut report number 8 (Vol. 369 pp): Australian Institute of Marine Science Townsville, Australia. https://eatlas.org.au/content/long-term-monitoring-great-barrier-reef-status-report-no-8-aims-ltmp.

Sweatman, H., Delean, S., & Syms, C. (2011). Assessing loss of coral cover on Australia's Great Barrier Reef over two decades, with implications for longer-term trends. Coral Reefs, 30(2), 521-531. https://doi.org/10.1007/s00338-010-0715-1.

Vercelloni, J., Caley, M. J., & Mengersen, K. (2017). Crown-of-thorns starfish undermine the resilience of coral populations on the Great Barrier Reef. Global Ecology and Biogeography, 26(7), 846-853. https://doi.org/10.1111/geb.12590.

Weir, C. J., Spiegelhalter, D. J., & Grieve, A. P. (2007). Flexible design and efficient implementation of adaptive dose-finding studies. Journal of Biopharmaceutical Statistics, 17(6), 1033-1050. https://doi.org/10.1080/10543400701643947.

Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. http://dx.doi.org/10.18637/jss.v027.i08.

34

Appendix A - Spatial and Temporal Distributions of coral cover and covariates considered in the model

It is apparent from Figure A.1 that coral cover was moderate to low on the surveyed reefs.

Cyclones occurred in some regions of the GBR during the years 2009, 2011, 2013, 2014, and

2015. Coral bleaching is visible over the GBR only during 2002 (Figure A.3). CoTS outbreaks

were only present only few reefs during the sampled years. The remaining figures in Appendix

A represent the spatial and temporal distribution of site-specific covariates.

35

Figure A.1: Spatial and temporal distribution of coral cover at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

36

Figure A.2: Spatial and temporal distribution of cyclone impacts at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef. Cyclone impacts have been aggregated to two levels in order to overcome the limitation of not having enough data in each level to estimate effect sizes with reasonable precision. Zero represents no cyclone effects and one represents some cyclone effects.

37

Figure A.3: Spatial and temporal distribution of coral bleaching at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef. Bleaching data have been aggregated to two levels in order to overcome the limitation of not having enough data in each level to estimate effect sizes with reasonable precision. Zero represents no coral bleaching and one represents 1% or more coral bleached.

38

Figure A.4: Spatial and temporal distribution of Crown of Thorns Starfish impacts at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

39

Figure A.5: Spatial and temporal distribution of reefs open and closed to fishing at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

40

Figure A.6: Bathymetry of sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

41

Figure A.7: Spatial and temporal distribution of mean temperature at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

42

Figure A.8: Spatial and temporal distribution of long-term mean Chlorophyll A concentration (µg/m³) in the water column at sites within the reefs considered under the Long-Term Monitoring Program of the Great Barrier Reef.

43

Appendix B – Further details about statistical methods

This section describes the estimation of the posterior distribution, posterior model probabilities, and utility approximation as implemented in this paper.

Posterior estimation

We chose a weakly informative multivariate Normal distribution for the parameter θ with

means (0, …, 0, -2, -2, 0) and a diagonal variance-covariance matrix with diagonal values

(100,…,100, 1, 1, 2). The corresponding components in each vector represent means and

variances of each regression coefficient, log partial sill, log range, and log variance,

respectively. Given a design with response data , the posterior distribution can be defined

as | , , ∝ | , , where | , , is the likelihood function. As random

effects are included in the model, the likelihood is expressed as follows:

| , , | , , , | , , (B.1)

where | , , , is the likelihood conditional on the random effects and | , is the

distribution of the random effects. In general, the above integral cannot be solved analytically,

so we used Monte Carlo methods (McGree et al., 2016) to approximate the likelihood as

follows:

| , ,1

| , , , , (B.2)

where ~ | , . This approximation can be used to approximate the posterior

distribution via the Laplace approximation as follows:

| , , MVN | ∗, ∗ , (B.3)

where ∗ denotes the mode of the posterior distribution | , , and ∗ ∗ denotes the variance covariance matrix being the inverse of the negative

Hessian matrix evaluated at ∗.

44

Posterior model probabilities

Let 1, . . , index the models considered for the coral cover data. The parameter of th

model includes the regression coefficients for covariates included in the th model, log of

variance (i.e. log of the reciprocal of the precision), and the log of covariance parameters (i.e.

partial sill and range). The posterior distribution of can be defined as

| , , ,| , , , |

| , ,,

(B.4)

where

| , , | , , , | , (B.5)

is called the marginal likelihood or the model evidence for model , | , , , is the

likelihood of model and | is the prior distribution of the model . The posterior

model probability is then given by (MacKay, 2003):

| , ,| , ,

| , (B.6)

where is the prior model probability and the term in the denominator is given by

| , | , , . (B.7)

The Laplace approximation can be used to form an approximation to | , , as follows

| , , 2 det ∗ | ∗ , , , ∗ | ,

(B.8)

where is the dimension of the parameter vector of the th model, ∗ is the mode of the

posterior density and ∗ ∗ (Overstall et al., 2018, Ryan, 2003). Thus, posterior

45

model probabilities can be estimated by substituting this approximation into Equation (B.6).

We denote the mean and variance-covariance matrix for the multivariate Normal posterior

distribution for the preferred model as ∗ and ∗ , respectively. It is this posterior

distribution that is considered as the prior distribution for design selection.

Utility approximation

We adopted the KLD utility and it can be expressed as follows:

, | , log | , log | . (B.9)

This utility can be extended to incorporate time-varying covariates as follows:

, , | , , log | , , log | , .

(B.10)

In order to evaluate the approximation to the expected utility (see Equation (7)), posterior

distributions need to be approximated. We approximate these posterior distributions using

Laplace approximation. As these will be multivariate Normal distributions, we denote these

with means ∗ and covariance matrices ∗ . Then, the KLD between the prior and posterior

distribution can be evaluated analytically as follows:

, , tr ∗ ∗ ∗ ∗ ∗ ∗

∗ ln∗

∗ .

(B.11)

46

Appendix C Model comparison To compare the linear model from Kang et al. (2016) to our proposed model, we first compared

the 95% posterior predictive checks of both models (Figure C1 and Figure C2). As can be seen,

both models appear to capture the average behaviour of coral cover. However, there are

differences when comparing the variability. The linear model yields intervals that contain all

of the data while for the Beta regression model 4% of the data falls outside these intervals. This

suggests that the Beta regression model is preferred over the linear model as we expect 5% of

the data to lie outside these intervals. Second, we evaluated the posterior model probabilities

of our model and the linear model in Kang et al. (2016). This yielded a posterior model

probability of approximately one for the Beta regression model, providing strong evidence that

it is preferred over the linear model.

Figure C1: Scatter plot of arcsine square root transformed coral cover proportions versus coded years with posterior median (black) and 95% posterior predictive interval (red) when using the model from Kang et al. (2016).

47

Figure C2: Scatter plot of coral cover proportions versus coded years with posterior median (black) and 95% posterior predictive interval (red) when using the spatial model. Qualitative comparison of the optimal designs In this section, we compare the designs found under our Beta regression model and the linear

model proposed by Kang et al. (2016). One of the main extensions of our model is the inclusion

of the spatial random effect that accounts for the fact that observations collected in space may

not be independent. To explore this, we can inspect the estimated range parameter reported in

Table 3. On a standardized distance scale, the estimated range is 0.33. After un-standardizing

this value, the estimated range is approximately equal to 12.76km. This implies that coral cover

at reefs/sites separated by distances less than 12.76km are spatially correlated, whereas

reefs/sites farther than 12.76km are not. With this in mind, we compare designs selected based

on the linear and Beta regression models. As can be seen, based on our model, Hayman reef

is selected as one of the least informative reefs (Figure C3). This reef is within 12.76kms of

Langford-bird reef implying that information about Hayman reef can be obtained by sampling

at this reef. In contrast, Broder Island was selected as the least informative reef based on the

linear model, which is relatively isolated in space. Thus, the designs found based on our

modelling approach appear to leverage the information that can be obtained due to significant

spatial variability in coral cover leading to the exclusion of reefs within a close vicinity to

others.

48

Figure C3: Visualisation of spatial locations of the two least informative reefs (sites) in the

Whitsunday region of the Great Barrier Reef. Reefs (sites) removed based on the Beta

regression model are shown in red while those based on the linear regression model are circled

in green. A small amount of jitter was added for visualisation purposes.

Bayesian design methods for improving the effectiveness of ...1 Bayesian design methods for...

Documents

Transcript of Bayesian design methods for improving the effectiveness of ...1 Bayesian design methods for...