Margaret Donald Thesis

Using Bayesian methods for theestimation of uncertainty in complex

statistical models

Margaret Donald

Bachelor of Arts (Hon), University of Melbourne

Master of Applied Statistics, Macquarie University

Submitted in fulfilment of the requirements

of the degree of Doctor of Philosophy

August 25, 2011

Discipline of Mathematical Sciences

Faculty of Science and Technology

Queensland University of Technology

Principal supervisor: Professor Kerrie Mengersen, Queensland University of Technology

Associate supervisor: Professor Anthony Pettitt, Queensland University of Technology

Abstract

The research objectives of this thesis were to contribute to Bayesian statistical methodology

by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal

methodology, by modelling error structures using complex hierarchical models.

Specifically, I hoped to consider two applied areas, and use these applications as a spring-

board for developing new statistical methods as well as undertaking analyses which might give

answers to particular applied questions.

Thus, this thesis considers a series of models, firstly in the context of risk assessments for

recycled water, and secondly in the context of water usage by crops. The research objective

was to model error structures using hierarchical models in two problems, namely risk assess-

ment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences

between cropping systems over time and over three spatial dimensions.

The aim was to use the simplicity and insight afforded by Bayesian networks to develop

appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore

the necessarily complex modelling of four dimensional agricultural data.

The specific objectives of the research were to develop a method for the calculation of

credible intervals for the point estimates of Bayesian networks; to develop a model structure to

incorporate all the experimental uncertainty associated with various constants thereby allowing

the calculation of more credible credible intervals for a risk assessment; to model a single day’s

data from the agricultural dataset which satisfactorily captured the complexities of the data; to

build a model for several days’ data, in order to consider how the full data might be modelled;

1

2

and finally to build a model for the full four dimensional dataset and to consider the time-

varying nature of the contrast of interest, having satisfactorily accounted for possible spatial and

temporal autocorrelations.

This work forms five papers, two of which have been published, with two submitted, and

the final paper still in draft.

The first two objectives were met by recasting the risk assessments as directed, acyclic

graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed

by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain

Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In

the second case, we incorporated the experimental data underlying the risk assessment constants

into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-in-

variables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of

experimental error into risk assessments.

In considering one day of the three-dimensional agricultural data, it became clear that geo-

statistical models or conditional autoregressive (CAR) models over the three dimensions were

not the best way to approach the data. Instead CAR models are used with neighbours only in

the same depth layer. This gave flexibility to the model, allowing both the spatially structured

and non-structured variances to differ at all depths. We call this model the CAR layered model.

Given the experimental design, the fixed part of the model could have been modelled as a set of

means by treatment and by depth, but doing so allows little insight into how the treatment effects

vary with depth. Hence, a number of essentially non-parametric approaches were taken to see

the effects of depth on treatment, with the model of choice incorporating an errors-in-variables

approach for depth in addition to a non-parametric smooth. The statistical contribution here was

the introduction of the CAR layered model, the applied contribution the analysis of moisture

over depth and estimation of the contrast of interest together with its credible intervals. These

models were fitted using WinBUGS [Lunn et al., 2000].

The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS

BIBLIOGRAPHY 3

becomes more problematic because of its highly correlated term by term updating. In this

work, we introduce a Gibbs sampler with block updating for the CAR layered model. The

Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This

framework is then used to consider five days data, and we show that moisture in the soil for

all the various treatments reaches levels particular to each treatment at a depth of 200 cm and

thereafter stays constant, albeit with increasing variances with depth.

In an analysis across three spatial dimensions and across time, there are many interactions

of time and the spatial dimensions to be considered. Hence, we chose to use a daily model

and to repeat the analysis at all time points, effectively creating an interaction model of time

by the daily model. Such an approach allows great flexibility. However, this approach does

not allow insight into the way in which the parameter of interest varies over time. Hence, a

two-stage approach was also used, with estimates from the first-stage being analysed as a set of

time series. We see this spatio-temporal interaction model as being a useful approach to data

measured across three spatial dimensions and time, since it does not assume additivity of the

random spatial or temporal effects.

Bibliography

Fuller, W. A. (1987). Measurement error models. New York: Wiley.

Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian mod-

elling framework: Concepts, structure, and extensibility. Statistics and Computing 10(4),

325–337.

Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal

of Computational and Graphical Statistics, 1–46. submitted August, 2010.

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or diploma at

any other higher educational institution. To the best of my knowledge and belief, the thesis con-

tains no material previously published or written by another person except where due reference

is made.

Signed:

Date: ~ /1- t Z.O 11

5

List of Publications arising from this

Thesis

The following publications form the basis for this thesis. They have either been published,

submitted for publication, or are in preparation.

Chapter 3: Bayesian Network for Risk of Diarrhoea Associated with the Use of Recycled

Water, Risk Analysis, 29 (12) 1672-1685, 2009.

Chapter 4: Incorporating Parameter uncertainty into Quantitative Microbial Risk Assess-

ment, Journal of Water and Health, 9 (1) 10-26, 2011, published online, October 2010.

Chapter 5: A Bayesian analysis of an agricultural field trial with three spatial dimensions,

submitted August 2010, Computational Statistics & Data Analysis.

Chapter 6: Comparison of three dimensional profiles over time, submitted December 2010,

Journal of Applied Statistics.

Chapter 7: Four dimensional spatio-temporal analysis of an agricultural field trial, in prepa-

ration.

7

Acknowledgements

I would like to thank Professor Kerrie Mengersen, firstly, for offering me a scholarship, without

which, I might not have found the courage to undertake and to persist in this, and secondly, for

her unfailing support and encouragement, her capacity for always focussing on the task, and her

ability to listen and let nature take its course. Thank you, Kerrie, for everything.

I would also like extend my deep appreciation to Clair Alston for her friendship, support

and help, and to Chris Strickland for his cheery help in the programming and mathematics of

Chapter 5. And to the members of BRAG and the other denizens of room O415, my thanks also.

The generosity of all my collaborators in this research has been an extraordinary experience.

Special thanks are due to Anne-Marie Clements for her continued and continuing encour-

agement of this pursuit, and her generosity in giving me a place to complete this work. To Ann

Eyland for her never-failing support of my statistical endeavours. And my thanks also to Ellis

Roberts and John Evans, my first statistical mentors. Ellis’ voice is heard in the concerns of this

thesis.

I also wish to thank Dr Maureen Aitken who gave me shelter and succour at the Women’s

College within the University of Queensland. And finally, my thanks to my daughters, Nicole

& Rachel who encouraged me to undertake this study, far from home.

9

Contents

1 Introduction 33

1.1 Overall objectives of this research . . . . . . . . . . . . . . . . . . . . . . . . 33

1.2 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1.3 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1.3.1 Case study 1a: Bayesian network . . . . . . . . . . . . . . . . . . . . 39

1.3.2 Case study 1b: QMRA . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.3.3 Case study: Field trial data . . . . . . . . . . . . . . . . . . . . . . . . 40

1.3.4 Agricultural data: Crop cycles and treatment layout . . . . . . . . . . . 43

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2 Literature Review 49

2.1 Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.1.1 Graphical models and Bayesian networks . . . . . . . . . . . . . . . . 49

2.1.2 Bayesian networks: applications . . . . . . . . . . . . . . . . . . . . . 55

2.2 Risk Assessments for Pathogens . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.2.1 Risk Assessment methodologies . . . . . . . . . . . . . . . . . . . . . 56

2.2.2 Data for a risk assessment . . . . . . . . . . . . . . . . . . . . . . . . 59

2.3 Spatio-temporal modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.3.1 Two dimensional Lattice data analyses . . . . . . . . . . . . . . . . . . 66

2.3.2 Agricultural studies with measurements at different depths . . . . . . . 69

11

12 CONTENTS

2.3.3 Spatio-temporal data analyses . . . . . . . . . . . . . . . . . . . . . . 70

2.3.4 Four dimensional spatio-temporal data analyses . . . . . . . . . . . . . 73

2.4 Addendum: The dynamic risk assessment model . . . . . . . . . . . . . . . . 75

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3 Paper One: Network for Risk of Diarrhoea Associated with the Use of Recycled

Water 101

3.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.2 Network for Risk of Diarrhoea Associated with the Use of Recycled Water . . . 104

3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.4.1 Development of a conceptual model . . . . . . . . . . . . . . . . . . . 106

3.4.2 Determination of prior probabilities . . . . . . . . . . . . . . . . . . . 108

3.4.3 Set up and use of models . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.4.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.5.1 Constructed BN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.5.2 Model 1: Analysis of the BN without uncertainty . . . . . . . . . . . . 113

3.5.3 Model 2: Analysis of the BN with uncertainty . . . . . . . . . . . . . . 114

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.6.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.6.2 Internal validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.6.3 Discussion of the results . . . . . . . . . . . . . . . . . . . . . . . . . 117

3.7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

3.9 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

CONTENTS 13

4 Paper Two: Incorporating parameter uncertainty into Quantitative Microbial Risk

Assessment (QMRA) 135

4.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.2 Incorporating parameter uncertainty into Quantitative Microbial Risk Assess-

ment (QMRA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4.4.1 Standard QMRA methodology . . . . . . . . . . . . . . . . . . . . . . 140

4.4.2 The extended QMRA model . . . . . . . . . . . . . . . . . . . . . . . 142

4.4.3 Data for the extended model . . . . . . . . . . . . . . . . . . . . . . . 143

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

4.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

4.9 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

5 Paper Three: An analysis of a field trial with three spatial dimensions 177

5.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

5.2 A Bayesian analysis of an agricultural field trial with three spatial dimensions . 179

5.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

5.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

5.4.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5.4.3 Treatment (fixed) effects . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.4.4 Choice of Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

5.4.5 Model comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

14 CONTENTS

5.4.6 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 190

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

5.5.1 Assessing presence of spatial correlation . . . . . . . . . . . . . . . . 190

5.5.2 Determining neighbourhoods and random components . . . . . . . . . 191

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

5.7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

5.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

6 Paper Four: Comparison of three dimensional profiles over time 211

6.1 Paper Four: Comparison of three dimensional profiles over time . . . . . . . . 214

6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

6.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

6.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.4.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

6.4.3 Fixed effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

6.4.4 Contrast and parameter comparisons . . . . . . . . . . . . . . . . . . . 221

6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

6.5.1 Model choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

6.5.2 Variance components . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

6.5.3 Depth segments and dates . . . . . . . . . . . . . . . . . . . . . . . . 223

6.5.4 Point by point contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . 223

6.5.5 Spatial residual components, ψ . . . . . . . . . . . . . . . . . . . . . . 224

6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

6.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

6.7.1 Sampling β. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

CONTENTS 15

6.7.2 Sampling σ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

6.7.3 Sampling ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

6.7.4 Sampling τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

6.7.5 Sampling ρ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

6.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

6.9 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

7 Paper Five: Four dimensional spatio-temporal analysis of an agricultural dataset 247

7.1 Four dimensional spatio-temporal analysis of an agricultural dataset . . . . . . 249

7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

7.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

7.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

7.6.1 Modelling spatio-temporal data . . . . . . . . . . . . . . . . . . . . . 262

7.6.2 Model Comparisons: Problems . . . . . . . . . . . . . . . . . . . . . 265

7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

7.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

7.9 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

7.9.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

8 Conclusions and further work 287

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

16 CONTENTS

Appendices 299

A 301

A.1 WinBUGS code for the model 2 Bayesian net of Paper 1 . . . . . . . . . . . . 301

B Supplementary materials for Chapter Six 307

B.1 Supplementary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

B.2 Supplementary Graphs: Contour Graphs for the spatial residuals . . . . . . . . 334

C Supplementary graphs and tables for Chapter 7 375

C.1 Graphs: Method 1, Method 2 random walk and penalised spline smoothed models375

C.2 Final estimates and credible intervals for the contrast of long fallow cropping

versus response cropping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Full Reference List 407

List of Figures

1.1 Site treatments for the agricultural data of chapters 5-7. Details of the 9 treat-

ments are given in Section 5.4.1 and again in Chapter 7 in Section 7.3 in the

description of the four-dimensional dataset. . . . . . . . . . . . . . . . . . . . 44

1.2 Crop cycles for the cropping treatments (Treatments 1-6). The vertical line in-

dicates the date for the data analysed in Chapter 5. The three dimensional data

are described in Section 5.4.1 and again in Section 7.3 of Chapter 7 as a four-

dimensional dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.1 An undirected graph for which x ⊥ y|z. . . . . . . . . . . . . . . . . . . . . . . 50

2.2 Four differing directed acyclic graphs (DAGs) with the same (undirected) struc-

ture as the undirected graph of Figure 2.1. . . . . . . . . . . . . . . . . . . . . 51

2.3 The dynamic model of Eisenberg et al. [2002]. Schematic diagram of trans-

mission model. t, independent variable representing time. Solid lines repre-

sent movement of individuals from one state to another. Dashed lines represent

movement of pathogens either directly from infectious host to susceptible host

or indirectly via the environment. State variables and parameters are defined in

the text.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.1 Conceptual model of Cook and Roser1 . . . . . . . . . . . . . . . . . . . . . . 124

17

18 LIST OF FIGURES

3.2 Bayesian network based on the conceptual model: Node numbering is that used

in the text and in the WinBUGS model . . . . . . . . . . . . . . . . . . . . . . 125

3.3 Relative risks for each age group (0-4, 5-64, 65+) and for the entire population

(All) for each risk scenario, estimated from the BN (Model 1) and by MCMC

(Model 2), with 95% credible intervals from Model 2. . . . . . . . . . . . . . . 126

3.4 Distribution of the probability of being infected with gastroenteritis. . . . . . . 133

3.5 Distribution of the probability of being infected with gastroenteritis when the

endpoint distribution fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

4.1 Model for a QMRA for surface vegetable irrigated with treated wastewater. Ob-

served data nodes shown in white, parameter nodes in green, and outcome nodes

in a light grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

4.2 Model for the part of the standard QMRA implemented here. Observed data

nodes shown in white, parameter nodes in green, and outcome nodes in a light

grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

4.3 Schematic Model for the directed acyclic graph implemented in WinBUGS for

estimation of parameters and risk. Observed data nodes (1,3,4,7) are shown in

white. Unknown parameter nodes to be estimated (2,6) in green, and outcome

nodes (5,8) in a light grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

4.4 Sunlight hours for January/June 2008 at Perth airport. . . . . . . . . . . . . . 160

4.5 Dose-Response curve with uncertainty for S.anatum: P = 1 − (1 + Dose/β)−α.

The bounding curves are the 95% credible intervals from the MCMC simulation. 161

4.6 Graphical model for Dose-Response estimated with error in measurement and

error in individual dosage: Measured dose is the observed batch dose, Batch

dose is the unobserved true batch dose, individual dose is the true unobserved

individual dose. The observation of an individual’s infection status is assumed

to be without error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

LIST OF FIGURES 19

4.7 Graphical model for a risk assessment which includes the parameters for dose-

response based on the errors-in-variables concept. . . . . . . . . . . . . . . . . 163

4.8 Dose-response curve parameters (α, β): Posterior distribution for logα vs logβ/1000

using log uniform priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

4.9 Die-off distributions for S.typhimurium: fixed effects pooled variance model

Nt = N0e−kt, k > 0. Note that for die-off k > 0. . . . . . . . . . . . . . . . . . . 165

4.10 Summer & Winter: Probability of infection - constant (the line) vs varying (the

dots) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

4.11 Summer & Winter: Probability of infection - Constant vs Varying by ranked

initial pathogen numbers groups. . . . . . . . . . . . . . . . . . . . . . . . . . 167

4.12 Probability of infection (no die-off) against ranked initial pathogen numbers

groups: using constant vs varying parameters for Beta binomial distribution. . . 168

4.13 Comparison of the Dose-response curves for S.anatum with 95% credible inter-

vals, estimated with & without “errors-in-variables”. . . . . . . . . . . . . . . 169

5.1 95% credible intervals for the contrast differences based on the cubic radial

bases model with errors-in-measurement (graphed where the 95% CI did not

cover zero). The lines with the widest tops and tails show “Long Fallow - Re-

sponse Cropping”, with the thinnest “Lucerne - Native Pastures”, and those with

medium width “Crop - Pasture”. . . . . . . . . . . . . . . . . . . . . . . . . . 203

5.2 Fixed effects curves for errors-in-variables model: Linear spline treatment ef-

fects & 95% credible intervals, CAR model, sites 1-54. The true depths are

those implied by the errors-in-measurement model. For each treatment there

are 6 sites, each with the same treatment curve. . . . . . . . . . . . . . . . . . 204

5.3 Fixed effects curves for errors-in-variables model: Cubic radial bases model

showing estimates at the nominal depth. Depth has been jittered to allow credi-

ble intervals to be seen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

20 LIST OF FIGURES

5.4 95% CI for the ratio of square root of the spatial variance to that of the non-

spatial variance at the fifteen depths: Cubic radial bases model with errors-in-

measurement for depth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

6.1 Cumulative distribution curves for the posterior distribution of the deviance, for

(date 4) September 23, 1998. The solid line represents that for the saturated

model, the middle broken line that for the 3-knot linear spline, and the more

coarsely broken line on the left that for the 5-knot linear spline model. . . . . . 235

6.2 Square root of non-spatial variances, by date and depth. Credible intervals are

staggered in date order. Note the comparatively smaller variances at the shal-

lower depths for Date 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

6.3 Square root of spatial variance, by date and depth. Credible intervals are stag-

gered in date order. Note the comparatively smaller variances at the shallower

depths for Date 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

6.4 Contrast: Long Fallow - Response cropping. Credible intervals are staggered in

date order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

6.5 Contrast: Cropping - Pastures. Credible intervals are staggered in date order. . 239

6.6 Contrast: Lucerne mixtures - Native pastures. Credible intervals are staggered

in date order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

6.7 Spatial residual components at depth 240 cm. . . . . . . . . . . . . . . . . . . 241

7.1 Long fallowing vs Response cropping at at all depths. Saturated model. Point

estimates from the MCMC iterates of the full model (Method 1). . . . . . . . . 272

7.2 Long fallowing vs Response cropping. Saturated model. Contour graph from

the point estimates from the MCMC iterates of the full model (Method 1). . . . 273

7.3 Long fallowing vs Response cropping at depth 100 for all trial dates. Saturated

model. Point estimates & 95%CIs from MCMC iterates from the full model. . . 274

LIST OF FIGURES 21

7.4 Long fallowing vs Response cropping at depth 100 for all trial dates. Penalised

spline smooth across dates. Point estimates & 95%CIs. . . . . . . . . . . . . . 274

7.5 Long fallowing vs Response cropping at depth 100 for all trial dates. Regres-

sion model (Equation 7.2) fitting 27 time-varying covariates. Point estimates &

95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

7.6 Long fallowing vs Response cropping at depth 100 for all trial dates. Random

Walk of order one. Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . 275

7.7 Spatially structured and unstructured standard deviations & 95% credible inter-

vals at depths 100 cm. The spatial standard deviations are shown in blue, the

unstructured standard deviations in green. . . . . . . . . . . . . . . . . . . . . 276

7.8 Spatially structured and unstructured standard deviations & 95% credible in-

tervals at depth 220 cm. The spatial standard deviations are shown dotted, the

unstructured standard deviations in green. . . . . . . . . . . . . . . . . . . . . 277

7.9 Long fallowing vs Response cropping at depth 140 for all trial dates (AR1 fit).

Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

7.10 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit

using weights which are reciprocals of the time intervals). Point estimates &

95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

7.11 Long fallowing vs Response cropping at depth 140 for all trial dates (RW2 fit).

Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

7.12 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit

with t dist df=10). Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . 279

7.13 Long fallowing vs Response cropping at depth 140 for all trial dates. Random

walk with 97% missing data. Random walk precision fixed at 2241. See Ta-

ble 7.1. Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . 279

7.14 Non-parametric penalised spline smooths. (Fits for the contrasts at the 7 depths.) 280

22 LIST OF FIGURES

8.1 Random walk of order one & 95% credible intervals at depth 100 cm. Fitted to

12 posterior contrast estimates at each time point. . . . . . . . . . . . . . . . . 294

8.2 Contaminated observational error: Random walk of order one & 95% credible

intervals at depth 100 cm. Fitted to 12 posterior contrast estimates at each time

point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

B.1 Spatial random components: Day 1, Depth 20 cm. . . . . . . . . . . . . . . . . 334




B.5 Spatial random components: Day 1, Depth 100 cm. . . . . . . . . . . . . . . . 336

















LIST OF FIGURES 23




























24 LIST OF FIGURES




























LIST OF FIGURES 25

C.1 Long fallowing vs Response cropping at depth 100 for all trial days. Saturated

model. summary of MCMC iterates from the full model for the contrast. Esti-

mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

C.2 Long fallowing vs Response cropping at depth 100 cm for all trial days. Non-

parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 376

C.3 Long fallowing vs Response cropping at depth 100 cm for all trial days. Random

Walk of order one. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 377



mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377



C.6 Long fallowing vs Response cropping at depth 120 cm for all trial days. Random




mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379



C.9 Long fallowing vs Response cropping at depth 220 for all trial days. Random




mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380



26 LIST OF FIGURES




model. summary of MCMC iterates from the full model. Estimates & 95% CIs. 382




Walk of order two. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 383













C.22 100 cm: Long fallowing vs Response cropping with Prior 5 precisions applied

to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 388





LIST OF FIGURES 27









C.29 Square root of variances & 95% credible intervals at depth 100 cm. Unstruc-

tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 392













C.36 Square root of unstructured variance: Days by Depth. Contour graph smooth. . 396

C.37 Square root of spatially structured variances: Days by Depth. Contour graph

smooth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

C.38 ρ & 95% credible intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

List of Tables

3.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.2 Sensitivity of two nodes: Gastroenteritis & Endpoint Distribution . . . . . . . . 121

3.3 Model Comparisons: Marginal probabilities & 95% credible intervals . . . . . 121

3.4 Model comparisons for Gastroenteritis under various conditions . . . . . . . . 122

3.5 Expected subgroup sizes for BN (Model 2) . . . . . . . . . . . . . . . . . . . 132

4.1 Settings for constant parameters . . . . . . . . . . . . . . . . . . . . . . . . . 170

4.2 Summary statistics for p(infected) over groupings . . . . . . . . . . . . . . . . 171

4.3 Summary statistics for groupings: Group Initial Pathogen Numbers / Doses . . 172

5.1 Comparing spatial neighbourhood modelling. Treatment effects model is iden-

tical for all models (Orthogonal polynomial degree 8). Models have 15 spatial

variance components (σ2d), and one homogeneous variance component (τ2), ex-

cept where otherwise stated. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

5.2 Values of Moran’s I for each depth layer. A normal approximation is used for

testing significance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

5.3 Comparing Fixed Effects modelling. Random components for all models are

given by 4 neighbour CAR with 15 depth variances (σ2d), and one homogeneous

variance component (τ2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

29

30 LIST OF TABLES

5.4 Contrasts at nominal depths: Cubic radial bases model where depth is measured

with error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

6.1 Summary of DICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

6.2 Estimates for ρ in the spatial precision matrix . . . . . . . . . . . . . . . . . . 232

6.3 Differences in ρ across the five time periods. . . . . . . . . . . . . . . . . . . . 232

6.4 Slopes for segment 200 cm - 300 cm for each treatment . . . . . . . . . . . . . 233

6.5 Signs for contrasts with 95% credible intervals not including zero, for each date.

Positive (+) and negative (−) values indicated. . . . . . . . . . . . . . . . . . . 234

7.1 Various priors used for the precisions of the timeseries models of Method 2 . . 268

7.2 Summary of DICs for Contrast 1 (Long fallowing vs Response cropping) at

Depth 140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

7.3 DICs for Long fallowing vs Response cropping: 1st order autoregressive models

vs simple regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

7.4 DICs for Long fallowing vs Response cropping: random walk model compar-

isons, using Prior 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

7.5 Square root of the Signal to Noise ratio for the RW models . . . . . . . . . . . 270

7.6 R2, pD and DIC for the RW(1) weighted models using priors 3-5 . . . . . . . . 270

7.7 Root mean square predicted error for RW1 models under Priors 1 & 2 . . . . . 271

8.1 Comparison of some fits for the contrast Long fallowing vs Response cropping

at Depth 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

B.1 Differences in σ for depths from 20cm to 100cm . . . . . . . . . . . . . . . . 308

B.2 Differences in σ for depths from 120 cm to 200 cm . . . . . . . . . . . . . . . 309

B.3 Differences in σ for depths from 220 cm to 300 cm . . . . . . . . . . . . . . . 310

B.4 Differences in κ for depths from 20cm to 100cm . . . . . . . . . . . . . . . . . 311

B.5 Differences in κ for depths from 120 cm to 200 cm . . . . . . . . . . . . . . . 312

LIST OF TABLES 31

B.6 Differences in κ for depths from 220 cm to 300 cm . . . . . . . . . . . . . . . 313

B.7 Differences in slope from 200 cm - 300 cm for each treatment across days . . . 314



B.10 Differences in slopes for each treatment on day 1 . . . . . . . . . . . . . . . . 317





B.15 Slopes for segment 200 cm - 300 cm for each treatment . . . . . . . . . . . . . 322

B.16 Slopes for segment 200 cm - 300 cm for Groupings . . . . . . . . . . . . . . . 323

B.17 Differences in slopes for each group across days . . . . . . . . . . . . . . . . . 324

B.18 Contrasts compared between days . . . . . . . . . . . . . . . . . . . . . . . . 325



B.21 Contrasts (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

B.22 Contrasts (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

B.23 Contrasts (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

B.24 Contrasts (4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

B.25 Contrasts (5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

B.26 Contrasts (6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

C.1 Contrast estimates for depth 100 cm: Long fallow cropping vs Response cropping399





32 LIST OF TABLES



Chapter 1

Introduction

1.1 Overall objectives of this research

While Bayes’ theorem has been known since its posthumous publication in 1763 [Bellhouse,

2004], its enthusiastic use in statistics is a modern phenomenon, based on the advent of fast

computers and clever algorithms. This explosion of Bayesian statistics has allowed a far more

natural way of analysing data.

Equally, because of the simplicity of the Gibbs sampler, we can conceptualise our data as a

sequence of probabilities conditioned on something being the case. This allows the specification

and solution of complex problems. As Jordan [2004] says in discussing graphical models, “What

is perhaps most distinctive about the graphical model approach is its naturalness in formulating

probabilistic models of complex phenomena in applied fields, while maintaining control over

the computational cost associated with these models.”

With the advent of fast computers and the development of Markov chain Monte-Carlo

methodology, the ability to integrate the complex integrals implied by the use of Bayes the-

orem has led to a flowering of Bayesian statistics, where probability statements are made in

terms of the probability of observing a parameter value given the data. This seems more natural

than finding the likelihood of the data given the parameters, a more classical mode of inference.

33

34 CHAPTER 1. INTRODUCTION

Additionally, the Gibbs sampler [Geman and Geman, 1984] has allowed complex models to

be expressed in terms of conditional probabilities, most of which are easy to envisage and ex-

press, and which often form a natural way of seeing the connections between various quantities.

A Gibbs sampler generates an instance of each variable (or set of variables) in turn, conditional

on the values of the other variables. The resulting sequence of samples forms a Markov chain,

and the stationary distribution of this resulting Markov chain, forms a sample from the desired

joint distribution. See, e.g., Gelman et al. [1995].

The research in this thesis is based entirely on the possibility of setting up complex condi-

tional models to describe data and using Gibbs sampling to fit such models. As Dunson [2001]

says: “A major advantage of the Bayesian MCMC approach is its extreme flexibility. Using

MCMC techniques, it is straightforward to fit realistic models to complex data sets with mea-

surement error, censored or missing observations, multilevel or serial correlation structures, and

multiple endpoints. It is typically much more difficult to develop and justify the theoretical

properties of frequentist procedures for fitting such models.”

A standard criticism of Bayesian statistics, is that it is subjective, requiring as it does, prior

probabilities (prior beliefs). However, long ago, Box [1980] pointed out that any fitting of a

model requires some sort of prior belief, saying “the need for probabilities expressing prior

belief has often been thought of, not as a necessity for all scientific inference, but rather as a

feature peculiar to Bayesian inference. This seems to come from the curious idea that an outright

assumption does not count as a prior belief....The model is the prior in the wide sense that it is

a probability statement of all the assumptions currently to be tentatively entertained a priori.”

Hence, the objective of this research was to make a contribution to Bayesian statistics and

to use Bayesian methodologies in an applied setting.

In this thesis, we use two application areas to motivate our statistical and applied contribu-

tions within Bayesian statistics. Firstly we consider risk assessments for recycled water using

graphical models, and secondly we consider models for an agricultural cropping system, where

measurements are taken over four dimensions, the aim being both to answer the substantive

1.1. Overall objectives of this research 35

questions and to contribute to Bayesian methodologies.

For the risk assessments for recycled water, we use directed acyclic graphs (DAGS), firstly

as Bayesian nets (BNs), and secondly, very much in the terms of Jordan [2004], where they pro-

vide computationally simple and conceptually simple ways of envisaging complex problems.,

Answers for the cropping systems are found using spatial and spatio-temporal models where

spatial and temporal autocorrelations must be accounted for in any assessment of difference.

In the first case study (Chapter 3.2), we describe a method for building credible intervals for

the point estimates of queries in a Bayesian network, and use the method to describe more fully

the marginal and conditional probabilities and relative risks of diarrhoea arising from various

scenarios.

In the second case study (Chapter 4.2), we postulate a risk assessment framework for Salmonella

infections, and recast it, firstly as a graphical model, and secondly as a graphical model with

nodes for all the data on which the risk assessment is predicated together with nodes for the

parameters which describe those data. This simple modification allows all the experimental un-

certainty of the underlying data to be incorporated into the risk assessment. The second model

is shown to produce estimates very different from those of the first model which used the ‘plug-

in’ estimates. Additionally, the dose-response data used in the risk assessment is used to form

a submodel within the complete risk model. In the submodel, the probability of infection and

its underlying parameters is calculated on the basis of an errors-in-variables model, which is

advocated as a more realistic use of the data. WinBUGS [Lunn et al., 2000], a program based

on directed acyclic graphs (DAGs), is used to implement the models.

The third case study considers an agricultural dataset, with the usual row/column framework

for the experimental plots of agricultural data. This dataset involves a third spatial dimension,

depth, and was collected over a five-year period with 61 days of moisture measurements at 15

depths in the soil. The first concern was to model a single day’s data, addressing the questions of

concern, while accounting for spatial correlation. This led to the paper of Chapter 5.2, in which

fixed models (with 45-135 terms) are fitted, together with complex random components, to give


more realistic estimates of the uncertainty associated with the contrasts of interest. Chapters 6.1

and 7.1 take a daily model for the agricultural data and apply it to (1) five days of data, and

(2) the full dataset, in a two-stage analysis. Chapters 6 and 7 utilise a new general purpose

software framework, pyMCMC [Strickland, 2010], to fit the models. This exploits the sparse

neighbourhood matrices typical of spatial data.

Chapter 5.2 introduces our conditional autoregressive (CAR) layered model, and fits many

differing fixed and spatial models using WinBUGS [Lunn et al., 2000], the purpose being to

determine a model which might describe all 61 days’ data.

Updating in WinBUGS (using the Gibbs sampler) is observation by observation, with con-

ditional probabilities defined on an observation by obervation basis. Chapters 6-7 use the more

efficient framework of Rue and Held [2005], which allows the exploitation of the sparse ad-

jacency matrices of Gaussian Markov Random Fields (GMRFs) to block update the spatial

components. In particular, this exploits the fact that the precision matrix of multivariate Gaus-

sian models describes the independence structure of the various elements [Whittaker, 1990].

The adjacency structures used to describe spatial dependence thus give rise to simple graphical

structures describing the data, and to more efficient estimation.

1.2 Research Aims

The aim of this thesis was to contribute to Bayesian statistical methodology by contributing to

risk assessment statistical methodology. and to spatial and spatio-temporal methodology, and

then to apply the new methodologies to risk assessments for recycled water in Western Australia,

and to the assessment of differences between cropping systems over time and over three spatial

dimensions.

The specific objectives were

1. to develop a method for the calculation of credible intervals for the point estimates of

Bayesian networks;

1.3. Structure of Thesis 37

2. to develop an appropriate DAG structure to allow the calculation of more credible credible

intervals for a risk assessment;

3. to model a single day’s data from the agricultural dataset which satisfactorily captured the

complexities of the data;

4. to build a model for several days’ data, in order to consider how the full data might be

modelled;

5. to consider the full four dimensional dataset and the time-varying nature of the contrast

of interest.

1.3 Structure of Thesis

This thesis has been written as a series of papers included as Chapters 3 to 7, which have been

published by, or submitted to journals, or which are in preparation, and which are here presented

in their entirety. Chapters 3 and 4 look at graphical models representing risks from re-used water,

loosely based on a Western Australian context. Both show methods for addressing uncertainty

more effectively via the use of graphical models. These two chapters address research objectives

(1) and (2), and have been published as papers. Chapters 5 to 7 address the uncertainty arising

from spatial correlations in three and four dimensional data, where the data address the problem

of salination of soils due to agricultural practices. Chapters 5 & 6 have been submitted to

journals. Chapter 7 describes work still to be crafted as a paper.

Chapter 2 comprises a Literature Review which covers Bayesian nets, graphical models, and

risk assessments for water, and provides the background and foundation for the methodological

component of the first two papers of the thesis, giving a more detailed basis than is available

in the papers. It includes a review of spatial correlation models and spatio-temporal models,

together with their use in an agricultural and three and four dimensional context, which again,

provides a more detailed background for the papers of Chapters 5-7.


Chapter 3 addresses the research aims by introducing a method for determining the uncer-

tainty associated with the point estimates in a Bayesian net, and uses this method in an applied

context to find credible intervals for marginal and conditional probabilities and relative risks for

various scenarios associated with the use of recycled water.

Chapter 4 advocates the inclusion of all experimental data on which a risk assessment may

be based, into a directed acyclic graph. This permits the estimation of the parameters simulta-

neously with the risk assessment, and thereby automatically includes all the experimental un-

certainty into the risk assessment. Thus, this chapter akes a typical Quantitative Microbial Risk

Assessment (QMRA), recasts the flow diagram as a graphical model, and adds further nodes

to allow the estimation of all parameters required by the risk assessment. In addition, for one

set of parameters, an errors-in-variables model is included. Creating a single graphical model

using all the data from which plug-in estimates are found allows more realistic estimation of

uncertainty associated with the estimated risk. By adding Markov chain to Monte Carlo, this

chapter contributes to risk assessment methodology. Its applied contribution is to introduce an

errors-in-variables model for the estimation of dose-response parameters.

Chapter 5 takes a three-dimensional spatial dataset and, using WinBUGS [Lunn et al., 2000],

compares models for the soil moisture profiles along the depth dimension by cropping treatment

(the fixed part of the model) and models for the spatial autocorrelation structure. An errors-in-

covariates model is also considered, since it has the capacity to model the fact that measured

depth does not represent depth within the soil profile. We introduce the layered CAR model and

show that the spatial autocorrelations are best modelled when depth is separated from the hori-

zontal dimensions. Here, too, for the regression component of the model, an interval censored

errors-in-variables model is fitted. Using conditional autoregressive (CAR) models, which work

from the simple sparse precision matrix, rather than using kriging models which work via (typ-

ically) dense covariance structures, complex fitted effects models may be fitted, even within the

inefficient observation by observation updating of WinBUGS.

Chapters 6 and 7 extend the modelling of Chapter 5 into the time domain, and use the sparse


graphical structure of the adjacency matrices to estimate the models more efficiently. Chapter 6

is an exploratory foray into the field of spatio- temporal modelling in the context of agricultural

data, and attempts to see what might remain constant over time. It also presents a block updating

Gibbs sampler for the problem. Chapter 7 fits a complex model to the full agricultural dataset.

The model is complex in its fixed part but more particularly in its error structure where CAR

spatial autocorrelation models are used for each day and each depth (the layered CAR model)

with a first order adjacency structure defined across the horizontal layer. The purpose of the

model was to describe a contrast between treatments over time. The modelling methods used

were both a one- and a two-stage analysis. In the second stage analysis, contrast estimates from

the first stage are explored using various time series models. This chapter also shows that for

random walk models of order one, where there is just one observation for two error sources, the

model estimates may be overly sensitive to the priors for the variances.

An overview of the research, together with some issues which arise, is provided in Chapter

8.

1.3.1 Case study 1a: Bayesian network

Starting from a conceptual model which represented the factors and pathways by which recycled

water may pose a risk of contracting gastroenteritis, a graphical model (Bayesian net) was cre-

ated, which was quantified by an expert. The contribution to Bayesian net methodology is that

all model predictions, whether risk or relative risk estimates, are expressed as credible intervals,

instead of simple point estimates.

1.3.2 Case study 1b: QMRA

Quantitative Microbial Risk Assessments (QMRA) are the method of choice for the estimation

of health risks from pathogens. A typical QMRA is considered, and rather than working from

a set of plug-in parameters, we show how to estimate all such parameters contemporaneously

within the risk assessment, thereby incorporating all the parameter uncertainty arising from


the experiments from which these parameters are estimated. The method is illustrated by a

case study that involves incorporating three disparate datasets into an MCMC graphical model

framework. The contribution here is to recognise that any and all primary data underlying a

risk assessment should be incorporated into a graphical model to estimate risk, and, equally

importantly, that the dose-response model should be fitted as an errors-in-variables model.

1.3.3 Case study: Field trial data

The viability of rainfed grain cropping on the Liverpool Plains (New South Wales, south eastern

Australia) is threatened by salination of land and water resources. Salination is caused by exces-

sive deep drainage below the plant root zone which mobilises sometimes vast sub-soil stores of

salt deposited at the time of soil formation. Deep drainage occurs when rain infiltrates already

wet soil that has insufficient capacity to store the additional water. This excess saline water may

produce water logging and shallow saline water tables or may discharge at lower points in the

landscape or into surface- or ground-waters [Broughton, 1994]. When saline ground waters en-

croach on the crop root zone, the salt kills germinating crops or reduces yields depending on salt

concentrations and rainfall [Daniells et al., 2001]. This excess water is usually due to a combina-

tion of above average rainfall falling onto land farmed using long fallow cropping practices (ie.

the land is kept as bare fallow for about 2/3 of the time). Although long-fallow cropping usually

results in good grain yields for each crop, average yields over time are generally less than yields

from more intensive, but somewhat more risky systems. To overcome both the problems of

excess water in the landscape under long fallow cropping, and the risk of poor crop yields due

to insufficient water supply between successive crops where cropping is frequent, a practice of

planting a crop, appropriate for the time of year, crop health and economic considerations, in

response to soil water content (opportunity or response cropping) is being increasingly adopted

by farmers.

This problem was addressed by running a cropping experiment which compared essentially

three types of cropping. The primary question for the scientists was whether response cropping


gives lower moisture values∗ both at the intermediate and greater depths, in comparison with

long fallowing, and whether this is sustained over different stages of the cropping cycle. Sub-

sidiary questions addressed here are the comparison of all cropping treatments with all pasture

treatments, and the comparison of the lucerne pasture mixtures with the native grass pasture.

Figure 1.1 shows the layout of the 9 treatments described in Sections 5.4.1 and 7.3 of Chapters

5 & 7, and Figure 1.2 shows the cycles of crops for treatments 1-6 throughout the five year

experiment.

The complexity of the data motivated the use of Bayesian methods to account for spatial

correlation and to answer the substantive question as to which cropping system was the most

viable form of agriculture.

This led to two submitted papers, the first of which explored possible models for describing

a single day’s data. This paper explored possible fixed models for the treatment curves with

depth, together with possible neighbourhood models for a CAR prior description of spatial

autocorrelation, and AR models in the horizontal directions. A major conclusion was that for

three dimensional data where an observation may be seen to have up to 26 immediate neighbours

in three dimensional space, there are immense advantages in not using depth neighbours, instead

allowing neighbours to be defined only at the same depth. This permits the possibility of having

differing spatial and non-spatial variances for each depth, a possibility which was found to

better describe the data. It also obviates the problem that neighbours in the depth dimension are

markedly closer to one another than neighbours across the horizontal layer, which gives rise to

the problem of choosing suitable weights. We call this model the CAR layered model.

The second agricultural paper took the lessons learned from the day’s modelling exercise and

considered five days of data. In doing this, the modelling framework of WinBUGS, exploited

so successfully in the earlier paper, was found inadequate to this larger data analysis task, since

models for these five days failed to converge to sensible estimates. An alternative approach was

to develop software specific to the task, and use block updating to eliminate the problem of very

∗Moisture in the soil is measured by the log(neutron count ratio), a surrogate for moisture. For details seeRingrose-Voase et al. [2003].


highly correlated MCMC chains. Working with Chris Strickland, modules for the CAR models

were built and incorporated into an MCMC modelling framework written in Python which uses

the libraries of Anderson et al. [1999], Blackford et al. [2002], Lawson et al. [1979], NumPy

Community [2010], together with methods based on Krylov subspace methods from Simpson

et al. [2008]. Rather than the improper CAR models of the earlier paper, the CAR model used

was the CAR proper model of Gelfand and Vounatsou [2003], but again modelled as a CAR

layered model.

The last agricultural paper (Chapter 7.1, in draft), analyses the full agricultural dataset,

basing the analysis on the work described in Chapter 6. Given the differences in spatial and

temporal variances, it made sense to analyse the dataset as a series of daily models, rather than to

assume, a-priori, some time varying data structure. Using the modelling framework developed in

Chapter 6, this chapter fits a complex model to the treatments, and models the contrast estimates

in a one- and two-stage analysis to answer the question as to whether response cropping results

in less moist soils than long fallow cropping. This chapter also considers problems associated

with random walk of order one models which posed problems for model choice.


1.3.4 Agricultural data: Crop cycles and treatment layout

These figures were not able to be included in the papers of Chapter 5 - Chapter 7, but are included

here to complete the description of the agricultural data in those chapters. Figure 1.2 shows the

crops sown over the five year period of the trial, together with their treatment identification

(1-6). Figure 1.1 shows the layout of the 9 treatments in the field.


Figure 1.1 Site treatments for the agricultural data of chapters 5-7. Details of the 9 treatmentsare given in Section 5.4.1 and again in Chapter 7 in Section 7.3 in the description ofthe four-dimensional dataset.


Figure 1.2 Crop cycles for the cropping treatments (Treatments 1-6). The vertical line indicatesthe date for the data analysed in Chapter 5. The three dimensional data are describedin Section 5.4.1 and again in Section 7.3 of Chapter 7 as a four-dimensional dataset.


Bibliography

Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Green-

baum, S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third

Edition (22 Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics

(SIAM).

Bellhouse, D. R. (2004). The Reverend Thomas Bayes, FRS: A biography to celebrate the

tercentenary of his birth. Statistical Science 19(1), 3–43.

Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kauf-

man, A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms

(BLAS). ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.

Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness.

Journal of the Royal Statistical Society. Series A (General) 143(4), 383–430.

Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salin-

ity studies - Liverpool Plains, TS94.026. Technical report, New South Wales Department of

Water Resources.

Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship

between yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions.

Australian Journal of Experimental Agriculture 41, 211–217.

Dunson, D. (2001). Commentary: practical advantages of Bayesian analysis of epidemiologic

data. American Journal of Epidemiology 153(12), 1222–1226.

Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models

for spatial data analysis. Biostatistics 4(1), 11–25.

BIBLIOGRAPHY 47

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian data analysis. Texts in

statistical science. London: Chapman & Hall.

Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions and the Bayesian

restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6,

721–741.

Jordan, M. I. (2004). Graphical models. Statistical Science 19(1), 140–155.

Lawson, C. L., R. J. Hanson, D. R. Kincaid, and K. F. T (1979). Basic Linear Algebra Subpro-

grams for Fortran usage. ACM Trans. Math. Software 5(3), 324–325.



325–337.

NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release

1.5.0.dev8106. Available online: http://docs.scipy.org/doc/. Accessed: February

9, 2010.

Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating,

J. Scott, M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells,

R. Farquharson, R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage

under different land uses in the Liverpool Plains Catchment. Technical Report 3, Agricultural

Resource Management Report Series, NSW Agriculture Orange.

Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca

Raton: Chapman & Hall/CRC.

Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov

random field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Avail-

able online: http://eprints.qut.edu.au.


Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal

of Computational and Graphical Statistics, 1–46. submitted August, 2010.

Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New

York: Wiley.

Chapter 2

Literature Review

The literature review is divided into a review of two areas (1) Bayesian networks & Risk As-

sessment, and (2) Spatio-temporal modelling.

Within the Bayesian network literature we consider some theoretical literature in the pursuit

of credible intervals for point estimates. In the risk assessment section, we also demonstrate a

search for data for use in a risk assessment for recycled water. (However, having hunted, we

preferred not to undertake the planned risk assessment.)

2.1 Bayesian networks

2.1.1 Graphical models and Bayesian networks

A fundamental concept of graphical models is that of conditional independence. Two random

variables x and y are independent iff p(x, y) = p(x)p(y). This is written as x ⊥ y. Two variables x

and y are conditionally independent given z iff p(x, y|z) = p(x|z)p(y|z). This is written as x ⊥ y|z.

Note that x and y may be marginally dependent despite being conditionally independent given

z. This relationship of conditional independence is captured in the undirected graphical model

of Figure 2.1. (Explanation adapted from Rue and Held [2005].)

For multivariate normal data, conditional independence of x and y corresponds to a zero in

49

50 CHAPTER 2. LITERATURE REVIEW

Figure 2.1 An undirected graph for which x ⊥ y|z.

the precision matrix [Rue and Held, 2005, Whittaker, 1990]. Using undirected graphical models

as descriptions of conditionally independent terms has been used to determine data models by

Darroch et al. [1980], Edwards [1995], Wermuth and Cox [1998].

A Bayesian network, sometimes called a probabilistic network, is a directed acyclic graph

(DAG) that represents a set of random variables and their conditional dependencies.

Figure 2.2 illustrates four differing DAGs for the graphical model of Figure 2.1. The condi-

tional independence of x and y, defined by the relationship shown in Figure 2.1, is common to

the four DAGs of Figure 2.2. In graphical theory, x and y are called nodes and represent random

variables x and y. The topmost graph of Figure 2.2 may be read as ‘x explains z, which explains

y’. Or more causal language may be used: the graph second from the bottom may be read as ‘x

and y cause z’. In the bottommost diagram of Figure 2.2, the directed edge from z to x models

a causal relationship between z and x. When variables are connected in this way, the variable

from which the edge originates is called the parent and the variable to which the edge leads is

called the child. If V represents the set of n variables (x1, x2, ..., xn) of a network, then the joint

2.1. Bayesian networks 51

Figure 2.2 Four differing directed acyclic graphs (DAGs) with the same (undirected) structureas the undirected graph of Figure 2.1.


probability of the network,

p(V) = p(x1, x2, ..., xn) =n∏

i=1

p(xi|Parents(xi)).

Thus, in Figure 2.2, for the topmost and bottommost figures, we have

p(x, y, z) = p(x)p(z|x)p(y|z),

p(x, y, z) = p(z)p(x|z)p(y|z).

These equations express the conditional independence properties inherent in the structure of

these two DAGs. An excellent reference which gives the theoretical background of Bayesian

net software, proofs and examples is Cowell et al. [2001].

In another useful reference, Korb and Nicholson [2004] emphasize the notion of ‘knowledge

engineering’, where Bayesian networks are used to encode structural relationships elicited from

experts. In addition to the theoretical bases for Bayesian net software, they describe techniques

for eliciting and verifying the resultant expert systems.

Bayesian nets have become a standard tool in artificial intelligence [Cowell et al., 2001],

and in the building of expert systems, which involves several stages. The first stage involves

‘natural judgements of relevance or irrelevance...’ with ‘missing edges in the graph encod(ing)

the irrelevance properties.’ [Cowell et al., 2001]. Having elicited a structure from experts,

the graph must be quantified by eliciting sufficient conditional probabilities to specify the joint

distribution.

Some important papers in the development of the theory of Bayesian nets are Andersen

et al. [1989], Boerlage [1992], Cowell and Dawid [1992], Jensen [2001], Jensen et al. [1995],

Lauritzen and Spiegelhalter [1988].

Pearl [1988] introduced the notion of d-separation (illustrated in the bottommost network


of Figure 2.2, which was important in in developing the first algorithms for Bayesian networks

and their use for artificial intelligence. The seminal paper by Lauritzen and Spiegelhalter in

1998 developed efficient algorithms for transfer of evidence within causal networks. Other key

contributors and contributions to the development of the theory and the software to implement

Bayesian nets were Cowell and Dawid [1992], Dawid [1992], Dawid et al. [1995]. Spiegel-

halter et al. [1993] show how parts of a network are updated with evidence using a Dirichlet

prior, while Jensen [1994] discusses how the various threads are drawn together to implement

Hugin[Hugin Expert A/S, 2007]. The Hugin webpage www.hugin.com/developer/Publications/

[Hugin Expert A/S, 2007] also lists various papers which underly its implementation.

Laskey [1995] looks at a measure of sensitivity for nodes via differentiation. Castillo et al.

[1997], Castillo et al. [1997] also propose measures for sensitivity analysis of Bayesian nets.

Jensen et al. [1991] consider measures of data conflict.

In addition to their use as expert systems, Bayesian nets may be derived from data which

allows elucidation of relationships in complex multivariate data. This can be thought of as being

a development of the work of Whittaker [1990], where the structure of the precision matrix for

multivariate normal models is used to determine the graphical structure of the model. This work

was continued and extended to categorical data in the software MIM [Edwards, 1995]. In the

context of Bayesian nets and directed as opposed to undirected models, Steck [2001] developed

the necessary path condition (NPC) algorithm to determine directionality, when using data to

‘recover’ the DAG structure of a Bayesian net. The work by Boerlage [1992] on ‘link strength’

also develops the data mining capacity of Bayesian nets. Lauritzen [1995] develops an EM

algorithm for fitting data to a predetermined Bayesian net. Neapolitan and Jiang [2007] further

detail types of learning for BNs. However, while this is an extremely valuable area of research

for finding meaningful parsimonious models to explain multivariate data, this is not a direction

of my research.

There is some work relating to the uncertainty associated with nets implied by data, see,

for example, the chapter on structural learning in Cowell et al. [2001]. Van Allen et al. [2001,


2008] demonstrate a method whereby uncerainty may be propagated through a network. They

show that under certain conditions a query response is asymptotically Gaussian and provide

its mean and asymptotic variance. However, our work differs somewhat from theirs: we elicit

credible intervals for all the probabilities of the conditional probability tables, and represent this

uncertainty by using beta distributed priors, and, in general, the probabilities we use are close

to 0 and 1. Van Allen et al. [2008] populate their nets with Dirichlet mixtures generated by the

conditional probability tables, where the majority of the probabilities are not close to 0 or 1.


2.1.2 Bayesian networks: applications

Bayesian networks are used very widely to envisage the consequences and risks in complex

systems. Thus, for example, Matias et al. [2007] use a BN to evaluate the environmental impact

of a mine, Martin et al. [2009] use BNs to look at accidents from falls, Rassmussen [1995] uses

them to relate blood type to cattle parentage. Barker et al. [2002] build a BN for Clostridium

botulinum growth model combining contamination processes, thermal death kinetics for spores,

germination and regrowth of cells, toxin production and patterns of consumer behaviour.

Within the environmental and water research literature Bayesian nets have also been widely

used. See, for example, Hamilton et al. [2007], Nicholson et al. [2003], Pike [2004], Pollino

and Hart [2005a,b], Pollino et al. [2007], Varis [1995, 1997, 1998].Kennett et al. [2001] Most of

these examples integrate expert opinion and data to consider a problem. Pike [2004] is partic-

ularly relevant to the concerns of this thesis, in that he builds a net to describe water treatment

plant failures on the Susquehanna River, using treatment plant failure data.

Albert et al. [2008] use directed acyclic graphs, via WinBUGS [Lunn et al., 2000], data from

food diaries, from sampling chicken flocks for contamination, and broiler production together

with expert opinion to quantify the risk of contracting a campylobacteriosis as a result of broiler

contamination. Their BN model integrates all these disparate data sources to produce a risk

estimate.

Influence diagrams integrated with Monte Carlo simulation were used by Casman et al.

[2000] to look at how long it might take to stem an outbreak of cryptosporidiosis taking into

account time for authorities (both health and water) to react and the likelihood and proportion

of the public’s compliance with advisories.

Object oriented Bayesian networks have been used for process control [Weidl et al., 2003,

2005], in a paper pulp plant. In this application, hidden markov models and ‘soft’ Bayes classi-

fication were used to detect and anticipate particular types of failure in the plant.

Barker et al. [2002] build a BBN for Clostridium botulinum growth model combining con-


tamination processes , thermal death kinetics for spores, germination and growth of cells, toxin

production and patterns of consumer behaviour, and comment that it ‘supports, and prioritises,

decisions and actions that minimise the chances and extent of detrimental events and maximise

opportunities for awareness and control’.

In the risk assessment of Chapter 3.2, the Bayesian net built fits into the knowledge engi-

neering framework of Korb and Nicholson [2004], where a net is built to describe and synthesize

qualitative expert knowledge, quantify it meaningfully and look at the implications of the model

built. Marcot [2006], Marcot et al. [2006] emphasize the need to use such implications to refine

the model. The net of Chapter 3.2 is built using expert knowledge. However, its main interest is

in the development of credible intervals for the risk ratios of interest.

2.2 Risk Assessments for Pathogens

In this section I explore the ‘how-to’ as described in influential books, as prescribed by the World

Health Organization, and by peak Australian water bodies, as well as the ‘how-to’ exemplified

and extended by various papers. I also look at the data mandated to be collected by regulation

(or law) in Western Australia, for the protection of the public and the environment. And finally,

I explore data which might be used for a risk assessment of a pathogen for recycled water in

Western Australia.

2.2.1 Risk Assessment methodologies

The literature for risk assessments and risk assessment methodology for drinking and receiving

waters is extensive. With respect to risks from pathogens, Haas and Eisenberg [2001] contribute

a chapter to Fewtrell and Bartram [2001], (the WHO Guidelines for water quality standards and

health), which looks at two types of quantitative microbial risk assessments (QMRA), static and

dynamic, and recommend establishing which of these two types of risk assessment is appropriate

to the pathogen and situation under consideration. The WHO guidelines [Fewtrell and Bartram,

2.2. Risk Assessments for Pathogens 57

2001] include a chapter on what constitutes an acceptable or tolerable risk [Hunter and Fewtrell,

2001]. And of course, there are many books on risk assessment, see, e.g. Burgman [2005], who

looks more generally at ecological risk modelling.

Looking at risk assessments associated with water, there are many papers, reports and guide-

lines, some of which mandate how a risk assessment should be undertaken. In the last decade

or more, the interest has shifted from chemicals with their lifetime, short-term and foetal risks,

to risks from pathogens and the potential for water-borne epidemics. ‘How-to’ manuals include

Roser et al. [2006], Petterson and Ashbolt [2006], Ashbolt et al. [2005]. (The last of these

proposes quantitative microbial risk assessments via a GUI front end programmed in Analytica

[Lumina Decision Systems, 2004]). An Austalian example of a risk assessment associated with

wastewater is Roser et al. [2006].

The literature discussed thus far works within the paradigm of Haas et al. [1999], who sum-

marise much prior work, and give many tables of constants and rates for use in risk assessments.

An alternative approach is to model the risks in terms of exposure, latent infection, symp-

tomatic infection, immunity and (again) exposure. This draws on the work of Anderson and

May [1991] and earlier researchers analysing epidemics and vector-borne diseases. This model

is termed the ‘dynamic’ model by Olivieri et al. [2007] and is based on sets of partial differential

equations, whereby individuals progress through a Markov chain model in a sequence of expo-

sure, latent infection, symptomatic infection, immunity and re-exposure stages. This model is

sometimes called a compartmental model and is illustrated as Figure 2.3, together with its re-

lated partial differential equations 2.1. The model can be used to simulate risk [Eisenberg et al.,

1996], or to analyse outbreak data [Eisenberg et al., 1998]. (Further details of this model are

given in an addendum to this literature review (Section 2.4). Eisenberg et al. [1998] modelled

daily symptom onset, that is, the time scale was extremely fine and unlikely to be replicated in

any available Australian data, where monthly incidence data is probably the best one can hope

for. Thus, in most cases, if we were to analyse data using such a model, we would not need the

many latent infection stages used by Brookhart et al. [2002], Eisenberg et al. [1998]. While if


we wished to use such a model as the basis of a risk simulation, the number of rates required to

be known, mitigates against its use.

The ‘static model’ currently most used in microbial risk assessments, requires a description

of the dose-response curve for the pathogen, together with a basis for calculating the likely

dose of the pathogen in the water being considered, while a ‘dynamic’ model run as a risk

simulation requires estimates of duration of immunity, pathogen shedding rates, latent infection

rates and more [Eisenberg et al., 1998]. Thus, both kinds of risk assessment have considerable

data requirements.

In a related discussion about risks from stormwater, the Water Environment Research Foun-

dation et al. [2007] commented that the data available to parameterize MRA models were not

robust, and this was our feeling with respect to running a simulation study for wastewater risks

based on the various constants available.

Outbreak reports are of considerable use in pinpointing the causes of outbreaks and hence

potentially extremely useful for risk assessment. Papers and reports listing outbreaks through-

out the developed world are Hrudey et al. [2002], Karanis et al. [2007], Nadebaum et al. [2004],

Rizak and Hrudey [2007], Sinclair [2005]. The long list of incidents cited in Nadebaum et al.

[2004] show that plant failures, human stupidity, extreme weather, and loss of corporate knowl-

edge have led to catastrophic water-borne disease outbreaks via the drinking water supply in

Europe and the US. Thus, plant failures, rather than the perfect plant operation postulated in

many risk assessments should be an important component of any water-borne disease risk as-

sessment. Khan [2010], too, makes the point that ‘risks to public health are determined by the

performance reliability of the system’. Rizak and Hrudey [2007] in looking at current sampling

methodologies for clean water discuss the problem of “intermittent, event-driven contamination

or system failure”. Event-driven contamination is also discussed by Signor and Ashbolt [2006]

who examine significant rainfall events in the Sydney water catchment and find that they give

rise to potentially large numbers of Cryptosporidium oocysts in the water supply. They too,

remark with Rizak and Hrudey [2007] that routine monitoring may well be both misinforma-


tive and uninformative. Woo and Vicente [2003] examine the water-borne disease outbreaks at

Walkerton and North Battleford using the framework of Rasmussen [1997] and conclude that

to minimise risk, the players in the complex sociotechnical system of clean water supply must

be identified, the objectives at each level be explicit, systematic feedback between all levels be

required, and that the players at each level be both competent and committed to safety. Westrell

et al. [2003] address the issue of plant reliability in examining risks for a water treatment plant

in Sweden. Clearly, this should be a component of any risk assessment. The pity of it is, that

we typically undertake risk assessments before embarking on new projects, but not when tech-

nology is old, except in response to an identified disaster. However, I ignore this component

in a risk assessment for wastewater: reliability data for wastewater treatment plants in WA is

limited to the data available in the annual audit reports [Water Corporation, 2010], and relates

to overflows, nutrients and odour issues.

2.2.2 Data for a risk assessment

To undertake a risk assessment on the lines laid down by Haas et al. [1999] or Natural Resource

Management Ministerial Council et al. [2006] one needs

1. A description of the microbe numbers in either the waste water, or in the final treated

water.

2. Data or summarised log reductions of decimal elimination capacity (DEC) for each stage

of the water treatment. See, e.g. Hijnen et al. [2007, 2005, 2004].

3. Data or a description of the effect of other pathogen reducing processes such as die-off in

sunlight, e.g., Sidhu et al. [2008].

4. An amount of crop/water ingested by a person, under the risk scenario. One may use

survey data (see.e.g.,Mons et al. [2007]) if available, or use choices made by other re-

searchers, such as those of Tanaka et al. [1998].


5. Data or a description for the dose of the pathogen and the probability of infection or other

outcome.

We focussed on the recycling plant at Subiaco wastewater treatment facility in WA, to con-

cretize the risk assessment problem. Subiaco is a class C plant where recycled water is destined

for non-potable uses [Isaac, 2008a, email], and therefore, post treatment water samples are tested

for E.coli only [Isaac, 2008b, email]. Results are owned and kept by the Water Corporation, and

sent to the WA Department of Health, and are not publicly available. Subiaco WWTP like many

other WWTPs produces an annual audit report which details discharge loads and concentra-

tions for total nitrogen, total phosphorus and some other parameters [Water Corporation, 2010].

No pathogens are reported. A good reason for the non-collection of both viral and microbial

pathogens is that they are difficult to detect in treated wastewater due the presence of inhibitory

chemicals and suspended solids. Viruses, being present in small numbers, require very large

samples. Recovery methods are “too cumbersome or complicated to be used on a routine basis”

[Toze, 1999, 2004].

Another planned use for reclaimed wastewater in WA is for aquifer recharge, with water to

be reclaimed to potable standard and then to be injected into groundwaters for storage. The fate

of various microorganisms, when treated wastewater was discharged to groundwaters in Western

Australian, is reported in Gordon and Toze [2003], Toze [2002, 2004], Toze et al. [2002, 2004],

while Toze et al. [2005] look at selected pathogens, at the sprinkler and pre- and post-irrigation

on the grass of McGillvray Oval, and find that Salmonella, while present in the sprinkler water

is not found in the grass (seven samples). Dillon et al. [2008] examine combined engineered

and aquifer treatment systems in water recycling, and give pathogen removal rates for various

treatments for aquifer recharge.

In a recycled water context, Petterson et al. [2001], Petterson and Ashbolt [2001], Petterson

[2002] looked at viruses and other pathogens and inactivation rates when contaminated water

was used to irrigate salad crops.


Log reduction data may be found for many pathogens in Hijnen et al. [2007, 2005, 2004].

However, Smeets et al. [2008],Smeets et al. [2008] find that the log reduction model does not

work well for their Campylobacter source water and treated water data, and Smeets et al. [2007]

find that reduction rates for Cryptosporidia are largely dependent on the source water, i.e, the

treated water concentrations are not a function of the source water concentrations. In using such

sources for an Australian risk assessment, we note that the very great differences in climate

(and possibly catchment waters), together with the differences between data for pilot plants

and data under actual conditions make these reduction rates subject to even greater uncertainty.

It remains a largely unanswered question as to whether experimental European and American

results for treating pathogens in water translate to the Australian environment. Sydney Water

(David Roser, 2008, pers.comm.) has collected pre- and post-treatment data for Cryptosporidia

and Giardia, but these data are not publicly available. Signor [2007], collected baseline and

runoff data in a Sydney catchment for Cryptosporidia, Campylobacter, Giardia and E.coli to

consider risks to the water supply from heavy rainfall events. Toze et al. [2004] consider water

quality improvements for (among other things) pathogens with aquifer recharge. Again, such

data are not publicly available for reanalyis.

The majority of the proposed and actual recycling plants in WA are add-ons to existing

wastewater treatment plants, whose treated water now flows to the sea or a river. Heavy rainfall

events in a catchment, which can overload a wastewater treatment plant due to storm water

runoff, have little effect on the recycled part of the plant, which for the McGillvray project,

takes less than 4% of the total throughput of wastewater for reprocessing [Water Corporation,

2011a,b]. However, as we start to recycle greater and greater proportions of our wastewater,

high rainfall events may need to be factored in to recycling plant failures. Wastewater treatment

plants are required by law, bylaw or regulation to monitor their effluents for many contaminants,

but currently the routine monitoring regime only involves such things as suspended solids, total

nitrogen and total phosphorus [Water Corporation, 2010].


Many experiments to determine the likelihood of infection or of overt disease from a par-

ticular dose of a pathogen are summarised by Teunis et al. [1996]. Salmonella dose-response

relationships are typically based on the experiments of McCullough and Eisele [1951a,b,c,d]

from the early 50s. Rotavirus dose-response equations are usually based on Ward et al. [1986],

while those for Giardia are based on the experiments of Rentdorff [1954]. The errors in mea-

surement of the Salmonella experiments are probably extremely large (Toze, 2009, pers comm),

and the experiments themselves were conducted on adults (prisoners and ‘volunteers’). These

volunteers form a particular cohort with immunities (potentially) quite different from any/most

population subgroups today. Haas et al. [1999] shows that the dose-response infection rates

for Campylobacter jejuni taken from Teunis et al. [1996] are of the order of the infection rates

calculated for the Milwaukee outbreak. However, depending on the pathogen, patterns of im-

munity within the population can vary considerably between groups [Tawk et al., 2006] & over

time [Jacobsen and Koopman, 2004]. Blaser and Newman [1982], far earlier, point out the dif-

fering susceptibilities for different agegroups, for the immuno-compromised versus the healthy,

for those who have had gastric surgery and other subgroups.

Salmonella data for a risk assessment

A number of experiments involving Salmonella die-off using some of the treatments for sewage

have been conducted [Karim et al., 2008, Palacios et al., 2001]. Wastewater effluents have been

sampled for Salmonella [Kinde et al., 1997].

Salmonellosis is a notifiable disease in Australia and summaries of cases and rates of cases

by month and by state are available at http://www9.health.gov.au/cda/Source/Rpt 5.cfm. How-

ever, these case numbers and rates are always an underreporting. Hall et al. [2006] used surveys

to build a model to allow prediction of what fraction of those with a diarrhoeal disease might

make it through the multiple hurdles to be recorded in the national database. Hall [2004] reports

on Salmonellosis in a study of foodborne gastroenteritis in Australia. Hall and Kirk [2005] es-

timate how much enteritis is due to food, and in particular, what proportion of illness for each

2.3. Spatio-temporal modelling 63

pathogen may be attributed to food.

Given the non-availability of primary data for a risk assessment and the need to make many

assumptions, I chose to approach the risk assessment task from a largely hypothetical viewpoint.

Hence, the paper of Chapter 3.2 constructs, quantifies and looks at the implications of a BN for

the risk of diarrhoea associated with the use of recycled water, and in doing so, constructs

credible intervals for the point estimates of probabilities for the scenarios of interest. The paper

of Chapter 4.2 takes the flowchart of a risk assessment, converts it to a DAG containing all the

disparate data needed for the usual plug-in estimates, and shows that estimating these within the

risk assessment process contributes greater uncertainty to the risk estimates. It also shows how

to build an errors-in-variables model for the dose-response equations taking the view that this

type of model is a more appropriate model.

2.3 Spatio-temporal modelling

Motivating case-study for spatio-temporal modelling. Motivating this literature review is a

dataset supplied by the NSW Department of Agriculture, which consists of moisture measure-

ments taken at 15 depths from a field experiment laid out in 6 rows of 18 plots. The measure-

ments were made over a period of 5 years from June 26, 1995 to May 23, 2000, measurements

being made for 61 different dates, spaced roughly one month apart. The dimensions row, col-

umn, date and depth are essentially orthogonal. (Not all plots were measured on each sampling

day.) The experiment involved randomised complete blocks of 9 treatments, with measure-

ments taken every 20 cm from 20 cm to 300cm. The main concern was to assess the differences

between long fallow treatments (3 treatments) and opportunity cropping (2 treatments).

Thus, the data are lattice data measured over four dimensions. The purpose of the analysis

is estimate the effects of the nine treatments on the profiles of soil moisture over depth and over

time, and to estimate the difference between long fallow cropping and opportunity cropping.

These data are supplied as point-referenced data. Specialist methods for the analysis of


spatial data are required since observations cannot be thought of as independent. Observations

which are close to each other in space (and/or in time) are typically correlated with each other

and this autocorrelation must be accounted for in the analysis.

Methods for the analysis of two dimensional spatial data: Point-referenced vs areal data.

There are two broad ways of thinking of spatial data, firstly as point-referenced data, where each

response is located at a point in space, and secondly, as areal data, where the response may be a

summary or aggregate of data for an area.

Generally, when data are point-referenced, analysts use point-referenced analysis methods

for dealing with auto-correlation, while where the data are summarised over a spatial area,

analysts use areal data methods, typically conditional autoregressive (CAR) models.

These data are supplied as point-referenced data and therefore each day and depth of these

data could be analysed using the spatial methods available for point-referenced data, such as

those suggested in, for example, Banerjee et al. [2004] or Cressie [1991], where various as-

sumptions are made about the data, such as smoothness and differentiability, which lead to vari-

ogram fitting, kriging and perhaps assessment of isotropy. Examples of applied point-referenced

spatial data analyses are Clements et al. [2008], Raso et al. [2006]).

Cressie [1991], Schabenberger and Gotway [2005] distinguish data types by spatial domain.

Geostatistical data is such that the response variable Z(s) at point s is observable everywhere

within the spatial domain D. Thus, between any two sample locations, there are theoretically

an infinite number of possible samples. They contrast this with lattice and regional data, where

the domain D is fixed and discrete, and thus both non-random and countable, and comment

that such data usually represent areal regions, where the response is some aggregation over the

region. The distinction becomes important when change of support becomes an issue [Banerjee

et al., 2004, Gotway and Young, 2002, Schabenberger and Gotway, 2005].

The geostatistical model [Banerjee et al., 2004, Cressie, 1991] is based on a point process

being weakly stationary. A spatial process Y(s) is weakly stationary if µ(s) ≡ µ, where µ(s) =


E(Y(s)), and Cov(Y(s), Y(s + h)) = C(h) for all h ∈ ℜr where s and s + h lie within D ∈

ℜr. A third type of stationarity is intrinsic stationarity, where E(Y(s + h) − Y(s)) = 0, and

E(Y(s + h) − Y(s))2 = Var(Y(s + h) − Y(s)) = 2γ(h). That is, the variance of the difference is

a function of the distance between the two points and nothing else. Weak stationarity implies

intrinsic stationarity. In order to specify a stationary process a valid covariance function must

be provided. That is, c(h) ≡ cov(Y(s), Y(s + h)) is such that for any finite set of sites s1, s2, ..sn

and for any a1, a2, ..an, Var[∑

aiY(si)] =∑

aia jCov(Y(si),Y(s j)) =∑

aia jc(si − s j) >= 0,

with strict inequality if not all the ai are 0. That is, c(h) must be a positive definite function.

Generally, someone seeking to fit such a model chooses one of a variety of possible covariance

functions which satisfy this condition in ℜr (usually in ℜ2). (This paragraph paraphrases text

from Banerjee et al. [2004].) Geostatistical models allow prediction at points where data has

not been observed.

Point-referenced data have been contrasted with ’areal’ data, for which other special meth-

ods of analysis have been devised. However, while data may be aggregated by area, with the

area being defined by a polygon, rather than a point, it is not the case that the data analysis

method is necessarily dictated by the spatial referencing system. A Voronoi (or Dirichlet) tes-

selation (or polygon) (see, e.g., Green and Sibson [1978]) may be formed for any set of points,

and equally any polygon may be considered to have a point mass at some sort of centroid. Thus,

methods devised for one type of data or another may be used regardless of the original form of

the data. What matters is whether the method makes sense for the data at hand.

We note that for the agricultural data of our case-study, the ecological fallacy [Robinson,

1950, 2009], where an analysis of aggegrate data is used for inference on the individuals being

aggregated, is not an issue: these data are essentially point data, unlike agricultural yield data

which are aggregates of the plot.

Areal data are often analysed using the notion of ‘neighbour’, using adjacency matrices and

corresponding weight matrices. Such analyses are based largely on the work of Besag [1974]

and Besag et al. [1991], where a local conditional specification determines a joint and global


distribution and allows spatial smoothing [Banerjee et al., 2004]. Conditional autoregressive

(CAR) models were advocated for use in agricultural contexts by Besag et al. [1995], Besag and

Higdon [1999].

The differences between CAR and geostatistical models may be more apparent than real.

Following Rue and Tjelmeland [2002], Hrafnkelsson and Cressie [2003] ‘calibrate’ a geostatis-

tical model using a Matern covariance structure to a CAR model, and conclude that the CAR

model is faster to run. This is not surprising, since the CAR model employs a sparse precision

matrix, and requires no inversion of the covariance function. More recently, Besag and Mondal

[2005], Lindgren et al. [2011] show the equivalence of various geostatistical and CAR models.

CAR neighbourhood models have been widely used for areal data spatial analyses, particu-

larly in health or economics, where for privacy/ethical reasons, data are usually aggregated over

some administrative spatial unit. For examples, see, Clements et al. [2008], Reich et al. [2007],

Song et al. [2011], all of whose papers use the convolution prior of Besag et al. [1991].

Two dimensional agricultural data have a whole set of methodologies specially devised for

them. These include the use of kriging models [Banerjee et al., 2004, Cressie, 1991]. Com-

mercial packages such as SAS [SAS Institute, 2004] and Vesper [Whelan et al., 2001] allow

geostatistical models to be fitted. Within WinBUGS [Lunn et al., 2000], both CAR and geosta-

tistical models may be fitted.

2.3.1 Two dimensional Lattice data analyses

Spatial ‘lattice’ data have a long history of methodologies, since agricultural trials are typically

laid out in lattices. Spatial effects in agriculture have been recognised as an issue for a long time

and this has led to considerable effort in experimental design to avoid spatially biased estimates.

However, a recognition that experimental design alone cannot overcome the problem, since

it may still be the case that errors are spatially correlated, gave rise to work such as that of

Papadakis [1937], and later to that of Bartlett [1978] who established that Papadakis’ approach

gave more efficient estimates. Baird and Mead [1991] review a number of methods for local


adjustments, and remark that the first difference plus errors method from Besag and Kempton

[1986] is equivalent to the linear variance approach of Williams [1986] and show this to be a the

most efficient approach, or close thereto.

Cullis and Gleeson [1991] advocate the use of ARIMA methods for modelling both row

and column residuals. In a later version of this approach, Gilmour et al. [1997] fit a complete

blocks model and AR(1), AR(1) models as a starting point for their REML modelling, and

look at kriging graphs on the residuals to determine how the data may be better modelled by

the introduction of further ‘global’ extraneous random effects. This approach is continued in

the work of Durban et al. [2003] who in addition to the AR(1) models for local effects, fit

semiparametric smoothers separately by row and column. Singh et al. [2003] use a similar

approach but consider linear & cubic splines in various combinations with AR(1) models for

both rows and columns. Martin et al. [2006] consider efficient experimental designs for the

complex spatial analyses of Cullis and Gleeson [1991], Gilmour et al. [1997], and remark that

“Current practice in NSW Agriculture, Australia is to use a spatial model for the dependence

fitted using ASREML [Gilmour et al., 1995].”

Another set of possible ways of modelling is found in the mixed modelling approach of

Piepho et al. [2003, 2004], Piepho and Ogutu [2007], Piepho et al. [2008], Piepho and Williams

[2010]. Typically, a model based on the experimental design is fitted, together with global and

local spatial smooths as needed, see, e.g., Stefanova et al. [2009]. Such a model may deal

with anisotropy using the AR1*AR1 choice of Gilmour et al. [1997], or by fitting an anistropic

kriging smooth as in Stefanova et al. [2009].

Thus, the critical first choice for the three dimensional modelling was a to choose a frame-

work for two-dimensional modelling. Whatever model was chosen needed to be common to all

depths and days.

A disadvantage of the CAR modelling choice within the WinBUGS framework is that

weights must be chosen a priori and not estimated as in Besag and Higdon [1999] and Besag

and Mondal [2005]. The lattice framework, so typically found in agricultural data, generally


needs an anisotropic treatment such as that found in the Besag models already cited and in, for

example, the models of Gilmour et al. [1997], Stefanova et al. [2009].

The primary difficulty was the determination of the spatial model. Given that the data are

point referenced, an obvious choice for a spatial autocorrelation model was a kriging model,

such as that of Gotway and Cressie [1990]. However, the large number of terms (45-135) in

the fixed part of the model made such an approach impossible within the MCMC framework

of WinBUGS. Additionally, including depth in the calculation of distance, would have meant

greater difficulties in disentangling treatment effects over depth from spatial modelling consider-

ations. Software such as SAS PROC MIXED [SAS Institute, 2004] offers the possibility of both

kriging and the various correlation model structures of Gilmour et al. [1997] within a REML

or ML framework, but when a model is poorly specified or very complex, PROC MIXED can

be difficult to use, and neither SAS [SAS Institute, 2004] nor ASREML [Butler et al., 2007,

Gilmour et al., 2005], nor Genstat [VSN International, 2011], nor Vesper [Whelan et al., 2001]

is freely available. We had hoped to show that the CAR models used were comparable to the

AR(1), AR(1) basis models of Gilmour et al. [1997], Stefanova et al. [2009]. However, the

AR(1), AR(1) models were unable to be fitted within WinBUGS with the desired complexity,

but did show comparability with the best CAR models of Table 5.1 (∆DIC = 257). A general

consensus for agricultural lattice data is that they should be dealt with anisotropically.

Alternative methods for lattice data are considered by Besag and Kempton [1986], Besag

[1974], Besag et al. [1995], Besag and Higdon [1999] using variants of conditional autoregres-

sive (CAR) models. These methods for analysing spatially correlated data have been available

via the freely available software, WinBUGS, for some time and many papers have been written

using conditional auto-regressive (CAR) models to smooth spatial data, particularly in the field

of spatial epidemiology, see, for example, Bernardinelli et al. [1995], Clements et al. [2008],

Earnest et al. [2010], Elliott [2000].


2.3.2 Agricultural studies with measurements at different depths

There are a large number of soil studies which consider some soil characteristic at various depths

in the soil. Some do this in circumstances almost identical to this study. Thus, e.g., Wong et al.

[2008] look at soil depth profiles for soil organic carbon. The study analyses depth profiles

only. Macdonald et al. [2009] look at 3 soil profiles, taken from 0.1 to 1.8 m and use observed

means and standard deviations for each depth and soil for statistical testing. Sleutel et al. [2009]

compare three different forests soils after compositing. There is no spatial modelling in any

dimension for these data. Strahm et al. [2009] look at dissolved organic carbon and dissolved

organic nitrogen at two depths (20 cm and 100 cm) and compare two harvesting treatments

in forests, using monthly measurements taken over a period of 3 years, and compare various

characteristics via regressions on geometric means. Comparisons are made via t-tests using

observed means and standard deviations. Shillito et al. [2009] use REML and SAS PROC

MIXED [SAS Institute, 2004] to deal with two dimensional spatial correlation using kriging in

a study of potato yields as a response to a nitrogen fertilizer experiment.

Wong et al. [2008] fit depth profiles at 7 sites, accounting for correlation over depth by

allowing correlation to diminish as a power of distance and allowing for heterogeneous variances

between depths. For the depth profile curves they used the cubic spline approach of Verbyla

et al. [1999]. Nayyar et al. [2009] account for spatial variation using the mixed model approach

of Wang and Goonewardene [2004] and fit 4 depths as fixed categorical effects, ignoring their

orderedness and continuity, which is reasonable with just 4 depths. Differing variances for

different depths are not used.

Ayars et al. [2009] look at groundwater measurements over time at two depths in a field

trial. The combinations of different depths and treatments are treated as differing treatments.

The spatial dimensions of the data in the field are not used and the analysis becomes a time

series analysis of treatments over time, with autocorrelation modelled as an AR(1) process.

Thus, despite the papers above involving the third spatial dimension, they provide little in


the way of a paradigm for a three-dimensional analysis.

2.3.3 Spatio-temporal data analyses

When we consider two-dimensional spatial analyses and time, we find that analyses fall into the

following categories:

1. Simple description. See, e.g. Bell et al. [2007], Teschke et al. [2001] who use maps at

various timepoints as descriptive devices.

2. Spatial analyses at various timepoints.

3. Temporal analyses at various spatial sites. See, e.g.Lemos et al. [2007], who, having

failed to find much influence in terms of spatial proximity, model time sequences for each

site using timeseries methods.

4. Spatio-temporal analyses with separable time and site effects.

5. Spatio-temporal analyses with both separable & non-separable time and site effects.

In group (4) we find Knorr-Held and Besag [1998], who in an early space-time model for epi-

demiological data, model the residuals from their fixed model as a spatial residual component

(which remains constant over time) and a time residual (which remains constant over space) plus

an unmodelled residual. Adebayo and Fahrmeir [2005], while including time varying covariates,

again include a common residual spatial component and a common residual time component.

Another model with complete separability of time and space effects is used by Crook et al.

[2003], who use the nonparametric penalised spline smooths available in BayesX [Belitz et al.,

2009a,b] to fit a smooth over time, in addition to smooths over covariates such as age, while

fitting the spatial CAR model of Besag et al. [1991]. The more complex models are generally

Bayesian and use either geostatistical methods or the CAR priors of Besag et al. [1991]. Models

with CAR priors typically partition the error term in the model, ϵit, as ei, et and eit, where the first

two error terms capture the structured spatial random effects and the structured temporal random


effects, and eit is a simple unstructured random effect with eit ∼ N(0, σ2). See, e.g., Adebayo

and Fahrmeir [2005], Crook et al. [2003], Knorr-Held and Besag [1998], Poncet et al. [2010],

Waller et al. [1997], where the last three papers use BayesX software[Belitz et al., 2009a,b] to

conduct the analysis. In these models, again, no structured time-space interaction effects are

fitted, only a final common error term. The structure of the group (4) models recognises that for

data aggregated over administrative units, as is so often the case for epidemiological data, the

commonalities of that area, in terms of demographic structure, but even in terms of various kinds

of pollution exposure are likely to be relatively constant for the time periods considered. And

again it is not unreasonable to assume that the time scale errors, which might be thought of as

the result of administrative changes, are common to the entire map. Thus, many spatio-temporal

analyses of epidemiological data use this framework.

In the group (5) models, in an analysis of epidemiological space time data, Abellan et al.

[2008] postulate a model with two CAR components, one for a constant spatial residual struc-

ture and another CAR model for the time component, but they fit the final space-time random

component as a Gaussian contamination mixture of large and small residuals, thereby allow-

ing the identification of sites and times which deviate from the common spatial residual and

common time residual of the models of group (4).

Assuncao et al. [2001] integrate time and space in a different way. They fit quadratics in time

which differ for each spatial location, but for which the coefficients are smoothed using CAR

priors. This elegant solution to a very short time sequence data allows the possibility of seeing

increasing and decreasing infection rates, while accounting for spatial closeness, and Assuncao

[2003], Assuncao et al. [2002] again use space-varying regression coefficients. In a further

variation, Yan and Clayton [2006] use the space-time interaction to define a set of space-time

separable clusters carrying a specific risk, and fit a final unstructured random effect.

A further very different approach to spatial modelling is that of Higdon [1998] who uses a

convolution approach with non-stationary dependence structures. This allows local anisotropies.

The approach was needed because the geostatistical models of Cressie [1991] are largely un-


workable with large datasets. Papers using this approach for spatial smoothing are Lemos and

Sanso [2009], Sahu and Challenor [2008], who give snapshots over time (group 2). A point

made by Higdon [1998] is that ocean datasets over time differ markedly from most other spatio-

temporal data in that measurements at a different time are not made at the same spatial location.

Looking at spatio-temporal analyses within an agricultural context, the analysis of Trought

and Bramley [2011] effectively fits all spatio-temporal interactions to look at the quality of grape

juice by site across time. Their strategy is to fit different curves across time for each site, and

then to look at spatial outcomes of their model by mapping (a group 3 type approach).

In considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu

[2007], Piepho et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009]

use mixed models within a REML framework to analyse their spatio-temporal data, and explic-

itly address the fitting of state-space models via standard software and REML. The fixed part of

their models is generally simple and the data are measured on two spatial dimensions.

When we move to a third spatial dimension (depth) the soil profile study of Macdonald et al.

[2009] does not use spatial information in the analysis. Other studies composite the soils from

different depths across soil types or treatment [Sleutel et al., 2009], while others [Nayyar et al.,

2009] use the mixed modelling framework advocated by Piepho et al. [2004]. Within a spatial

context only, Haskard et al. [2007] fit an anisotropic geostatistical model.

A major difference between the agricultural data of our study and epidemiological data

which is so often modelled using the convolution CAR prior of Besag et al. [1991] for the

spatially structured error, a structured temporal random term and an unstructured error with a

variance common over both space and time, is that the spatial units of epidemiological data tend

to vary slowly over the time scale of a few years. In contrast, the moisture data modelled here,

vary markedly from sampling day to sampling day, and it is clear that the simple separable vari-

ance decomposition used by so many epidemiological models, does not describe the data well.

For the agriculture data we consider, it is not reasonable to assume constant spatial residual com-

ponents over time as in Adebayo and Fahrmeir [2005], Knorr-Held and Besag [1998], neither


for the full time period of 5 years, nor for monthly time intervals. Nor does the elegant stability

model of Abellan et al. [2008] have anything to offer here. We chose to assume spatial residuals

differed for each time period, since it was difficult to postulate a reasonable relationship for the

evolution of the spatial residuals.

In moving to four dimensions, there are yet more possibilities for the decomposition of

the fixed and error parts of the model. However, in the context of differing treatments for

differing plots in the horizontal dimensions, with the same treatment along the depth profile

at each plot, and in the context of different scales between the depth measurements and the

distances between plots, it was a simple decision to exclude depth from the neighbourhood

error structures. If depth neighbours were to be included as neighbours with equal weights,

the horizontal layer information is downweighted. If we weight using functions of distance,

the horizontal correlations become effectively irrelevant. This choice to deal with the third

dimension differently is made by others analysing three dimensional data. Ridgway et al. [2002]

modelling ocean measurements separate out the depth component in their loess data fits.

2.3.4 Four dimensional spatio-temporal data analyses

Large four-dimensional datasets are found in the oceanographic literature and some different

approaches are given in Holbrook and Bindoff [2000], Lemos and Sanso [2009], Ridgway et al.

[2002]. The papers of both Holbrook and Bindoff [2000], Ridgway et al. [2002] consider in-

terpolation methods to create realistic grid points of data for further analysis. Ridgway et al.

[2002], who use Loess quadratic fits, treat depth quite differently from the coordinates of lati-

tude and longitude(X, Y). The quadratic surface they fit to the latitude and longitude includes an

XY term in addition to X2 and Y2 terms, but there are no crossed terms for depth. Additionally,

their weighting term which determines neighbours to be included treats depth differently from

the latitude and longitude coordinates.

What is clear from this Literature Review, is that the decision to deal with depth separately

is a decision made by many before us and made for the same reasons, namely that the mea-


surements in the third dimension are made on a very different scale from those of the other

two dimensions. In considering the agricultural data of our study, the decision to fit a complete

spatio-temporal interaction model (the daily model fitted for each day) is appropriate when com-

mon spatial and temporal residuals are unlikely.

2.4. Addendum: The dynamic risk assessment model 75

2.4 Addendum: The dynamic risk assessment model

The example (below, Figure 2.3) of a dynamic risk assessment model is taken directly from

Eisenberg et al. [2002].

“This conceptual modeling methodology is dynamic and population based; that is,

the risk of infection manifests at the population level. Specifically, in the transmis-

sion of infectious diseases (but not of diseases due to chemical exposure), the risk

of disease due to pathogen exposure depends on the disease status of the population

and potentially on the contact patterns within the population. Figure 2.3 is a dia-

gram of a transmission model for enteric pathogens. Each box represents one state

of the system. Five of the six states represent the epidemiologic states of the popu-

lation: S , susceptible; E, latent (infected but noninfectious and asymptomatic); IS .

diseased (infectious and symptomatic); IA, carrier (infectious but asymptomatic); P,

immune (either partial or complete). The sixth state, W, represents concentration of

pathogens in the environment. Members of a given state may move to another state

based on the causal relationships of the disease process. For example, members of

the population who are in the susceptible state may move to the diseased state after

exposure to a pathogenic agent.

To describe the epidemiology of enteric pathogen transmission, the conceptual

model includes both state variables and rate parameters. State variables (S , E,

IS , IA, and P) track the number of individuals in each of the states at any given

point in time and are defined such that S + E + IS + IA + P = N (i.e., the sum of

the state variables equals the total population). The rate parameters determine the

movement of the population from one state to another. In general, the rate parame-

ters are β, the rate of transmission from a noninfected state, S , to an infected state,

E, due to both environmental (e,g., drinking water) and person-person exposure to

a pathogen; α, the rate of movement from exposure to illness; δ and σ, the rates of


recovery from an infectious state, IS or IA, respectively, to the postinfection state,

P; γ, the rate of movement from the postinfection state (partial immunity), P, to

the susceptible state, S ; ϕ, the rate of shedding of pathogens into the environment

by infectious individuals; and ξ, the per capita mortality rate of the pathogen in the

environment. An additional parameter in the model, ρ, represents the proportion

of asymptomatic infections. For more mathematical detail pertaining to the model,

see (the) publications, Brookhart et al. [2002], Eisenberg et al. [1996].”

dS (t)dt = γP(t) − (β + η[IA(t) + IS (t)])S (t)

dE1(t)dt = (β + η[IA(t) + IS (t)])S (t) − αE1(t)

dE2(t)dt = αE1(t) − αE2(t)

...

dEk(t)dt = αEk−1(t) − αEk(t)

dIA(t)dt = ραEk(t) − δIA(t)

dIS (t)dt = (1 − ρ)αEk(t) − δIS (t)

dP(t)dt = δ[IA(t) + IS (t)]

(2.1)

The diagram of Eisenberg et al. [2002] does not match their text (quoted above). Figure 2.3

implies that δ = σ, and that the rate of recovery from the infectious state is the same whether

a person is asymptomatic or not, and this common rate, δ is seen in the equations 2.1. Again,

differing from the text, the infectious population, both asymptomatic and asymptomatic, have

been assumed to be infecting the uninfected at a common rate, η, both in the diagram and in

the equation. A further difference is the sequence of ‘boxes’ in the latently infected group. This

corresponds to modelling a ‘distributed delay’ over the latent infection period. See Eisenberg

et al. [1998].

2.4. Addendum: The dynamic risk assessment model 77

Figure 2.3 The dynamic model of Eisenberg et al. [2002]. Schematic diagram of transmissionmodel. t, independent variable representing time. Solid lines represent movement ofindividuals from one state to another. Dashed lines represent movement of pathogenseither directly from infectious host to susceptible host or indirectly via the environ-ment. State variables and parameters are defined in the text..

\

~~GJ -

S(t)

Susceptible ;;y Ek -

Latently infected

(3 T)

:

)I

IA(t) j W~o~ \ asymptomatic 1 : n ,1 P<t>

\' IS(t) '

' ' ' ' ' ' ' ' ' '

W~o~

symptomatic

/1 Protected

W(t) Pathogen concentration

-------~--------------J

in water _j


Bibliography

Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investi-

gate the stability of patterns of disease.(Mini-Monograph). Environmental Health Perspec-

tives 116(8), 1111–1119.

Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive

discrete-time survival models. Statistics in Medicine 24(5), 709–728.

Albert, I., E. Grenier, J.-B. Denis, and J. Rousseau (2008). Quantitative Risk Assessment from

Farm to Fork and Beyond: A Global Bayesian Approach Concerning Food-Borne Diseases.

Risk Analysis 28(2), 557–571.

Andersen, S., K. Olesen, F. Jensen, and F. Jensen (1989). Hugin - a shell for building Bayesian

belief universes for expert systems. In Eleventh International Joint Conference on Artificial

Intelligence, Detroit, Michigan, pp. 1080–1085.

Anderson, R. and R. May (1991). Infectious diseases of humans: dynamics and control. New

York: Oxford University Press.

Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson

(2005). Microbial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers

University of Technology.

Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environ-

metrics 14(5), 453–473.

Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter

model applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.

Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishma-

BIBLIOGRAPHY 79

niasis in a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in

Medicine 20(15), 2319–2335.

Ayars, J. E., P. Shouse, and S. M. Lesch (2009). In situ use of groundwater by alfalfa. Agricul-

tural Water Management 96(11), 1579–1586.

Baird, D. and R. Mead (1991). The empirical efficiency and validity of two neighbour models.

Biometrics 47(4), 1473–1487.

Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for

spatial data. Monographs on statistics and applied probability. Boca Raton, London, New

York, Washington D.C.: Chapman & Hall.

Barker, G. C., N. L. C. Talbot, and M. W. Peck (2002). Risk assessment for Clostridium bo-

tulinum: a network approach. International Biodeterioration& Biodegradation 50(3-4), 167–

175.

Bartlett, M. (1978). Nearest neighbour models in the analysis of field experiments. Journal of

the Royal Statistical Society. Series B (Methodological) 40(2), 147–174.

Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). BayesX Software for Bayesian Infer-

ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at

http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25,

2010.

Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). BayesX Software for Bayesian Infer-

ence in Structured Additive Regression Models Version 2.0.1 Software Methodology Manual.

Online at http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: Oc-

tober 25, 2010.

Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in


PM2. 5 chemical composition in the United States for health effects studies. Environmental

Health Perspectives 115(7), 989–995.

Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995).

Bayesian analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22),

2433–2443.

Besag, J. and R. Kempton (1986). Statistical analysis of field experiments using neighbouring

plots. Biometrics 42(2), 231–251.

Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with

discussion). J. R. Statist. Soc. B 36(2), 192–236.

Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochas-

tic systems. Statistical Science 10(1), 3–41.

Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal

of the Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.

Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.

Biometrika 92(4), 909–920.

Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in

spatial statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–

59.

Blaser, M. J. and L. S. Newman (1982). A review of human salmonellosis: I. Infective dose.

Reviews of infectious diseases 4(6), 1096–1106.

Boerlage, B. (1992). Link Strength in Bayesian Networks. Ph. D. thesis, University of British

Columbia, Canada.

BIBLIOGRAPHY 81

Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, in-

cluding longitudinal experiments. Journal of Agricultural, Biological, and Environmental

Statistics 14(3), 253–280.

Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg

(2002). Statistical estimation of parameters in a disease transmission model: analysis of a

Cryptosporidium outbreak. Statistics in Medicine 21, 3627–3638.

Burgman, M. (2005). Risks and Decisions for Conservation and Environmental Management.

New York: Cambridge University Press.

Butler, D. G., B. R. Cullis, A. R. Gilmour, and B. J. Gogel (2007). Analysis of Mixed Models for

S Language Environments, ASReml-R Reference Manual Release 2, Volume No. QE02001 of

Training and Development Series. Brisbane, Australia: Queensland Department of Primary

Industries and Fisheries.

Casman, E. A., B. Fischhoff, C. Palmgren, M. J. Small, and F. Wu (2000). An integrated risk

model of a drinking water borne cryptosporidiosis outbreak. Risk Analysis 20(4), 495–511.

Castillo, E., J. M. Gutierrez, and E. Castillo (1997). Sensitivity analysis in discrete Bayesian

networks. IEEE Transactions on Systems, Man & Cybernetics: Part A 27, 412–423.

Castillo, E., J. M. Gutierrez, A. S. Hadi, and C. Solares (1997). Symbolic propagation and

sensitivity analysis in Gaussian Bayesian networks with application to damage assessment.

Artificial Intelligence in Engineering 11, 173–181.

Clements, A., S. Brooker, U. Nyandindi, A. Fenwick, and L. Blair (2008). Bayesian spatial

analysis of a national urinary Schistosomiasis questionnaire to assist geographic targeting of

Schistosomiasis control in Tanzania, East Africa. International Journal for Parasitology 38,

401–415.


Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F.

Gabrielli, and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated

uncertainty, West Africa. Emerging Infectious Diseases 14(10), 1629–1632.

Cowell, R. G. and A. P. Dawid (1992). Fast retraction of evidence in a probabilistic expert

system. Statistics and Computing 2(1), 37–40.

Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic

Networks and Expert Systems. Springer.

Cressie, N. A. C. (1991). Statistics for spatial data. Wiley series in probability and mathematical

statistics. Applied probability and statistics. New York: John Wiley.

Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time

to event data: a case study using months from angiography to coronary artery bypass graft

(CABG). Statistics in Medicine 22(18), 2943–2961.

Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to

two dimensions. Biometrics 47, 1449–1460.

Darroch, J. N., S. L. Lauritzen, and T. P. Speed (1980). Markov fields and log-linear interaction

models for contingency tables. The Annals of Statistics 8(3), 522–539.

Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert

systems. Statistics and Computing 2(1), 25–36.

Dawid, A. P., U. Kjaerulff, and S. L. Lauritzen (1995). Hybrid propagation in junction trees.

In Advances in Intelligent Computing - Ipmu ’94, Volume 945 of Lecture Notes in Computer

Science, pp. 87–97. Springer Verlag KG.

Dillon, P., D. Page, J. Vanderzalm, P. Pavelic, S. Toze, E. Bekele, J. Sidhu, H. Prommer, S. Hig-

ginson, R. Regel, S. Rinck-Pfeiffer, M. Purdie, C. Pitman, and T. Wintgens (2008). A critical

BIBLIOGRAPHY 83

evaluation of combined engineered and aquifer treatment systems in water recycling. Water

Science & Technology - WST 57(5), 753–762.

Durban, M., C. A. Hackett, J. W. McNicol, A. C. Newton, W. T. B. Thomas, and I. D. Currie

(2003). The practical use of semiparametric models in field trials. Journal of Agricultural

Biological and Environmental Statistics 8(1), 48–66.

Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn,

D. Muscatello, and K. Mengersen (2010). Small area estimation of sparse disease counts

using shared component models-application to birth defect registry data in New South Wales,

Australia. Health & Place 16, 684–693.

Edwards, D. (1995). Introduction to Graphical Modelling. New York: Springer-Verlag.

Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an

epidemiological framework. Risk Analysis 16, 549–563.

Eisenberg, J. N. S., M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr (2002). Disease

transmission models for public health decision making: Analysis of epidemic and endemic

conditions caused by waterborne pathogens. Environmental Health Perspectives 110(8), 783–

790.

Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An anal-

ysis of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection

process. Epidemiology 9(3), 255–263.

Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publica-

tions. Oxford: Oxford University Press.

Fewtrell, L. and J. Bartram (2001). Water Quality: Guidelines, Standards and Health. London:

World Health Organisation.


Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extrane-

ous variation in the analysis of field experiments. Journal of Agricultural Biological and

Environmental Statistics 2, 269–293.

Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide

Release 2.0. Technical report, VSN International Ltd, Hemel Hempstead, UK.

Gilmour, A. R., R. Thompson, and B. R. Cullis (1995). Average information REML: an efficient

algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–

1450.

Gordon, C. and S. Toze (2003). Influence of groundwater characteristics on the survival of

enteric viruses. Journal of Applied Microbiology 95(3), 536–544.

Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water

infiltration. Water resources research 26(11), 2695–2703.

Gotway, C. A. and L. J. Young (2002). Combining incompatible spatial data. Journal of the

American Statistical Association 97(458), 632–648.

Green, P. J. and R. Sibson (1978). Computing Dirichlet tessellations in the plane. Computer

Journal 21, 168–173.

Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.),

Water Quality: Guidelines, Standards and Health. WHO.

Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New

York: Wiley.

Hall, G. (2004). Results from the National Gastroenteritis Survey 2001 2002. Technical Report

NCEPH Working Paper Number 50, National Centre for Epidemiology & Population Health.

Hall, G. and M. Kirk (2005). Foodborne illness in Australia annual incidence circa 2000. Tech-

nical report, Australian Government Department of Health and Ageing.

BIBLIOGRAPHY 85

Hall, G., J. Raupach, and K. Yohannes (2006). An estimate of under-reporting of foodborne no-

tifiable diseases: Salmonella Campylobacter Shiga toxin producing E. coli (STEC). Technical

report, National Centre for Epidemiology & Population Health.

Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen

(2007). Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula

bloom initiation in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–

1287.

Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spa-

tial prediction using REML. Journal of Agricultural, Biological, and Environmental Statis-

tics 12(2), 147–160.

Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North

Atlantic Ocean. Environmental and Ecological Statistics 5, 173–190.

Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and

G. Medema (2007). Removal and fate of Cryptosporidium parvum, Clostridium perfrin-

gens and small-sized centric diatoms (Stephanodiscus hantzschii) in slow sand filters. Water

Research 41, 2151–2162.

Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms

by drinking water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Nether-

lands.

Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-

organisms by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The

Netherlands.

Hijnen, W. A. M., J. F. Schijven, P. Bonne, A. Visser, and G. J. Medema (2004). Elimination

of viruses, bacteria and protozoan oocysts by slow sand filtration. Water Science & Technol-

ogy 50(1), 147–154.


Holbrook, N. and N. Bindoff (2000). A statistically efficient mapping technique for four-

dimensional ocean temperature data. Journal of Atmospheric and Oceanic Technology 17(6),

831–846.

Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application

to nuclear fall-out. Environmental and Ecological Statistics 10, 179–200.

Hrudey, S. E., P. M. Huck, P. Payment, R. W. Gillham, and E. J. Hrudey (2002). Walkerton:

Lessons learned in comparison with waterborne outbreaks in the developed world. Journal

of Environmental Engineering and Science 1(6), 397–407.

Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6,

2008.

Hugin Expert A/S (2007). Hugin Expert - Publications. Available on:

www.hugin.com/developer/Publications/. Accessed: November 6, 2008.

Hunter, P. R. and L. Fewtrell (2001). Acceptable risk. In L. Fewtrell and J. Bartram (Eds.),

Water Quality: Guidelines, Standards and Health. WHO.

Isaac, D. (2008a). Email: June 27,2008: Re: Fw: Recycled water: measurements required

under licence by the Health Department.

Isaac, D. (2008b). Fit for purpose guidelines for recycled water. email, received June 26, 2008.

Jacobsen, K. and J. Koopman (2004). Declining hepatitis A seroprevalence: a global review and

analysis. Epidemiology and Infection 132, 1005–1022.

Jensen, F. (1994). Implementation aspects of various propagation algorithms in Hugin. Techni-

cal Report Research Report R-94-2014, Department of Mathematics and Computer Science,

Aalborg University, Denmark, Aalborg, Denmark.

Jensen, F. (2001). Bayesian Networks and Decision Graphs. Springer.

BIBLIOGRAPHY 87

Jensen, F. V., S. H. Aldenryd, and K. B. Jensen (1995). Sensitivity analysis in Bayesian net-

works. Lecture Notes in Artificial Intelligence 946, 243.

Jensen, F. V., B. Chamberlain, T. Nordahl, and F. Jensen (1991). Analysis in Hugin of data con-

flict. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence,

UAI ’90, New York, NY, USA, pp. 519–528. Elsevier Science Inc.

Karanis, P., C. Kourenti, and H. Smith (2007). Waterborne transmission of protozoan parasites:

A worldwide review of outbreaks and lessons learnt. Journal of Water and Health 5(1), 1–38.

Karim, M. R., E. P. Glenn, and C. P. Gerba (2008). The effect of wetland vegetation on the

survival of Escherichia coli, Salmonella typhimurium, bacteriophage MS-2 and polio virus.

Journal of Water and Health 06(2), 167–175.

Kennett, R. J., K. B. Korb, and A. E. Nicholson (2001). Seebreeze prediction using Bayesian

networks: a case study. Lecture Notes in Computer Science 2035, 148–153.

Khan, S. J. (2010). Quantitative chemical exposure assessment for water recycling schemes.

Waterlines Report Series, No 27. Australian Government National Water Commission.

Kinde, H., M. Adelson, A. Ardans, E. H. Little, D. Willoughby, D. Berchtold, D. H. Read,

R. Breitmeyer, D. Kerr, R. Tarbell, and E. Hughes (1997). Prevalence of Salmonella in

municipal sewage treatment plant effluents in Southern California. Avian Diseases 41(2),

392–398.

Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics

in Medicine 17, 2045–2060.

Korb, K. B. and A. E. Nicholson (2004). Bayesian Artificial Intelligence. London: CRC Press.

Laskey, K. B. (1995). Sensitivity analysis for probability assessments in Bayesian networks.

IEEE Transactions on Systems, Man and Cybernetics 25, 901–909.


Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data.

Computational Statistics & Data Analysis 19, 191–201.

Lauritzen, S. L. and D. J. Spiegelhalter (1988). Local computations with probabilities on graph-

ical structures and their application to expert systems. Journal of the Royal Statistical Society.

Series B (Methodological) 50(2), 157–224.

Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend

fields of North Atlantic sea surface temperature. Journal of the American Statistical Associ-

ation 104(485), 5–18.

Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in

a central California estuary. Journal of Agricultural, Biological, and Environmental Statis-

tics 12(3), 379–396.

Lindgren, F., H. Rue, and J. Lindstrom (2011). An explicit link between Gaussian fields and

Gaussian Markov random fields: the stochastic partial differential equation approach. Journal

of the Royal Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.

Lumina Decision Systems (2004). Analytica. Available on:

www.lumina.com/ana/editiondescriptions.htm. Accessed: April 24, 2008.



325–337.

Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and

I. White (2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid

sulfate landscape. Applied Clay Science 46(2), 166–175.

Marcot, B. G. (2006). Characterizing species at risk I: Modeling rare species under the North-

west Forest Plan. Ecology and Society 11(2), 10.

BIBLIOGRAPHY 89

Marcot, B. G., P. A. Hohenlohe, S. Morey, R. Holmes, R. Molina, M. C. Turley, M. H. Huff,

and J. A. Laurence (2006). Characterizing species at risk II: Using Bayesian belief networks

as decision support tools to determine species conservation categories under the Northwest

Forest Plan. Ecology and Society 11(2), 12.

Martin, J. E., T. Rivas, J. M. Matas, J. Taboada, and A. Argelles (2009). A Bayesian network

analysis of workplace accidents caused by falls from a height. Safety Science 47(2), 206–214.

Martin, R. J., N. Chauhan, J. A. Eccleston, and B. S. P. Chan (2006). Efficient experimental

designs when most treatments are unreplicated. Linear Algebra and its Applications 417(1),

163–182.

Matias, J. M., T. Rivas, C. Ordonez, J. Taboada, and J. M. Matias (2007). Assessing the envi-

ronmental impact of slate quarrying using Bayesian networks and GIS. In AIP Conference,

Volume 963, pp. 1285–1288.

McCullough, N. B. and C. W. Eisele (1951a). Experimental human salmonellosis: I. pathogenic-

ity of strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried

whole egg. The Journal of Infectious Diseases 88(3), 278–289.

McCullough, N. B. and C. W. Eisele (1951b). Experimental human salmonellosis: II. Immunity

studies following experimental illness with Salmonella meleagridis and Salmonella anatum.

The Journal of Immunology 66(5), 595–608.

McCullough, N. B. and C. W. Eisele (1951c). Experimental human salmonellosis: III.

Pathogenicity of strains of Salmonella newport, Salmonella derby, and Salmonella bareilly

obtained from spray-dried whole egg. The Journal of Infectious Diseases 89(3), 209–213.

McCullough, N. B. and C. W. Eisele (1951d). Experimental human salmonellosis: IV.

Pathogenicity of strains of Salmonella pullorum obtained from spray-dried whole egg. The

Journal of Infectious Diseases 89(3), 259–265.


Mons, M., J. Van der Wielen, E. Blokker, M. Sinclair, K. Hulshof, F. Dangendorf, P. Hunter,

and G. Medema (2007). Estimation of the consumption of cold tap water for microbiological

risk assessment: an overview of studies and statistical analysis of data. Journal of Water and

Health 5(1), 151–170.

Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification

& risk assessment for drinking water supplies. Technical report, CRC for Water Quality and

Treatment.

Natural Resource Management Ministerial Council, Environment Protection and Heritage

Council, and Australian Health Ministers Conference (2006). Australian Guidelines for

Water Recycling: Managing health and environmental risks (Phase1) 2006. Available on:

www.ephc.gov.au/taxonomy/term/39. Accessed: March 29, 2008.

Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil micro-

bial quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1),

115–121.

Neapolitan, R. E. and X. Jiang (2007). Probabilistic Methods for Financial and Marketing

Informatics. Elsevier.

Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian net-

works for water quality prediction in Sydney Harbour. Available

online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Accessed: March 27,2008.

Olivieri, A. W., R. Danielson, J. N. Eisenberg, L. Johnson, V. Pon, R. Sakaji, R. Soller, J. A.

Soller, J. Stephenson, and C. Trese (2007). Evaluation of microbial risk assessment tech-

niques and applications in water reclamation. Technical report, Water Environment Research

Foundation (WERF), Alexandria, VA. Available online: www.werf.org/AM/.

Palacios, M. P., P. Lupiola, M. T. Tejedor, E. Del-Nero, A. Pardo, and L. Pita (2001). Climatic

BIBLIOGRAPHY 91

effects on Salmonella survival in plant and soil irrigated with artificially inoculated wastewa-

ter: preliminary results. Water Science Technology 43(12), 103–108.

Papadakis, J. S. (1937). Mthode statistique pour des expriences sur champ. Bulletin scientifique

damlioration des plantes de Thessalonique 23, 30.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference.

San Mateo, California: Morgan Kaufmann Publishers.

Petterson, S. and N. Ashbolt (2001). Viral risks associated with wastewater reuse: modeling

virus persistence on wastewater irrigated salad crops. Water Science and Technology 43(12),

23–26.

Petterson, S., N. Ashbolt, and A. Sharma (2001). Microbial risks from wastewater irrigation of

salad crops: A screening-level risk assessment. Water Environment Research 73(6), 667–672.

Petterson, S. A. and N. J. Ashbolt (2006). WHO Guidelines for the safe use of wastewater

and excreta in agriculture microbial risk assessment section. Technical report, World Health

Organization.

Petterson, S. R. (2002). Microbial Risk Assessment of Wastewater Irrigated Salad Crops. Ph.

D. thesis, University of New South Wales.

Piepho, H. P., A. Buchse, and K. Emrich (2003). A hitchhiker’s guide to mixed models for

randomized experiments. Journal of Agronomy and Crop Science 189(5), 310–322.

Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized

experiments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–

247.

Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework.

American Statistician 61(3), 224–232.


Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear

variance models in plant breeding trials. Biometrical Journal 50(2), 164–189.

Piepho, H. P. and E. R. Williams (2010). Linear variance models for plant breeding trials. Plant

Breeding 129(1), 1–8.

Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal

of the American Water Resources Association 40(6), 1563–1578.

Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of

ecotoxicological information in risk assessments. Australian Journal of Ecotoxicology 11,

57–58.

Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elici-

tation for parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman,

and A. Rizzoli (Eds.), Third Biennial Meeting: Summit on Environmental Modelling and

Software, Burlington, USA.

Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Param-

eterisation and evaluation of a Bayesian network for use in an ecological risk assessment.

Environmental Modelling and Software 22, 1140–1152.

Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal

analysis of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest

Entomology 12(3), 325–332.

Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety

Science 27(2-3), 183–213.

Raso, G., P. Vounatsou, L. Gosoniu, M. Tanner, E. K. N’Goran, and J. Utzinger (2006). Risk

factors and spatial patterns of hookworm infection among schoolchildren in a rural area of

western Cte d’Ivoire. International Journal for Parisitology 36(2), 201–210.

BIBLIOGRAPHY 93

Rassmussen, L. (1995). Bayesian network for blood typing and parentage verification of cattle.

Technical report, Department of Mathematics and Computer Science, Aalborg University,

Denmark. Hugin reference Hugin 6.9.

Reich, B., J. Hodges, and B. Carlin (2007). Spatial analyses of periodontal data using condition-

ally autoregressive priors having two classes of neighbor relations. Journal of the American

Statistical Association 102(477), 44–55.

Rentdorff, R. (1954). The experimental transmission of human intestinal protozoan parasites:

II. Giardia lamblia cysts given in capsules. American Journal of Hygiene 59, 209–220.

Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted

least squares-application to the waters around Australasia. Journal of Atmospheric and

Oceanic Technology 19(9), 1357–1375.

Rizak, S. and S. Hrudey (2007). Strategic water quality monitoring for drinking water safety.

Technical Report 37, CRC for Water Quality and Treatment.

Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Socio-

logical Review 15(3), 351–357.

Robinson, W. (2009). Ecological correlations and the behavior of individuals. International

Journal of Epidemiology 38(2), 337–341.

Roser, D., S. Khan, C. Davies, R. Signor, S. Petterson, and N. Ashbolt (2006). Screening

health risk assessment for the use of microfiltration-reverse osmosis treated tertiary effluent

for replacement of environmental flows. Technical Report CWWT Report 2006-20, Centre

for Water and Waste Technology, School of Civil and Environmental Engineering, University

of NSW.

Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to

estimate baseline and hazardous event risks with management end uses in mind. Technical


report, MicroRisk project co-funded by the European Commission under the Fifth Framework

Programme, Theme 4: Energy, environment and sustainable development (contract EVK1-

CT-2002-00123).

Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca

Raton: Chapman & Hall/CRC.

Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields.

Scandinavian Journal of Statistics 29(1), 31–49.

Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean tempera-

ture and salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.

SAS Institute (2004). SAS Version 9.1.3. Cary, NC., USA: SAS Institute Inc.

Schabenberger, O. and C. A. Gotway (2005). Statistical methods for spatial data analysis. Texts

in statistical science. Boca Raton: Chapman & Hall/CRC.

Shillito, R. M., D. J. Timlin, D. Fleisher, V. R. Reddy, and B. Quebedeaux (2009). Yield

response of potato to spatially patterned nitrogen application. Agriculture Ecosystems &

Environment 129(1-3), 107–116.

Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass

surfaces irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.

Signor, R. and N. Ashbolt (2006). Pathogen monitoring offers questionable protection against

drinking-water risks: a QMRA (Quantitive Microbial Risk Analysis) approach to assess man-

agement strategies. Erratum in Water Science and Technology 54 (11-12):451. Water Science

and Technology 54, 261–268.

Signor, R. S. (2007). Microbial risk implications of rainfall-induced runoff events entering a

reservoir used as a drinking-water source. Journal of Water Supply Research and Technology

- AQUA 56, 515–531.

BIBLIOGRAPHY 95

Sinclair, M. (2005). Strategic review of waterborne viruses. Technical report, CRC for Water

Quality and Treatment.

Singh, M., R. S. Malhotra, S. Ceccarelli, A. Sarker, S. Grando, and W. Erskine (2003). Spatial

variability models to improve dryland field trials. Experimental Agriculture 39(02), 151–160.

Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and

S. De Neve (2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and

coniferous forests under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.

Smeets, P. W. M. H., Y. J. Dullemont, P. H. A. J. M. V. Gelder, J. C. V. Dijk, and G. J. Medema

(2008). Improved methods for modelling drinking water treatment in quantitative microbial

risk assessment; a case study of Campylobacter reduction by filtration and ozonation. Journal

of Water and Health 6(3), 301–314.

Smeets, P. W. M. H., G. J. Medema, Y. J. Dullemont, P. H. A. J. M. V. Gelder, and J. C. V. Dijk.

(2008). Case study of Campylobacter reduction by filtration and ozonation. Journal of Water

and Health 6, 301–314.

Smeets, P. W. M. H., G. J. Medema, G. Stanfield, J. C. v. Dijk, and L. C. Rietveld (2007). How

can the UK statutory Cryptosporidium monitoring be used for quantitative risk assessment of

Cryptosporidium in drinking water? Journal of Water and Health 5(1 (Suppl)), 107–118.

Song, H.-R., A. Lawson, R. B. D’Agostino Jr, and A. D. Liese (2011). Modeling type 1 and type

2 diabetes mellitus incidence in youth: An application of Bayesian hierarchical regression for

sparse small area data. Spatial and Spatio-temporal Epidemiology 2(1), 23–33.

Spiegelhalter, D. J., A. P. Dawid, S. L. Lauritzen, and R. G. Cowell (1993). Bayesian analysis

in expert systems. Statistical Science 8(3), 219–247.

Steck, H. (2001). Constrained-Based Structural Learning in Bayesian Networks Using Finite

Data Sets. Ph. D. thesis, Institut fur der Informatik der Technischen Universitat.


Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial

analysis of field trials. Journal of Agricultural Biological and Environmental Statistics 14(4),

392–410.

Strahm, B. D., R. B. Harrison, T. A. Terry, T. B. Harrington, A. B. Adams, and P. W. Footen

(2009). Changes in dissolved organic matter with depth suggest the potential for posthar-

vest organic matter retention to increase subsurface soil carbon pools. Forest Ecology and

Management 258(10), 2347–2352.

Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety

of wastewater reclamation and reuse using enteric virus monitoring data. Water Environment

Research 70(1), 39–51.

Tawk, H. M., K. Vickery, L. Bisset, W. Selby, and Y. E. Cossart (2006). The impact of hepatitis

B vaccination in a western country: recall of vaccination and serological status in Australian

adults. Vaccine 24(8), 1095–1106.

Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal

distribution of airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for

gypsy moth eradication. Environmental Health Perspectives 109(1), 47–54.

Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The

dose-response relation in human volunteers for gastro-intestinal pathogens. Technical report,

National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.

Toze, S. (1999). PCR and the detection of microbial pathogens in water and wastewater. Water

Research 33(17), 3545–3556.

Toze, S. (2002). Review of the risk of groundwater contamination from microbial pathogen due

to the infiltration of treated effluent to groundwater at the Bridgetown wastewater treatment

plant. A consultancy report to the Water Corporation, WA. Technical report, CSIRO.

BIBLIOGRAPHY 97

Toze, S. (2004). Literature Review on the Fate of Viruses and Other Pathogens and Health Risks

in Non-Potable Reuse of Storm Water and Reclaimed Water. CSIRO. Accessed: February 1,

2011.

Toze, S., J. Hanna, and J. Sidhu (2005). Microbial monitoring of the McGillivray Oval direct

reuse scheme Report to the Water Corporation WA. Technical report, CSIRO.

Toze, S., J. Hanna, A. Smith, and W. Hick (2002). Halls Head indirect treated wastewater reuse

scheme. Technical report, CSIRO.

Toze, S., J. Hanna, T. Smith, L. Edmonds, and A. McCrow (2004). Determination of water

quality improvements due to the artificial recharge of treated effluent. In J. Steenworden and

T. Endreny (Eds.), IAHS Publications-Series of Proceedings and Reports: Wastewater reuse

and groundwater quality, Volume 285, pp. 53–60. Wallingford [Oxfordshire]: IAHS, 1981-.

Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New

Zealand: characterising spatial and temporal changes in fruit composition and juice quality

in the vineyard. Australian Journal of Grape and Wine Research 17(1), 79–89.

Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In

Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01),

Seattle. Citeseer.

Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a

Belief Net response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172,

483–513.

Varis, O. (1995). Belief networks for modelling and assessment of environmental change. En-

vironmetrics 6, 439–444.

Varis, O. (1997). Bayesian decision analysis for environmental and resource management. En-

vironmental Modelling and Software 12(2-3), 177–185.


Varis, O. (1998). A belief network approach to optimization and parameter estimation: applica-

tion to resource and environmental management. Artificial Intelligence 101(1-2), 135–163.

Verbyla, A., B. Cullis, M. Kenward, and S. Welham (1999). The analysis of designed exper-

iments and longitudinal data by using smoothing splines. Journal of the Royal Statistical

Society: Series C (Applied Statistics) 48(3), 269–311.

VSN International (2011). Genstat. Available online:

http://www.vsni.co.uk/software/genstat/.

Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal

mapping of disease rates. Journal of the American Statistical Association 92(438), 607–617.

Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal

experiments with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.

Ward, R., D. Bernstein, E. Young, J. Sherwood, D. Knowlton, and G. Schiff (1986). Human

Rotavirus studies in volunteers: determination of infectious dose and serological response to

infection. Journal of Infectious Diseases 154(5), 871–880.

Water Corporation (2010). Subiaco Wastewater Treatment Plant Annual Report 2009-10. Tech-

nical Report PM-3851463, Water Corporation, Perth, Western Australia.

Water Corporation (2011a). McGillivray Oval Irrigation Project. Available on-

line: http://www.watercorporation.com.au/M/mcgillivray_oval.cfm. Accessed:

February 2, 2011.

Water Corporation (2011b). Subiaco treatment plant. Available online:

http://www.watercorporation.com.au/W/wwtp_subiaco.cfm. Accessed: February

2, 2011.

Water Environment Research Foundation, A. Olivieri, and C. Summers (2007). Assessing risk

BIBLIOGRAPHY 99

of pathogens in separate stormwater systems. Available online: http://www.werf.org/am/.

Accessed: February 17, 2011.

Weidl, G., A. Madsen, and E. Dahlquist (2003). Object oriented Bayesian network for industrial

process operation.

Weidl, G., A. L. Madsen, and S. S. Israelson (2005). Applications of object-oriented Bayesian

networks for condition monitoring, root cause analysis and decision support on operation of

complex continuous processes. Computers and Chemical Engineering 29, 1996–2009.

Wermuth, N. and D. R. Cox (1998). On association models defined over independence graphs.

Bernouilli 4(4), 477–495.

Westrell, T., O. Bergstedt, T. Stenstrom, and N. Ashbolt (2003). A theoretical approach to

assess microbial risks due to failures in drinking water systems. International Journal of

Environmental Health Research 13, 181–197.

Whelan, B. M., A. B. McBratney, and B. Minasny (2001). Vesper-spatial prediction software

for precision agriculture. In Third European Conference on Precision Agriculture. (G. Gre-

nier, S. Blackmore Eds.) pp. 139-144. Agro Montpellier, Ecole Nationale Agronomique de

Montpellier., pp. 18–20. Citeseer.

Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New

York: Wiley.

Williams, E. R. (1986). A neighbour model for field experiments. Biometrika 73(2), 279–287.

Wong, V. N. L., B. W. Murphy, T. B. Koen, R. S. B. Greene, and R. C. Dalal (2008). Soil organic

carbon stocks in saline and sodic landscapes. Australian Journal of Soil Research 46(4), 378–

389.

Woo, D. M. and K. J. Vicente (2003). Sociotechnical systems, risk management, and public


health: comparing the North Battleford and Walkerton outbreaks. Reliability Engineering &

System Safety 80(3), 253–269.

Yan, P. and M. K. Clayton (2006). A cluster model for space-time disease counts. Statistics in

Medicine 25(5), 867–881.

Statement of Contribution of Co-Authors for Thesis by Publication

The authors listed below certify that:

1. they meet the criteria for authorship, in that they have participated in the conception , execution, or interpretation , of at least that part of the publication in their field of expertise;

2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;

3. there are no other authors according to these criteria;

4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and

5. they agree to the use of the publication in the student's thesis and its publication on the Australasian Digital Thesis database consistent with any limitations set by publisher requirements.

In the case of Chapter 3:

Title: Bayesian Network for Risk of Diarrhoea Associated with the Use of Recycled Water

Journal: Risk Analysis Status: Published 2009, 29(12) 1672-1685

Contributor Statement of Contribution Signature Date

Margaret Donald Margaret Oonald as first author

~ .. 10/"' / ro was responsible for the concept of the paper, data analysis, interpretation and the writing of all drafts. ~ ()

Or Angus Cook Was responsible for the ~? f')_r(~ (o definition of the network and forC: editorial comment.

Professor Kerrie Was responsible for general l(k-t P91to Mengersen advice and editorial comment

Principal supervisor's Confirmation

I have sighted email or other correspondence from all co-authors confirming their certifying authorship. JU 1

/.((M, I( M c /\Jt;., ,(1-;;,;Jt;_" 1\J __ ...J"--A.._A...,------ ___ I....:;.G.~-/_t__,9-+/....:....1=0--Name Signature Date

Chapter 3

Network for Risk of Diarrhoea

Associated with the Use of Recycled

Water

3.1 Preamble

This chapter has been written as a journal article, and addresses research objective (1) which

aimed to build credible intervals for point estimates of a Bayesian net (BN) in the context of a

risk assessment.

The technique used was to recast the BN as a DAG within WinBUGS [Lunn et al., 2000],

and then to elicit priors for the uncertainty of the conditional priors required by the BN. Some-

what annoyingly, the method demands that the user postulate a sample size for the population at

risk. The other simple idea was to recognise that conditioning on the realisation of a particular

node was equivalent to subsetting the data. Even for small (and large) probabilities the method

outlined would seem necessary, as the complex mixtures which give rise to the final point esti-

mates of the proportions are not well approximated by a simple binomial distribution or a normal

distribution. For further discussion see the addendum to this chapter 3.9, and Figures 3.4- 3.5.

101

102CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED

WITH THE USE OF RECYCLED WATER

WinBUGS code for the model 2 BN is found in Section A.

I am the principal author and the paper is presented here in its entirety, but with different bib-

liographic conventions from the journal, Risk Analysis, in which it was published. Angus Cook

provided the framework and asked the key question about confidence limits, Kerrie Mengersen

oversaw, guided and elicited ideas. Margaret Donald as first author was responsible for the

concept of the paper, data analysis, interpretation, writing all drafts and addressing reviewers’

comments.

Title: Network for Risk of Diarrhoea Associated with the Use of Recycled Water

Authors: Margaret Donalda, Angus Cookb, Kerrie Mengersena.

aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434,Brisbane, QLD 4001, Australia.

bThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia.

In this paper we take a Bayesian net, and consider various possible outcomes. The paper’s

contribution is to express those outcomes as credible intervals.

The conceptual model used represents the factors and pathways by which recycled water

may pose a risk of contracting gastroenteritis. This was converted to a Bayesian net and quanti-

fied using Angus Cook’s expert opinion. Bayesian nets are an important aid in conceptualising

complex relationships. The quantification of conditional probabilities and the consequences

which flow from the net and those probabilities permits the possibility of adjusting both net and

probabilities so that the consequences match our understandings and the data.

The method was to create Markov chain Monte Carlo samples for all nodes of the net, having

recognised that the Directed Acyclic Graph (DAG) structure of Bayesian nets (BN) is replicated

in the WinBUGS software Lunn et al. [2000]. The technique involved eliciting uncertainty

bounds for all elements of the Conditional Probability Tables (CPTs), and matching moments

of the Beta distribution to those bounds, thereby adding an extra node to the DAG of the original

BN for each element of any CPT.

There were a number of conditional outcomes of interest and these were found by forming

BIBLIOGRAPHY 103

the subset which matched the condition within each MCMC iteration. To allow estimation

of the relevant ratios, a single MCMC iteration ran for a fixed population size of 50000 (a

number chosen to ensure that each condition had a reasonably sized denominator within the

iteration). Within the iteration, counts satisfying each outcome were found and ratios for the

relevant condition found. These were then able to be summarised to give 95% credible intervals.

This method is quite general and does not require the ability to calculate partial differentials

as do, for example, the methods of Van Allen et al. [2001] or Chan and Darwiche [2004]. (These

papers came to my attention after the paper had been accepted.)

Bibliography

Chan, H. and A. Darwiche (2004). Sensitivity analysis in Bayesian networks: from single

to multiple parameters. In UAI ‘04 Proceedings of the 20th Conference on Uncertainty in

Artificial Intelligence, pp. 67–75. AUAI Press.



325–337.


Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01),

Seattle. Citeseer.

3.2 Network for Risk of Diarrhoea Associated with the Use of Re-

cycled Water

Margaret Donalda, Kerrie Mengersena, Angus Cookb

aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,

QLD 4001, Australia bThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009,

Australia.

Abstract

Estimating potential health risks associated with recycled (reused) water is highly complex given the

multiple factors affecting water quality. We take a conceptual model which represents the factors and

pathways by which recycled water may pose a risk of contracting gastroenteritis, convert the conceptual

model to a Bayesian net and quantify the model using one expert’s opinion. This allows us to make

various predictions as to the risks posed under various scenarios. Bayesian nets provide an additional

way of modelling the determinants of recycled water quality and elucidating their relative influence on

a given disease outcome. The important contribution to Bayesian net methodology is that all model

predictions, whether risk or relative risk estimates, are expressed as credible intervals.

Keywords

Bayesian nets, credible intervals, recycled water, gastroenteritis, expert opinion.

3.3. Introduction 105

3.3 Introduction

With climate change and increasingly prolonged droughts affecting most states in Australia, interest has

developed in reusing waste water. Recycled water is currently being used in numerous schemes globally

[Anderson, 2007, Asano, 1998]. Sewage has been associated with the outbreak of waterborne diseases

since John Snow’s pioneering work on cholera [Snow, 1849, 1855] and continues to be associated with

the outbreak of enteric diseases world-wide, where cross-contamination of water distribution systems, or

poorly treated sewage-contaminated source waters continue to give rise to epidemics [Nadebaum et al.,

2004]. It is therefore of major public health significance to assess potential risks associated with the use

of recycled water.

Various forms of risk assessment have been used to determine the safety of recycled water. In par-

ticular, Quantitative Microbial Risk Assessment(QMRA) is currently the method of choice for assessing

the risk of infection due to consumption of drinking water [Ashbolt et al., 2005, Haas and Eisenberg,

2001, Roser et al., 2006]. However, these may often be limited by a paucity of data either for source or

finished water (wastewater). There may also limitations in the capacity of such risk assessments to deter-

mine which process components are contributing to disease risk. In this paper we adopt a supplementary

analysis based on a Bayesian network (BN) to help inform the process of risk estimation. Networks also

provide insight into possible problems arising in recycled water systems because starting conditions can

be varied to explore a range of scenarios of interest. The model presented is not intended to serve as a

replacement for a comprehensive QMRA, and indeed QMRA estimates may be used to guide the inputs

into such a network.

A BN is a graphical model with an underlying probabilistic framework, that characterises and quan-

tifies an outcome of interest, and the variables and their interactions associated with this outcome. It

is a form of directed acyclic graph (DAG). See Cowell et al. [2001] or Korb and Nicholson [2004] for

more details. Bayesian networks have been used widely in environmental literature [Hamilton et al.,

2007, Nicholson et al., 2003, Pike, 2004, Pollino and Hart, 2005a,b, Pollino et al., 2007, Varis, 1995,

1997, 1998]. Most of these examples integrate expert opinion and data to quantify the probability tables

underlying the network.

Recycled water may differ considerably with respect to quality depending on the treatment and reuse

purpose [Natural Resource Management Ministerial Council et al., 2006], and the possible exposure path-



ways will also differ. In this paper, we illustrate a generic Bayesian network for contaminants which may

enter a recycled water scheme and consider the potential health risks with respect to enteric pathogens

which may still remain in the recycled water component. This model may then be extended to many

other contexts, including specific schemes and distribution systems with their own inputs (based on ex-

pert opinion and/or series of process data).

A general model representing the factors and pathways involved in such a process was developed by

Angus Cook and David Roser∗. This conceptual framework comprises six components: recycled water

and distribution pathways, exposure pathways and populations, cumulative end-user dose, identified tox-

icity and pathogenicity pathways, individual covariates and health endpoints. This model was used as the

basis for the development of the BN as described in Section 3.4. An exposition of results arising from

interrogation of the network is provided in Section 3.5, followed by a general discussion in Section 3.6.

The purpose of the paper is fourfold: firstly to illustrate a Bayesian network for the assessment of a

recycled water system; secondly, to quantify the model using one expert’s opinion; thirdly, to assess the

sensitivity of the model to its various parent nodes; and finally, to add uncertainty to the probabilities and

conditional probabilities of the nodes in order to examine the difference these uncertainties might make

to the model’s predictions.

A secondary purpose of the paper is to highlight the fact that a BN is a directed acyclic graph (DAG),

and as such can be represented not only by such purpose built software as Hugin [Hugin Expert A/S,

2007] and Netica [Norsys Software Corp., 2007], but also with any software which represents DAGs. In

this case, we have used WinBUGS [Lunn et al., 2000] as this software allows one to add uncertainty to

every node and prediction (including relative risks) of the BN.

3.4 Methods

3.4.1 Development of a conceptual model

A conceptual model developed by Cook and Roser (Figure 3.1) was represented as a Bayesian Network.

The model was not designed to reflect a particular recycled water system, but to indicate the various

∗Dr. David J. Roser, Centre for Water and Waste Technology, University of New South Wales, Australia.Dr. Angus G.Cook, School of Population Health, University of Western Australia.The conceptual model was prepared as part of the project “Assessing the Public Health Impacts of Recycled WaterUse”, funded by the Western Australian Government through the Department of Water’s Water Fund.

3.4. Methods 107

factors (or nodes) that influence whether the standard of the water is likely to be classified as acceptable

(’safe’) or unacceptable (’unsafe’). The Bayesian network comprises two distinct subnetworks describing

water supply distribution endpoints and health endpoint, joined by a directed link. In this particular

example, the health endpoint considered was gastroenteritis, although the framework is appropriate for a

wider range of health outcomes.

Each node of the network was ascribed categories or ordinal levels. The underlying conditional

probability tables (CPTs) and possible ranges for these probabilities were based on an expert opinion of

epidemiological water risk assessment (were provided by Angus Cook). The structure of the network, the

underlying probabilities and the resultant tentative model predictions were presented at two workshops:

one consisting of environmental health authorities from the University of Western Australia School of

Population Health and the Western Australian Department of Health, the other consisting of water and

wastewater researchers from the Centre of Water and Wastewater Technology (University of New South

Wales). Hereafter, this network is referred to as Model 1.

The nodes of this network may be variously defined. For example, the final outcome node of the

BN, ‘Gastroenteritis’ has a number of possible interpretations: the rate of gastroenteritis episodes per

person per year, or perhaps, the rate of gastroenteritis hospital admissions, where the gastroenteritis

is attributable to, or associated with, recycled water in this hypothetical model. Further discussion of

possible meanings for nodes is deferred until the discussion in Section 4.

The expert, Angus Cook, was asked to express uncertainty about all probabilities used in Model 1,

by specifying a 95% confidence interval for the probability. This information was incorporated in an

augmented BN (Model 2), where the elicited 95% confidence interval for the uncertainty for each binary

node was re-expressed as a beta distribution with parameters (α, β) that were determined on the basis of

the ranges of probability values provided. These beta distributions became the priors to the relevant nodes

in Model 2. To allow appropriate comparisons between Model 1 (without uncertainty) and Model 2 (with

uncertainty), the expected value of each prior in Model 2 was set to equal the value of the corresponding

probability in Model 1; hence the probabilities of Model 2, both conditional and marginal, are those

obtained from Model 1, provided the MCMC chain is sufficiently long for the desired accuracy. In all,

some 40 priors were needed in Model 2 to express the uncertainties associated with the conditional and

unconditional probabilities of Model 1.

To illustrate the difference between Model 1 and Model 2: In model 1, the parent node 1 (Pri-



mary Source Water) was given a probability of .01 of meeting an acceptable standard. In model 2, the

probability of Primary Source Water meeting an acceptable standard is drawn from a Beta distribution,

Beta(6.751, 668.37), whose expected value is .01, and which 95% of the time returns probability values

between .004 and .02.

3.4.2 Determination of prior probabilities

In determining the beta priors†, we used the method of moments to determine (α, β). For details, see

Spiegelhalter et al. [1994] where Dirichlet priors were developed for use in a congenital heart disease

network. Dirichlet priors may be used in Hugin (as ‘experience’), but such priors are used when updating

a node within the network on a case by case basis. They do not add uncertainty to model predictions.

The age structure for the population to which the BN probabilities was applied was taken from

Australian population data [Anon., 2008] for 2000. No priors were placed on the multinomial age distri-

bution, as age structures evolve slowly in Australia.

3.4.3 Set up and use of models

Detailed description of the model setup

Model 1 is described fully by Figure 3.2, together with the conditional and unconditional probabili-

ties given in Table 3.1 in column p. Properties and likelihood functions are well explained in such books

as that by Cowell et al. [2001]. Model 2 is identical in framework, but the conditional and unconditional

probabilities vary to have the corresponding mean of p and its variance (elicited from the expert), and are

drawn from the Beta distribution with parameters (α, β).

The values of α and β were found by considering the upper and lower bounds in Table 3.3 as 95%ile

limits for the distribution. These were used to deduce 2 × 1.96 ×√

Var(X) and then the moments (the

mean and the variance) were matched to solve for α and β, where E(X) = αα+β

, Var(X) = αβ(α+β)2(α+β+1) .

Thus, if we consider nodes 1 and 5 which feed into node 2, and use Xi to describe the (0,1) output

from node i, then under the simple Bayesian network (Model 1), and using B(π) to represent a Bernoulli

distribution with parameter π, and Be(α, β) to represent a Beta distribution with parameters (α, β), we

†p ∼ Beta(α, β)⇒ f (p) = pα−1(1 − p)β−1 Γ(α+β)

Γ(α)Γ(β) , 0 < p < 1

3.4. Methods 109

have

X1 ∼ B(.01), X5 ∼ B(.01) and X2 is given by

X2|X1 = 0, X5 = 0 ∼ B(.96), X2|X1 = 0, X5 = 1 ∼ B(.98),

X2|X1 = 1, X5 = 0 ∼ B(.99), X2|X1 = 1, X5 = 1 ∼ B(.98).

For model 2, the p parameter of the Bernoulli distribution is, itself, stochastic and we have from

Table 3.1 and Figure 3.2, and using the same naming convention as above

p1 ∼ Be(6.751, 668.37), X1 ∼ B(p1),

p5 ∼ Be(.599, 59.25), X5 ∼ B(p5), while p2 and X2 are given by

p2(1) ∼ Be(55.687, 2.32), p2(2) ∼ Be(117.083, 2.39),

p2(3) ∼ Be(375.525, 3.79), p2(4) ∼ Be(5.135, .01),

X2|X1 = 0, X5 = 0 ∼ B(p2(1)), X2|X1 = 0, X5 = 1 ∼ B(p2(2)),

X2|X1 = 1, X5 = 0 ∼ B(p2(3)), X2|X1 = 1, X5 = 1 ∼ B(p2(4)).

At Node 4 in Model 2, the output X4 depends only on the values of X3 and X7. Thus,

X4|X7 = 0, X3 = 0 ∼ B(p4(1)), X4|X7 = 0, X3 = 1 ∼ B(p4(2)),

X4|X7 = 1, X3 = 0 ∼ B(p4(3)), X4|X7 = 1, X3 = 1 ∼ B(p4(4)), where

p4(1) ∼ Be(8.335, 3.57), p4(2) ∼ Be(21.054, 5.26),

p4(3) ∼ Be(30.217, 3.36), p4(4) ∼ Be(152.358, .15).

In model 2, the entire network is embedded in a for-loop of a particular population size, N. This

means that each simulation consists of X1 drawn N times, thereby allowing us to calculate ratios and

relative risks for that population (and all 12000 simulations).

Models 1 and 2 give marginal probabilities of the nodes and conditional probabilities of interest.

For example, scenarios that are explored include the probability of gastroenteritis for a child aged less

than five years, when the “cumulative dose” (CD) is acceptable, or the probability of gastroenteritis for

an adult when there is a failure to achieve the acceptable standard at the end point distribution. The



conditional probabilities were also represented as risks relative to the risk when the cumulative dose is

acceptable, for the group considered. All model predictions are thus represented initially in terms of

probabilities and are then expressed as relative risks of gastroenteritis, relative to the input nodes at the

safest settings (that is, least likely to lead to gastroenteritis). All relative risks are thus relative to the risk

posed for the comparable group when the cumulative dose (CD) is acceptable‡.

The impact of the various nodes on the final outcome node of ‘Gastroenteritis’ (and on the node

‘Endpoint Distribution’) was assessed using the mutual information and the variance of belief for the

various nodes. These calculations are based on Pearl [1988], while the variance of belief calculations are

based on Spiegelhalter [1989]. The mutual information for two nodes(X,Y) is a measure of the distance

between P(X)P(Y) and P(X,Y)§, while the variance of belief is a variance measure which again measures

the effect of one node on another.

The BN (Model 1) was analysed using both Netica [Norsys Software Corp., 2007] and Hugin [Hugin

Expert A/S, 2007]. As discussed earlier, the BN is a directed acyclic graph and thus may also be analysed

in a Bayesian Markov Chain Monte Carlo framework. Thus, uncertainty was added to all the underly-

ing probabilities of Model 1, to give Model 2, using Winbugs [Lunn et al., 2000], a Bayesian MCMC

graphical package. The purpose of translating Model 1 to Model 2 was to associate uncertainty with the

marginal and conditional probability and relative risk point estimates from the BN of Model 1. The BN

software is a convenient framework in which to calculate such estimates, but does not provide uncertainty

analysis and corresponding credible (or confidence) intervals. Thus, we used Netica and Hugin to draw

inferences of interest (see section 3).

We then built a somewhat more complex model (Model 2) in Winbugs, which mimicked Model 1 in

order to find posterior credible intervals for all point estimates. In particular, the conditional probabili-

ties found under Model 1, are found by subsetting the sample at each iteration to satisfy the particular

condition.

In order to quantify the variability of potential health outcomes, a population sample size is required,

since beta distributions give rise to binomial outcomes. Although the marginal probabilities have the size

of the whole population as their denominator, the conditional probabilities are based on the subset of the

sample population which satisfies the condition. Thus, with small sample sizes, some relative risks may

‡Relative Risk=Probability(gastroenteritis for the specific agegroup under scenario)/Probability(gastroenteritisfor the specific agegroup when Cumulative dose is acceptable).

§Defined as I(X,Y) =∑

Y P(Y)∑

X P(X|Y)log P(X,Y)P(X)P(Y) .

3.4. Methods 111

not always be able to be calculated. In this example, we needed to choose a population size that was

sufficiently large that the various desired conditional probabilities and relative risks were always able

to be estimated. A population size of 50000 was selected which enabled the construction of credible

intervals for all relative risks and conditional probabilities found using the BN.

The MCMC simulations for Model 2 were run for 12000 iterations. There is no requirement for

burn-in since the starting distribution is the target distribution. Several runs showed that 12000 iterations

was a sufficient run length for estimates to be stable at the number of decimal points shown.

3.4.4 Model Validation

The models may be assessed by comparison with known outcomes (external validation), or by critical

consideration of how the components of the model interact to give conclusions, and consideration of the

sensitivity of the model to changes in inputs (internal validation).

In the absence of an identified outbreak, and related data, external validation of a risk assessment

model is not usually feasible. Thus, for example the Milwaukee Cryptosporidium outbreak was large

and well-defined, and thereby triggered an enormous commitment of resources. This allowed the de-

termination of the duration of contamination, the proportion of the population that was affected (from a

random telephone survey), oocyst concentrations in the treated water, and so on Arrowood et al. [2001],

Brookhart et al. [2002], Eisenberg et al. [1996, 1998], Haas et al. [1999], MacKenzie et al. [1994, 1995].

With no data available for source waters for Cryptosporidium, rotavirus, Campylobacter nor surrogates,

a direct external validation was not possible. We did attempt external validation by consideration of

the National Notifiable Diseases Surveillance System data¶, and calculated the disease rates for Cryp-

tosporidiosis, Rotaviral enteritis, and Campylobacter infections per 100 000 people per year for the three

age groups. However, such incidence rates are likely to be undercounts of disease rates, given the many

hurdles to be crossed before being recorded in such a database. We also looked at the same three enteric

infections, and infectious diarrhoea as a proportion of admissions to a group of NSW hospitals∥ for the

¶The National Notifiable Diseases Surveillance System data [National Notifiable Diseases Surveillance System,2008], was queried to find both the rates and number of notifications by age group and sex for 2008. This allowedinference of baseline populations for each age group. The numbers of notifications by agegroup and the inferredpopulation sizes were then summed for each age group to give the rates for the agegroups used here.

∥The hospital admission rates are based on all admissions (678248, numbers per agegroup were 83699, 388113,206436) over five years (from January 1, 2001 to December 1, 2005), from six hospitals in the South West SydneyArea Health Service. The principal diagnosis at discharge, coded by ICD10 [World Health Organization, 2008], wasused to classify each admission. The ICD10 codes used were A08.0 (Rotaviral enteritis), A04.5 (Campylobacterenteritis), A07.2 (Cryptosporidiosis), while “Intestinal infectious diseases” are given by the ICD10 chapter codes of



three age-groups, but without adjusting for population figures, the proportion of admissions affected by

infectious diarrhoeas is not useful for validation of this model.

Moreover, as indicated in the introduction, this is a generic model: it has not been built to specif-

ically describe outbreaks of Cryptosporidium related diarrhoea. (Had that been so, the frequency of

prolonged/wet weather events might have formed part of the network35).Again, had it been intended

to describe rotavirus infections, a seasonal component would have been necessary, to account for the

seasonality in outbreaks, and hence seasonality in sewage, and potentially, in the recycled water.

Finally, it has not been formulated for a particular site. A site-specific model would have had condi-

tional probabilities tailored to the site, and some aspects of a site-specific external validation might then

have been possible.

Internal validation of the model occurs in the discussion, where we consider the implications of

the structure. Thus, we consider the sensitivity of the model to changes in inputs. We also discuss

how the structure of the model implies certain assumptions and suggest potential structural changes to

accommodate the situation when these assumptions may not be satisfied.

However, this is a generic model. It has not been built to specifically describe the outbreaks of

Cryptosporidium related diarrhoea, for example. (Had that been so the frequency of prolonged/heavy

rainy weather events might have formed part of the network [Signor, 2007].)

3.5 Results

3.5.1 Constructed BN

The final agreed BN is displayed in Figure 3.2. It contains 14 nodes: 7 binary terminal parent nodes,

1 ternary parent node, and 6 other nodes containing a further 30 binary probabilities as described in

Section 3.4. It consists of two subnetworks: one which describes various influences and processes on

the wastewater to the point of distribution (‘Endpoint distribution’) and another which describes various

aspects of the water and their influences on the population health outcome ‘Gastroenteritis’. This struc-

ture of the network, together with the probability settings (which are generally close to one or zero), is

important when we consider the sensitivity of inference to the various parent nodes; see Section 3.2.

A00, A01, A02, A03, A04, A05, A06, A07, A08 and A09. The databases used were the NSW Department of HealthHIE (”Health Information Exchange”) databases. The admission (disease count) excluded admissions within 3 daysof an earlier admission.

3.5. Results 113

Table 3.1 shows the values of the parameters (α, β) corresponding to the beta priors, calculated from,

and representing, the various elicited probabilities and ranges. The expert’s ranges are not shown but may

be back-calculated using moments and the method of Spiegelhalter et al. [1994]. Thus, Node 1 (“Primary

Source Water”) has (α, β) values as (6.751, 668.37). These correspond to the choice of a probability of .01

of meeting an acceptable standard and a 95% range for that probability as .005 to .02. The probabilities

(p) of Table 3.1 are used for the nodes in the BN (Model 1), and the values of (α, β) show settings for

the beta priors used in Model 2, which were chosen to achieve mean marginal probabilities equivalent to

those obtained under Model 1.

In the following sections we discuss the outcomes of the BN depicted in Figure 3.2 and quantified

using the probabilities of Table 3.1. In Section 3.5.2, we examine the network ignoring the uncertainty of

the probability estimates (Model 1) and then, in Section 3.5.3, assess the impact of adding this uncertainty

(Model 2).

3.5.2 Model 1: Analysis of the BN without uncertainty

Table 3.3 shows the marginal probabilities obtained under Model 1 for all nodes in the model, both the

simple and complex. (Note that the marginal probabilities for the simple nodes are those given by the

settings.) In this illustrative example, the probability of gastroenteritis over the entire population is .0208;

the probability of a cumulative dose that would be classified as “acceptable” is .9732; the probability that

the endpoint distribution meets defined ‘acceptable’ conditions is .9579; and the probability that the

pathogen load is acceptable is .9931.

One of the strengths of a BN is the ease with which a user may vary conditions and find conditional

probabilities relating to scenarios of interest. We define a baseline risk as the risk when the system is

running at its safest level, that is when the cumulative dose is accceptable. Under these conditions, the

probability of gastroenteritis is .0151 (over all age groups). The baseline risks are used as the denomi-

nator in estimating the relative risks. Thus, compared with this baseline, the overall relative risk (RR) of

gastroenteritis obtained from the BN is .0208/.0151 or 1.38. In contrast, using the same baseline risk, if

the endpoint distribution fails, the gastroenteritis probability becomes .0214 (RR of 1.42). If the distri-

bution endpoint fails and unplanned/planned usage is unplanned, the gastroenteritis probability becomes

.0236, i.e., a RR of 1.56.

For children under five years of age, the BN indicates that the gastroenteritis probability would be



.0594, given the expert-based probabilities described in section 3.1. Using again a baseline risk corre-

sponding to the safest operation of the system (an acceptable cumulative dose), for which the probability

is .0500, this translates to a RR of 1.19. In contrast, if the endpoint distribution fails, the gastroenteritis

rate for children under five years old becomes .0605, giving a RR of 1.21, and if, in addition, the usage

is unplanned, the rate for these children becomes .0641, a RR of 1.28. For those aged between 5 and 64

years, these relative risks are 1.51, 1.57 and 1.77 respectively, and for older adults (65+), they become

1.24, 1.27 and 1.36, respectively.

When the cumulative dose is acceptable, the probability of gastroenteritis in this hypothetical model

is an age-weighted average of .05, .01 and .03. Thus, even with perfect inputs from the parent nodes,

the final probability for gastroenteritis must lie between .01 and .05. Moreover under this model, the

marginal probabilities for gastroenteritis are .0594 for children under five years of age, .0151 for those

between 5 and 64, and .0372 for those of 65 and over, which indicates little change from the rates with

perfect drinking water, thereby indicating little impact from recycled water.

3.5.3 Model 2: Analysis of the BN with uncertainty

Analysis of Model 2 was undertaken in Winbugs as described in section 2. Using a population size of

50000 and 12000 iterations, the probabilities and relative risks of the BN (Model 1) were recovered and

95% credible intervals were found for all estimates of interest. Marginal probability point estimates ob-

tained under Model 1 and Model 2 are given in Table 3.3, together with corresponding credible intervals.

(For the ‘simple’ nodes, Table 3.3 shows the credible interval and thus the effective 95% range introduced

by the Beta prior. Table 3.4 shows the probabilities and relative risks of gastroenteritis, for the population

as a whole, for various agegroups, and under various conditions, together with the 95% credible intervals.

(Figure 3.3 shows the relative risks for all agegroups and scenarios.) The point estimates obtained under

Model 1 are also given in Table 3.4.

The relative risks considered in Table 3.4 and Figure 3.3 have all been defined to have higher risk

than the baseline (which occurs when the cumulative dose is acceptable) and as such should be greater

than 1. When a condition subsets a small subgroup, the consequent uncertainty of estimation may re-

sult in a credible interval that spans 1. Thus, for example, when the endpoint distribution fails and

planned/unplanned usage is unplanned, the credible interval for the relative risk of gastroenteritis is (.3,

3.1). Similarly, for all age subgroups, when both endpoint distribution fails and planned/unplanned usage

3.6. Discussion 115

is unplanned, the 95% credible intervals for the RRs are (0, 4.5), (0, 4.1) and (0,4.5) respectively. Other

relative risks, based on larger subsets, have smaller credible intervals that exclude 1; for example, the RR

across the whole population is 1.38 with 95% CI (1.3, 1.4) and that for children under five years of age

is 1.19 with 95% CI (1.1, 1.3).

From Table 3.4, we see that under an acceptable cumulative dose, the probabilities of infection for

each age group are .05, .01, .03, respectively. If we interpret these probabilities as the probability of a

person being infected per year, then these translate to rates of infection of 5000 (4300, 5800) cases, 1000

(900,1100), and 3000 (2600, 3400) per 100000 persons per year, respectively, which increase under the

model to 5940(5100,6800), 1510 (1400, 1600), and 3270(3300, 4200) per 100000 per year.

3.6 Discussion

3.6.1 Framework

It should be noted that we have used the MCMC capacity of Winbugs in model 2, although any Monte

Carlo framework would have been adequate. However, we wished to draw attention to the fact that

both Bayesian nets and Gibbs sampling take place in a directed acyclic graph (DAG) framework, with

simplifying Markovian properties, and to emphasize the common framework. We also chose to use this

framework, because it is simple, explicit and transparent, something which cannot be said of spreadsheet

frameworks, where considerable detective work is required to determine both what has been done and in

what sequence. The Winbugs software has now been freely available for many years and is robust and

well-supported by its many users, so it seemed sensible to demonstrate its use as another, transparent,

option in the armoury of risk assessment.

3.6.2 Internal validation

Inspection of the number of links between nodes in the BN and the final outcome of ‘Gastroenteritis’,

indicates that this network is deep, with ‘Primary Source Water’ being six links from ‘Gastroenteritis’.

With most probabilities near to the extremes of zero and one, a node at even a depth of three from

a later node is insufficient to induce a substantive impact on that node. Thus, most of the network

describing the catchment up to the endpoint distribution has little influence on the findings with respect

to ‘Gastroenteritis’, while within the subnetwork leading to endpoint distribution, neither the ‘Other



Source Water’ nor the ‘Primary Source Water’ have much impact. This observation is supported by

the mutual information results of Table 3.2 which confirmed that the network comprises two strongly

distinct components, representing the water distribution subnetwork and the health outcome subnetwork.

Moreover, these results showed that other factors in the health outcome subnetwork largely mitigate the

impact of the endpoint distribution node.

A property of the structure of the network is that it essentially consists of two sub-networks linked

by only one node-to-node link from the endpoint distribution node to the pathogen load node. The first

subnet describes the water distribution, while the second describes the health effects. This impacts on the

sensitivity of the network to changes in inputs. The structure, combined with the depth of the network

and the more extreme probabilities associated with each node, results in the second half of the network

(the part which follows on from the endpoint distribution) being largely insensitive to any findings in the

earlier subnetwork, just as the distribution subnetwork shows little sensitivity to any node further than

two nodes away. A consequence is that there is little need to include additional complexity in the model

through, for example, feed-back loops since the nodes of such loops would be even further away and

even less likely to exert an influence.

Within the health outcome subnet, the unplanned/planned use node clearly envisages water of a po-

tentially less than potable quality which may carry an unacceptable pathogen concentration. The structure

of this subnet presupposes that unplanned usage has the same probabilities for all segments of the pop-

ulation. However, in light of the various proposed reuses, unplanned usage by joggers and swimmers,

for example, may more probably be age-related, than the consumption of contaminated foodstuffs which

may apply more broadly across the population. This problem could be addressed by allowing the Age

node to also be a parent to the ‘Planned/Unplanned usage’ node. Furthermore, the population-wide inter-

pretation of the probabilities in the BN does not explicitly acknowledge that for many enteric diseases,

while the index case may have been exposed via water, subsequent cases may be largely due to person-to-

person contact. This could also be accommodated by a structural change to the network or a redefinition

of nodes.

Other structural models could also be considered. For example, Pike [2004] describes a BN which

shows the possibility of all treatment processing nodes being bypassed and which is able to be verified

by monitoring data. This BN has a flatter, less vertical structure than that of the BN used here and also

includes nodes with more than one child-node, with the advantage that nodes with low/high probabil-

3.6. Discussion 117

ities are allowed greater possibility of being influential than in the BN used here. Pike’s network also

represents the possibility of complete failure to process (for example when a plant goes to bypass). In

any system where the whole of the wastewater plant output flows to a river and is reused downstream,

one should take into account the frequency of the sewage treatment plant going to bypass. However,

it is acknowledged that this is not a problem imposed by recycling, but rather a problem of wastewater

treatment.

3.6.3 Discussion of the results

The aim of this paper was to illustrate how the risks of gastroenteritis posed by the use of recycled

water could be represented using a Bayesian network. The network approach provides a full description

of relevant nodes, levels, probabilities and ranges of uncertainty for these probabilities, and allows us

to determine factors and links having most influence in the model. This summary is not intended to

provide a commentary on a particular recycled water system, but rather to indicate how risk may be

conceptualised in a network. In this context, we have used an expert’s opinion to populate the nodes, but

networks may also be built based upon group opinion or may use inputs drawn from more quantitative

sources, as in a QMRA.

Comparing the two models, it was apparent that substantial uncertainty was added by the inclusion

of the beta priors. The addition of uncertainty to the network has brought considerable variation to the

predictions. For example, the relative risk for a child of age less than five, when the endpoint distribution

fails is 1.21, but the 95% credible interval for the RR is wide (.5, 2.0), with a corresponding probability

point estimate of .0605 and 95% credible interval of (.024, .102). Public health implications and con-

sequent decisions may vary considerably on inspection of the range and bounds of such intervals. This

addition of credible intervals on point estimates of outcome probabilities and relative risks is arguably a

valuable addition to the analytical and inferential results of the BN.

The network approach allowed identification of the nodes that contributed most to the outcome of

gastroenteritis. These were cumulative dose, age, exposure period and pathogen intake. In summary,

based on the conceptual model and expert-based probabilities, the BN revealed an overall risk for gas-

troenteritis of 1.38 relative to that for an acceptable cumulative dose, with a 95% credible interval (1.3,

1.4) based on a population of 50000. The relative risks varied over age cohorts and, as expected, had

point estimates greater than one under adverse scenarios. For example, with a failure at the endpoint



distribution, the relative risk became 1.42 (1.0, 1.8).

With large populations, small percentage changes may represent many people with an increased

chance of contracting diarrhoea as a consequence of exposure to recycled water. Thus, .0208, the prob-

ability of gastroenteritis under the default values, represents some 1040 extra cases with a 95% credible

interval of (950, 1100) cases. The addition of uncertainty to the predictions is a useful addition to the

potential inferences of the Bayesian net.

A difficulty for the suggested methodology for calculation of credible intervals occurs when the

condition implies a relatively small subset of the population. This may be overcome in two ways: by

considering just the subset and its links under the given condition, or by sampling from a sufficiently

large base population which allows the sampling to occur without null subsets. (Note that for a BN this

is never a problem: all conditional probabilities relating to the model may be found, regardless of the

closeness of a probability to zero or one, or of the depth of the network.)

The BN described in this paper was developed to reflect a conceptual model for health risks asso-

ciated with recycled water proposed by two experts in the field (Cook/Roser). This paper is intended to

contribute to the available tools for assessing this important environmental health issue, and in particular

to contribute a methodology for quantifying the uncertainty of point estimates arising from BNs.

3.7. Tables 119

3.7 Tables



Table 3.1 Settings

Node Name Value p

14 Age1 0-4 years 0.06715-64 years 0.809665+ years 0.1233

Node Name Description p α β

1 PSW Primary Source Water Meets2 0.01 6.751 668.375 OSW Other Source Water Meets 0.01 0.599 59.256 Rep Reprocessing Meets 0.99 59.2524 0.598517 OPPS Other planned/unplanned supply Meets 0.80 48.372 12.0938 Puse Planned/Unplanned use Planned 0.90 30.217 3.3574410 EP Exposure period Short 0.90 30.217 3.3574412 PU Pathogen uptake Low 0.50 47.52 47.52

Node Name p α β

2 Primary Treatment(Meets) PSW OSW

(PT) 0 0 1 .960 55.687 2.321 2 .980 117.083 2.39

1 0 3 .990 375.525 3.791 4 .998 5.135 0.01

3 Storage(Meets) Rep PT

0 0 1 .800 48.372 12.091 2 .990 59.252 0.6

1 0 3 .900 30.217 3.361 4 .990 239.98 2.42

4 Endpoint Distribution (Meets) OPPS Storage

(ED) 0 0 1 .700 8.335 3.571 2 .800 21.054 5.26

1 0 3 .900 30.217 3.361 4 .999 152.358 0.15

9 Pathogen Load(Low) ED Puse

(PL) 0 0 1 .700 8.335 3.571 2 .950 68.391 3.6

1 0 3 .970 172.529 5.341 4 .999 152.358 0.15

11 Cumulative Dose(Accept3) PU EP PL

(CD) 0 0 0 1 .700 8.335 3.570 1 2 .800 7.068 1.771 0 3 .900 30.217 3.361 1 4 .970 172.529 5.34

1 0 0 5 .930 92.103 6.930 1 6 .950 68.391 3.61 0 7 .980 117.083 2.391 1 8 .999 152.358 0.15

13 Gastroenteritis(Yes) CD Age

0 1 1 .400 8.82 13.232 2 .200 12.093 48.373 3 .300 10.456 24.4

1 1 4 .050 3.6 68.392 5 .010 3.739 375.533 6 .030 5.33595 172.529

1. Based on the Australian Population census 2000. p values common to both Model 1 & 2.2. Meets an acceptable standard.3. An acceptable dose.p gives the settings for Model 1.α, β give the settings for the Beta prior for the corresponding node in Model 2.

3.7. Tables 121

Table 3.2 Sensitivity of two nodes: Gastroenteritis & Endpoint Distribution

Mutual Variance Distance fromNode Information of Beliefs Relevant Node

Gastroenteritis (to the findings at each node)

13 Gastroenteritis .14584 .0203551 011 Cumulative Dose .01498 .0011554 114 Age .00436 .0001595 110 Exposure Period .00137 .0000480 212 Pathogen Uptake .00068 .0000191 29 Pathogen Load .00002 .0000006 24 Endpoint Distribution .00000 .0000000 38 Planned/Unplanned Use .00000 .0000000 37 Other planned/unplanned supply .00000 .0000000 41 Primary Source Water .00000 .0000000 65 Other Source Water .00000 .0000000 62 Primary Treatment .00000 .0000000 56 Reprocessing .00000 .0000000 53 Storage .00000 .0000000 4

Endpoint Distribution (to the findings at earlier nodes-Half model)

4 Endpoint Distribution .25206 .0403721 07 Other planned/unplanned supply .08803 .0063370 13 Storage .00151 .0001320 12 Primary Treatment .00005 .0000031 26 Reprocessing .00000 .0000000 25 Other Source Water .00000 .0000000 31 Primary Source Water .00000 .0000000 3

Table 3.3 Model Comparisons: Marginal probabilities & 95% credible intervals

ProbabilityNode Name Value Model 1 Model 2

Complex Nodes

2 Primary Treatment Meets .9605 .9605 (.959, .962)3 Storage Meets .9864 .9864 (.985, .987)4 Endpoint Distribution Meets .9579 .9579 (.956, .960)9 Pathogen Load Acceptable .9931 .9931 (.992, .994)11 Cumulative Dose Acceptable .9732 .9732 (.972, .975)13 Gastroenteritis Yes .0208 .0207 (.019, .022)

Simple Nodes

1 Primary Source Water Meets .01 .01 (.004, .019)5 Other Source Water Meets .01 .01 (.000, .046)6 Reprocessing Meets .99 .99 (.954, 1.000)7 Other Planned/Unplanned Supply Acceptable .80 .80 (.692, .889)8 Planned/Unplanned Use Planned .90 .90 (.778, .974)10 Exposure period Short .90 .90 (.780, .976)12 Pathogen uptake Low .50 .50 (.399, .600)

These probabilities are given to at most 3 significant figures since .9605 = 1 − .0395



Table 3.4 Model comparisons for Gastroenteritis under various conditions

p(Gastroenteritis)Condition Model 1 Model 2

(No conditions) .0208 .0207 (.019, .022)(0-4 yrs) .0594 .0594 (.051, .068)

(5-64 yrs) .0151 .0150 (.014, .016)(65+ yrs) .0372 .0372 (.033, .042)

Cumulative Dose Acceptable .0151 .0151 (.014, .016)(0-4 yrs) .0500 .0501 (.043, .058)

(5-64 yrs) .0100 .0099 (.009, .011)(65+ yrs) .0300 .0300 (.026, .034)

Endpoint Distribution Fails .0214 .0214 (.015, .028)(0-4 yrs) .0605 .0605 (.024, .102)

(5-64 yrs) .0157 .0156 (.010, .022)(65+ yrs) .0381 .0381 (.016, .063)

Endpoint Distribution Fails .0236 .0236 (.005, .046)& Planned/Unplanned Use is unplanned (0-4 yrs) .0641 .0639 (0, .222)

(5-64 yrs) .0177 .0175 (0, .040)(65+ yrs) .0409 .0410 (0, .130)

RR(Gastroenteritis)Condition Model 1 Model 2

1.38 1.38 (1.3, 1.4)(0-4 yrs) 1.19 1.19 (1.1, 1.3)

(5-64 yrs) 1.51 1.52 (1.4, 1.6)(65+ yrs) 1.24 1.24 (1.2, 1.3)

Endpoint Distribution Fails 1.42 1.42 (1.0, 1.8)(0-4 yrs) 1.21 1.21 (.5, 2.0)

(5-64 yrs) 1.57 1.58 (1.0, 2.2)(65+ yrs) 1.27 1.27 (.6, 2.1)

Endpoint Distribution Fails 1.56 1.57 (.3, 3.1)& Planned/Unplanned Use is unplanned (0-4 yrs) 1.28 1.28 (0, 4.5)

(5-64 yrs) 1.77 1.78 (0, 4.1)(65+ yrs) 1.36 1.37 (0, 4.5)

3.8. Figures 123

3.8 Figures



Figure 3.1 Conceptual model of Cook and Roser1

COMPONENT 1: RECYCLED WATER PROCESSING AND DISTRIBUTION PATHWAYS

I

I

I

l. PRIMARY SOURCE WATUt

! .), PRIMARY TR£A.TM(NT PR()(($$£5

! ft, HVOAAVUC 0\'NAMIC$ A.NO STOAAG£

PAitAM£TE~

l

J I

~";o~ ./

1 .OTHER SOURCE WAT(R tNPUTS

... REPROCESSIIlG

$. Ol14£il. PlJ,Nti(O !R UtiPLANtt£0 SUPPLY ~ ~ vn

I 7. EtlOPOitUDtSTRJBUTIOH I SUPPLY I

I

I

I

I

COMPONENT 2: EXPOSURE PATHWAYS AND POPULATION$ (actual and potential)

IJ .PLMI.NED OR REGULA lED WATER UTILISATIOII OR CONTACT 8Y END USER

9.UH:PU.NIIED OR IIOII·REGULATEO WATER Ul'IUSATIOH OR CONTACT

COMPONENT 3: CUMULATIVE END·USER DOSE

COMPONENT 5: COMPONENT 4: ~-·~~--IDENTIFIED TOXICITY/ 0 PATitOGENSI (H[MJCALS OF INDIVIDUAL

PATHOGENICITY CON" AN COVARIATES

PATHWAYS I ~ I 11 ... mw<Eor INDIVIDUAL COVARIATCS 12. PATHWAYS TO I 810LOGICAL Eff'ECT

/ COMPONENT 6: HEALTH ENDPOINTS

I 1l. PROJ£(l[0 DEVUOP.¥.EHT OF ACUTt I I 1•. PROJECTED OEVELOPA\ENT Of LONG· ADVERSE HEALTH ENDPOIIOS TERM (CHROIIK) AOVEitSE HEAlTH

(NOPOIIUS

r== ~

= ~ + C:TUAL OEVELOPMEKT Of LON~ TERM

I ACTUAL O(V(LOPM(NT OF" ACUTE

I HRONKJ ADVEIIS£ Hr.AL TH ENDPOIIITS I

ADVERS.E HEAL TU [HOP01t41S

===="

3.8. Figures 125

Figure 3.2 Bayesian network based on the conceptual model: Node numbering is that used inthe text and in the WinBUGS model



Figure 3.3 Relative risks for each age group (0-4, 5-64, 65+) and for the entire population (All)for each risk scenario, estimated from the BN (Model 1) and by MCMC (Model 2),with 95% credible intervals from Model 2.

BIBLIOGRAPHY 127

Supporting information

The WinBUGSs code for the model is available from the first author.

Acknowledgments

The models described in this paper have been developed as part of the research project “Assessing the

Public Health Impacts of Recycled Water Use” that received funding from the Western Australian Gov-

ernment through the Department of Water’s Water Fund. The project is led by the University of Western

Australia and partnered by Queensland University of Technology and the Western Australian Department

of Health.

Dr Ken Hillman and the Simpson Centre for Health Services Research, Liverpool Hospital, NSW

supplied the HIE data.

Bibliography

Anderson, J. M. (2007). AWA water recycling forum position paper: Water recycling to meet our water

needs. In S. J. Khan, R. M. Stuetz, and J. M. Anderson (Eds.), Water Reuse and Recycling 2007.

Sydney: UNSW Publishing & Printing Services.

Anon. (2008). Accessed, June, 2008: http://www.nationmaster.com/country/as-australia/.

Arrowood, M. J., P. J. Lammie, J. W. Priest, D. G. Addiss, M. R. Hurd, W. R. MacKenzie, A. C. McDon-

ald, M. S. Gradus, G. Linke, and E. Zembrowski (2001). Cryptosporidium parvum-specific antibody

responses among children residing in Milwaukee during the 1993 waterborne outbreak. Journal of

Infectious Diseases 183(9), 1373–1378.

Asano, T. (1998). Wastewater reclamation and reuse. Water Quality Management Library ; V. 10.

Lancaster, Pa.: Technomic Pub.

Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson (2005). Mi-

crobial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers University of Tech-

nology.



Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg (2002).

Statistical estimation of parameters in a disease transmission model: analysis of a Cryptosporidium

outbreak. Statistics in Medicine 21, 3627–3638.

Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic Networks and

Expert Systems. Springer.

Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an epidemi-

ological framework. Risk Analysis 16, 549–563.

Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An analysis

of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection process.

Epidemiology 9(3), 255–263.

Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.), Water

Quality: Guidelines, Standards and Health. WHO.

Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New York:

Wiley.

Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen (2007).

Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula bloom initiation

in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–1287.

Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6, 2008.


Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling

framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.

MacKenzie, W., N. J. Hoxie, M. E. Proctor, M. S. Gradus, K. A. Blair, D. E. Peterson, J. J. Kazmierczak,

D. G. Addiss, K. R. Fox, J. B. Rose, and J. P. Davis (1994). A massive outbreak in Milwaukee

of cryptosporidium infection transmitted through the public water supply. New England Journal of

Medicine 331(3), 161–167.

BIBLIOGRAPHY 129

MacKenzie, W. R., W. L. Schell, K. A. Blair, D. G. Addiss, D. E. Peterson, N. J. Hoxie, J. J. Kazmierczak,

and J. P. Davis (1995). A massive outbreak of waterborne cryptosporidium infection in Milwaukee,

Wisconsin: Recurrence of illness and risk of secondary transmission. Clinical infectious diseaseas 21,

57–62.

Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification & risk

assessment for drinking water supplies. Technical report, CRC for Water Quality and Treatment.

National Notifiable Diseases Surveillance System (2008). National Notifiable Diseases Surveillance

System. Available on: http://www9.health.gov.au/cda/Source/CDA-index.cfm. Accessed:

April 9, 2008.

Natural Resource Management Ministerial Council, Environment Protection and Heritage Coun-

cil, and Australian Health Ministers Conference (2006). Australian Guidelines for Wa-

ter Recycling: Managing health and environmental risks (Phase1) 2006. Available on:


Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian networks for water quality prediction

in Sydney Harbour. Available online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Ac-

cessed: March 27,2008.

Norsys Software Corp. (2007). Netica 3.25. Available online: www.norsys.com. Accessed February

15, 2008.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference. San

Mateo, California: Morgan Kaufmann Publishers.

Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal of the

American Water Resources Association 40(6), 1563–1578.

Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of ecotoxicolog-

ical information in risk assessments. Australian Journal of Ecotoxicology 11, 57–58.

Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elicitation for

parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman, and A. Rizzoli

(Eds.), Third Biennial Meeting: Summit on Environmental Modelling and Software, Burlington, USA.



Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Parameterisation and

evaluation of a Bayesian network for use in an ecological risk assessment. Environmental Modelling

and Software 22, 1140–1152.

Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to estimate

baseline and hazardous event risks with management end uses in mind. Technical report, MicroRisk

project co-funded by the European Commission under the Fifth Framework Programme, Theme 4:

Energy, environment and sustainable development (contract EVK1-CT-2002-00123).

Signor, R. S. (2007). Probabilistic Microbial Risk Assessment & Management Implications for Urban

Water Supply Systems. Ph. D. thesis, UNSW.

Snow, J. (1849). On the mode of communication of cholera. London: John Churchill.

Snow, J. (1855). On the mode of communication of cholera (2nd Edition ed.). London: John Churchill.

Spiegelhalter, D. (1989). A unified approach to imprecision and sensitivity of beliefs in expert systems. In

L.N.Kanal (Ed.), Uncertainty in Artificial Intelligence 3. North Holland: Elsevier Science Publishers

B.V.

Spiegelhalter, D. J., N. L. Harris, K. Bull, and R. C. G. Franklin (1994). Empirical-evaluation of prior

beliefs about frequencies - methodology and a case-study in congenital heart-disease. Journal of the


Varis, O. (1995). Belief networks for modelling and assessment of environmental change. Environ-

metrics 6, 439–444.

Varis, O. (1997). Bayesian decision analysis for environmental and resource management. Environmental

Modelling and Software 12(2-3), 177–185.

Varis, O. (1998). A belief network approach to optimization and parameter estimation: application to

resource and environmental management. Artificial Intelligence 101(1-2), 135–163.

World Health Organization (2008). ICD-10 Classification of Diseases. Available online:

www.cdc.gov/nchs/data/dvs/2008Vol1.pdf. Accessed: April 10, 2008.

3.9. Addendum 131

3.9 Addendum

Reconsidering the Model 2 BN for diarrhoea, that is, the BN for which not only probabilities, but also

uncertainty has been elicited for the conditional probability tables (Figure 3.2), we see that

p(Gastro = 1) = p(Gastro = 1|CD = 0, Age < 5)p(CD = 0)p(Age < 5)+

p(Gastro = 1|CD = 0, 5 ≤ Age < 65)p(CD = 0)p(5 ≤ Age < 65)+

p(Gastro = 1|CD = 0, Age ≥ 65)p(CD = 0)p(Age ≥ 65)+

p(Gastro = 1|CD = 1, Age < 5)p(CD = 1)p(Age < 5)+

p(Gastro = 1|CD = 1, 5 ≤ Age < 65)p(CD = 1)p(5 ≤ Age < 65)+

p(Gastro = 1|CD = 1, Age ≥ 65)p(CD = 1)p(Age ≥ 65)

(3.1)

Van Allen et al. [2001, 2008] demonstrate a method whereby uncertainty of any kind may be prop-

agated through a network. They show that a query response is asymptotically Gaussian and provide its

mean value and asymptotic variance. However, Figures 3.4- 3.5 (where the population was taken as 50

000 and these figures represent 6000 MCMC iterations), show clearly defined mixtures of the proportions

satisfying (1) getting Gastroenteritis in any of the different agegroups, and (2) getting Gastroenteritis in

any of the different agegroups when the endpoint distribution (ED) fails.

Within the BN the probabilities for the age groups are given as constants without error. Thus, the

distribution of p(Gastro = 1) is a mixture of six distributions. If we consider the first right hand term of

Equation 3.1, the probability (p(Gastro = 1|CD = 0, Age < 5)) was set as a Beta(8.82, 13.23) distribution

(Table 3.1), which has a mean value of 0.4 with 95% of its distribution lying in the interval (0.21, 0.61).

The probability, p(CD = 0) has a mean of 0.027 with 95% of its distribution lying in the interval (0.025,

0.028)∗∗, while the p(Age < 5) equals 0.0671 (Table 3.1). In comparison, the fifth term, p(Gastro =

1|CD = 1, 5 ≤ Age < 65)p(CD = 1)p(5 ≤ Age < 65), uses p(Gastro = 1|CD = 1, 5 ≤ Age < 65),

which from Table 3.1 was set as a Beta(3.739, 375.53) with a mean of 0.01 and 95% of which lies in the

interval (0.003, 0.022), p(CD = 1), which from Table 3.3 has a mean of 0.97 with 95% of its distribution

in (0.972, 0.975), and p(5 ≤ Age < 65) is a constant 0.8096 (Table 3.1).

In embedding the essential DAG within the population loop, no evidence is added to the nodes. The

structure of the BN remains unchanged, as do the Beta priors reflecting the elicited uncertainty. We

∗∗Derived by subtraction from 1 of the results for Cumulative Dose (acceptable) in Table 3.3



would argue that with the relatively extreme binomial probabilities of the conditional probability tables,

plus their (elicited) relatively narrow 95% bands in this BN (Table 3.1), the structure of almost discrete

probability mixtures at each node is preserved. Van Allen et al. [2008] note that their normal result is

asymptotic only, and show examples which seem well approximated by beta distributions, but do not

envisage distributions such as that shown in Figure 3.5. From our work, it is clear that the result of

querying a Bayesian net does not necessarily result in a normal or a beta distribution (or a binomial

distribution) in the case of a finite population, particularly when the evidence is overwhelmingly strong

in a particular direction. A further difference between our work and that of Van Allen et al. [2008]

is that they assume a Dirichlet distribution at each node of the conditional probability tables, and they

populate their nodes by simulation. The major difference between their work and ours possibly lies in the

extremely small populations in our BN for some of the queries, despite an overall population of 50,000,

which means that Van Allen’s asymptotics become irrelevant (as they, themselves, note when saying that

Beta distributions fit the error distributions of the queries better than the normal in some cases). As can

be seen from Table 3.5, an overall population of 50,000 gives rise to an expected count of 14 children

under the age of 5 when both the Endpoint distribution and Unplanned usage nodes fail. Van Allen et al.

[2008] argues the asymptotics for any query within a BN. However, our work shows that it might be

foolish to rely on asymptotics with finite populations, when forming credible intervals for BNs.

Table 3.5 Expected subgroup sizes for BN (Model 2)

Condition E(n)Condition

All populationsBase Rate 50000Cumulative dose acceptable 48660Endpoint distribution fails 2105EndpointDistribution fails & UnplannedUsage fails 211

Age <5Base Rate 3405Cumulative dose acceptable 3265Endpoint distribution fails 141EndpointDistribution fails & UnplannedUsage fails 14

For a population of 50000

3.9. Addendum 133

Figure 3.4 Distribution of the probability of being infected with gastroenteritis.

Figure 3.5 Distribution of the probability of being infected with gastroenteritis when the end-point distribution fails.



Bibliography


Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle.

Citeseer.

Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a Belief Net

response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172, 483–513.



1. they meet the criteria for authorship, in that they have participated in the conception, execution, or interpretation , of at least that part of the publication in their field of expertise;






Tit le: Incorporating parameter uncertainty into Quantitive Microbial Risk Assessment

Journal: Journal of Water and Health Status: Accepted for publication

Contributor

Margaret Donald

Or Simon Toze

Or Jatinder ~ SI!> If V

Or Angus Cook

Professor Kerrie Mengersen

Statement of Contribution

Margaret Donald as first author was responsible for the concept ~<_-..9-;~of the paper, data analysis, interpretation and the writing of all drafts.

Was responsible for advice on microbiological issues and editorial comment

Was responsible for editorial comment

Was responsible for editorial comment

Was responsible for general advice and for editorial comment.

Principal Supervisor's Confirmation

Date

..2o fO I

I have sighted email or other correspondence from all co-authors confirming their certifying

authorship. /' ; • U ... (~v< t~ .--tlfM ~J v A...A-- to /uttf I)

Name Signature Date

Chapter 4

Incorporating parameter uncertainty

into Quantitative Microbial Risk

Assessment (QMRA)

4.1 Preamble

This chapter has been written as a journal article, and addresses research objective (2). The purpose was

to build a graphical model based on the flow diagrams used in a typical Quantitative Microbial Risk As-

sessment (QMRA), and rather than using plug-in estimates allow the data behind those plug-in estimates

to contribute the appropriate uncertainties to the risk via a complex hierarchical model. Thus, the impor-

tant contribution is to not use plug-in estimates for the many parameters required, but to build a graphical

model to incorporate all the data used for estimation of such parameters. This means that we largely dis-

pense with assumptions about normality, and no longer ignore parameter estimates’ correlations, since

the estimation process incorporates these automatically into the risk assessment. Additionally, and im-

portantly, we include an errors-in-variables model within the graphical model to estimate the parameters

for the risk of infection equation based on McCullough and Eisele [1951]’s experiments. The software

used is WinBUGS [Lunn et al., 2000]. Hence, the methodology is easily accessible to any practitioner in

the field.

135

136CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY

INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)

This paper does several things. It incorporates disparate primary data to estimate the parameters

required in a risk assessment within the risk assessment, thereby incorporating automatically the patterns

of parameter correlations which are not necessarily bivariate normal, together with all their uncertainty.

Haas et al. [1999] suggests dealing with this using a two-stage approach, with the parameter uncer-

tainty being captured by 2,000 bootstrap samples, and alternatively, via rank correlations and a copula,

back transforming to known marginal distributions [Haas, 1999]. Secondly, it uses an errors-in-variables

model for the infection model, which despite all the analyses and reanalyses of these data, see e.g., Haas

et al. [1999], Oscar [2004], Teunis et al. [1997] has not been done before.

In our view, the method and the incorporation of an errors-in-variables model for the parameter

estimates for the risk of infection are important contributions to finding more realistic error estimates in

quantitative microbial risk assessments.

I am the principal author and it is reprinted here in its entirety, but with different bibliographic con-

ventions from the Journal of Water and Health, for which it has been accepted (May 2010). Jatinder

Singh provided the microbiological data and editorial comment. Simon Toze helped with interpretation

and understandings of the microbiological data and provided editorial comment. Kerrie Mengersen over-

saw and guided the exposition. Margaret Donald as first author was responsible for the concept of the

paper, data analysis, interpretation, writing all drafts and ddressing reviewers’ comments.

Title: Incorporating parameter uncertainty into Quantitative Microbial Risk Assessment (QMRA)

Authors: Margaret Donalda, Kerrie Mengersena, Simon Tozebc, Jatinder Singhb, Angus Cookd.

aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,QLD 4001, Australia.

bCSIRO Land and Water, Queensland Biosciences Precinct, 306 Carmody Road, St Lucia, QLD4067, Australia.

cSchool of Population Health, University of Queensland, Herston Road, Herston, QLD 4006, Aus-tralia.

dThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia.

Bibliography

Haas, C. N. (1999). On modeling correlated random variables in risk assessment. Risk Analysis 6,

1205–1214.

BIBLIOGRAPHY 137


Wiley.



McCullough, N. B. and C. W. Eisele (1951). Experimental human salmonellosis: I. pathogenicity of

strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried whole egg. The


Oscar, T. (2004). Dose-response model for 13 strains of Salmonella. Risk Analysis 24(1), 41–49.

Teunis, P. F. M., G. J. Medema, L. Kruidenier, and A. H. Havelaar (1997). Assessment of the risk

of infection by Cryptosporidium or Giardia in drinking water from a surface water source. Water

Research 31, 1333–1346.

4.2 Incorporating parameter uncertainty into Quantitative Micro-

bial Risk Assessment (QMRA)

Abstract

Modern statistical models and computational methods can now incorporate uncertainty of the parameters

used in Quantitative Microbial Risk Assessments (QMRA). Many QMRAs use Monte Carlo methods,

but work from fixed estimates for means, variances, and other parameters. We illustrate the ease of

estimating all parameters contemporaneously with the risk assessment, incorporating all the parameter

uncertainty arising from the experiments from which these parameters are estimated. A Bayesian ap-

proach is adopted, using Markov Chain Monte Carlo Gibbs sampling (MCMC) via the freely available

software, WinBUGS.

The method and its ease of implementation are illustrated by a case study that involves incorporating

three disparate datasets into an MCMC framework. The probabilities of infection when the uncertainty

associated with parameter estimation is incorporated into a QMRA are shown to be considerably more

variable over various dose ranges than the analogous probabilities obtained when constants from the

literature are simply ‘plugged’ in as is done in most QMRAs. Neglecting these sources of uncertainty

may lead to erroneous decisions for public health and risk management.

Keywords

parameter uncertainty; MCMC; Quantitative Microbial Risk Assessment (QMRA); recycled water; risk

assessment; Salmonella spp.

4.3. Introduction 139

4.3 Introduction

In Australia, Quantitative Microbial Risk Assessment (QMRA) is recommended as the method of choice

for assessing health risks from exposure to pathogens in recycled water, e.g., Natural Resource Manage-

ment Ministerial Council et al. [2006]. The particular application examined in this paper is the risk of

microbial infections associated with exposure to recycled water.

This paper presents a modification to the standard QMRA methodology, in which the risk assessor

typically finds various quantities of interest, such as dose-response, die-off and/or log-reduction parame-

ters, and plugs these quantities into the risk assessment model. There is often little acknowledgement of

the fact that these quantities are uncertain. We contrast this ‘plug-in’ approach with an approach based

on a Bayesian risk assessment model, in which all the data which have been used to produce the quanti-

ties of interest necessary to the risk assessment are included. The uncertainty associated with the model

parameters is therefore propagated throughout the analysis. This may be considered an extension of the

standard QMRA model.

To illustrate the approach, we consider the probability of a person becoming infected with Salmonella

spp. after being exposed to recycled wastewater. The scenario is not drawn from actuality but is designed

to illustrate the extension of the standard QMRA methodology. In the illustration, we ignore the problems

of dose estimation, and investigate the part of risk estimation for which we have data available.

The paper is arranged as follows. First a standard QMRA method is outlined, followed by a brief

description of the extended method, together with the datasets which will be used to illustrate it. The

conceptual and statistical models into which these data are incorporated are then detailed, and the results

are compared with those one would obtain without the incorporation of parameter uncertainty. This

case study demonstrates that considerable uncertainty is induced in the probability of infection when

this Bayesian approach is adopted. In the discussion, we elaborate the differences seen between the two

methods and note the simplicity of our method.



4.4 Methods

4.4.1 Standard QMRA methodology

A QMRA requires a knowledge of pathogen numbers at some stage of the treatment process, generally

in the influent. Estimates for log-reductions for various water treatment processes from pilot or other

studies are then needed to estimate pathogen numbers in the treated water. A mechanism of ingestion,

and an amount of the treated recycled water ingested must be postulated or found. This, together with the

pathogen numbers in the treated water, allows estimation of possible microbial doses. Finally, specifica-

tion of a dose-response curve for the microbe of interest is needed to allow estimation of the probability

of infection given a particular dose. A natural representation for a QMRA is via a graphical model such

as Figure 4.1.

The QMRA of Figure 4.1 shows the steps for assessing the risk associated with eating a crop irrigated

with recycled water. In such a figure, nodes without parents need information in order to run the risk

assessment. Thus, for a standard QMRA, reading down the figure and from left to right, we need

1. A description of the microbe numbers in either the waste water, or in the final treated water. Typi-

cally, if Salmonella spp. is sampled at all, it is sampled in the wastewater and may be described as

coming from a log normal distribution with mean, µ and possibly a standard deviation σ.

2. ‘Log reductions’ in order to estimate the microbial numbers in the treated water. Water treatments

are generally thought to reduce the numbers of pathogens at a rate proportional to the influent

numbers of the pathogen in the water. This may be expressed in terms of log base 10, when it may

be referred to as ‘log reduction’, or a decimal elimination capacity (DEC); see, e.g., Hijnen et al.

[2007, 2005, 2004]. However, the DEC is typically given by a single number, e.g. 3 which would

mean that log10 Cin f luent − log10 Ce f f luent = 3, where Cin f luent is the number/L in the influent, and

Ce f f luent, the number/L in the effluent. Such a log reduction would imply that the effluent numbers

are one thousandth those of the influent. To find these, published or grey literature involving the

particular treatment type for a particular plant is searched.

3. A die-off constant k or T90 time to 90% die-off. In the case study, where a field is irrigated with

recycled wastewater, it is expected that sunlight will kill particular microbes at a rate proportional

to their number. I.e., dNdt ∝ N or Nt = N0e−kt, where k is sometimes referred to as the die-off

4.4. Methods 141

constant. Other equations may be used, but this is a reasonably common approximation to die-off

for some organisms, and is a good fit for the data used in this case study. Sinton et al. [2007] use

a ‘shoulder’ equation∗, but as is common, the quantities in their various equations are given as

constants, with no error indicated.

4. Sunlight and shade hours, for the locality in which the recycled water is to be used.

5. A suitable amount of crop/water ingested by a person. One may use survey data if available, or

use choices made by other researchers, for example, Tanaka et al. [1998]. (Typically such data are

supplied as constants.)

6. An equation and the parameters which describe the dose-response, i.e., the probability of be-

coming infected, having ingested a particular dose of the microbe. For Salmonella, the equation

usually used is Beta-Poisson, and from p 401 of Haas et al. [1999], the risk assessor would select

α = .3126 and N50 = 2.36 × 104, to give the probability of infection, P, from a given dose D,

where D is the number of microbes ingested, as: P = 1 −(1 + D

N50(21/α − 1)

)−α. In an alternative

parameterisation, we have P = 1 −(1 + D

β

)−α, where β = N50(21/α − 1) ≃ 193120, and N50 is the

number of microbes giving a 50% probability of infection.

Thus, to perform a risk assessment, the risk assessor performs a Monte-Carlo simulation, working

through the graphical model (Figure 4.1). Starting at some stage in the water processing cycle, an initial

number/L of the pathogen is drawn from the water treatment distribution described by constants, µ0, σ0.

This number is then reduced by either the value obtained by drawing a log-reduction value from the DEC

distribution, described by µ1, σ1, or, if no distribution is given or able to be inferred, then reduced by

the DEC, µ1, for the process or processes. In the scenario considered, sunlight is expected to reduce

pathogen numbers, so the die-off equation is used to give a final pathogen number in water which, then,

together with a draw for the quantity of water ingested, gives the number of pathogens ingested. Finally,

the probability of infection is calculated, via a dose-response equation, and a final draw made from a

Bernoulli distribution to simulate the person’s infection status. This is repeated many times to simulate

the risk, resulting in a distribution of the simulated endpoint risk.

For the case study, we consider an abbreviated version of the QMRA of Figure 4.1. This is rep-

resented by Figure 4.2. In this version, the information requirements enumerated above are limited to

∗100[1 − [1 − exp(−kT )]n]



1 (distribution for treated water, not influent), 3 (die-off constants), 4 (sunlight hours), and 6 (dose-

response equation parameter constants). Table 4.1 shows the fixed constants used in the risk simulation

of the QMRA of Figure 4.2.

As can be seen, this is not a risk assessment, since we abstract just a part of the full model, in order

to illustrate more clearly that much uncertainty may fail to be incorporated into risk assessments. In

partial justification, we note that it is generally not thought worthwhile to monitor the end-use water for

the pathogens of interest as it is believed that they will be present in such small quantities and will be

so diffuse within the water body that substantive positive results would only be obtained by processing

impractically large samples. Data on pathogen reductions or log reduction studies exist but have been col-

lected from typically small-scale, short-run experiments, usually in countries with very different climatic

conditions. Moreover, such data may be owned by private utilities and are either not publicly available,

or provided with minimal details. Thus, often only summary statistics or incomplete statistics (at best)

filter into the public domain.

If the risk of a particular health outcome needs to be estimated, available data are even more limited.

For example, Salmonella spp. have been linked with a number of outbreaks in the US, Europe and Japan

Marks et al. [1998]. In Australia, limited data for Salmonella spp. numbers and their inactivation by

various wastewater treatment processes are available. The few studies available are those of Gibbs and

others, and these focus on Salmonella spp. in sludge, rather than within the water fraction [Gibbs, 1995,

Gibbs and Ho, 1993, Gibbs et al., 1995].

Thus, this study takes a small part of the risk assessment process and shows how it may be extended,

by embedding data within a Bayesian framework to estimate the corresponding parameters, and thereby

give better uncertainty estimates. The extended model for doing this is described in the next section.

4.4.2 The extended QMRA model

In the extended model, the small experiments which lead to the various required constants, are incorpo-

rated directly into the risk assessment process, allowing the uncertainty associated with the estimates to

be automatically incorporated into the risk assessment. Thus, Figure 4.3, the extended model, contains

two additional nodes (1 & 7) in comparison with Figure 4.2, the standard QMRA. These nodes repre-

sent the data which give rise to the ‘constants’ fed into the QMRA assessment of Figure 4.2, but in the

extended model are used to derive estimates of the random quantities that describe these data.

4.4. Methods 143

The starting point for the extended model is the graphical representation of the QMRA which is

seen to be a directed acyclic graph or DAG. Thus, a risk assessment may be embedded in a Bayesian

framework, thereby allowing parameters to be estimated simultaneously with the risk assessment. In

the extended model (Figure 4.3), parameters (supplied as constants under Figure 4.2) are both estimated

and used for the derivation of other quantities. Thus, the model descriptions at nodes (2) and (6) are

the explanatory models of the data at the new nodes (1) and (7), and also the means for estimation of

dose after die-off (node 5), and estimation of the probability of a person becoming infected (node 8).

In this Bayesian framework ‘prior’ probabilities (prior beliefs) for the parameters of the explanatory

models are needed, and uninformative priors are used in order that the parameter estimates and the

uncertainty associated with them will closely approximate the maximum likelihood solutions for each

set of parameters and data.

As with the standard QMRA model, a Monte Carlo approach is taken to analyse the extended model.

Here, however, given the Bayesian setup and additional information, a more formal Markov chain Monte

Carlo approach is used to estimate posterior distributions of the various quantities of interest such as

dose-response parameters, die-off parameters, and the risk of infection. The Bayesian framework used

was the freely available WinBUGS [Lunn et al., 2000]. This is described in more detail later, in the

context of the case study.

We now give a more detailed description of the data and the models used to explain them.

4.4.3 Data for the extended model

Three disparate sources of information are integrated into the model described above: die-off data for

S. typhimurium, dose-response data for S. anatum [Teunis et al., 1996], and a short run of weather data

from an Australian city, Perth [Bureau of Meteorology, 2010], giving the number of hours of sunlight in

a summer and a winter month. We also use a fictitious pathogen distribution for the treated water with

a range which allows the possibility of a 100% infection rate. These data sets and distributions are now

described in more detail.

Salmonella dose-response data (Figure 4.3, node 7). In considering the risks of Salmonella spp.

poisoning, we chose to use the S.anatum data presented in the report by Teunis et al. [1996], in which

infection curves were fitted by strain and species. These authors concluded that for S.anatum the three

strains could be grouped together to determine a single dose-response curve, using a likelihood ratio test.



Others have made different choices; thus Haas et al. [1999] used all thirteen species and strains detailed

in Teunis et al. [1996] and discarded some ‘outliers’, after similar testing, as did Oscar [2004]. Each of

these authors’ strategies gives a different set of quantities which the risk-assessor may use, but whatever

parameter estimates he/she uses, they are used without any error being associated with them. Our purpose

was not to determine a best model for dose-response for Salmonella, but to show how to incorporate the

uncertainty associated with the estimation of such model parameters into a risk assessment.

A further reason for using Salmonella dose-response data is that the data are best summarised by a

beta-Poisson dose-response curve, where the probability of infection for a given dose of D microbes, is

given as, P(in f ection) = 1 − (1 + D/β)−α, which is characterised by two parameters (α, β) which are

highly correlated.

Hence, there are two issues here in using these parameters in a risk assessment. Firstly, they are

typically included as point estimates without acknowledgment of uncertainty of specification or of their

correlation. However, even if this uncertainty were included, it is preferable to use the posterior distri-

bution of the two parameters directly instead of making the standard assumption of bivariate normality,

which Teunis et al. [1996] have shown does not hold.

Salmonella typhimurium die-off data (Figure 4.3, node 1). S. typhimurium die-off data from Sidhu

et al. (2008) were supplied by the authors. These data allow the estimation of die-off rates with uncer-

tainty for S. typhimurium under several conditions, winter/summer, sun/shade, grass/thatch. The available

dataset consists of 34 observations. The summer observations were taken over all the combinations of

conditions, but the winter data were for grass only and measured die-off in light and shade, thereby giving

six sets of experimental conditions, and potentially six die-off constants. In the experiment, grass irri-

gated with sterile effluent, was seeded with known numbers of S.typhimurium and samples of the grass

and thatch were then harvested at 1, 2, 4, 6, 7.3 and 9.3 hours after the initial seeding (in summer). Mi-

crobial numbers were counted and averaged for the samples taken at each harvest time and sample type

(sun/shade, grass/thatch). For the winter samples, harvest times were 1, 2, 4, 6, and 8 hours and grass

and thatch were not separated. For further details of the experiment, see Sidhu et al. [2008].

Die-off over time is expected to be proportional to the number of organisms. Thus, dNdt ∝ N. This

equation has the solution Nt = N0e−kt, where k is positive. One can use any base for exponentiation and

this changes the constant k, often referred to as the ‘die-off’ constant. To avoid confusion about bases

for exponentiation, the constant used to express this equation may be given as T90, or the time to 90%

4.4. Methods 145

die-off. Solving .10 = e−kT90 , gives T90 in terms of k and vice versa.

Microbial count numbers are usually thought to be log normally distributed. Hence, the die-off

equation takes the form: log Nt ∼ N(log N0−kt, σ2), or, alternatively. log(Nt/N0) ∼ N(−kt, σ2). We fit the

second version of this equation, for each set of experimental conditions. This was done because greater

effort (in terms of replicates) had gone into finding the value of the initial seedings. The original complete

data had shown differences in the die-off rates for all combinations of sun/shade and winter/summer, but

none for thatch. Hence, we fit four die-off constants, k1...k4 to the model log(Ni,t/Ni,0) ∼ N(−kit, σ2),

where i references the summer/winter, sun/shade combination, and a common (pooled) variance σ2 is

used. Natural logs were used, and the die-off value was not constrained to be negative; indeed, 14%

of the posterior estimates for die-off, in winter and in the shade were positive. When this occurred, a

zero was substituted in the corresponding decay equation in the risk modelling (see below), although

some evidence exists for the regrowth of Salmonella spp. on lettuce leaves under the right circumstances

[Brandl and Amundson, 2008].

The posterior estimates for die-off act on the treated water pathogen number node (‘initial dose’,

node 3 of Figure 4.3), together with the sunlight hours of node 4, to produce the dose after die-off at

node 5. The die-off calculation uses the maximum 17 hour period for winter and summer available for

die-off, based on the irrigation regime for the sports ovals in Perth where the experimental die-off data

were collected. Note that the various values of k are the die-off constants of item 3, in the description of

Standard QMRA Methodology.

Sunlight hours (Figure 4.1, Figure 4.2 and Figure 4.3, node 4). Die-off is a function of sun/shade,

summer/winter. The daily sunshine hours at Perth Airport, for January 2008, and for June 2008 were

supplied by the Australian Bureau of Meteorology (pers.comm, 2010). Rather than work with summary

statistics, or fit a distribution to these data, they were resampled. These data are clearly not normally

distributed (see Figure 4.4), nor are they expected to be, since the number of sunlight hours is bounded

by 0 and the number of possible hours of sunlight on a particular day at the latitude of Perth. Figure 4.4

indicates a mixture of rainy and sunny days and discretization. Given that the data are bounded and

possibly a mixture of distributions, it seemed more sensible to resample, rather than to fit and sample

from an arbitrary distribution.

Doses. (Item 5, Standard QMRA Methodology section and Figure 4.1). No data were used for the

person’s dose. This node is not included in the case study (Figures 4.2 & 4.3).



The Salmonella dose-response curve (Figure 4.5) shows that very high doses of S.anatum are re-

quired for infection. (Using the point estimates found by Teunis et al. [1996], a dose of 400 S. anatum

gives a probability of infection of .01, while for a dose of 1000 the probability of infection becomes .03.)

Given the probabilities of infection, it appears that the dose-response curve applies largely to healthy

adults. Since small children and the elderly are more likely to become ill under the same dosing regime,

a range of doses was induced (via the treated water pathogen numbers distribution) in order to see the

effect of parameter uncertainty over the full dose-response curve.

Under the models used in this case-study, the node “person’s microbe dose” of Figure 4.1 is equated

to “Dose after die-off” (Figures 4.2 & node 5 of Figure 4.3).

Treated water numbers. This node is a derived node in Figure 4.1 but an initial node in Figures

4.2 & 4.3. The pathogen numbers distribution in treated water was ascribed a (natural) log-uniform

distribution over the range (-1, 30). Thus, in this study, both consumption rates and numbers per L of the

pathogen in the recycled or the influent source water were ignored. Instead, an arbitrary distribution was

chosen for the Salmonella spp. numbers distribution in treated water, to allow the possibility of seeing

the effect of the uncertainty in die-off rates and the uncertainty in the dose-response parameters on the

estimate of probability of infection, under many possible scenarios.

Putting it all together

Conceptual Model. The directed acyclic graph for the extended model (the ‘conceptual model’) is given

by Figure 4.3. Here, the Sidhu et al. [2008] data (node 1) are explained by the regression model (equation

3) of node 2 which estimates the die-off parameters. For each iteration of the MCMC algorithm, the four

die-off rates are estimated; a dose sample is drawn from node 3; and a sample is drawn from the sunlight

hour data of node 4. At node 5, the die-off constants for this iteration are applied using (equations 4-8)

for a 17 hour day with the sunlight and shade hours from node 4. When the draw for the winter shade or

sunlight die-off parameter is negative, it is replaced by zero.

Independently, the Teunis et al. [1996] dose-response data for S. anatum (node 7) are explained by the

current estimates of (α, β) (node 6, and fitted using equations 1 & 2), which are used in the same MCMC

iteration (at node 8, using equations 9 & 10) to calculate the probability of infection, thus allowing a

single estimate of the probability of infection (and the infection status of an individual) at each iteration.

Statistical Model. Node 7 contains the dose-response data from Teunis et al. [1996] which may be

4.4. Methods 147

represented as (Di, Ni, Xi), i = 1, ..., 19, where Di is the ith dose, Ni the number of subjects given the ith

dose, and Xi the number of subjects infected by the ith dose.

These are explained by the dose-response equation (node 6, equations 1 & 2) with parameters (α, β).

Uninformative log-uniform priors are given for (α, β), and after burn-in, the posteriors for (α, β) are

essentially identical to the maximum likelihood estimates. Thus, nodes 6 & 7 are described by

Xi ∼ Bin(pi,Ni), (4.1)

pi = 1 −(1 +

Di

β

)−α. (4.2)

with priors for (α, β) given by

ln(α) ∼ U(−10, 15)

ln(β) ∼ U(−6, 20)

The current MCMC simulation of the posteriors for (α, β) is passed to node 8, again using equations (1)

& (2) (but now in the form of (9) & (10)), to give a value for the probability of infection (and whether an

individual is infected), after sunlight die-off.

Node 1 represents the die-off data, which may be considered as (L j, t j, N0( j), Nt( j)), j = 1, ..., 34. j

references each data point, while L j=1,...,6, represents the line and experimental condition to which the

jth point belongs, and there are 6 of these corresponding to the number of different conditions of the

experiment, t j is the number of hours elapsed from the initial seeding (with count N0( j) on line L j), and

Nt( j) is the count at time t j for line L j. The die-off constants kL( j) = kl, L j = 1, ..., 6, j = 1, ..., 34.

However, as discussed earlier, different values of k are fit for summer/winter and sun/shade, since the

grass/thatch in combination with sun/shade for summer did not need separate fits. The die-off regression

equations (node 2) which explain the die-off data are given by

ln(

Nt( j)

N0( j)

)∼ N(−t jkl, σ

2) (4.3)



with uninformative priors for kl (l = 1, ..., 4) and σ2, given by

kl ∼ N(0, 1000),

σ2 ∼ IG(.01, .01).

The posterior estimates for kl and σ2 in the MCMC simulation are used at node 5, to estimate the dose

after die-off from sunlight, based on the dose from node 3, and the sunlight hours from node 4.

At node 4 the season sunlight hours are sampled directly from the data (S m, hm), where S m is the

season (winter/summer) and hm are the sunlight hours for the day of that season.

Let D0 be the initial number of pathogens drawn from the treated water distribution (node 3) and the

number of hours of sunlight drawn in winter/summer be h (node 4). Then D17, the number of pathogens

17 hours after irrigation is drawn from

log(D17/D0) ∼ N(−k1h − k2(17 − h), σ2) In winter, k1, k2 ≥ 0 (4.4)

∼ N(−k2(17 − h), σ2) where k1 < 0, k2 ≥ 0 (4.5)

∼ N(−k1h, σ2) where k1 ≥ 0, k2 < 0 (4.6)

∼ N(0, σ2) where k1, k2 < 0 (4.7)

log(D17/D0) ∼ N(−k3h − k4(17 − h), σ2) In Summer (4.8)

where k1, ..., k4 and σ2 are posterior draws from node 2. (Note that although there may be a possibility

of bacterial growth, this possibility was not permitted in the risk estimation since where an estimate for

any winter die-off k value was negative, it was replaced by zero.) D0, the initial dose (node 3: treated

water/Effluent distribution), is drawn from a log uniform distribution which allows the full curve for the

probability of infection to be seen:

ln(D0) ∼ U(−1, 30)

D17 then passes to node 8, where the probability of infection is calculated using the current posterior

estimates for α and β (from node 6). Then pin f , the probability of infection, and I (whether an individual

is infected or not, taking a value of 1 for infected, 0 for not infected), are given by

4.4. Methods 149

pin f = 1 −(1 +

D17

β

)−α(4.9)

I ∼ Bin(pin f , 1). (4.10)

As noted earlier, the model described above and in Figure 4.3 was implemented in WinBUGS [Lunn

et al., 2000]. The initial distribution of the dose is drawn from a log uniform distribution to allow the

consequences of parameter uncertainty at any dose to be explicitly included. In the simulation, for the

draw of each dose, each parameter is drawn conditional on the data and all other associated parameters.

For the final results, a burn-in of 30,000 was used to reach the target distributions for dose and die-off,

with a further 10,000 iterations used for the ‘risk’ estimation. Two chains and Gelman Rubin statistics

[Lunn et al., 2000] for each of the quantities of interest were used to verify convergence to the stationary

distribution.

Further extensions to the model. The model described above can be extended in a number of ways.

We present here two further conceptual models, which again can be expressed as DAGs: an errors-in-

variables [Fuller, 1987, Wand, 2009] model for the estimation of the dose-response model (Figure 4.6),

and a DAG for the incorporation of the errors-in-variables model into the QMRA presented here (Fig-

ure 4.7). The errors-in-variables model estimates the parameters of the dose-response equation on the

assumption that the doses are measured with error, and is detailed below. Not surprisingly, the additional

uncertainty postulated in this model increases the uncertainty associated with the estimation of the prob-

ability of infection. This approach is appropriate: dose is measured with error and this should be taken

into account when estimating the dose-response curve, though this example is intended to be illustrative

rather than definitive. The second model, Figure 4.7, expands node 7 of Figure 4.3, and shows how the

errors-in-variables dose-response model would be integrated into the ‘risk assessment’ carried out in this

paper. That such a model can be easily fit and incorporated into a risk assessment, further justifies the

data-based risk assessment approach used here.

Estimating dose-response assuming errors in dose

McCullough and Eisele [1951] prepared batches of S.anatum for which the S.anatum count was

measured. In the model we present below, we recognise the difficulty of ascertaining such dosages.

Thus, we assume that the batch dose is measured with error, and that the individual’s true dose from the

batch is not the true batch dose. The individual then becomes infected or not infected. In the model,



the status of infected/not infected has been assumed to be measured with no error. Figure 4.6 shows a

schematic directed acyclic graph for this model.

Let the unobserved true dose of batch b be Zb, the unobserved true dose for individual i subjected to

batch b be Yi(b), the observed dose for batch b be Xb, and the infected status of individual i be Ii. There

were 19 dosage batches, and 114 individuals each received a dose from a particular batch. Then letting

i(b) reference the individual i receiving a dosage from batch b, and pi be the probability of individual i

becoming infected, we have

log(Zb) ∼ N(0, σ2)

log(Xb) ∼ N(log(Zb), .001)

where b = 1, ..., 19 and σ2 ∼ IG(.1, .1).

log(Yi(b)) ∼ N(log(Zb), .001)

pi = 1 − (1 + Yi(b)/β)−α

Ii ∼ Bernouilli(pi)

where i = 1, ..., 114. (All measured batch doses were divided by 1000 prior to fitting.)

The assumed errors in measurement used here are possibly unrealistic, with the variance of the errors

for the true batch dose, and for the true individual dose being set at 0.001, but they do affect the model

and particularly the width of its credible intervals. (When we consider the rounding of McCullough and

Eisele’s S.anatum numbers, it is, however, more likely that we have underestimated the variance.)

4.5 Results

Figure 4.8 shows the bivariate posterior distribution of the dose response parameters (α, β); as indicated

earlier, this is unlikely to be bivariate normal and indeed this is apparent from the figure. The parameters

are highly correlated and the surface of the loglikelihood at the point of convergence is fairly flat (not

shown), which means that the values of the parameters estimated using conventional maximum likeli-

hood methods are somewhat dependent on the stopping rule for convergence. In terms of the methods

advocated in this paper, it would seem that the dose-response curve parameters are not distributed as a bi-

variate normal, and that to simulate such a distribution via some summary parameters would be relatively

difficult.

Figure 4.5 shows the dose-response curve distribution given by these parameters’ posterior distribu-

tion. This curve is created from the outputs of nodes 1 and 2 and shows the estimates of the probability

4.5. Results 151

of infection based solely on the Teunis et al. [1996] data. The considerable variation of the probability

at low doses should be noted (not shown in the graph, but noted from the MCMC data). P(in f ection)

between e0 and e10 (2 × 105), ranges from almost zero to occasionally 0.5 for the same dose. Even for

a dose of 20 bacteria (e3) some realisations show a probability of infection of 0.2. For extremely high

dose values, the majority of probabilities of infection are close to one, but occasionally the probability is

considerably less.

Figure 4.9 shows the distributions for the die-off parameters for summer/winter, sun/shade. These

are fairly symmetric, reflecting in part the model assumptions of normality, given the few data. Note that,

for shade in winter, a substantial proportion (14%) of the posterior die-off values are negative, which

could lead to the inference of no die-off under these conditions.

Figure 4.4 shows the daily sunlight data for winter or summer, and for both months, it can be seen that

the majority of days were neither cloudy, overcast or rainy, but attained the maximum possible number

of sunlight hours.

In Figure 4.10, the probabilities of infection for summer and winter (estimated with parameter un-

certainty - ‘Varying’) are contrasted with the corresponding estimates where all needed values have been

plugged in as constants (‘Constant’). This figure shows the addition of considerable variation when the

underlying data and their model are incorporated in the risk assessment model. When most of the prob-

abilities lie below 0.5, the additional uncertainty increases the range of probabilities thereby giving an

increased likelihood of infection. When most of the probabilities lie above 0.5, the added uncertainty

again gives a wider range and therefore includes lower probabilities of infection in comparison with the

model using a constant.

Box plots for the probability of infection for 15 initial dose groups (Figure 4.11) indicate that if

the infection probabilities are not close to zero or one, the uncertainty is very greatly increased. Thus,

including parameter uncertainty could make a very great difference to conclusions about risk. Table 4.3

gives summary statistics for the initial doses by grouping. Table 4.2 gives summary statistics for the

probability of infection for each of these groupings shown in the graphs (Figure 4.11). Thus, in Table 4.2,

in dose grouping 8, the mean probability of infection for winter when constants are used is 0.82 with a

90% CI (0.73, 0.89) compared with 0.78 (0.43, 0.96) for the varying parameters, again more than double

the spread. Table 4.3 shows that the initial dose range for dose grouping 8 is 6.4× 106 to 5.52× 107 with

a median dose of 1.89×107 cells. Looking at the summer scenarios for dose grouping 12 (Table 4.2), the



mean probability of infection is 0.25 with 90% CI (0.12, 0.40) using constants, compared with a mean

probability of infection of 0.30 (0.04, 0.72), when parameters are drawn with uncertainty from their

distributions. This equates to a difference in interval width of 0.68 versus 0.28. That is, using varying

parameters the credible interval covers two thirds of the probability scale, whereas using constants the

interval is one third of the scale, constituting a very substantial difference.

The effect of the uncertainty induced by the uncertainty of the die-off parameters was so great that it

seemed useful to estimate the probability of infection for the initial dose (divided by 1,000) allowing no

die-off, to show the effect of the uncertainty induced by the dose-response parameters alone. Figure 4.12

shows the effect of including the uncertainty of the parameter estimates for dose-response when no die-off

is considered. The results for each grouping are shown in Tables 2 & 3. Thus, for dose grouping 8, using

constants only, the 90% interval is (0.16, 0.48), but (0.12, 0.56) with varying parameters, a difference in

width of 0.32 to 0.44. This is a considerable difference when the response lies between 0 and 1. From

Table 3, this group is seen to encompass doses from 6.4 × 103 to 5.5 × 104 cells.

Figure 4.13 shows the additional uncertainty in the probabilities of infection from the dose-response

model which results from the errors-in-variables model. The 95% credible intervals for the probability

of infection when error is assumed in the dose are considerably wider, but also, the mean probability

of infection has moved to the left, indicating a higher probability of infection for a dose, than when

the probability is estimated without recognising that the dose is measured with error. The additional

uncertainty operates to make the probability of infection markedly higher at lower doses. (Note that none

of the results discussed in the ‘risk assessment’ include this modification to nodes 6 & 7 of Figure 4.3.)

4.6 Discussion

The extended QMRA model was expressed as a Bayesian model and analysed using a simulation-based

approach, namely Markov Chain Monte Carlo (MCMC) to estimate distributions of the probability of

infection, thereby taking into account the uncertainty associated with parameter estimates needed in the

risk assessment, automatically and more satisfactorily. In general, when parameter uncertainty is taken

into account, it is typical to assume that the parameter estimate is normally distributed, which it may well

not be. The manner in which uncertainty is incorporated in the extended model allows the experimental

data to dictate the distribution of the parameter uncertainty, and allows the possibility of asymmetry

and long tails. The Bayesian framework permits embedding of several unrelated models in a single

4.6. Discussion 153

risk assessment, via directed acyclic graphs (DAGs), and may be compared with more conventional risk

assessments, where parameter estimates and their associated distributional assumptions are used. As

examples of such risk assessments, see, e.g., Gerba et al. [2008], Pouillot et al. [2004], Tanaka et al.

[1998], Whiting and Buchanan [1997].

This approach may be compared with that proposed by other researchers [Haas et al., 1999, Teunis

et al., 1997] who prepare a bootstrap sample of parameter estimates for the dose-response curve, thereby

allowing for non-normality, prior to running the risk assessment. However, despite this being the method

recommended in Haas et al. [1999], most risk assessments take their dose-response parameters as con-

stants. It would seem that bootstrapping, choosing a size for the bootstrap sample, and incorporating the

resultant bootstrap sample into the simulation framework is generally discouraging for most practition-

ers. When the interest in a risk assessment involves tails of distributions or upper percentiles, it is critical

not to ignore the tail behaviours of the fitted distributions. The method we propose permits all such

asymmetries and uncertainties to be easily incorporated. In this particular case, where the dose-response

parameters are correlated, the problem of simulation is particularly difficult, since the two parameters are

not bivariate normal. Hence, when the dose-response curve is a beta-Poisson, some appropriate method

must be used to capture the bivariate behaviour. Haas [1999] proposes a further method based on rank

coefficients. This method is again complex to implement, whereas here we argue that using MCMC via

WinBUGS is not.

There are so little data used in the estimation of the die-off coefficients that there is little evidence of

asymmetry. Nonetheless, using data rather than previously estimated constants, ensures that uncertainty

is propagated properly throughout the simulation. Had the ‘shoulder’ equation of Sinton et al. [2007]

been considered appropriate, the same problem of correlated parameter estimates for the curve fit would

again be as strongly evident as they are for the dose-response equation.

MCMC simulation and estimation has been available for some time, but is rarely used in the context

of risk assessment as described here. Kelly and Smith [2009] present a simple primer of MCMC methods

for this purpose, and, in particular, discuss its use in fitting hierarchical models, and in dealing appro-

priately with missing and uncertain data. Messner et al. [2001] use an MCMC approach to perform a

meta-analysis using hierarchical MCMC modelling to develop a dose-response curve for C.parvum. The

same approach is taken by Qian et al. [2003] who use MCMC to fit a hierarchical model to perform a

meta-analysis for various studies of protozoan inactivation by UV light. Delignette-Muller et al. [2006]



used a complex hierarchical model to describe the growth of Listeria in cold-smoked salmon, and then

used this to develop a further model for the time necessary to reach particular pathogen numbers, with

the second model importing the uncertainty implicit in the original data.

Paulo et al. [2005] undertook a risk assessment, which closely parallels our approach: the parameter

estimation for various submodels and the assessment of dietary exposure to pesticides (as a final node)

was accomplished in the one MCMC model. Like the models of Paulo et al. [2005], the model presented

in this paper differs substantially from the majority of risk assessments in two main ways. Firstly, it em-

beds the primary data (for dose-response, Salmonella die-off, and doses) within the simulation itself, thus

incorporating directly the uncertainties of the data, together with the (unknown) correlation structures.

That is, no summary data are used and no process is represented by a constant. Secondly, by putting

these together in a directed acyclic graph (DAG) and using MCMC, the model allows us to simultane-

ously estimate all the parameters currently used in a QMRA, together with the risks in which we are

interested.

Here, every parameter, however disparate, is estimated simultaneously with the risk simulation. This

permits all parameter uncertainty to be propagated throughout the risk assessment by incorporating all

relevant data seamlessly into the one directed acyclic graph. Thus, ideally, the data nodes might consist

of microbial cell numbers post treatment, dose-response data, and microbial numbers prior to treatment

(thereby allowing estimation of the log-reduction constants, and a potential comparison of the two meth-

ods of estimation), die-off data, and users’ consumption behaviour data. This method means that there is

no necessity for prior bootstrap simulations, as in Cullen and Frey [1999]’s “two-dimensional” approach

to fitting ‘uncertainty’ and ‘variability’: the models and the methods are explicit and transparent.

In summary, we have demonstrated a method for incorporating parameter uncertainty, which does

not require complex simulation methods. Where a risk assessor is trying to do more than arrive a point es-

timate, and is running Monte-Carlo simulations such as offered by @Risk [Palisade Corporation, 2008],

this method allows risk uncertainty to be satisfactorily described, without resorting to two-step estima-

tion procedures. It is also far more transparent than a spreadsheet approach where operations and their

sequencing can be difficult to discern. This method incorporates all the original data used to derive the

required parameters for a QMRA, into the QMRA, whereas in the more traditional approach these pa-

rameters are derived prior to undertaking the risk assessment and are ‘plugged’ in to the assessment. We

would recommend it as a simple, transparent method which should be incorporated into a risk assessor’s

4.7. Conclusions 155

armoury.

4.7 Conclusions

The aim of the study was twofold: (i) to indicate the potential problems arising from failure to include the

uncertainty of parameter estimates in risk assessments; and (ii) to illustrate the superiority of estimating

the parameters to be used in the risk assessment simultaneously with the risk assessment. When one

considers the “banana-shaped” bivariate graph for the dose-response parameters (α, β) and its long left

tail presented in Figure 4.8, there is little doubt that the simultaneous estimate of all parameters of interest

is a better methodology to use. The techniques and programs used to derive such estimates are now

readily available.

Our analysis indicated that, where dose ranges are either extremely large or small, estimating risk by

including the uncertainty in the underlying parameters makes little difference in the possible ranges for

the probability of infection. However, when the dose is within the range where the risk is neither very

close to one nor zero, the inclusion of uncertainty in the parameters may make marked difference in the

possible ranges for the probability of infection.

However, the results of this study highlight the superiority of models developed directly from data

for finding more realistic estimates of uncertainty. In practical terms, we would advocate that workers

in this field report comprehensive data. Commonly, reported results only include a range and a mean,

occasionally a standard deviation, and often not even the number of observations used. These are gen-

erally insufficient to permit adequate estimation of risk. In addition, there is a failure to acknowledge,

let alone include, the uncertainty which results from small experiments. For the methodology advanced

in this paper, we would recommend, firstly, that all data from experiments leading to parameters needed

in a risk assessment, be in the public domain, particularly when their interpretation may have important

implications for public health. A major limitation imposed on this study was the inability to access data

collected by, or on behalf of, any Australian water utility, much of which is mandated by law or regula-

tion. Thus, our final recommendation is that such data be made publicly available. Journals may make a

difference in the short term, by insisting on this for data forming the basis of a published paper.



4.8 Figures

4.8. Figures 157

Figure 4.1 Model for a QMRA for surface vegetable irrigated with treated wastewater. Observeddata nodes shown in white, parameter nodes in green, and outcome nodes in a lightgrey.



Figure 4.2 Model for the part of the standard QMRA implemented here. Observed data nodesshown in white, parameter nodes in green, and outcome nodes in a light grey.

4.8. Figures 159

Figure 4.3 Schematic Model for the directed acyclic graph implemented in WinBUGS for es-timation of parameters and risk. Observed data nodes (1,3,4,7) are shown in white.Unknown parameter nodes to be estimated (2,6) in green, and outcome nodes (5,8)in a light grey.



Figure 4.4 Sunlight hours for January/June 2008 at Perth airport.

4.8. Figures 161

Figure 4.5 Dose-Response curve with uncertainty for S.anatum: P = 1 − (1 + Dose/β)−α. Thebounding curves are the 95% credible intervals from the MCMC simulation.

.. / I

/ /

6 -



Figure 4.6 Graphical model for Dose-Response estimated with error in measurement and errorin individual dosage: Measured dose is the observed batch dose, Batch dose is theunobserved true batch dose, individual dose is the true unobserved individual dose.The observation of an individual’s infection status is assumed to be without error.

4.8. Figures 163

Figure 4.7 Graphical model for a risk assessment which includes the parameters for dose-response based on the errors-in-variables concept.



Figure 4.8 Dose-response curve parameters (α, β): Posterior distribution for logα vs logβ/1000using log uniform priors

0

I _, _,

. ·.·

-10 -9 -8 -7 -6 -6 -4 -8 -2 -1 0 ' ' 6

4.8. Figures 165

Figure 4.9 Die-off distributions for S.typhimurium: fixed effects pooled variance model Nt =

N0e−kt, k > 0. Note that for die-off k > 0.

... " I' n

""" '' 11 ' ' '

' I I ' ' I I ' ' ' ' I I

4000 ' ' '

' I I ' ' ' ' I I

I ' ' ' I I ,_ ...

' ' ' I I -... ..... ,.., ' ' ' ' I I -..,...., ' ' ' I I """"""""' ' ' ' I I ' '

""' ' ' I I ' ' ' ' I I ' ' I I ' ' '

' ' I I 1000 ' ' ' I I ' ' I I \j \ X __

'-_., .. .. "' l.5 2.0

DiB-~--k



Figure 4.10 Summer & Winter: Probability of infection - constant (the line) vs varying (thedots)

u~--------------------------------~~~-,

u ·,:· .

" - .. ,_

" "

I .. • .. I ..

" " " " -I

4.8. Figures 167

Figure 4.11 Summer & Winter: Probability of infection - Constant vs Varying by ranked initialpathogen numbers groups.

10

0.9 - Cons tcnt c==:J Varying

D.8

0.7

I D.6

D.5

J M

D.3

02

D.l l. 0.0 L

0 2 8 4 5 6 8 10 11 12 Ill 14 lfi

hdliol Patlicsen Nlllllbn: niDk group

LO

t ~· r r T-

0.9 - Cons tant c:==:J Vo ryin<,J

' D.8

0.7

! D.6

D.5

~ ;i M

D.8

02

D.l

L 0.0

0 2 8 4 5 6 8 10 11 12 Ill 14 lfi

- l'lltbogen Nlllllbn: niDk group



Figure 4.12 Probability of infection (no die-off) against ranked initial pathogen numbers groups:using constant vs varying parameters for Beta binomial distribution.

4.8. Figures 169

Figure 4.13 Comparison of the Dose-response curves for S.anatum with 95% credible intervals,estimated with & without “errors-in-variables”.

LO

0.9

o.s

0.7

o.s

J D.5

1),4

o.s

0,2

D.l

0.0

llDl 100000 lOOOOOOOO



4.9 Tables

Table 4.1 Settings for constant parameters

Constant Description Value Derivation

α p(in f ection) = 1 − (1 + dose/β)−α .451 From Teunis et al. [1996]β 15177 (as above)

k1 Winter die-off constant(sunlight) -.3010 An earlier runk2 Winter die-off constant(shade) -.1237 (as above)k3 Summer die-off constant(sunlight) -1.0390 (as above)k4 Summer die-off constant(shade) -.6457 (as above)

S W Sunlight hours (Winter) 6.584 Mean June 2008 (Perth)S S Sunlight hours (Summer) 11.625 Mean January 2008 (Perth)

4.9. Tables 171

Table 4.2 Summary statistics for p(infected) over groupings

Group Period Type Mean std median q5 q95

7 No die-off Constant 0.07 0.001 0.06 0.03 0.14Varying 0.10 0.003 0.07 0.02 0.27

Summer Constant 0.00 0.000 0.00 0.00 0.00Varying 0.00 0.000 0.00 0.00 0.00

Winter Constant 0.57 0.004 0.58 0.41 0.71Varying 0.54 0.009 0.56 0.11 0.85
















q5: 5th percentile of posterior distributionq95: 95th percentile of posterior distribution



Table 4.3 Summary statistics for groupings: Group Initial Pathogen Numbers / Doses

Group Mean Pathogens std median Min Max

0 1.21 × 100 2.70 × 10−2 1.06 × 100 3.70 × 10−1 2.81 × 100

1 8.53 × 100 1.80 × 10−1 7.37 × 100 2.82 × 101 1.99 × 101

2 7.41 × 101 1.71 × 100 6.17 × 101 1.99 × 101 1.74 × 102

3 5.68 × 102 1.23 × 101 4.81 × 102 1.75 × 102 1.28 × 103

4 4.29 × 103 9.65 × 101 3.59 × 103 1.30 × 103 1.03 × 104

5 3.89 × 104 9.36 × 102 3.20 × 104 1.03 × 104 9.64 × 104

6 3.38 × 105 7.61 × 103 2.87 × 105 9.70 × 104 7.97 × 105

7 2.75 × 106 6.19 × 104 2.31 × 106 8.00 × 105 6.40 × 106

8 2.29 × 107 5.32 × 105 1.89 × 107 6.40 × 106 5.52 × 107

9 1.83 × 108 3.96 × 106 1.59 × 108 5.53 × 107 4.14 × 108

10 1.52 × 109 3.63 × 107 1.27 × 109 4.14 × 108 3.67 × 109

11 1.16 × 1010 2.50 × 108 9.88 × 109 3.68 × 109 2.69 × 1010

12 8.35 × 1010 1.72 × 109 7.28 × 1010 2.70 × 1010 1.89 × 1011

13 6.09 × 1011 1.35 × 1010 5.01 × 1011 1.90 × 1011 1.38 × 1012

14 4.57 × 1012 1.0 × 1011 3.93 × 1012 1.38 × 1012 1.07 × 1013

For the No die-off results, the doses are 1/1000th of these

BIBLIOGRAPHY 173

Bibliography

Brandl, M. T. and R. Amundson (2008). Leaf age as a risk factor in contamination of lettuce with

Escherichia coli O157 : H7 and Salmonella enterica. Applied and Environmental Microbiology 74(8),

2298–2306.

Bureau of Meteorology (2010, April 15). 2010JR12235 *** Student/Request for Data, Forecasts

or other services/wa/Climate and Past Weather*** (JR- [SEC=UNCLASSIFIED]. email: cli-

[email protected].

Cullen, A. C. and H. C. Frey (1999). Probabilistic techniques in exposure assessment : a handbook for

dealing with variability and uncertainty in models and inputs. New York: Plenum Press.

Delignette-Muller, M. L., M. Cornu, R. Pouillot, and J. B. Denis (2006). Use of Bayesian modelling

in risk assessment: Application to growth of Listeria monocytogenes and food flora in cold-smoked

salmon. International Journal of Food Microbiology 106(2), 195–208.


Gerba, C. P., N. C.-d. Campo, J. P. Brooks, and I. L. Pepper (2008). Exposure and risk assessment of

Salmonella in recycled residuals. Water Science & Technology 57(7), 1061–1065.

Gibbs, R. A. (1995). Die-off of human pathogens in stored wastewater sludge and sludge applied to

land. Technical report, Urban Water Research Association of Australia, Water Services Association

of Australia, Melbourne.

Gibbs, R. A. and G. E. Ho (1993). Health risks from pathogens in untreated wastewater sludge: implica-

tions for Australian sludge management guidelines. Water 20(1), 17–22.

Gibbs, R. A., C. J. Hu, G. E. Ho, P. A. Phillips, and I. Unkovich (1995). Pathogen die-off in stored

wastewater sludge. Water Science & Technology 31(5-6), 91–95.


1205–1214.


Wiley.



Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and G. Medema

(2007). Removal and fate of Cryptosporidium parvum, Clostridium perfringens and small-sized centric

diatoms (Stephanodiscus hantzschii) in slow sand filters. Water Research 41, 2151–2162.

Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms by drinking

water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.

Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-organisms

by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.

Kelly, D. L. and C. L. Smith (2009). Bayesian inference in probabilistic risk assessment–The current

state of the art. Reliability Engineering & System Safety 94(2), 628–643. 0951-8320 doi: DOI:

10.1016/j.ress.2008.07.002.



Marks, H. M., M. E. Coleman, C. T. J. Lin, and T. Roberts (1998). Topics in microbial risk assessment:

Dynamic flow tree process. Risk Analysis 18(3), 309–328.

McCullough, N. B. and C. W. Eisele (1951). Experimental human salmonellosis: I. pathogenicity of



Messner, M. J., C. L. Chappell, and P. C. Okhuysen (2001). Risk assessment for Cryptosporidium: A

hierarchical Bayesian analysis of human dose response data. Water Research 35(16), 3934–3940.






Palisade Corporation (2008). At Risk5.0. Available online: www.palisade.com/risk/. Accessed:

October 22, 2009.

BIBLIOGRAPHY 175

Paulo, M. J., H. v. d. Voet, M. J. W. Jansen, C. J. F. t. Braak, and J. D. v. Klaveren (2005). Risk assessment

of dietary exposure to pesticides using a Bayesian method. Pest Management Science 61(8), 759–766.

Pouillot, R., P. Beaudeau, J.-B. Denis, and F. Derouin (2004). A quantitative risk assessment of water-

borne Cryptosporidiosis in France using second-order Monte Carlo simulation. Risk Analysis 24(1),

1–17.

Qian, S. S., C. A. Stow, and M. E. Borsuk (2003). On Monte Carlo methods for Bayesian inference.

Ecological Modelling 159, 269.

Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass surfaces

irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.

Sinton, L., C. Hall, and R. Braithwaite (2007). Sunlight inactivation of Campylobacter jejuni and

Salmonella enterica, compared with Escherichia coli, in seawater and river water. Journal of Wa-

ter and Health 5(3), 357–365.

Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety of wastew-

ater reclamation and reuse using enteric virus monitoring data. Water Environment Research 70(1),

39–51.



Research 31, 1333–1346.

Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The dose-

response relation in human volunteers for gastro-intestinal pathogens. Technical report, National In-

stitute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.

Wand, M. P. (2009). Semiparametric and graphical models. Australian and New Zealand Journal of

Statistics 51(1), 9–41.

Whiting, R. C. and R. L. Buchanan (1997). Development of a quantitative risk assessment model for

Salmonella enteritidis in pasteurized liquid eggs. International Journal of Food Microbiology 36,

111–125.



1. they meet the criteria for authorship, in that they have participated in the conception , execution, or interpretation , of at least that part of the publication in their field of expertise;






Title: A Bayesian analysis of an agricultural field trial with three spatial dimensions

Journal: Computational Statistics and Data Analysis Status: Submitted September 2010

Contributor

Margaret Donald

Dr Clair Alston

Rick Young

Statement of Contribution

Margaret Donald as first author was responsible for the concept of the paper, data analysis, interpretation and the writing of all drafts.

Was responsible for advice on measurements and their meaning and editorial comment.

Was responsible for advice on the purpose and background to the field trial, advice on the meaning of statistical results and editorial comment

-----+-Professor Kerrie Mengersen

Was responsible for general advice and editorial comment


Date

to/oq(r

I have sighted emall or other correspondence from all co-authors confirming their certifying authorship.

1 1 J IL...[ ~:l. t: 11t:Nl, '-:<J( v /.rvt-- I) I o1 /t 0

Name Signature Date

Chapter 5

A Bayesian analysis of an agricultural

field trial with three spatial dimensions

5.1 Preamble

This chapter satisfies research objective (3), where we aimed to build a satisfactory complex model for

one day’s data of the agricultural trial data. In this chapter, we consider various models for a single

day’s data from the agricultural trial. We compare many potential adjacency structures for a CAR model

[Besag, 1974, Besag et al., 1991] to model spatial autocorrelation in the data, as well as the AR1, AR1

model of Gilmour et al. [1997]. For the fixed part of the model, we consider orthogonal polynomials,

linear splines, cubic splines and cubic radial bases to model the treatment curves along the depth dimen-

sion. Knowing that the measured depth does not measure the depth in the soil profile [Ringrose-Voase

et al., 2003], we also introduce an errors-in-variables model for modelling the treatment curves along the

depth dimension.

This chapter has been written as a journal article, of which I am the first author and is presented here

in its entirety. It is reprinted here with its abstract, and with different bibliographic conventions from

Computational Statistics and Data Analysis, to which it has been submitted in September, 2010. Rick

Young provided the data, helped with all things agricultural, in addition to providing editorial comment.

Clair Alston provided major editorial advice in addition to advice on the collection and meaning of

the data. Kerrie Mengersen oversaw, helped with, and guided the exposition. As first author, I was

177

178CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE

SPATIAL DIMENSIONS

responsible for concept of the paper, the data analysis, interpretation and the writing of all drafts as well

as the final version.

Title: A Bayesian analysis of an agricultural field trial with three spatial dimensions

Authors: Margaret Donalda, Clair Alstona, Rick Youngb, Kerrie Mengersena.


QLD 4001, Australia.

bTamworth Agricultural Institute, Industry & Investment NSW, 4 Marsden Park Road, Calala, NSW

2340, Australia.

Bibliography

Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J.

R. Statist. Soc. B 36(2), 192–236.

Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial

statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.

Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extraneous variation

in the analysis of field experiments. Journal of Agricultural Biological and Environmental Statistics 2,

269–293.

Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,

M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,

R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses

in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report

Series, NSW Agriculture Orange.

5.2 A Bayesian analysis of an agricultural field trial with three spa-

tial dimensions

Abstract

Modern technology now has the ability to generate large datasets over space and time. These data

typically exhibit high autocorrelations over all dimensions. Generally, in the statistical modelling of such

data, modelling across time is made independent of the spatial dimensions. In a like manner, in three

dimensional space, when measurements are made over widely differing distances across the different

dimensions, it seems that better models may be fitted when the various spatial dimensions are separated.

Here, using an example of agricultural data collected over three dimensions, we see that the better model

is that in which depth is separated from the modelling within the horizontal layers. The field trial data

motivating the methods were collected to examine the behaviour of traditional cropping and to determine

a cropping system which could maximise water use for grain production while minimising leakage below

the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18

columns, in the lattice framework of an agricultural field.

Bayesian Conditional Autoregressive models are used to account for local site correlations. Con-

ditional autoregressive models have not been widely used in analyses of agricultural data. This paper

serves to illustrate the usefulness of these models in this field, along with the ease of implementation in

WinBUGS, a freely available software package.

The innovation is the fitting of separate conditional autoregressive models for each depth layer, while

simultaneously estimating depth profile functions for each site treatment. Modelling interest lay in how

best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the

spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and

treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbour-

hood models by depth layer. It is hierarchical, with separate conditional autoregressive spatial variance

components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth

errors as interval-censored measurement error. The Bayesian framework permits transparent specification

and easy comparison of the various complex models compared.

Keywords

Bayesian, Conditional Autoregressive (CAR) models, Cubic radial bases, Errors-in-variables, Field


SPATIAL DIMENSIONS

trial, Latent variables, Markov Chain Monte-Carlo (MCMC), Markov random field (MRF), Orthogonal

polynomials, Spatial autocorrelation, Splines, Variance components.

5.3 Introduction

In the past 20 years there has been a large uptake of Bayesian methods in many scientific fields, but

this trend is less prevalent in agriculture. The paper of Besag and Higdon [1999], which demonstrated

Bayesian methods for an agricultural field trial, has been cited approximately 360 times, but of these,

just 2 were published in an agricultural journal, and none analysed an agricultural field trial. Similarly,

in a search of Web of Science on July 31, 2009, 545 papers were found which cited Besag et al. [1991],

a seminal paper for conditional autoregressive (CAR) modelling. Of these, the majority (341) related to

health, disease or death in humans or animals, and again, none dealt with an agricultural field trial.

Almost 20 years on from Besag et al. [1991], we re-examine the advantages of the conditional au-

toregressive (CAR) or Markov random field (MRF) models, described by Besag [1974] and elaborated

by him and co-workers in Besag et al. [1991] and applied to field trial data in Besag et al. [1995], Besag

and Higdon [1999]. Readily available software for CAR models allows simple specification of complex

random components, and simple calculation of complex quantities based on the model, permitting the

analyst to consider many differing models.

This paper is motivated by an increasing problem in agriculture, that of understanding the impact

of cropping regimes on water and, concomitantly, soil salinity. In many parts of the world the viability

of rainfed grain cropping is threatened by salination of land and water resources. Salination is caused

by excessive deep drainage below the plant root zone which mobilises sometimes vast sub-soil stores

of salt deposited at the time of soil formation. Deep drainage occurs when rain infiltrates already wet

soil that has insufficient capacity to store the additional water. This excess saline water may produce

water logging and shallow saline water tables, or may discharge at lower points in the landscape, or into

surface- or ground-waters [Broughton, 1994]. When saline ground waters encroach on the crop root zone,

the salt kills germinating crops or reduces yields depending on salt concentrations and rainfall [Daniells

et al., 2001]. The excess water is usually due to a combination of above average rainfall falling onto

land farmed using long fallow cropping practices, that is, the land is kept as bare fallow for about 2/3

of the time. Although long-fallow cropping usually results in good grain yields for each crop, average

yields over time are generally less than yields from more intensive, but somewhat more risky systems.

5.4. Methods 181

To overcome both the problems of excess water in the landscape under long fallow cropping, and the risk

of poor crop yields due to insufficient water supply between successive crops, when cropping is frequent,

a practice of planting a crop, appropriate for the time of year, crop health and economic considerations,

in response to soil water content (opportunity or response cropping) is being increasingly adopted by

farmers. When data are collected to consider the impact of cropping regimes, a substantive challenge of

this endeavour is the description of water patterns over space and depth.

This paper examines a single data set from a randomised complete block experiment, which com-

prises soil moisture measurements taken at 3 dimensions in space: row, column and depth. The presence

of spatial correlation is demonstrated, and various ways are considered for modelling it. We consider

several conditional autoregressive (CAR) models [Besag, 1974], with a complex variance structure. We

also consider an AR(1), AR(1) model such as those used by Gilmour et al. [1997] as base models, and

fitted here using Markov Chain Monte Carlo Gibbs sampling, and kriging models [Cressie, 1991].

Various models for the treatment effects along the depth dimension are considered, and include or-

thogonal polynomials, linear and cubic splines, and in conjunction with the splines, an errors-in-variables

model for depth to account for shrinkage of soils on drying, and expansion on wetting.

The interest of the final models chosen is that the data are best modelled using CAR models in two

dimensions, and not three.

5.4 Methods

5.4.1 Data

To test the efficiency of water use and the productivity of response cropping compared with long fal-

low systems and traditional continuous winter cropping, a field experiment was established on a deep,

well structured and well drained, non-saline, cracking clay soil (Black Vertosol) in the upper reaches of

the Liverpool Plains catchment in New South Wales in south-eastern Australia. Accurate measurement

of soil water content was critical to the success of this work. This was measured from access tubes at

measurement sites in each of 18 experimental plots over 15 equal depth increments to 3 m. The neutron

scattering count [Ringrose-Voase et al., 2003] was taken at each depth in the access tube and controlled

by the neutron count taken from an access tube fitted into a drum of water after each set of field measure-

ments. The surrogate measurement for moisture used in the analysis here is the log transformation of the


SPATIAL DIMENSIONS

ratio of raw neutron count data to the control count. The measurement sites were arranged as 3 rows with

18 columns per row. Thus, the dimensions row, column, and depth are essentially orthogonal.

Nine experimental treatments consisting of three fully phased cropping systems and three types of

perennial pasture were allocated as a randomised complete block design to the 18 plots each containing

3 of the 54 measurement sites [Ringrose-Voase et al., 2003, p23]. Treatments are described as follows.

1. Treatments 1-3. Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are

grown in three years with an intervening 10-14 month fallow period. The 3 treatments were each

of 3 phases of the long fallow 3 strip system. ‘Long Fallow 1’ started with wheat in the winter of

1994 followed by sorghum in summer, 1996. ‘Long Fallow 2’ started with sorghum in the summer

of 1995, and ‘Long Fallow 3’ started with wheat in the winter of 1995.

2. Treatment 4. Continuous cropping in winter with wheat and barley grown alternately.

3. Treatments 5 and 6. Response cropping, where an appropriate crop (either a winter crop or a

summer crop) was planted when the depth of moist soil exceeded a predetermined level. The two

response cropping treatments were differentiated by the sequence of crop types.

4. Treatments 7-9. Perennial pastures. The 3 treatments were lucerne (a deep rooted perennial forage

legume with high water use potential), lucerne grown with a winter growing perennial grass,and a

mixture of winter and summer growing perennial grasses.

The data used here were from the third year of the experiment, when the treatments had bedded down

and measurements reflected treatment effects.

Rainfall in the 6 months preceding these moisture measurements was very high (802 mm, compared

to the annual average of 684 mm). As a result, winter crops in treatments 3-7 had not substantially

depleted the stored soil water compared with previous years. The sorghum crop in treatment 1 had

recently been planted into a fully recharged soil profile. Plots for treatment 2 were lying fallow. In

contrast, pasture plots were much drier as they were depleted by 300-600mm coming into the winter.

The volumetric water content calculated using the neutron moisture meter calibration equations from

Ringrose-Voase et al. [2003] of the soils ranged from 26% under lucerne to 55% under long fallow in the

surface 50 cm, illustrating the large water holding capacity of these soils.

5.4. Methods 183

5.4.2 Spatial Correlation

A well-known problem of agricultural field trials is the variability in the fertility, physical and hydraulic

characteristics of the landscape on which the experiment is sited. In a field trial, spatial correlation in the

observations is expected since we anticipate that soil and drainage conditions of neighbouring plots are

likely to be more similar than those of plots further away. To reduce bias in measurements, treatments

within blocks are allocated at random. However, random allocation does not necessarily eliminate the

problem, particularly in small experiments. In addition, although the treatments form a randomised

complete block treatment, there are three measurements within each treatment plot. These measurements

within the plot are expected to be highly spatially correlated.

A further complication is that the treatment effect, moisture, is modelled as a fixed function of depth

(Section 5.4.3). With a poor fixed effects model, we could expect residuals for each site depth profile

to become autocorrelated. We used a potentially overfitted model (the 8 degree orthogonal polynomial)

as a base model for the comparisons of neighbourhood models (Table 5.1). Testing the base model

for autocorrelations, using SAS PROC AUTOREG [SAS Institute, 2004] showed autocorrelation in the

depth residuals for 2 of the 54 sites at the 1% significance level, whether 6, 8 or 10 degree polynomials

were used. Hence, any of these polynomial models seemed a reasonable choice for a base model.

In preliminary analyses we fitted kriging models [Banerjee et al., 2004, Cressie, 1991, Gotway and

Cressie, 1990] to the residuals by depth layer, after fitting a saturated model of all treatments by depths

(135 fixed terms). We considered covariograms by depth layer, and additionally allowed for anisotropy

across row and column. However, neither anisotropic nor isotropic kriging models appeared satisfactory:

for all depths, no covariogram showed evidence of being a function of increasing distance. Banerjee et al.

[2004], Stefanova et al. [2009, p73] indicate that such evidence is not always reliable.

Spatial correlation via a local neighbourhood definition was also considered. Moran’s I [Banerjee

et al., 2004, pp72–73] was used to explore the spatial association of the residuals described above, using

a neighbourhood matrix of equal weights for first-order neighbours in the same depth layer. Table 5.2

shows this as being statistically significant at the 10% level for ten of the fifteen depths. Neighbourhoods

were largely defined by depth layer because of the very great difference in scale of depth compared with

the distances across row and columns.

In the light of the results of the Moran’s I analysis, conditional autoregressive (CAR) models were

adopted. They were more flexible than kriging models, since they employ sparse precision matrices [Rue


SPATIAL DIMENSIONS

and Held, 2005], a far more efficient computational tool, rather than the dense covariance matrices used

in kriging. Recent research [Besag and Mondal, 2005, Lindgren et al., 2010] indicates that the distinction

between kriging and CAR models is more apparent than real.

Neighbourhoods and weights

CAR models are based on neighbourhoods. In a three dimensional space, there are many potential

choices for the neighbourhood of a point. A point in space could, for example, be thought of as being

surrounded by 26 neighbours of a 3 × 3 box, or by its first order neighbours in both spatial layer and

depth (6 neighbours). The major innovation of this paper is that, recognising the great differences in scale

between measurements in a horizontal layer and those at different depths, the neighbourhoods compared

were largely within the same depth layer.

As discussed in Section 5.4.2, CAR models depend not only on the definition of neighbourhood,

but also on the weights given to neighbours, which may often be distance based. Distance weights were

not used for the following reasons. For these data, neighbouring measurements across depth are 20

cm apart, while neighbouring measurements along rows are roughly 10 m apart when the measurement

sites are in the same subplot and roughly 30 metres apart when not. Along the columns, the distances

between neighbours are approximately 20 metres. Suppose that the reciprocal of the distance between

neighbouring measurement points is used as the weight. This would mean that points within a block

would weigh about 3 times as much as adjacent points not in the block, thus making averaging across

neighbours closer to averaging across a block. If depth neighbours were included, a depth neighbour

would weigh 50 times more than a neighbour within the block and 150 times more than a neighbour

from a neighbouring plot. This would effectively reduce the spatial analysis to a single dimension: depth,

and make it a depth neighbourhood analysis only. Distance weights could be discounted by raising the

distance to a fractional power, but any discounting power is arbitrary, and would increase the number of

models to be distinguished between, without adding insight.

The choice of neighbourhood weights as (0,1), i.e., a neighbour or not a neighbour, allows a simple

choice of the best adjacency model, independent of weights. If distance weights were to be used, it would

be unclear whether it is the weights or the adjacency definition changing the fit of the model. For example,

a second-order neighbour model rejected by a (0,1) weighting scheme, may not differ greatly from the

first-order neighbour model when distance weights are used, since the distance weights will discount the

5.4. Methods 185

second-order neighbours. We preferred to determine appropriate neighbourhoods over which to average,

independent of weights.

Depth neighbours were used in one CAR neighbourhood model only (Table 5.1), as it was possible

that after accounting for treatment effects using depth (Section 5.4.3), neighbouring depth residuals might

be correlated, both naturally or by a poor fixed modelling inducing depth correlation.

In weighting horizontal neighbours equally, the differences of scale between rows and columns are

ignored. Gilmour et al. [1997] deal with this by using a base model with AR(1) modelling for row

and column, while Besag and Higdon [1999], Besag and Mondal [2005] use differently weighted row

and column CAR neighbourhood models in which the weights are estimated. Both modelling strategies

recognise anisotropy, since a priori, it is unclear whether two experimental sites in adjacent rows which

are physically closer will have greater spatial correlation than two sites cultivated in the same column

but further apart. Under WinBUGS, weights may not be random quantities to be estimated, but must be

fixed. Our choice was above all to determine a suitable neighbourhood.

CAR spatial models

Let the sites be indexed by i = 1, ..., 54, at depths, 20, 40,...,300 (indexed by d = 1, ..., 15). The moisture

value, yi,d is modelled as nine depth functions, f j(i)(d), one for each treatment j, determined by the site

index, i. These are functions of the depth (indexed by d). The residual from this fixed effects model is

modelled as the sum of a spatial residual component, si,d, and a non-spatial residual component ϵi,d, with

ϵi,d ∼ N(0, τ2). The spatial residual component is an average of the neighbouring spatial residuals, [Besag

and Kooperberg, 1995, Besag et al., 1991]. This local spatial smoothing specification ensures a global

specification via Brooke’s lemma [Banerjee et al., 2004], and allows us to account for spatial similarities.

For site i at depth d, the full model is

yi,d = f j(i)(d) + si,d + ϵi,d

where f j(i)(d) is the treatment effect for treatment j at site i and depth index d, and is a function of

depth. The conditional probability of the spatial residual component, si,d, given its neighbours, sk,d, is

si,d |sk,d, k ∈ ∂i ∼ N(∑

k∈∂i

wik sk,d

wi+,σ2

dwi+

)


SPATIAL DIMENSIONS

where ∂i is the set of indices for the neighbours of site i, wik is the weight of the kth neighbour of i,

wi+ is the sum of the weights of the neighbours of i, and σ2d is a variance component for the CAR model

at depth, d, and there is a common homogeneous variance component across all depths, τ2.

The majority of neighbourhood models compared are described by the CAR model given above.

However, two further random component models were fitted, one of which included a CAR model with

depth neighbours which therefore could only be given a single spatial variance component (σ2). This

model allowed the final homogeneous residual term ϵi,d to be dependent on depth, with ϵi,d ∼ N(0, τ2d).

The final model considered had first-order neighbours in the same depth layer, with 15 spatial variance

components and 15 homogeneous variance components, one for each depth.

5.4.3 Treatment (fixed) effects

Interest lies in describing moisture as a function of both depth and treatment. For this reason, we model

depth effects using fixed terms in the model, rather than incorporating depth neighbours into the random

CAR component of the model.

Preliminary analyses in which possible spatial correlation was ignored showed that the data could

be described in terms of five groupings of the treatments using orthogonal polynomials of up to degree

8 for at least some of the groupings. However, we chose to fit polynomials and spline models for all 9

treatments, to permit all treatment effects to be seen, and made final comparisons across the groupings.

Soil moisture measurements were considered to be part of a continuum. To take advantage of

this continuity it was thought reasonable to approximate the treatment effects as continuous, preferably

smooth, functions of depth. We fitted orthogonal polynomials, linear and cubic splines, and cubic radial

bases to the depth, allowing all curves to vary across the bases by treatment.

We compared 9 treatment polynomials of degree 10, 8 and 6, with linear spline models having 3-5

internal knots, and cubic splines and cubic radial bases with 5 interior knots.

For model choice the Deviance Information Criterion (DIC) [Spiegelhalter et al., 2002] was used to

compare the goodness of fit of the various models with their differing fixed and random effects.

Each of these models may be expressed in the following way: At site, i, depth indexed by d, with

treatment j,

5.4. Methods 187

yi,d = f j(i)(d),

= Xβ, in the case of the models used,

with X being a design matrix based on the treatments j(i), and the basis functions of the depth index, d.

Spline and cubic radial bases models

Treatment effects across the depths were modelled as linear splines with varying numbers of knots (from

3-5), linear splines with 5 knots and depth considered to be measured with error, and cubic radial basis

functions [Ngo and Wand, 2004] with and without measurement error in depth. We chose to fit a mea-

surement error model [Fuller, 1987, Wand, 2009] as an alternative to the adjustment to depth used by

Ringrose-Voase et al. [2003] to account for soil shrinkage/expansion under drier/wetter. While depth is

measured accurately, the depth within the soil profile is not, and it is the depth within the soil profile

which is of interest to the soil scientists. Additionally, fitting an errors-in-variables model allows the

possibility of a better fit for the spline model, and also provides a method for dealing with any residual

depth correlations. For the Errors-in-variables model, see Section 5.4.3.

For the 5 knot model, 5 equally spaced internal knots were chosen at d=3.33, 5.67, 8, 10.33 and

12.67, d=1,...,15. In this type of semi-parametric modelling, knots are typically chosen at quantiles of

the data. Thus, with equally spaced observations over depth, equally spaced knots are appropriate. The fit

using the cubic radial bases did not appear to improve with an increasing number of functions. Penalised

linear and cubic splines were fitted as described by Wand [2009].

Additionally, cubic radial basis functions as defined in Ngo and Wand [2004] were fitted. These

involve the inversion of a matrix, and the use of matrix algebra. However, this matrix is fixed once the

knots have been chosen. Thus, the changing bases implied by an errors-in-variables model do not require

matrix inversion within an MCMC implementation.

For both the linear splines and cubic radial basis functions the half-Cauchy distributions recom-

mended by Marley and Wand [2010] were used as prior distributions for the variance of the coefficients.


SPATIAL DIMENSIONS

Errors-in-measurement model

The proposed errors-in-measurement model postulates that the true depth index z is interval-censored

and is related to the observed depth index, d, in the following way:

zd |d ∼ N(d, σ2z )I(zd−1, zd+1) for d = 2, 3, ...14,

z1|d = 1 ∼ N(1, σ2z )I(0, z2),

z15|d = 15 ∼ N(15, σ2z )I(z14, 16),

where the prior for σ2z is

σ2z ∼ Half-Cauchy(1).

The choice of Half-Cauchy(1) was dictated by the need to disallow initial values of z from moving

to the extremes (0,16) and thereby removing the spline bases.

Both the spline models and the cubic radial bases model accommodate the measurement of depth

with error, giving a latent true depth of z × 20. Where an errors-in-variable model is used for depth, d

is replaced by z, the unobserved true depth index, and the knots are adjusted accordingly. The treatment

effects of the model, f j(i)(d), become f j(i)(z).

Contrasts of interest

It was important to assess whether response cropping would use rainfall (as stored soil water) more

efficiently than the traditional practices of long fallow cropping or continuous winter cropping. Other

important questions were to establish the patterns of water use and changes in soil water profiles under

response cropping compared with those under perennial pastures, which are noted for their ability to

respond to rainfall and to use available soil water at most times of the year.

Thus, three contrasts were considered: (1) the difference between the traditional long fallow treat-

ments (1,2,3) and the response cropping treatments (5,6), (2) the difference between cropping (treatments

1-6) and pastures (7-9), and (3) the difference between the lucerne treatments (7-8) and the perennial grass

treatment (9).

The various long fallow treatments were out of phase. Thus, despite the interest being between

response cropping and long fallow cropping, all 9 depth treatment curves were fitted separately to allow

any differences between them to be seen, with treatments only grouped for comparisons.

Under errors-in-measurement models of depth, treatment comparisons were made at the nominal

5.4. Methods 189

depths.

5.4.4 Choice of Priors

The variances for the spatial residual components (σ2d) were given a common inverse gamma prior, and

the non-spatial variance component (τ2) an inverse gamma prior. Thus,

1/σ2d ∼ Gamma (a1, b1),

1/τ2 ∼ Gamma (a2, b2), where,

al ∼ Gamma(.1, .1),

bl ∼ Gamma(.1, .1),

l = 1, 2.

For the splines and cubic radial basis functions, the coefficients of fixed terms were assigned the prior

N(0, σ2u), with σ2

u having a Half-Cauchy(25) prior. The latent depth variable, z, was assigned a prior of

N(0, σ2z ), with σ2

z having a Half-Cauchy(1) prior. These half -Cauchy choices are not restrictive, since

the median of the Half-Cauchy(1) is 1, the mode is 0, and the mid 90% of the distribution lies within

(.08,12.7), while for Half-Cauchy(25), the mid 90% lies between (1.97, 318).

The coefficients of the orthogonal polynomials were assigned priors of N(0, 3.3). This choice was

influenced by the number of fixed terms. With large numbers of terms, it was important to keep their sum

within numeric computing range during the burn-in. Given that their sum lay between -0.8 and -0.2, this

prior did not seem too restrictive.

Similar considerations applied to the choice of Gamma (0.1, 0.1) for the hyperpriors for the parame-

ters of the inverse gamma distributions from which the variances for the spatial and non-spatial random

components were drawn; a distribution with a mean of 1 and a variance of 10 (and mid 90% in (0,10))

seemed reasonable for these hyperpriors.


SPATIAL DIMENSIONS

5.4.5 Model comparisons

The choice of neighbourhood model is made using the Deviance Information Criterion (DIC) [Spiegel-

halter et al., 2002], available in WinBUGS, while using a common fixed specification (orthogonal polyno-

mials of degree 8 for each treatment). And again, the choice for the fixed effects is made using a common

CAR neighbourhood specification (a maximum of 4 possible neighbours in the same depth plane).

5.4.6 Implementation Details

Initially, treatment effects were expressed in terms of design matrices, X, and MCMC iterations esti-

mating the treatment effects iterated over all 810 observations. However, within WinBUGS, it is more

useful to think of the fitted models as fitting a value for each of nine treatments at each of 15 depths (135

estimates), and using indices to assign this fitted value to each of the 6 site observations for a treatment.

This change speeds convergence and reduces memory requirements.

Neighbourhood matrices, othogonal polynomials, and the inverse matrices required by the cubic

radial bases were calculated outside WinBUGS and formed part of the data description. Spline bases

were calculated in WinBUGS, which is necessary when the errors-in-measurement model for depth was

used, as the bases change with each change in the latent depth variable.

All models were run as scripts, with at least a 60,000 iterations for burn-in (140,000 for errors-in-

variables models) with 200,000 iterations in all for the more complex models. Models were set up with

two chains and Gelman-Rubin statistics checked.

5.5 Results

5.5.1 Assessing presence of spatial correlation

The presence of spatial correlation was demonstrated by Moran’s I [Banerjee et al., 2004]. Table 5.2

shows this statistic as being statistically significant at the 10% level for ten of the fifteen depths. The

pattern of significance was also reflected in the significance of the ratio of spatial variance to non-spatial

variance differing from 1, with lower variability being shown at the central depths.

5.5. Results 191

5.5.2 Determining neighbourhoods and random components

Table 5.1 compares several models with differing neighbourhood structures, and in some cases, differing

random components. In this table all models have the same fixed design, with orthogonal polynomials

of degree 8 for each treatment (the ‘base’ model). This polynomial model was used because it had been

shown to adequately model the treatments as a function of depth when using a single pooled error, and

examination of the residuals along the depth dimension had demonstrated that the model was not induc-

ing autocorrelated residuals by depth. Models are compared using the effective number of parameters

(pD) and Deviance Information Criterion (DIC). The essential set of comparisons is used to determine

an appropriate neighbourhood, and each model in this set has a single homogeneous variance component

(τ2), and 15 spatial variance components (σ2d). Three additional models are considered, one of which

includes depth neighbours and which therefore cannot be fitted with differing spatial variance compo-

nents. This model has 15 homogeneous variance components (τ2d). The second additional model has

the same variance structure but has a 4 neighbourhood CAR structure. The final model has 15 spatial

variance components and 15 homogeneous random variance components and the same 4 neighbour CAR

structure. The major set of comparisons is between models having the same variance structure: a random

component structure of 15 spatial random components (σ2d) and a common homogeneous variance (τ2).

Table 5.1 compares a base model together with models having the same variance component structure:

1. One common pooled variance for error across all sites and depths (the ‘base’ model);

2. CAR model with a maximum of two neighbours per site (along the row), 15 spatial variances, one

for each depth, and a single homogeneous error variance;

3. CAR model with a maximum of four neighbours per site (directly adjacent in row and column),

15 spatial variances, one for each depth, and a single homogeneous error variance;

4. CAR model with a maximum of eight neighbours per site (includes diagonally adjacent sites), 15

spatial variances, one for each depth, and a single homogeneous error variance;

5. AR(1), AR(1) model [Gilmour et al., 1997], with a different autocorrelation components for each

depth layer, along the rows, and a common AR(1) component across the rows;

Table 5.1 also shows a further set of comparisons which allow the determination of the random

component structure, and also determine whether depth neighbours should be fitted within the CAR

modelling offered under WinBUGS.


SPATIAL DIMENSIONS

1. CAR model with a maximum of 6 neighbours per site (2 of which are depth neighbours), one

spatial variance, and 15 homogeneous error variances, one for each depth;

2. CAR model with a maximum of 4 neighbours per site, one spatial variance, and 15 homogeneous

error variances, one for each depth;

3. CAR model with a maximum of 4 neighbours per site, 15 spatial variance components, and 15

homogeneous error variances, one for each depth.

Using the DIC criterion, the preferred model is that having a maximum of 4 neighbours in the same

layer, with the 15 differing spatial variances. Including depth neighbours with the one spatial variance

component and the 15 non-spatial variance components gave a poorer model than the 4 neighbour model

with the same random component structure. Somewhat surprisingly, the 30 variance component model

was less satisfactory than either of the models with the same first-order neighbourhood model, and 16

variance components

In Table 5.3 we examine various fixed effects models while using a common random component

specification. All models in this table use a homogeneous random component, and a CAR specification

with a maximum of 4 first-order neighbours in the same depth layer, and the CAR model for each depth

having a differing spatial variance.

1. The saturated model with 9 treatments × 15 depth terms;

2. Three orthogonal polynomial models of (a) degree 6, (b) degree 8, and (c) degree 10;

3. Three linear spline models with (a) 4 internal knots, (b) 4 internal knots, and the assumption of

errors in the depth measurement, and (c) 5 internal knots, and the assumption of errors in the depth

measurement;

4. Two cubic radial bases models with 5 internal knots and (a) no assumption of error in the depth

measurement, and (b) the assumption of errors in the depth measurement;

5. Cubic spline with 5 internal knots.

The three polynomial models were fitted to choose a good base model to allow comparison of the

CAR models, and show that the choice of an 8 degree polynomial model was a reasonable choice for the

comparisons of Table 5.1.

5.5. Results 193

The poor fit of the saturated model (9 × 15 terms) reflects the biology of the system. Treatments be-

come increasingly irrelevant with depth, with the roots of each crop becoming unable to access moisture

at the deeper depths. When the errors-in-variables model is not fitted, the best model of the orthogonal

polynomials is the set of orthogonal polynomials of degree 8, and the best model that using cubic radial

bases.

The linear spline models with four and five knots and errors-in-measurement for depth, which are

roughly equivalent, provide a better fit than the simple linear spline model.

The cubic radial bases model with the 5 interior knots, and the same model with the errors-in-

measurement component provide the best of the spline basis fits. Despite the apparent lack of necessity

for the errors-in-variables modelling with these bases, the errors-in-measurement model is preferred since

this matches the known occurrence of soil shrinkage and expansion. The estimated differences between

true depth and nominal depth were effectively the same for both the linear spline model and the cubic

radial bases model.

Contrasts of interest were monitored at each depth. For the models without errors-in-measurement,

the contrasts were, as expected, sharper than those for the models with measurement error. However,

the patterns were largely the same. The major differences established between treatment groupings are

those for cropping versus pasture and for lucerne versus the perennial grasses, with cropping giving

the higher moisture values, perennial grasses the next highest values, and lucernes giving the lowest

moisture values. The differences are most marked at the shallower depths. The hoped-for difference

between response cropping and long fallowing was observed at the intermediate depths (from 160 cm -

200 cm, for the polynomial model, and from 180 cm - 200 cm for the errors-in-variables models). See

Figure 5.1.

As expected, all fixed effects curves from the various fixed effects models with the 4 neighbour

hierarchical CAR model have wider credible intervals than those for the corresponding models with no

spatial correlation taken into account (not shown). The CAR analysis is more realistic in that spatial

correlation has been accounted for.

Figure 5.2 shows the linear spline fit for the treatment effects of the model from errors-in-measurement

model, again with the hierarchical 4 neighbour CAR spatial model. This graph shows great variation in

true depth where there is rapid drying of the profile. The credible intervals of Figure 5.2 also show

greater variability for the fixed component at both the shallower and deeper depths, which was ob-


SPATIAL DIMENSIONS

served in all model fits. These fitted curves are essentially the same as those for the cubic radial bases

errors-in-measurement curve fit (Figure 5.3). Based on the DIC, the spline model without the errors-in-

measurements gives an inferior fit to the polynomial model when fitted without this, but an apparently

improved fit when we include the possibility of depth being measured with error. However, the cubic

radial bases fit without errors-in-measurement squanders fewer parameters, and has a lower DIC than the

other two models. Thus, if we were to judge on DIC alone, it would be the preferred model, but, given

that the errors-in-measurement model is appropriate, there is a strong case for preferring the cubic radial

bases model with errors-in-measurement.

At the greater depths, predicted moisture levels differ little between treatments. Predicted mois-

ture levels differ most markedly between treatments at the shallower depths, but again no difference is

observed between response cropping and long fallowing, at these depths.

Figure 5.4 shows the ratio of the standard deviations for the spatial neighbourhood residual compo-

nents to the standard deviation for the non-spatial variance, together with 95% credible intervals. Again

we see greater spatial variation at both the shallow and at the greater depths, with the smaller variance

components being at depths from 60 cm to 200 cm, for the various fixed models. The spatial variation is

not significantly different from the non-spatial variation at the shallower depths (from 20 cm to 140 cm),

while at the intermediate depths, from 160 cm to 200 cm, the spatial variance component is smaller than

the non-spatial variance component, with the spatial variation being larger than the non-spatial variation

at the greater depths (from 240 cm to 300 cm). This aspect of the spatial variation is consistent over the 2,

4 and 8 neighbour CAR models, and over the spline-basis and orthogonal polynomials. Clearly, the total

variation drives this, since the 15 homogeneous error variance model with one spatial variance model is

essentially equivalent to the 15 spatial variance model with one homogeneous variance component (Ta-

ble 5.1). After fitting any fixed effects model, the total residual variation not accounted for by the fixed

model is greater at both shallower and deeper depths.

Table 5.4 gives the contrasts between treatment types at various depths under the errors-in-variables

cubic radial bases model. The contrasts are more tightly estimated under the model without errors-in-

measurement of depth. However, the significant contrasts closely parallel each other. The contrasts are

shown in Figure 5.1.

5.6. Discussion 195

5.6 Discussion

In this paper, there were two critical issues: (1) to find an appropriate way of dealing with spatial cor-

relation, and (2) to find an appropriate model for the treatment effects. Having accomplished these two

goals, it then becomes possible to make inferences for the questions asked by the soil scientists.

The primary difficulty was the determination of the spatial model. Given that the data are point

referenced, an obvious choice of spatial model was a kriging model, such as that of Gotway and Cressie

[1990]. However, the large number of terms in the fixed part of the model made such an approach

impossible in the MCMC framework of WinBUGS. Additionally, including depth in the calculation

of distance, would have meant greater difficulties in disentangling treatment effects over depth from

spatial modelling considerations. Software such as SAS PROC MIXED [SAS Institute, 2004] offers the

possibility of both kriging and the various correlation model structures of Gilmour et al. [1997] within

a REML or ML framework, but when a model is poorly specified or very complex, PROC MIXED can

be difficult to use, and neither SAS nor the package of Gilmour et al. [2005] is freely available. We had

hoped to show that the CAR models used were comparable to the AR(1), AR(1) basis models of Gilmour

et al. [1997], Stefanova et al. [2009]. These were unable to be fitted within WinBUGS with the desired

complexity, but show comparability with the best CAR models of Table 5.1 (∆DIC = 257).

We used CAR models, first introduced by Besag [1974], and elaborated by Besag and coworkers

in Besag et al. [1995], Besag and Higdon [1999] for agricultural lattices. In recent work [Besag and

Mondal, 2005, Lindgren et al., 2010], CAR models have been shown to be closely related to kriging, but

whereas in kriging the highly dense covariance matrix is used, in a CAR model, a sparse precision matrix

is the basis for estimation.

Here, given that we wished to model the moisture measurements as a function of the depth, it made

good sense not to include depth in the CAR model specifications. The restriction of neighbours to the

same horizontal layer permitted the fitting of spatial residual components with differing variances, while

also avoiding the problem of the scale difference of depth when compared to row/column scale. Thus,

we were able to see (Table 5.1) that a model with 15 spatial variance components and one homogeneous

component (∆DIC = 300) was roughly equivalent to the model having 15 homogeneous components

and one spatial variance (∆DIC = 270), with both being better than the model which allowed both sets

of variance components to vary by depth (∆DIC = 76).


SPATIAL DIMENSIONS

Interestingly, the inclusion of depth neighbours in a CAR first-order neighbourhood model, with a

single spatial variance component (σ2), and 15 homogeneous variance components (τ2d), led to a poorer

model, leading us to believe that where measurements are made in 3 dimensions and taken at very differ-

ent scales, autocorrelations should be modelled separately. In future work, we would like to model depth

dimension autocorrelations with an AR(1) or ARIMA type model. However, here, with the use of an

errors-in-measurement model for the treatment effects, it would seem that the larger part of any residual

depth autocorrelation has been dealt with, while the failure to deal with rows and columns separately is

not grave.

A disadvantage of the CAR modelling choice was that within the WinBUGS framework, weights

must be chosen a priori and not estimated as in Besag and Higdon [1999], Besag and Mondal [2005].

The lattice framework, so typically found in agricultural data, needs an anisotropic treatment such as

that found in the Besag models already cited and in the models of Gilmour et al. [1997], Stefanova et al.

[2009].

We chose to model within a Bayesian framework for a number of reasons. The CAR models, readily

available in WinBUGS, were more flexible than potentially equivalent kriging models and other con-

tinuous space models available for point referenced data. Additionally, the WinBUGS framework for

MCMC analysis is both transparent and accessible to analysts of all skill levels. Analysis proceeds by

formulating the model as a set of conditional distributions and simulating realisations directly from the

posterior distributions of the parameters. Moreover, once the model has converged to the stationary dis-

tribution, most quantities of interest may be estimated. For example, the ratio of the square root of the

spatial variances to the overall non-spatial variance may be calculated in each MCMC iteration and the

samples monitored to find 95% posterior credible intervals.

Having accounted for spatial variation via CAR modelling, the concern was to choose a treatments

effect model and estimate treatment differences. The polynomial models were useful as a base compar-

ison, since they had been shown to have minimally autocorrelated residuals along the depth dimension,

and treatment effects could have been fitted adequately using the 8 degree polynomial, the linear spline

with depth measured with error, or the cubic radial bases model (with or without error). One of the

strengths of the WinBUGS framework was that it was possible to fit an errors-in-variables model, and

there are two good reasons for fitting errors-in-variables models: Firstly it is untenable in most regression

frameworks to believe that the response variable is measured with error, while the explanatory variable

5.6. Discussion 197

is not, and secondly, and possibly more importantly in this instance, in an earlier report [Ringrose-Voase

et al., 2003] the researchers were applying a complex formula to the measured depth in order to find the

true depth. Thus, this ability to model true depth was a useful extension of the model.

Table 5.3 shows the near equivalence of a number of competing treatment models. The treatment

contrasts shown in Table 5.4 and Figure 5.1 are those from the cubic radial bases with errors-in-variables

model, and do not differ significantly from the corresponding graphs and tables for the linear-spline

model with errors in variables. There are significant differences between cropping and pastures, while the

contrast of interest, between response cropping and long fallow rotation, is observed at the intermediate

depths. However, Figure 5.1 shows sufficient difference between the two types of cropping in the critical

part of the profile, for response cropping to be recommended should such a difference be repeated in

further data. The differences are in the mid-depth range where moisture uptake is needed to prevent

salination.

For some time the methods of Besag [1974] for analysing spatially correlated data have been avail-

able via the freely available software, WinBUGS, and many papers have been written using conditional

auto-regressive (CAR) models to smooth spatial data, particularly in the field of spatial epidemiology, see,

for example, Bernardinelli et al. [1995], Clements et al. [2008], Earnest et al. [2010], Elliott [2000]. How-

ever, few authors analysing agricultural lattice data have chosen to use Markov Random Field methodol-

ogy where the data are point-referenced. An early paper promoting CAR methods for lattice plots was

Besag et al. [1995] which analysed strawberry data in a lattice plot.

Other methods for agricultural spatially correlated data include those of Cullis and Gleeson [1991]

and Cullis et al. [1989], which use ARIMA models to account for spatial autocorrelation and model the

variance components using REML. In a later version of this approach, Gilmour et al. [1997] fit a complete

blocks model and AR(1), AR(1) models as a starting point for their REML modelling, and look at kriging

graphs on the residuals to determine how the data may be better modelled by the introduction of further

‘global’ extraneous random effects. The general consensus of these various authors is that agricultural

lattice data should be dealt with anisotropically. This could not be done within the framework we used.

However, we believe that the complex modelling shown here may illustrate a general truth, that where

lattice points on a 3-dimensional grid are far from equally spaced, the data need to be considered via an

approach resembling that used here with layering being possible where the measurements are roughly

equally spaced.


SPATIAL DIMENSIONS

In summary, this paper has extended the usual CAR modelling with a single spatial neighbourhood

matrix to a hierarchical CAR model with 15 different spatial variance components. It has also demon-

strated the richness available in modelling within a Bayesian framework, by combining this more com-

plex CAR spatial modelling framework with fixed effects treatment models with many terms, and more

importantly, an errors-in-variables model. We hope that this demonstration of the flexibility of CAR

models and their ease of fitting, together with the simplicity of fitting complex fixed effects models may

lead to greater use of CAR models and of Bayesian modelling in agricultural research.

Acknowledgments

This research was supported by the ARC Centre of Excellence in Complex Dynamic Systems and Con-

trol, and by QUT.

We thank Dr Alison Bowman of NSW Industry and Investment for her interest in, and support of

this work. We thank, too, Professor Matt Wand of the University of Wollongong whose generosity with

his understanding of non-parametric modelling led to the ‘semi-parametric’ modelling of the data, and

hence, to the errors-in-variables model.

This paper is dedicated to the memory of Julian Besag, a pioneer in this field of research, a teacher

and a friend.

5.7. Tables 199

5.7 Tables

Table 5.1 Comparing spatial neighbourhood modelling. Treatment effects model is identical forall models (Orthogonal polynomial degree 8). Models have 15 spatial variance com-ponents (σ2

d), and one homogeneous variance component (τ2), except where otherwisestated.

Description pD DIC ∆ DIC

Base model: No spatial component 81 -2690 -

Linear CAR (maximum 2 horiz neighbours) 264 -2811 121CAR (maximum 4 horiz neighbours) 358 -2990 300 †CAR (maximum 8 horiz neighbours) 320 -2930 240AR(1), AR(1) 945 -2947 257CAR (maximum 4 horiz neighbours, 2 depth)* 109 -2752 62CAR (maximum 4 horiz neighbours)* 110 -2960 270CAR (maximum 4 horiz neighbours) ** 121 -2766 76

∆ DIC=DIC(Base Model)−DIC†indicates the favoured neighbourhood model.* Models with 1 spatial variance & 15 non-spatial variance components.** Model with 15 spatial & 15 non-spatial variance components.


SPATIAL DIMENSIONS

Table 5.2 Values of Moran’s I for each depth layer. A normal approximation is used for testingsignificance.

Depth Moran’s I Prob

20 -0.062 0.473040 -0.063 0.465560 0.210 0.0028 *80 0.204 0.0037 *

100 0.115 0.0907 *120 0.125 0.0668 *140 0.139 0.0434 *160 0.107 0.1135180 -0.002 0.9199200 0.069 0.2844220 0.189 0.0068 *240 0.407 <.0001 *260 0.242 0.0006 *280 0.147 0.0329 *300 0.176 0.0118 *

* Significant at α=10%

5.7. Tables 201

Table 5.3 Comparing Fixed Effects modelling. Random components for all models are givenby 4 neighbour CAR with 15 depth variances (σ2

d), and one homogeneous variancecomponent (τ2).

Deg/Knots No. Terms Type D pD DIC ∆ DIC

135 Saturated Model (9 × 15 terms) 809 -2319 -

6 63 Orthogonal poly 297 -2970 6518 81 358 -2990 671 †

10 99 371 -2967 648

4 54 Linear Spline 318 -2923 6044 54 (+error in depth) 369 -3002 683 †5 63 (+error in depth) 401 -2999 680 †

5 81 Cubic radial bases 327 -2954 6355 81 (+error in depth) 368 -3013 694 †

5 81 Cubic Spline 257 -2769 450

†indicates the best fixed effects model of its type.No. Terms is the number of fitted fixed effects termspD, DIC given for the moisture value to allow comparison.∆ DIC: DIC(saturated model) - DIC.


SPATIAL DIMENSIONS

Table 5.4 Contrasts at nominal depths: Cubic radial bases model where depth is measured witherror.

Contrast Depth Est 95%CI Prob

Long Fallow v Opportunity cropping 180 0.033 0.004 0.063 0.0246200 0.033 0.002 0.066 0.0357

Crop v pasture 20 0.440 0.401 0.484 <.000140 0.372 0.346 0.399 <.000160 0.314 0.286 0.340 <.000180 0.270 0.244 0.294 <.0001

100 0.236 0.212 0.261 <.0001120 0.202 0.178 0.228 <.0001140 0.164 0.142 0.188 <.0001160 0.130 0.106 0.154 <.0001180 0.106 0.082 0.129 <.0001200 0.093 0.069 0.118 <.0001220 0.090 0.065 0.116 <.0001240 0.092 0.065 0.120 <.0001260 0.092 0.062 0.122 <.0001280 0.085 0.050 0.119 <.0001300 0.071 0.009 0.127 0.0279

Lucernes v native pasture 20 -0.305 -0.369 -0.245 <.000140 -0.301 -0.343 -0.254 <.000160 -0.290 -0.333 -0.239 <.000180 -0.273 -0.316 -0.228 <.0001

100 -0.243 -0.290 -0.202 <.0001120 -0.188 -0.237 -0.146 <.0001140 -0.115 -0.155 -0.072 <.0001160 -0.054 -0.092 -0.006 0.0322

Only contrasts with CIs not containing zero shown

5.8. Figures 203

Figure 5.1 95% credible intervals for the contrast differences based on the cubic radial basesmodel with errors-in-measurement (graphed where the 95% CI did not cover zero).The lines with the widest tops and tails show “Long Fallow - Response Cropping”,with the thinnest “Lucerne - Native Pastures”, and those with medium width “Crop -Pasture”.

5.8 Figures

0.5

0.4 f !

! 1 1

o2 1 I I 01 1 1 ± ± ! I f I ""················ ················ ···r+ .l ............. .... ... .

~= I l I I I I -0.3

0.3

0 100 200 300

Depth (cm)


SPATIAL DIMENSIONS

Figure 5.2 Fixed effects curves for errors-in-variables model: Linear spline treatment effects &95% credible intervals, CAR model, sites 1-54. The true depths are those implied bythe errors-in-measurement model. For each treatment there are 6 sites, each with thesame treatment curve.

-0.2

-0.3

i -0.4

§ a

i -0.5

-0.6

~ ·I§ )1

-0.7

-0.8

-0.9

0 100

l.Long Fallow (1) 2.Long Fallow (2) 3.Long Fallow (3)

4.Continuous 5.Response Cropping(l)

-- 6.Response Cropping(2) -- 7.Pasture aucerne -1)

200

8.Pasture auceme-2) 9.Pastme (native)

300

Estimated true depth (cm)

400

5.8. Figures 205

Figure 5.3 Fixed effects curves for errors-in-variables model: Cubic radial bases model showingestimates at the nominal depth. Depth has been jittered to allow credible intervals tobe seen.


SPATIAL DIMENSIONS

Figure 5.4 95% CI for the ratio of square root of the spatial variance to that of the non-spatialvariance at the fifteen depths: Cubic radial bases model with errors-in-measurementfor depth.

BIBLIOGRAPHY 207

Bibliography

Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for spatial data.

Monographs on statistics and applied probability. Boca Raton, London, New York, Washington D.C.:

Chapman & Hall.

Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995). Bayesian

analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22), 2433–2443.


R. Statist. Soc. B 36(2), 192–236.

Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic

systems. Statistical Science 10(1), 3–41.

Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal of the

Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.

Besag, J. E. and C. Kooperberg (1995). On conditional and intrinsic autoregressions. Biometrika 82(4),

733–746.


Biometrika 92(4), 909–920.



Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salinity studies

- Liverpool Plains, TS94.026. Technical report, New South Wales Department of Water Resources.

Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F. Gabrielli,

and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated uncertainty, West

Africa. Emerging Infectious Diseases 14(10), 1629–1632.




SPATIAL DIMENSIONS

Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to two dimen-

sions. Biometrics 47, 1449–1460.

Cullis, B. R., W. J. Lill, J. A. Fisher, B. J. Read, and A. C. Gleeson (1989). A new procedure for the

analysis of early generation variety trials. Journal of the Royal Statistical Society Series C Applied

Statistics 38(2), 361–375.

Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship between

yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions. Australian Journal

of Experimental Agriculture 41, 211–217.

Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn, D. Muscatello,

and K. Mengersen (2010). Small area estimation of sparse disease counts using shared component

models-application to birth defect registry data in New South Wales, Australia. Health & Place 16,

684–693.

Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publications. Ox-

ford: Oxford University Press.




269–293.

Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide Release 2.0.

Technical report, VSN International Ltd, Hemel Hempstead, UK.

Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water infiltration.

Water resources research 26(11), 2695–2703.

Lindgren, F., H. Rue, and J. Lindstrom (2010). An explicit link between Gaussian fields and Gaussian

Markov random fields: The SPDE approach. Journal of the Royal Statistical Society Series B, to

appear.

Marley, J. K. and M. P. Wand (2010). Non-standard semiparametric regression via BRugs. Journal of

Statistical Software 37(5), 1–30.

BIBLIOGRAPHY 209

Ngo, L. and M. Wand (2004). Smoothing with mixed model software. Journal of Statistical Software 9,

1–56.






Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca Raton:

Chapman & Hall/CRC.


Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model

complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),

583–639.

Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial analysis of

field trials. Journal of Agricultural Biological and Environmental Statistics 14(4), 392–410.



Statemen• of Contribution of Co·AUthors for Thesis by Publication

The authors listed below oerti!y !hat;

1. they meet !he Glitena for authorship, in that they have pa:cic:pated ln the conception, execution, or fnlerpretation, oi a! least that part of the in lhaldiold ol expertise;

2. they lake public responsibility tor their pul)iic:ation, except tor lhe rBSj:>onsib!e· ttuthQf who accepts overai! ''"mnn<ihilitv tt1e pui;ilciilion;

3. lheee ara oo o1her authors according to thl3se criteria;

4. potential conflicts.of interest have been di,;clcJsEd granting bOdies, or publisher of or other pulalic:ations, head olthe ""'.nnn,ihiA

academic· un!t, and

5. they agree to the use ofine publication in !he student's ltlesis and its on the Australasian Digital Thesis database consistent with any limitations set tJy publisher requirements~

In~ t:ase otChapl.er 6;

T!tk>: Cflfl!parlson of three dlmensiooal profiles O.V!!!' tim& JOul'll!ll: Journal of Applied S!a!islics Status: Submitted December 20i 0

! stit:;;:m~nTol££lf1iiibution _______ ,:. S!g~aitil'e _____ -JJ>iue_~-'-! ! As first author was r;osponsible fonhe: 1 conc~t ol the paper, data Malys1s, ; interpretation and the writing of an drafts1_ as ! wail as lhEi iinai version. Determined thecfil1al i CAR model to be thus contributing to j the final of the Gibbs csamplEwior the ! CP.H layered mode!. Wrote its mathematical ! description and that ofthe fullmooc'lc 1 Programmed the DIG calculations an(! the j neighbourhood matrices,

---------1 i Was responsible !or advice on : Dr Clair Al$100

! Dr Chrls strlddand

i l"ll€asoremei1ts <md their meaning and i editorial commant

------------:---------~------~ 1 Programmed and dellatoped the G~ ! : sampler tar tl'le CAR layer mOdel in pylii!GMG J

~~----1-------___;--l.!!ld su:pervii>edMl<rgarafs d~ll:m.i:!f it, in ~--~---~ ' addition to supervising the mathematical '

l Rick Y auny

i description of the rnOdel.

i Was responsibiB for advice on lhe purplse i and background to ihe field trial, advice on i the of statiStical results and editorial i comment

;·-----~~ -----------+------"")-----! l ProtessorKerrle 1 Was forgooeralad\liceand l MengBrsen 1 editorial comment ~ ·------~----- ----------------------~------------L-------~


1 have sight\ild email or olher correspoodence from all co-authors confirming their certifying

authorshr~ j ~ wdAJ td. -2JI/1 in Name 1 Signature · Date J!-

Chapter 6

Comparison of three dimensional

profiles over time

Preamble

This chapter addresses research objective (4) and fits a model to several days data to allow some of the

complexities of modelling the full dataset to be explored. This chapter uses Gaussian Markov Random

Fields and their sparse matrix representation to allow efficient block updating of the spatial residual

components. Banerjee et al. [2004] show the pointwise conditional CAR specifications lead to a global

spatial specification via Brook [1964]. Rue and Held [2005] show that the pointwise specification is

equivalent to a global Gaussian Markov Random Field specification. Here, unlike Rue and Held [2005]

and [Martino and Rue, 2008] who divide the nodes into two disjoint sets, and update the one conditional

on the other, we update the full set of spatial components as a block using the Krylov subspace methods

of Simpson et al. [2008], Strickland et al. [2010].

There are currently two programs using the GMRF framework for fitting spatial models with block

updating. INLA, developed by Rue and collaborators, fits into an R framework and is a reasonably

transparent framework for fitting models of the kind discussed here. BayesX, developed by Belitz et al.

[2009a,b] provides a framework for fitting additive models, together with CAR spatial priors and is

somewhat easier to use. However, having fitted the complex models of Chapter 5.2, I wished to fit

similar models but for larger datasets. The software of BayesX and INLA proved somewhat difficult

211

212CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL

PROFILES OVER TIME

to use with the complex CAR models which seemed to be needed. (For further discussion of this see

Chapter 7.1.)

Choosing to fit the desired models in pyMCMC [Strickland, 2010], a purpose-built MCMC frame-

work developed in Python, allowed considerably greater insight into the fitted models.

The appendix B gives tables for the differences in the various random components over time and

over depths, together with differences in slopes in the last linear segment of the depth curves for each

treatment and day, and contrast estimates. Contour curves for the spatial random errors for each day and

depth (75 graphs) are also given and show considerable continuity across the depths within a day.

I am the principal author and the paper is reprinted here with its abstract, but with different biblio-

graphic conventions from the Journal of Applied Statistics, to which it has been submitted. Rick Young

provided the data, helped with agricultural interpretations, in addition to providing editorial comment.

Chris Strickland programmed and developed the Gibbs sampler for the CAR layer model in pyMCMC

and checked my description of it given in the appendix 6.7, in addition to clarifying and verifying the

mathematical description of the model given in Section 6.4.1. My contribution to the development of the

sampler was to direct & describe what was needed. I also programmed the neighbourhood matrix and

DIC calculations, and wrote the descriptions of the model and the Gibbs sampler. Clair Alston provided

major editorial advice in addition to advice on the collection and meaning of the data. Kerrie Mengersen

oversaw, helped with, and guided the exposition. As first author, I was responsible for concept of the

paper, the choice of CAR model, the data analysis, interpretation and the writing of all drafts as well as

the final version.

Title: Comparison of three dimensional profiles over time

Authors: Margaret Donalda, Chris Stricklanda, Clair Alstona, Rick Youngb, Kerrie Mengersena.




2340, Australia.

BIBLIOGRAPHY 213

Bibliography



Chapman & Hall.

Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). Bayesx Software for Bayesian Infer-


http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.

Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). Bayesx Software for Bayesian Inference in

Structured Additive Regression Models Version 2.0.1 Software Methodology Manual. Online at


Brook, D. (1964). On the distinction between the conditional probability and the joint probability ap-

proaches in the specification of nearest-neighbour systems. Biometrika 51(3-4), 481.

Martino, S. and H. Rue (2008). Implementing approximate Bayesian inference using Integrated Nested

Laplace Approximation: A manual for the INLA program. Citeseer.


Chapman & Hall/CRC.

Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov random

field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Available online:

http://eprints.qut.edu.au.

Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal of Com-

putational and Graphical Statistics, 1–46. submitted August, 2010.

Strickland, C. M., D. P. Simpson, I. W. Turner, R. Denham, and K. L. Mengersen (2010). Fast Bayesian

analysis of spatial dynamic factor models for multi-temporal remotely sensed imagery.

6.1 Comparison of three dimensional profiles over time

Abstract

We describe an analysis for data collected on a three-dimensional spatial lattice with treatments applied at

the horizontal lattice points. Spatial correlation is accounted for using a conditional autoregressive (CAR)

model. Observations are defined as neighbours only if they are at the same nominal depth. This allows

the corresponding variance components to vary by depth. We use Markov Chain Monte Carlo (MCMC)

with block updating, together with Krylov subspace methods for efficient estimation of the model. The

method is applicable to both regular and irregular horizontal lattices and hence to data collected at any

set of horizontal sites for a set of depths or heights, for example, water column or soil profile data.

The model for the three-dimensional data is applied to agricultural trial data for five separate days taken

roughly six months apart, in order to determine possible relationships over time. The purpose of the trial

is to determine a form of cropping that leads to less moist soils in the root zone and beyond. We estimate

moisture for each date, depth and treatment accounting for spatial correlation and determine relationships

of these and other parameters over time.

keywords Bayesian; Conditional Autoregressive (CAR) models; depth profiles; Field trial; Linear

spline; Markov Chain Monte-Carlo (MCMC); Gaussian Markov random field (GMRF); Spatial autocor-

relation; Variance components.

6.2 Introduction

Despite numerous papers examining crop rotations and field experiments conducted over lengthy periods

of time, it remains a difficult problem to analyse such data satisfactorily, and in this case, the problem is

not that of a response measured over a surface, but that of a response measured over three dimensions

of the field. We describe field trial data from a long term crop rotation trial, conducted to determine

a cropping system which would maximise the use of stored water in the soil, and minimise the risk of

water leakage leaching the soil of its salts and endangering long-term agriculture on the Liverpool Plains

in New South Wales, Australia.

These data pose several problems: how to describe the treatment effect, how to account for spatial

6.3. Case Study 215

autocorrelation, how to account for spatial correlation over depths, and what might be an appropriate

model over time.

Preliminary analyses for one date’s data indicate that CAR models [Besag and Higdon, 1999, Be-

sag et al., 1991] describe the local spatial autocorrelations well. The choice of CAR models is further

discussed in Section 6.6. We use an identical model structure for each date to analyse five dates of soil

moisture measurements taken six months apart over a period of two years. In using the same model,

we wish to determine which parameters are constant over the different dates, and which are not, in an

exploration of the data prior to fitting a time-space model, with the final purpose of data analysis being

to determine the best cropping system of those considered.

The model for the treatment effect assumes that each treatment determines a depth profile curve for

each date. These treatment effects are modelled as continuous curves along the depth dimension, with

different curves for each date and treatment. We use linear splines to model treatment effect over depth,

as this allows trend comparisons over segments of the curve.

We present a methodology for fitting spatially correlated agricultural data, where data are three di-

mensional over space, which is computationally efficient.

Section 6.3 describes the data used in the case-study. Section 6.4 describes the model and the com-

putational framework for its estimation. Section 6.4 also describes the methods used for comparisons

of contrasts between and within dates. Section 6.5 provides the results of the case-study. Section 6.6

provides a discussion of the methods and the results.

6.3 Case Study

The four dimensional data used in this case-study consist of moisture observations taken at 108 surface

treatment sites, 15 depths and over 5 different dates during a two-year period. The 108 measurement

sites are arranged as 6 rows with 18 columns per row. Hence, data at each time point consist of 1620

measurements at 108 sites over 15 depths.

The purpose of the field trial is to determine a cropping system which leads to lower moisture values

in the soils, in order to minimise the risk of deep drainage. More complete details of the trial may be

found in Ringrose-Voase et al. [2003]. Nine treatments are considered. These fall into three groups, long

fallow cropping, response cropping and pasture treatments.

The primary question of interest to crop scientists is whether response cropping gives lower moisture


PROFILES OVER TIME

values both at the intermediate and greater depths, in comparison with long fallow, and whether this

is sustained over different stages of the cropping cycle. Subsidiary questions addressed here are the

comparison of cropping treatments with pasture treatments, and the comparison of the lucerne pasture

mixtures with the native grass pasture.

The concern in this paper is to establish whether the various components of the model vary from date

to date, and to determine the cropping system best suited to the land.

The treatments are

1. Treatments 1-3: Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are

grown in three years with an intervening 10-14 month fallow period. The 3 treatments are each of

3 phases of the long fallow 3 strip system.

2. Treatment 4: Continuous cropping in winter with wheat and barley grown alternately.

3. Treatments 5 and 6: Response cropping, where an appropriate crop (either a winter or a summer

crop) is planted when the depth of moist soil exceeded a predetermined level.

4. Treatments 7-9: Perennial pastures. The three treatments are lucerne (a deep rooted perennial

forage legume with high water use potential), lucerne grown with a winter growing perennial

grass,and a mixture of winter and summer growing perennial grasses.

The dates of the observations are almost equally spaced over two years (July 23, 1997, December 4,

1997, April 28, 1998, September 23, 1998, and February 25, 1999).

6.4 Methods

6.4.1 Model

The model describes data collected over the three spatial dimensions, in particular, over a three dimen-

sional lattice, with the response variable arising from an experimental treatment applied at lattice points

on the horizontal plane. The model consists of a regression or fixed effect component, a spatial component

and an irregular component or residual error. For the spatial component we consider both non-stationary

and stationary spatial processes. The regression component is specified via the treatments (determined

by the horizontal planar spatial locations), and a set of basis functions over the third spatial dimension,

depth. As in the case of two-dimensional spatial models, we enumerate the horizontal spatial locations in

6.4. Methods 217

a particular order which determines the spatial neighbourhood matrix. To simplify the definition of the

matrices associated with the variances and precisions, the (n × 1) vector of observations, y, is arranged

as D× S observations, where D is the number of lattice points in the depth dimension, and S the number

of spatial locations in the horizontal plane.

The model for y is as follows

y = Xβ + ψ + ϵ, (6.1)

where X is an (n × p) design matrix, β is a (p × 1) vector of regression coefficients, ψ is an (n × 1) vector

that models spatial correlation at each depth and ϵ is an (n × 1) residual vector that is homogeneous

within each depth. The design matrix, X, models the treatment effects as continuous functions of depth

for each treatment. The spatial covariance is modelled using a Gaussian Markov random field (GMRF).

Stationary and non-stationary covariance structures are considered. A proper conditional autoregressive

(CAR) prior [Gelfand and Vounatsou, 2003] is used for the stationary case, while an intrinsic CAR prior

[Besag et al., 1991, Rue and Held, 2005] is used in the non-stationary case. The spatial variation for ψ is

captured either through a proper prior on ψ in the stationary case such that

ψ ∼ N(0,Ω (ρ, τ)−1

),

or in the non-stationary case, through an improper prior

ψ ∼ N(0,Ω (τ)−1

).

Points on the lattice are defined as neighbours only if they lie in the same horizontal layer. The precision

matrices, Ω (ρ, τ) and Ω (τ), are (n × n) block diagonal matrices that depend on the horizontal neigh-

bourhood structure, the (D × 1) vector of scaling coefficients, τ2, and, in the stationary case, a spatial

dependence parameter ρ, where |ρ| < 1. In the non-stationary case ρ is not required. The block diagonal

structure permits D separate scaling coefficients, τ2, that model differing variances at each depth for the

spatial components.

The error, ϵ, is an n × 1 vector, that is defined such that

ϵ ∼ N (0,Σ (σ)) ,


PROFILES OVER TIME

where Σ (σ) is an (n × n) diagonal covariance matrix that is a function of a (D × 1) vector σ2 that allows

heterogeneity across depths in the non-spatial random component. The variance, Σ, is defined as

Σ = diag(σ2

1I, σ22I, . . . , σ2

DI),

where I is the S × S identity matrix. This structure arises from the ordering of y by depth and then by

spatial site. The residuals are modelled as having differing variances at each depth.

Two variables, depth and treatment, are used to describe the fixed effects. Here, depth is treated

as a continuous variable, and a set of basis functions is formed from it in order to fit splines. For the

case-study, the basis functions are linear splines, but other basis functions may be used [Ngo and Wand,

2004]. Treatment is a categorical variable, with T levels. The design matrix, X may be expressed as

X = A ⊗ B,

where A is a D×k matrix of k depth basis functions, ⊗ is the Kronecker product and B is an S ×T matrix

that matches the S horizontal sites with the appropriate set of T dummy variables for the site treatments.

The Kronecker product gives X as an n × p matrix with n = D × S and p = k × T .

The linear splines are defined as zk(d)=(d − κk)+, where

(d − κk)+ = d − κk, d ≥ κk,

= 0, d < κk,

for some knot sequence κ1, ...κk−2, and d = 1, 2, . . . ,D. The basis functions in A for each d are [1, d, z1(d), z2(d), . . . zk−2(d)]

[Ngo and Wand, 2004]. For the linear splines used in the case-study of this paper, the number of basis

functions, k, is the number of internal knots plus 2.

For the stationary CAR prior the precision matrix is

Ω (ρ, τ) = block diagonal(Q/τ2

1,Q/τ22, . . . ,Q/τ

2D

),

with τ2 an n× 1 vector of scaling coefficients permitting different variances at the D different depths, and

6.4. Methods 219

Q an S × S first order neighbourhood precision matrix common to each depth layer, and

Q = (M − ρW).

The neighbourhood matrix, W, is defined such that

wi j =

0 i = j,

−1 i ∼ j, (i, j are neighbours),

0 otherwise,

and M is given by

M = diag (n1, n2, . . . , nS ) ,

where ni is the number of neighbours of site i. See Gelfand and Vounatsou [2003].

In the depth layered scheme used here the non-stationary CAR prior is defined as

Ω (τ) = block diagonal(R/τ2

1, R/τ22, ...R/τ

2D

),

with R an S × S first order neighbourhood precision matrix whose elements ri j are specified by

ri j =

ni i = j,

−1 i ∼ j,

0 otherwise,

where ni is the number of neighbours for site i [Rue and Held, 2005].

6.4.2 Computation

Computation is performed using a general-purpose MCMC software framework currently under devel-

opment, which allows block updating of parameters. Programming is in Python and uses the Fortran

and C libraries, LAPACK, BLAS, SciPy of Anderson et al. [1999], Blackford et al. [2002], and NumPy

Community [2010] respectively. Model parameters are partitioned into five blocks, (ψ, τ,β,σ, ρ) , each

jointly sampled. Closed form samplers are used for all model parameters except ρ where a Metropolis

Hastings sampler is used. Block updating is found to be more efficient by various authors, see, for exam-


PROFILES OVER TIME

ple, Chib and Carlin [1999], Pitt and Shephard [1999]. Lui et al. [1994] show theoretically that jointly

sampling parameters in a Gibbs scheme leads to a reduction in correlation in the associated Markov chain

in comparison with the individual sampling of parameters. Block updating typically means that MCMC

chains converge faster.

The conditional autogressive models have a sparse precision matrix defined by the adjacency ma-

trix. The sparse matrix representation used here is the compressed sparse row format described by Saad

[2003]. Krylov subspace methods are used for updating [Simpson et al., 2008, Strickland et al., 2010].

Further computing details are given in Section 6.7, where block updating equations are given for the

posterior probabilities.

6.4.3 Fixed effects

Three models are considered for the regression component: a three-knot linear spline with knots at depth

indices 4, 7 & 10, a five-knot linear spline with equally spaced knots at 3.33, 5.67, 8, 10.33 and 12.67,

and a saturated model of 135 terms which fits a constant for each treatment by depth.

The linear splines allow discussion of trends across various depth segments. The five-knot linear

spline was the initial choice. However, with six segments defined over 15 depth points, linear trends

may not be seen because of the limited number (3) of different depths within a segment, so a three-knot

model was also considered. For comparison, a saturated model of treatments by depths or 9 × 15 = 135

parameters was also fitted.

Smooth continuous curves may be fitted using the generalised additive (GAM) framework of Hastie

and Tibshirani [1990], or the random walk (order 2) (RW2) smoothing of INLA [Martino and Rue, 2009],

the RW2 penalised splines of BayesX [Belitz et al., 2009a,b] which are described more fully in Brezger

and Lang [2006], Lang and Brezger [2004] and Kneib and Fahrmeir [2006]. Such frameworks seem

unnecessarily complicated for the problem here. (For example, a naive use of BayesX, gave a default 20

knots across the 15 depth values.) Additionally, the use of linear splines allows comparisons of trends

over the linear segments of the curves.

Choice of the regression component and final model is made using the Deviance Information Crite-

rion (DIC) [Spiegelhalter et al., 2002], an information criterion based on the Deviance and adjusted by

an estimated number of parameters. The results of these comparisons are reinforced by the curves of the

posterior deviance distributions [Aitkin, 1997]. See Figure 6.1.

6.5. Results 221

6.4.4 Contrast and parameter comparisons

Output for each MCMC simulation after burnin was kept for all model estimates. This permitted post-

hoc comparisons for any desired function both within and across the measurement dates. Contrasts of

interest are (1) Average Long fallow cropping minus average response cropping, (2) Average cropping

minus average pastures, and (3) Average lucernes minus native perennial pastures. These contrasts are

calculated for both the slope of the line segment from 200 cm to 300 cm, and for the moisture estimates

at each point in the depth profile.

Contrasts and slopes are compared across all combinations of dates, giving 10 comparisons for each

estimate. Comparisons within a date are formed by pairing the estimates from the same iteration. How-

ever, the estimates from each date’s model are independent. Hence, the across date comparisons are

formed after randomising the iterates.

We compare the variance components of the model in the same manner. For the random spatial

components a visual comparison only is made, using the 95% credible intervals for ψ for each site and

depth. For depths from 140 cm and onward, these credible intervals largely overlap.

6.5 Results

6.5.1 Model choice

The DIC (Table 6.1) indicates that the three-knot linear spline model with the stationary CAR prior is a

better model than the three-knot model with the non-stationary CAR, and a more appropriate model than

the stationary CAR prior saturated model or the five-knot linear spline on almost all of the five dates. On

the date (date 4, September 23, 1998) when the five-knot model is found to be the best, the three-knot

model is virtually equivalent. Clearly, tracking treatment effects at depths where they do not exist leads

to a poorer fit. However, it seems likely that the improvement observed with the five-knot model on

September 23, 1998 represents a slightly better fit at the shallower depths for that date. The three-knot

linear spline fits the bulk of the data well, but may be a poorer model for some dates at the shallower

depths covered by the first linear segment.

The Deviance curves for the three-knot linear spline, five-knot linear spline and saturated model

show the superiority of the five-knot linear spline model for date 4 (Figure 6.1). Plots of deviance curves


PROFILES OVER TIME

for all models and dates show the saturated model generally to be the poorest of the fitted models.

A useful byproduct of the DIC calculation is the calculation of pD, the effective number of parame-

ters. The saturated model contains 135 parameters and the variance components consist of 31 parameters,

but as can be seen from Table 6.1, pD is approximately equal to 5/8 of the degrees of freedom available

on any date. This proportion indicates that many of the spatial residual components might well be con-

sidered to be outliers of the CAR normal models [Spiegelhalter et al., 2002].

6.5.2 Variance components

Figures 6.2 and 6.3 show the square roots of the non-spatial and the spatial variance component pa-

rameters, σ2 and τ2, and indicate that they mirror each other at the various depths and over the dates.

Figure 6.2 illustrates the need for a non-spatial variance component for each depth, but that these compo-

nents may be constant over dates. The comparisons to see whether the non-spatial variance component

for each depth differs across dates, show that all 100 of the possible comparisons across dates for depths

from 120 cm to 300 cm have 95% credible intervals which include zero. For depths from 20 cm to 100

cm (50 comparisons) just 9 differences have credible intervals not inclusive of zero, and these all involve

comparisons with date 4. These intervals indicate that the non-spatial variance component, σ2, varies by

depth but not by date.

Figure 6.3 shows τ varying by depth, but being approximately constant across dates from depths 120

cm to 300 cm. Comparisons across dates show just 3 observed differences whose 95% credible interval

fails to include zero from a possible 100. There are apparently some differences across dates at the

shallower depths, with 25 of the 50 possible comparisons showing differences for depths from 20 cm to

100 cm, and these are generally differences with the τ values for date 4. The spatial variance components

vary by depth, but not by generally by date. This is particularly true for depths from 120 cm to 300 cm.

The variance component graphs (Figures 6.2, 6.3) show very much lower variability in the mid-depth

range. Date 4 (September 23, 1998) shows considerably smaller variances for the shallower depths than

those for the other dates for both the spatial and non-spatial variance components.

Tables 6.2 and 6.3 show values and comparisons for the parameter ρ. Just one of the possible

10 comparisons across dates has a 95% credible interval which did not include zero. ρ appears to be

effectively the same across dates.

6.5. Results 223

6.5.3 Depth segments and dates

The three-knot spline model consists of four linear segments for each treatment. Table B.15 shows the

95% credible intervals for the slope of the linear segment at the greatest depth (from 200 cm to 300 cm).

Almost all treatments show no trend in this segment. (The exceptions are treatment 8, a lucerne mixture

treatment which shows decreasing moisture in this line segment, and treatment 2 which on two of the

five dates shows increasing moisture.) In general, the last linear segment (from 200 cm - 300 cm) is

constant for all treatments over all dates. Hence, from about 200 cm depth and deeper, the treatments

would appear to no longer affect the moisture levels and moisture stays roughly constant but with greater

variability with increasing depth.

Contrasts between the dates for each treatment’s final slope give 95% credible intervals which include

zero for all treatments, except for treatments 1 and 9, which each show 4 of the 10 possible differences

between the dates’ final slopes as differing.

If we group long fallowing, response cropping and pastures and calculate a common final slope for

each grouping, these estimated slopes all have 95% credible intervals which include zero. Comparing

the contrasts for these grouped slopes across dates, no differences are found across dates. Mean moisture

levels from 200 cm to 300 cm do not change across the different dates for the various treatments and

types of cropping, but become more variable with depth.

6.5.4 Point by point contrasts

Figures 6.4- 6.6 graph the point by point contrasts for all depths and datess. The most important of these,

Figure 6.4, shows the 95% credible intervals for the long fallow versus response cropping contrast as

generally differing across dates at the shallower depths, but overlapping for the depths from 200 cm to

300 cm.

Tables for contrasts for the three-knot linear model are given in the online supplementary materials.

Table 6.5 shows the sign of each contrast whose 95% credible interval does not contain zero.

The statistical evidence is that the treatments no longer affect the moisture values from the depth of

200 cm to 300 cm, and given that the moisture profile is effectively flat at these depths, it seems that

moisture levels after 200 cm are constant for their treatment, but have greater variability than at the mid-

depths. The contrast of long fallow cropping (treatments 1-3) versus response cropping (treatments 5

& 6), has almost positive 95% credible intervals from 200 cm to 300 cm for all dates. Thus, it would


PROFILES OVER TIME

appear that for the five dates considered response cropping decreases moisture levels at the depth critical

for salination.

As expected, all contrasts for the contrast ‘Crop vs pasture’ (the average of treatments 1-6 minus the

average of treatments 7-9) are positive for all dates and depths, with the difference being roughly constant

from 200 cm to 300 cm. That is, cropping leads to moister soil than pastures.

The lucerne pasture mixtures (treatments 7 & 8) perform consistently better than the native pastures

for depths greater than 100 cm. That is, at these depths, lucerne mixtures lead to drier soil than the native

pastures.

The differences discussed above are also shown in the saturated model contrast differences but not so

markedly. These same contrasts when compared across the dates show essentially no differences in the

depths from 200 cm to 300 cm.

6.5.5 Spatial residual components, ψ

As indicated in Section 6.4.4, no formal comparisons were made for the spatial residuals across dates.

Graphs of their 95% credible intervals were plotted to inspect overlap or non-overlap. For depths from

140 cm and deeper the credible intervals largely overlap. Figure 6.7 gives contour graphs for these spatial

residuals at the depth of 240 cm for the different dates. These show considerable consistency across dates.

6.6 Discussion

In considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu [2007],

Piepho et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009] use mixed models

within a REML framework to analyse their spatio-temporal data, and explicitly address the fitting of

state-space models via standard software and REML. The fixed part of their models is generally simple

and the data are measured on two spatial dimensions. Some soil profile studies [Macdonald et al., 2009]

do not use spatial information in the analysis. Some studies composite the soils from different depths

across soil types or treatment [Sleutel et al., 2009]. Others [Nayyar et al., 2009] use the mixed modelling

framework advocated by Piepho et al. [2004]. Roy and Blois [2008] is one of the few papers in an

agricultural context which uses conditional autoregressive models. The current methodology of choice

for agricultural data, which accounts for spatial correlation would seem to be mixed modelling to describe

6.6. Discussion 225

spatial and other variance components, using REML. Despite the work of Besag et al. [1995], Besag and

Higdon [1993, 1999] there has been almost no use of CAR models for agricultural analyses. We use

conditional autoregressive models for their simplicity and their capacity to allow reasonably complex

fixed model components. Working with the sparse precision matrix from the adjacency matrix rather

than from a dense covariance matrix permits efficient model fitting. Besag and Mondal [2005], Lindgren

et al. [2010] show the equivalence of various kriging and CAR models.

The use of block updating allows good mixing and the Krylov subspace methods exploit the sparse

structure of the precision matrix to give efficient sampling.

The choice to allow neighbours only at the same depth is made for several reasons. Firstly, with depth

an important part of the regression component, to include depth-neighbours would confound estimation

of the treatment effects. Secondly, and more importantly, it permits the fitting of differing variances for

the spatial components at each depth. Finally, using the obvious choice of distance weighted neighbours

would mean that with the great differences in scale between horizontal and vertical distances the neigh-

bourhood model would degenerate effectively into a depth neighbourhood model only, while using (1,0)

neighbours would also be difficult to justify. This consideration seems likely to apply in many agricultural

contexts where observations are made in three spatial dimensions. We use a first order neighbourhood

across the horizontal lattice with (1,0) weights.

We fitted the same model to five dates of data aiming to discover how best to fit a model for the full

data. It largely appears that several important parameters of the model (ρ, τ and σ) are constant across

dates for the depths which are of concern for salination. (If we should wish to model moisture at all

depths, classifying dates as wet or dry on the basis of previous rainfall, may be useful to distinguish such

dates as date 4, September 23, 1998.)

From the DIC values, we see that the simplification of the three-knot model, where a longer linear

segment at the deeper depths is used, has resulted in a better model. Clearly for depths from about 200

cm and greater, the various treatments no longer exercise a direct effect on the moisture content of the

soil. Rather, the moisture content remains approximately constant at whatever level it has reached by 200

cm, but with increasing variability with increasing depth. This is true for all five dates.

We have presented a methodology for the analysis of three dimensional lattice data sets, where the

distance between lattice points in one dimension is not commensurate with those in the other two, a

situation which often applies water column, air column and soil studies. The method is applicable to


PROFILES OVER TIME

both regular and irregular lattices in the horizontal plane. We see it as applying to oceanographic, and air

column data as well as three dimensional agricultural studies.

The analyses of the case study here have uncovered important features of the data. In particular, by

having taken out the spatially correlated components, they indicate that response cropping gives rise to

more satisfactory moisture levels than long fallow cropping below the root zone where the soils are at

greatest risk of salination.

6.7. Appendix 227

6.7 Appendix

The joint posterior for the full set of unknown parameters is estimated by partitioning the parameters into

five blocks

(ψ, τ,β,σ, ρ) .

and a Gibbs sampling scheme is defined such that the jth step is

1. Sample ψ j from p(ψ|y, τ j−1,β j−1,σ j−1, ρ j−1

),

2. Sample τ j from p(τ|y,ψ j,β j−1,σ j−1, ρ j−1

),

3. Sample β j from p(β|y,ψ j, τ j,σ j−1, ρ j−1

),

4. Sample σ j from p(σ|y,ψ j, τ j,β j, ρ j−1

),

5. Sample ρ j from p(ρ|y,ψ j, τ j,β j,σ j

).

Let S be the number of horizontal sites, D the number of different depths, and n the number of

observations.

The following subsections describe the sampling from each of the full conditional posteriors in the

scheme above.

6.7.1 Sampling β.

We define y, such that

y = Σ−1/2 (y − ψ) ,

and X such that

X = Σ−1/2X.

The prior probability density function (pdf) for β is taken as

β ∼ N(β,V−1

),

where β is the prior mean, and V is the prior precision. Thus the posterior distribution for β is given by

β|y,σ,ψ ∼ N(β,V

−1),


PROFILES OVER TIME

where

V = XT X + V,

and

Vβ = Vβ + XT y.

6.7.2 Sampling σ.

Let

ϵ = y − Xβ − ψ,

and let the n × 1 residual vector be partitioned by depth, d, into D subvectors, ϵd, such that

ϵ = [ϵ1, ϵ2, . . . , ϵD],

where each ϵd is an S ×1 vector, with d = 1, 2, . . . ,D. The vectorσ2 is updated by updating each variance

component, σ2d, one at a time, using the following updating equations.

1/σ2d |y,β,ψ ∼ Gamma (ν/2, s/2) ,

where

ν = S + ν,

and

s = s + ϵTd ϵd,

and the common prior for each variance component, σ2d, the dth element of the vector, σ2, is

1/σ2d ∼ Gamma

(ν/2, s/2

), d = 1, 2, . . . ,D.

6.7.3 Sampling ψ.

Define

y = y − Xβ.

6.7. Appendix 229

This gives

y|β,σ, ρ, τ ∼ N(ψ,Σ).

Hence,

p(ψ|y,β,σ, ρ, τ) ∝ p(y|...) × p(ψ|ρ, τ),

∝ exp− 1

2

(ψTΣ−1ψ + ψTΩψ − 2ψTΣ−1 y

).

and thus

ψ|y,β,σ ∼ N(ψ,Ω−1

),

where

Ω = Ω + Σ−1,

and

Ωψ = Σ−1 y.

6.7.4 Sampling τ.

The elements τ2d of the vector τ2, d = 1, 2, . . . ,D, are updated one at a time as follows. The n × 1 vector

ψ is partitioned into D subvectors ψd, (the spatial residuals at depth d), and (see Section 6.4.1)

Ω (ρ, τ) = block diagonal(Q/τ2

1,Q/τ22, . . . ,Q/τ

2D

).

Let the prior pdf for for τ2d be given by

1/τ2d ∼ Gamma

(a2,

b2

),

with (a, b) as hyperpriors.

This gives the updating posterior probability density function for τ2 as

1/τ2d |ψ, ρ ∼ Gamma

(a2,

b2

), where

a = a + S , and

b = b + ψTd Qψd,


PROFILES OVER TIME

for

τ2 = [τ21, τ

22, . . . , τ

2D].

6.7.5 Sampling ρ.

From Section 6.4.1, the precision matrix for the spatial components isΩ(ρ, τ). The prior for ρ is taken as

ρ ∼ Beta (α, β) ,

with (α, β) the hyperparameters.

Hence, the posterior probability density function for ρ is given by

p(ρ|ψ, τ) ∝ |Ω(ρ, τ)|1/2ρα−1(ρ − 1)β−1 exp−1

2

(ψTΩ(ρ, τ)ψ

),

and ρ is sampled via a Metropolis-Hastings update.

6.8. Tables 231

6.8 Tables

Table 6.1 Summary of DICs

Model pD DIC

Date 1 D135(S) 1065 -5850K5(S) 1019 -5875K3(S) 1032 -5915 *

K3(NS) 959 -5657

Date 2 D135(S) 1061 -5952K5(S) 1039 -6038K3(S) 1048 -6049 *

K3(NS) 904 -5560

Date 3 D135(S) 1049 -5885K5(S) 1034 -5961K3(S) 1044 -5996 *

K3(NS) 906 -5509

Date 4 D135(S) 1093 -6570K5(S) 1070 -6623 *K3(S) 1064 -6619

K3(NS) 1011 -6507

Date 5 D135(S) 1053 -6321K5(S) 1024 -6378K3(S) 1024 -6396 *

K3(NS) 973 -6214

D135(S): Saturated model, 9 × 15 terms.K5(S): 5-knot linear spline.K3(S): 3-knot linear spline.K3(NS): 3-knot linear spline - Intrinsic CAR.

(S): 3tationary CAR(NS): Non-stationary (Intrinsic CAR).

pD: Estimated number of parameters.


PROFILES OVER TIME

Table 6.2 Estimates for ρ in the spatial precision matrix

ρ 95% CI

July 23, 1997 .461 (.373, .550)December 4, 1997 .385 (.304, .477)April 28, 1998 .375 (.289, .462)September 23, 1998 .346 (.266, .429)February 25, 1999 .325 (.246, .413)

Table 6.3 Differences in ρ across the five time periods.

Day1 Day2 est q025 q975 Sig

1 2 0.077 0.000 0.156 *3 0.087 0.008 0.162 *4 0.115 0.032 0.195 *5 0.136 0.058 0.213 *

2 3 0.010 -0.068 0.0854 0.039 -0.039 0.1225 0.060 -0.017 0.135

3 4 0.029 -0.049 0.1145 0.050 -0.024 0.124

4 5 0.021 -0.054 0.098

Est=ρDay1 − ρDay2

6.8. Tables 233

Table 6.4 Slopes for segment 200 cm - 300 cm for each treatment

Treatment Day (Date) Est q025 q975 Sig

1 1 -0.002 -0.010 0.0052 0.001 -0.006 0.0083 -0.001 -0.008 0.0054 -0.005 -0.012 0.0015 -0.002 -0.009 0.005

2 1 -0.005 -0.013 0.0022 -0.004 -0.011 0.0033 0.010 0.003 0.017 *4 0.012 0.005 0.019 *5 -0.003 -0.010 0.003

3 1 0.006 -0.001 0.0142 0.004 -0.003 0.0113 0.005 -0.002 0.0124 0.002 -0.005 0.0095 0.001 -0.006 0.008

4 1 -0.005 -0.012 0.0022 -0.004 -0.010 0.0033 -0.004 -0.011 0.0024 -0.004 -0.010 0.0025 -0.005 -0.011 0.001

5 1 0.001 -0.007 0.0082 -0.000 -0.007 0.0073 0.002 -0.005 0.0094 0.000 -0.006 0.0075 -0.000 -0.007 0.006

6 1 0.004 -0.004 0.0122 0.001 -0.006 0.0083 0.005 -0.003 0.0124 0.003 -0.004 0.0105 -0.004 -0.011 0.003

7 1 0.003 -0.003 0.0102 0.003 -0.003 0.0103 0.005 -0.002 0.0114 0.003 -0.003 0.0095 0.002 -0.004 0.008

8 1 -0.009 -0.016 -0.003 *2 -0.009 -0.016 -0.002 *3 -0.009 -0.015 -0.003 *4 -0.008 -0.014 -0.002 *5 -0.011 -0.017 -0.005 *

9 1 0.004 -0.003 0.0112 0.003 -0.004 0.0103 0.006 -0.001 0.0134 0.006 -0.001 0.0125 0.005 -0.001 0.012

* indicates 95% credible interval does not include zero.


PROFILES OVER TIME

Table 6.5 Signs for contrasts with 95% credible intervals not including zero, for each date. Pos-itive (+) and negative (−) values indicated.

DateContrast Depth 1 2 3 4 5

Long Fallow - Response 20 + - + +

40 + + + +

60 + + +

80 + + + +

100 + + + + +

120 + + + +

140 + - + +

160 + + + +

180 + + + +

200 + + + +

220 + + + + +

240 + + + + +

260 + + + + +

280 + +

300 +

Cropping - Pastures 20 + + + + +

40 + + + + +

60 + + + + +

80 + + + + +

100 + + + + +

120 + + + + +

140 + + + + +

160 + + + + +

180 + + + + +

200 + + + + +

220 + + + + +

240 + + + + +

260 + + + + +

280 + + + + +

300 + + + + +

Lucerne mixtures - Native 20 - -40 - - -60 - - - -80 - - - -

100 - - - - -120 - - - - -140 - - - - -160 - - - - -180 - - - - -200 - - - - -220 - - - - -240 - - - - -260 - - - - -280 - - - - -300 - - - - -

6.9. Figures 235

Figure 6.1 Cumulative distribution curves for the posterior distribution of the deviance, for (date4) September 23, 1998. The solid line represents that for the saturated model, themiddle broken line that for the 3-knot linear spline, and the more coarsely brokenline on the left that for the 5-knot linear spline model.

6.9 Figures

LO

0.9

0.8

0.7

0.6

0.5

OA

0.3

02 ---Design -------- K3

Ol K5

0.0 l,-,---,-,-;_:,.:-:;:=;::::;;=;=;~:;=,--,-,--,-,--,-,-~,-,---,-.-.---,--,--,-,--,-,--,-,--,-,--,-,--,---,---,-,--,--.--.---,-,J

-8200 -8100 -8000 -7900 -7800 -7700 -7600 -7500 -7400 -7300 -7200

Deviance


PROFILES OVER TIME

Figure 6.2 Square root of non-spatial variances, by date and depth. Credible intervals are stag-gered in date order. Note the comparatively smaller variances at the shallower depthsfor Date 4.

OJ2_c---------

OJl

010

~ ·I 0.09

i 0.08

i' 0.07

~ - 0.06 0

! 0.05

~ 0.04 r'E

0.00

0.02

0~~0000~~~~00~~~~~~~

Depth

6.9. Figures 237

Figure 6.3 Square root of spatial variance, by date and depth. Credible intervals are staggeredin date order. Note the comparatively smaller variances at the shallower depths forDate 4.

012

OJl

010

·I 0.09

0.08 :>

! OJYI

0.06 'a

! 0.06

j 0.04

0.03

i w\ ~. riD! ~I 0.02

O.lll i J ±g:ic ± " ""' -- - J!l: ±811 0.00

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320

Depth


PROFILES OVER TIME

Figure 6.4 Contrast: Long Fallow - Response cropping. Credible intervals are staggered in dateorder.

0.28 0.26 0.24 0.22

0.20 0.1.8 Ol6

! 014

012 010 0.08

~ 0.06

0.04

J 0.02 0.00

-0.02

-0.04 -0.06

-0.08 -010 -012

-014

0

\ \

/ /

\ \

/ /

/

50 100 150 200

Depth

250

July Zl, 19!11 December 4, 19!11

April 28, 1998 September Zl, 1998

February 25, 1999

300 350

6.9. Figures 239

Figure 6.5 Contrast: Cropping - Pastures. Credible intervals are staggered in date order.


PROFILES OVER TIME

Figure 6.6 Contrast: Lucerne mixtures - Native pastures. Credible intervals are staggered indate order.

0.2

Ol

~ z 0.0

1 \ /1r! ~i~t \/y/ i -0.2

l-1-Y / ...:I

l /r -0.3

\ f /

-OA

0 50 100 150 200

Depth

July Zl, 19!11 December 4, 19!11

April 28, 1998 -- September Zl, 1998

February 25, 1999

250 300 350

6.9. Figures 241

5050 5100 5150 5200

4800

4900

5000 −0.08

−0.04

0

0 0

0 0.02

0.02

July 23,1997

5050 5100 5150 5200

4800

4900

5000 −0.1 −0.06

−0.02

0

0

0 0

0

0 0

0.02

0.02

0.02

December 4,1997

5050 5100 5150 5200

4800

4900

5000 −0.1

−0.06 0

0

0

0

0 0

0

0.02 0.02

0.02 0.02

April 28,1998

5050 5100 5150 5200

4800

4900

5000 −0.1

−0.04 0

0

0

0

0

0

0.02

0.02

September 23,1998

5050 5100 5150 5200

4800

4900

5000 −0.1 −0.08 −0.04

0

0

0

0

0

0

0.02

February 25,1999

Figure 6.7 Spatial residual components at depth 240 cm.


PROFILES OVER TIME

Bibliography

Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior

distribution of the likelihood. Statistics and Computing 7, 253–261.

Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum,

S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third Edition (22

Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics (SIAM).









Besag, J. E. and D. Higdon (1993). Bayesian inference for agricultural field experiments. Bull. Inst.

Internat. Statist 55(Book 1), 121–136.




Biometrika 92(4), 909–920.



Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman,

A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms (BLAS).

ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.

BIBLIOGRAPHY 243

Brezger, A. and S. Lang (2006). Generalized structured additive regression based on Bayesian P-splines.

Computational Statistics and Data Analysis 50(4), 967–991.

Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, including longitu-

dinal experiments. Journal of Agricultural, Biological, and Environmental Statistics 14(3), 253–280.

Chib, S. and B. P. Carlin (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and

Computing 9, 17–26.

Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial

data analysis. Biostatistics 4(1), 11–25.

Hastie, T. and R. Tibshirani (1990). Generalized additive models (1st ed.). Monographs on statistics and

applied probability. London ; New York: Chapman and Hall.

Kneib, T. and L. Fahrmeir (2006). Structured additive regression for categorical spacetime data: A mixed

model approach. Biometrics 62(1), 109–118.

Lang, S. and A. Brezger (2004). Bayesian P-splines. Journal of Computational and Graphical Statis-

tics 13(1), 183–212.



appear.

Lui, J. S., W. H. Wong, and A. Kong (1994). Covariance structure of the Gibbs sampler with applications

to the comparisons of estimators and augmentations schemes. Journal of the Royal Statistical Society,

Series B 57(1), 157–169.

Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and I. White

(2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid sulfate landscape.

Applied Clay Science 46(2), 166–175.

Martino, S. and H. Rue (2009). R Package: INLA. Department of Mathematical Sciences NTNU,

Norway.


PROFILES OVER TIME

Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil microbial

quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1), 115–121.


1–56.

NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release 1.5.0.dev8106. Avail-

able online: http://docs.scipy.org/doc/. Accessed: February 9, 2010.

Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized experi-

ments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–247.

Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework. American

Statistician 61(3), 224–232.

Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear variance

models in plant breeding trials. Biometrical Journal 50(2), 164–189.

Pitt, M. and N. Shephard (1999). Analytic convergence rates and parameterisation issues for the Gibbs

sampler applied to state space models. Journal of Time Series Analysis 20, 63–85.






Roy, V. and S. d. Blois (2008). Evaluating hedgerow corridors for the conservation of native forest herb

diversity. Biological Conservation 141, 298–307.


Chapman & Hall/CRC.

Saad, Y. (2003). Iterative methods for sparse linear systems. Society for Industrial and Applied Mathe-

matics. [electronic resource].

BIBLIOGRAPHY 245




Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and S. De Neve

(2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and coniferous forests

under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.



583–639.


analysis of spatial dynamic factor models for multi-temporal remotely sensed imagery.

Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal experiments

with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.

Statement of Contribution of Co-Authors for Thesis by Publi1cation

The authors !is1:ed below certify that

1. 'they meet the· criteria for authorship, in tr1at have participaited in the cortceoti•Jn, of the publk:a!ion in their field of execution, or at least that

expertise;

2. t!leytake public responsibility'for tlleir part ofthe puiDiicati,)n, "'""·"'for trn; responsible author \;ryho accepts overall e>;ponsibi!ity

3. !here are rio other authors according to tllese criteria;

4. potential oonilicts of irlterest have been disclosed to (a) m;mtinn nntnes (b) \he editor or putllisoor ol JOUrnals or other publications, and (c) !he academic unfl;, and

5. they agree to !be use of the publication in the student's thesis and its M the Ai;stralasian Digital Thesis database oonsistent wit11.any liml!ations set by publisher requirements,

In Ute.- of Chapter 7:

Tiller Fot>r dimensional spat!Q.temporal amllysfS: d 1$ aQ!f~ ~· Status: In preparation

L~contr_~~~~or".-----'! -~!~!~~~~i!igt~~!!~rt~!!~.!~5. _ . L-c~-iB_~atUre'------o-~ . .J~Ste ! i Margaret Oonald i Margaret Donald as first author

i was !or the concept ! cl the paper, date analysis,. ! interpretation and the writing of i ell drafts.

~---~~ --------r-~----4·---1 i Was tl!$ponsible fl::>r advice M ; measurements. and lheir

Dt Clair Alston

i Or Ghrls Strlck!and

i Rick Young I

) meaning and editoriaJ comment. -------+---~----1·-----l i Programmed & developed the

, Gibbs sampler used fur the CAR r

' myered model, In pyMGMC ..

' Will be responsible fur a!lvlce an i the purpose <lf!d backgrourtl! to I thP tierd·trta,, advice: on the i meaning of statistical results afl(l 1

i editorial nnmment. l ·~~~ l Was responsible for general 1 Professor Kerrie

' Mengersen J advice and e(jilorial comment.

Prmclpq.! Supervisor's Confil'!llation

I have sighted email or other correspondence from all co-authors Clll1linning their certily\ng authorshio. j / / · / j

!.<. ~'1 /l/oc V\_JE~ F0._ ___. . J, q-/f;!r Name Signature Date

Chapter 7

Four dimensional spatio-temporal

analysis of an agricultural dataset

Preamble

This chapter addresses research objective (5) to fit a complex spatio-temporal model to the full agri-

cultural dataset. It uses the model and modelling software discussed in Chapter 6.1 to fit the saturated

treatment by depth model to 56 days data from the agricultural trial data. Thus, here we eschew the aim

of simplifying the treatment curves by depth, taking the view that the contrasts are better estimated by

fitting means to all treatments and depths. The data are analysed as a single analysis, which is created

by fitting the model of Chapter 6.1 at each date. And, secondly, in a two stage analysis, where estimates

for the contrast of interest are taken from the full model and used in a series of time series analyses to

consider the contrast over the complete trial.

Appendix C shows all the fits from Method 1 together with the penalised spline (over time) models

which are included to show the seasonality at the shallower depths dampening until there is virtually no

seasonality at the greater depths. The random walk fits show the fits from the chosen time series model.

The appendix includes figures for all these models for depths 100 cm to 220 cm.

I am the principal author and the paper is given with its abstract. This paper is in preparation. Rick

Young provided the data and will provide agricultural and other editorial comment when the statistical

content has been finalised. Chris Strickland programmed the various samplers for the posteriors in pyM-

247

248CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL

ANALYSIS OF AN AGRICULTURAL DATASET

CMC. I programmed the precision matrix and DIC calculations. Clair Alston provided editorial advice

in addition to advice on the collection and meaning of the data. Kerrie Mengersen oversaw, helped with,

and guided the exposition. As first author, I am responsible for concept of the paper, the data analysis,

interpretation and the writing of all drafts.

Title: A four dimensional spatio-temporal analysis of an agricultural field trial

Authors: Margaret Donalda, Chris Stricklanda, Clair Alstona, Rick Youngb, Kerrie Mengersena.




2340, Australia.

7.1 Four dimensional spatio-temporal analysis

Abstract

While a variety of statistical models now exist for the spatio-temporal analysis of two-dimensional data

collected over time, there are few published examples of analogous models for the spatial analysis of

data taken over four dimensions, namely space, height or depth, and time. When taking account of the

autocorrelation of data within and between dimensions, the notion of closeness often differs for each

of the dimensions. In this paper, we consider a number of approaches to the analysis of such a dataset

arising from an agricultural experiment which explores the impact of different cropping regimes on soil

moisture. The proposed models vary in their representation of the spatial correlation in the data, its

assumed temporal pattern and the choice of conditional autoregressive (CAR) and other priors. The

sensitivity of random walk of order 1 models to priors and their effect on fit and hence the deviance

information criterion (DIC) is also discussed.

In terms of the substantive question, we find that response cropping is generally more effective than

long fallow cropping in reducing soil moisture at the depths considered (100 cm to 220 cm). We also find

that there may be a problem with random walks of order one, in that they are extremely sensitive to the

priors, and it is unclear how to choose priors to give a meaningful fit.

7.2 Introduction

Where observations are collected from a series of sites, at a series of time points, observations taken

close to each other in either time or space may be autocorrelated. Highly positively correlated obser-

vations reduce the number of effective observations, and testing which fails to take this autocorrelation

into account will often report erroneous significant relationships. Hence methods have been developed

for both spatial and temporal models to account for autocorrelation. In many applications, the spatial

autocorrelations are the focus of interest, but here, we wish only to account for them.

Spatio-temporal data are often analysed using models where spatial and temporal autocorrelation

effects are separable, and with an assumption of no structure in the time by space error interaction term

(Section 7.6). This is particularly common for spatio-temporal epidemiological analyses.



We consider a dataset where spatial autocorrelation effects are not constant over time, space, nor over

the fourth dimension, depth. The data are agricultural data from a lattice plot arrangement with differing

experimental treatments by plot. The spatial dimension, depth, reflects the experimental treatment for the

plot. The dataset is described in more detail in Section 7.3.

In accounting for spatial correlations we use the convolution conditional autoregressive (CAR) prior

model of Besag et al. [1991], but with the proper CAR prior of Gelfand and Vounatsou [2003], rather

than geostatistical modelling. There are two reasons for doing this. Firstly kriging models are slow to

converge in a Bayesian setting when datasets are large [Higdon, 1998], whereas neighbourhood models

which define a sparse precision matrix are relatively fast, both because of their sparseness and because

they require no repeated matrix inversions. Secondly, Besag and Mondal [2005] and Lindgren et al.

[2010] show an equivalence between the two types of model. This is supported by Hrafnkelsson and

Cressie [2003] and Rue and Tjelmeland [2002], who calibrate CAR models to kriging models.

Because of the complexity of the model and the size of the dataset (90,720 observations), the primary

method of analysis was to fit the full model as a series of daily models, using a block updating Gibbs

sampler and saving the Markov chain Monte Carlo iterates after burnin to allow estimation of the contrast

of interest for each depth and day and their credible intervals. Additionally, the contrast estimates are

modelled using time series methods, to gain insight into the time-varying process.

When many models are fitted to data, a simple comparison method is needed prior to any detailed

model assessment. We used the DIC of Spiegelhalter et al. [2002].

The purpose of the paper was to account for both the spatial and temporal autocorrelations in the

four dimensions of these data. It seemed unlikely for these data that an additive autocorrelation model

in time and space would be appropriate, and ignoring the time dimension, the question of dealing with

autocorrelations within the three-dimensional space also needed to be resolved. Finally, we wished to

form a contrast from the estimates and to describe it over time. Thus, the final objectives of the data

analysis were threefold: firstly, to estimate the contrast together with 95% credible intervals across time;

secondly, to understand the time-varying nature of the contrasts; and thirdly, to find appropriate credible

intervals for the contrasts when considered as time series.

In section 7.3 we discuss the data. In section 7.4 we outline the analysis methods and models.

Section 7.5 outlines the results. Section 7.6 provides a discussion of both methods and results.

7.3. Data 251

7.3 Data

The four dimensional data consist of moisture observations taken at 108 surface treatment sites and 15 soil

depths from 20 cm to 300 cm, for 56 different dates spaced roughly equally over a two-year period. The

108 measurement sites are arranged as 6 rows with 18 columns per row. Hence, data at each time point

consist of 1620 measurements at 108 sites over 15 depths, while the entire dataset consists of 90,720

observations. The data were collected to determine a cropping system which would minimise water

leakage, and consisted essentially of three cropping systems, running in different phases, and giving rise

to 9 treatments. The first moisture measurements were made on June 26, 1995, the last on April 27, 2000.

Further details of the trial may be found in Ringrose-Voase et al. [2003].

The primary question for crop scientists was whether response cropping gives lower moisture values

both at the intermediate and greater depths, in comparison with long fallow cropping, and whether this

is sustained over different stages of the cropping cycle. This contrast is calculated as the average of

treatments 1, 2 & 3 minus the average of treatments 5 & 6. The units of measurement for the contrast are

log(neutron count ratio), a surrogate measure for moisture. See Ringrose-Voase et al. [2003]. Note that

the higher the log(neutron count ratio), the moister the soil.

The treatments are

• Treatments 1-3: Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are

grown in three years with an intervening 10-14 month fallow period. The 3 treatments are each of

3 phases of the long fallow 3 strip system.

• Treatment 4: Continuous cropping in winter with wheat and barley grown alternately.

• Treatments 5 and 6: Response cropping, where an appropriate crop (either a winter or a summer

crop) is planted when the depth of moist soil exceeded a predetermined level.

• Treatments 7-9: Perennial pastures. The three treatments are lucerne (a deep rooted perennial

forage legume with high water use potential), lucerne grown with a winter growing perennial

grass,and a mixture of winter and summer growing perennial grasses.

In modelling the contrast evolving over time, the covariates of log(rainfall+1), linear, quadratic and

cubic effects over time, together with interactions of year with sine and cosine terms with periods of a

year and a half-year were considered as possible useful covariates. (See section 7.4.)



7.4 Methods

Three methods were used to consider the contrast of interest. The data may be fitted as a single model

or they may be fitted as a two-stage model with the same model fitted for each date, allowing a date by

all model terms interaction model. An attractive possibility is to fit the structured additive models with

penalised spline smoothers of Fahrmeir et al. [2004] and Kneib [2006] to the full dataset using BayesX

software [Belitz et al., 2009a,b]. This methodology allows smoothing of estimates across unequally

spaced covariates and was particularly attractive given the unequal time intervals over which measure-

ments were made. It also permits the fitting of the CAR models of Besag et al. [1991]. We used three

methods to meet the objectives.

The first method (Method 1) fits the same model at each date, which, when fitted for each date, gives

a complete model for all the data. The second method (Method 2) takes the single contrast estimate

for each date (and depth) from Method 1 and uses time series methods to give understanding of the

time-varying nature of the contrasts, and finally, in Method 3, we used the structured additive models of

Brezger and Lang [2006], Fahrmeir et al. [2004] to fit a single model to the full dataset.

Method 1

Let ytid be the response variable measured on date t, at site i (of S horizontal plot sites), at depthid d

(d = 1, ..., 15). Let j be the treatment given at site i.

Method 1 fits the same model at each date, t, which gives the full model as

ytid = ft j(d) + ψtid + ϵtid, ϵtid ∼ N(0, σ2td), with

ft j(d) = αt jd,

ψtid |ψti′d, i , i′, ψtid ∼ N(ρt

∑i′∈∂i

ψti′dni,τ2

tdni

),

(7.1)

where ni is the number of sites adjacent to site i, and i′ ∈ ∂i denotes that site i′ is a neighbour of site

i. Neighbours are defined as neighbours only within the same depth, and are first order neighbours

in all models. Note that this is the proper CAR model of Gelfand and Vounatsou [2003], and that ρt

is common across all depths for a given date, t. The function, ft j(d), is a function of d estimated for

each treatment and date. However, as shown above, a parameter was fitted for each treatment, date and

depth. From the treatment effects, the contrast of long fallowing versus response cropping is calculated as

7.4. Methods 253

(e1+e2+e3)/3− (e5+e6)/2, where e j indicates the estimate for treatment j. Thus, in Method 1, we fit the

saturated model, with 56 daily treatment means at each depth, i.e., 56×9×15 = 7560 treatment estimates

for each date, t, and depth, d, and find 840 contrast estimates together with their credible intervals.

Use of this method satisfies the first objective of Section 7.2 of estimating the contrast together with

credible intervals over time. Method 1 provides credible intervals for the contrast across time but provides

no insight into their time-varying nature. Hence, the need for Method 2 (satisfying objectives 2 and 3 of

Section 7.2).

Method 2

Method 2 takes the contrast estimates from Method 1, which correspond to 15 series over the dimension

of time, and fits time series models. We consider the time series for the depths from 100 cm to 220 cm

(7 series in all). The set of depth × time series could be modelled as a multivariate time series. However,

although there is clear evidence of continuity in the contrasts across the depths, it is unclear just how

one might wish to model the depth-varying component of such a multivariate random walk, or of a more

general dynamic linear model. There seemed no obvious simplification. Hence, we chose to model the

time series at each depth, each as a univariate series.

Within the framework of Method 2, we considered time series models, simple regression models and

combinations of these. The models fall into four classes, those using time series methods alone such as the

random walk models, and autoregression models (described by Equations 7.3- 7.5), regression models

with time-varying covariates (Equation 7.2), a combination of an autoregessive model with regression

components using time-varying covariates, and penalised spline smooths over time (Equation 7.6).

The time series models assumed equally spaced observations over time, which was not the case. The

regression models used time-varying covariates, which included log(rainfall+1) and interactions of year

by sine and cosine terms with periods of a year and a half year. Smooths over time such as polyno-

mial smooths or penalised spline smooths typically allow insightful descriptions of the data. With these

data, where we expect seasonality and perhaps trends, the penalised spline fits can suggest explanatory

variables for a simple regression. The regression models and the regression models in combination with

an autoregressive model were an attempt to deal with the assumption of equal time-intervals and its

inadequacies.

Let Yt represent the contrast estimate at time, t. Within the framework of Method 2, the following



models were fitted:

A regression model which assumes errors are not autocorrelated.

Yt = Y ∼ N(Xβ,V), 1/V ∼ Gamma(10−6, 10−6), (7.2)

where X is a design matrix of time-varying covariates, such as log(rainfall+1) and interactions of year

(as a factor) by sine and cosine terms with periods of a year and a half year.

The local level state-space model (random walk) of e.g., Commandeur and Koopman [2007], Harvey

[1989], West and Harrison [1997] was fitted.

Yt = µt + νt, νt ∼ N(0,V),

µt = µt−1 + ωt, ωt ∼ N(0,W),

1/V ∼ Gamma(10−6, 10−6), 1/W ∼ Gamma(10−6, 10−6).

(7.3)

In a further version of this model, t-distributions with 10& 4 degrees of freedom were substituted for the

normal distributions for the observation and state errors.

An alternative formulation of the random walk model which uses CAR neighbourhood models was

also used. This formulation permits the neighbours to be weighted, and thus allows a correction for the

unequal time intervals. Hence, weighted random walk models of order 1 (RW1) and order 2 (RW2) were

fitted using Lunn et al. [2000].

Yt = µt + ωt + ψt, ωt ∼ N(0,V),

ψt |ψt′ , t , t′, ψt ∼ N(∑

t′∈∂twt′ψt′

w+ , Ww+

), where

wt′ = 1/|t − t′|, and w+ =∑

t′∈∂t wt′ ,

(7.4)

with V , W defined as in Equation 7.3. The weight used is the reciprocal of the distance between neigh-

bours over the time scale.

Autoregressive models Box and Jenkins [1976] and Gamerman and Lopes [2006] were also fitted,

7.4. Methods 255

withYt ∼ N(µt,V), 1/V ∼ Gamma(10−6, 10−6),

µt = α0 + α1Yt−1, or,

µt = α0 + α1Yt−1 + Xβ, or,

µt = α0 + α1Yt−1 + α2Yt−2, or,

µt = α0 + α1Yt−1 + α12Yt−12.

(7.5)

Note that the simple regression models using time-varying covariates (Equation 7.2), and the autoregres-

sive models with time-varying covariates (Equation 7.5) were fitted in an attempt to remedy the problem

of a non-equally spaced time series. The assumption of equally spaced contrasts across time in the mod-

els of Equations 7.5 and 7.3 motivated the weighted random walk models of Equation 7.4. However, an

alternative was to fit missing data models. This was done for the random walk of order 1 model (Equa-

tion 7.3) only. For these data, the highest common factor of the time intervals was 1, which gives a time

series of largely missing data (56 observed of 1768 observations or 3.2% non-missing observations). The

missing data model was fitted for one depth only (140 cm), since the credible intervals for random walk

models were arbitrarily dependent on the priors for the precisions. (See Section 7.4 for the priors used

for the precisions and Section 7.6 for further discussion.)

An additional way of dealing with these time series contrasts, and one which did not require the equal

time interval assumption, was to fit generalised additive models using penalised spline smooths over time

[Brezger and Lang, 2006, Fahrmeir et al., 2004]. Let the contrast, Yt, at date, t, and depth, d, be defined

as

Yt = f (t) + ϵt, ϵt ∼ N(0, σ2), (7.6)

for each depth, d, with f (t) being fitted as a penalised spline over time with a random walk penalty of

order 2. These models, like the regression model of Equation 7.2, do not account for autocorrelation over

the time dimension, but they use the unequally spaced dates of the contrasts.

The time series models of Method 2 capture only the time-varying variance of the contrast and fail

to reflect the spatial error. Hence, we experimented with precisions for the random walk model (see

Section 7.4), which might reflect the full error which includes the within date variability of the contrast,

in an attempt to satisfy objective 3 of Section 7.2.



Method 3

Finally, dealing with the full dataset, we fitted two additive structured models using the full 90,720

observations. The first model is defined as

ytid =∑

j f j(d) +∑

j f j(t) + ψid + ϵtid, ϵtid ∼ N(0, σ2),

ψid |ψi′d, i , i′, ψid ∼ N(∑

i′∈∂iψi′dni,τ2

dni

),

(7.7)

with ψid being drawn from a Gaussian Markov random field (GMRF) in a layered scheme as above, but

with the spatial components at a site (i) being common across dates, and the associated variances being

given by τ2d, i.e. specific to each depth and constant across dates. The term f j(d) denotes smoothed

treatment curves over depth, d, while f j(t) denotes smoothed treatment curves over time, t. Thus, in

the ‘fixed’ (non-parametric) part of the model, we have assumed simple additivity over time and depth,

while in the modelling the variances, the model assumes constant spatial residuals at each depth (ψid)

from sampling date to sampling date. As can be seen from the definition, smoothed treatment profiles

across time were common to all depths, and smoothed treatment profiles across depths were common to

each date.

The second model is analogous to that of Method 1 and is defined by

ytid =∑

t∑

j f j(t)(d) + ψtid + ϵtid, ϵtid ∼ N(0, σ2t ),

ψtid |ψti′d, i , i′, ψtid ∼ N(∑

i′∈∂iψti′d

ni,τ2

tdni

),

(7.8)

Thus, this model fits a penalised curve for each treatment over depth, for each timepoint, and models

site correlations using CAR models with different variances at each depth and day, together with a final

unstructured residual whose variance differs by day. Thus, the model from Equation 7.8 is again a consid-

erable simplification of the model of Method 1. The CAR residual structure is the same, but is coupled

with an unstructured variance common to each day, whereas the unstructured variance components of

Method 1 (Equation 7.1) differ by date and depth. Probably more importantly, it fits a series of penalised

smooths across the depth dimension (by date and by treatment), while there is no smoothing along depth

in the saturated model of Method 1.

7.4. Methods 257

Priors

Priors for the Method 1 precisions for both structured and unstructured residual component precisions

were Gamma(5,.005), with priors for the fixed coefficients being normal with mean zero and variance 10.

For the Method 2 models, which fitted the contrast across the time dimension, the priors for the

coefficients were generally specified as a diffuse normal prior N(0, 106). Priors for the precision terms

of these models were initially set as Gamma(10−6, 10−6). However, almost all the models of Method 2

were rerun with priors for the precisions of Gamma(10−4, 10−4), and final model choice was made using

models with this prior.

However, given that we wanted a meaningful temporal description of the contrast together with

appropriate credible intervals, we experimented with various ways of apportioning crude estimates of the

total error observed in the model from Method 1. Table 7.1 gives the settings for the 5 different priors

used for the various models of Method 2, and Priors 3-5 of this table show three schemes for apportioning

the error.

Priors for the Method 3 models were set as the default BayesX software priors, with all precisions

having a Gamma(.001,.001) prior.

Model Comparisons

We adopted the Deviance Information Criterion (DIC) of Spiegelhalter et al. [2002] as the method for

model comparison. Thus, we planned to compare the the full models of Method 1 and Method 3 via the

DIC, in addition to choosing a model from the many models of Method 2. Within the Method 2 models,

only the models fitted using WinBUGS were compared.

With the problems observed when comparing random walk models of order one (Section 7.5), we

looked at the root mean square of predictive error to try and resolve the problem. This is defined as

√(yt+1 − E(yt+1|y1, y2, ...yt, θ))2,

and is used in Table 7.7.



Computational details

The model of Method 1 which produced the contrast estimates used in Method 2 was fitted using cus-

tom built software, pyMCMC [Strickland, 2010], which used block updating. Its daily models had a

6,000 iterate burnin and 16,000 iterates in all. (Fewer burnin iterates were needed because of the block

updating.)

The models of Method 2 were fitted using BayesX [Belitz et al., 2009a,b] or WinBUGS [Lunn et al.,

2000]. The BayesX software was used because it offered penalised smooths over time, and because actual

dates could be used in the fit. It was thought that such models would offer insight into the seasonality

and/or trends in the data. WinBUGS was used because of its transparency and its robustness as a well

established software.

The univariate time series models of Method 2 were run with 2 chains, 120,000 iterates with a

100,000 burnin when using WinBUGS and Gelman-Rubin statistics were checked. (This burnin was

unnecessarily large.) Models fitted using BayesX, Equation 7.6, were run with a 10,000 iterate burnin

and 60,000 iterates in all (which was probably unnecessary, given that this software uses block updating

[Brezger and Lang, 2006]).

Geweke diagnostics [Geweke, 1992] for convergence and Raftery-Lewis estimates for accuracy

[Raftery and Lewis, 1992] were checked and found to be satisfactory for all models, except where noted

otherwise.

7.5 Results

Estimates for the contrasts at all depths and their 95% credible intervals from Method 1 are given in the

supplementary materials of Appendix C, as are graphs of the fits for all 7 depths time series. The Method

1 estimates and credible intervals are those which best satisfy the first objective of the analysis.

Figure 7.1 shows the point estimates from Method 1 for the contrasts at depths 100 cm to 220 cm.

A careful reading of this graph shows a continuity of the contrast estimates across time and depth. The

same data are graphed again as a contour graph of moisture over day and depth (Figure 7.2) in order to

emphasize the continuity of the contrast estimates across time and depth.

Various fits for the contrasts at depth 100 cm are shown in Figures 7.3- 7.6. Figure 7.3 shows the fit

for Method 1, and when compared to the three fits of Figures 7.4- 7.6, can be seen to have much wider

7.5. Results 259

credible intervals. This fit is thought to give the most appropriate credible intervals, for the reasons given

below. Figure 7.4 shows the penalised smooth from Method 2 Equation 7.6, and illustrates the seasonality

observed in these models at the shallower depths. Figure 7.5 shows the 28 term regression fit from

Equation 7.2 of Method 2, and echoes the penalised spline fit of Figure 7.4, but with the discontinuities

expected in a model with interactions of year by periodics. Figure 7.6 shows the random walk of order 1

model of Method 2, and with its more abrupt jumps the seasonality displayed in the two earlier graphs is

less obvious.

Figures 7.7 and 7.8 show the square roots of the spatial and the unstructured variances for each

date at 100 cm and 220 cm, estimated using the model of Method 1. Not unexpectedly, the variances

at the shallower depths show greater variability across the sampling dates (Figure 7.7). The comparable

graphs across all depths show a decreasing variability with increasing depth of these parameters across

the sampling dates (Figure 7.8). The variability in these parameters justifies the choice to fit the same

model across all sampling dates (thereby allowing all parameters of the original model to vary by date),

since a description of their evolution across time was not obvious a-priori.

Comparisons (Table 7.2) of the time series models of Method 2 are given for the contrast at depth 140

cm and were used to consider various models and ways of dealing with the unequal time spacing. Some

of the models compared are shown in Figures 7.9- 7.13. These figures show the poorer fits of the models

with the poorer DICs. This table indicates that the AR(1) and AR(2) models are essentially equivalent,

that the AR (1)(12) model is a poor model (with its negative estimate for the number of parameters), and

that the better models are the random walk models. The table indicates that the RW(1) or the RW(2)

distance weighted models are the best of those models compared. Table 7.2 shows the DIC and pD

varying for differing priors for the random walk models, but not for the other models fitted under Method

2. It indicates that the RW1 models give overfitted models, with the estimated number of parameters

exceeding the number of points fitted when the more diffuse priors of Prior 1 are used. Thus, the decision

was made to use models fitted under Prior 2 as the basis of model choice. Ideally, neither the DIC nor

the estimated number of parameters (pD) should depend on the specification of the priors. This issue is

discussed further in Section 7.6.2.

Table 7.3 compares the models fitted using Equations 7.2 and 7.5. This table shows that additional

periodic covariates improve the fit of the AR1 model at depths of 100 & 120 cm, but for all other depths

the simple AR1 model accounts adequately for the data without the need for rainfall or periodics. This is



not surprising, since the time series models take out the random shocks that might be explained by such

terms. Note that the model AR1+5, a covariate model in combination with an AR1 model, posits (some-

what unrealistically) the same amplitudes across the years for the cyclical behaviour, but (realistically)

posits a common time of year for the yearly maxima and minima. This compares with the year by period

interaction model which permits different amplitudes for the periodics for each year and different times of

year for the maxima and minima. This table also gives the DICs for the regression model with interaction

terms of year by periodics (24 terms), together with a cubic over time, giving 27 time covariates. Not

surprisingly, with 28 parameters to fit 56 observations, these models from Table 7.3 show the regression

model doing better than the AR1 models for all depths (except 200 & 220 cm), and they also compare

reasonably well with those of Table 7.4, although they are not as good as the best of the random walk

models.

Table 7.4 shows the DICs & pDs for the random walk models of order 1 & 2, both weighted and

unweighted. The weighted RW1 model appears to be the best model for the depths from 100-160 cm,

while the unweighted RW2 model appears best for 200-220 cm. The downweighting of points further

away in the weighted models generally leads to a greater estimated number of parameters and a better fit

at the shallower depths. Similarly, the weighted RW2 models generally improve the DIC and increase

the estimated number of parameters by downweighting points further away. That is, the weighted models

decrease the smooths of the unweighted models. Ideally, we would have preferred a common time series

description for the contrast at all depths. However, the final time series model chosen for the shallower

depths is the RW1 model (from 100-160 cm), and for the remaining depths the RW2 model.

Random walk models allow the calculation of the ratio (W/V) between the two types of variance in

the model (Equations 7.3 and 7.4) which is the signal to noise ratio [West and Harrison, 1997]. The signal

to noise ratio for the different depths is tabulated in Table 7.5 and shows a clear gradient, with ratios at

the shallow depths having higher signal to noise ratios than those at the deeper depths, with perhaps three

different depth strata (100 cm, 120-180 cm, and 200-220 cm).

The penalised spline fits from Equation 7.6, Figure 7.14, show seasonal peaks and troughs which

vary in amplitude across the years, thereby suggesting interactions of year by sine and cosine terms with

periods of a year and half year. These curves also show the periodic behaviour dampening with increasing

depth. We found significant terms in the rainfall and interactions of year by half-yearly periodic simple

regression model, but such models showed the expected problems of sine curves being too smooth at

7.5. Results 261

their peaks and troughs with disjuncts at the year breaks, thus giving rise to serially correlated errors.

We included these models because they gave a basis for comparison with the WinBUGS models where

equality of time intervals was assumed, and allowed a possibility of a correction when used in combina-

tion with the AR models. Figure 7.4 gives the smoothed model of Equation 7.6 and credible intervals for

the contrast at 100 cm. Figure 7.6 shows the RW1 fit which was the final model choice.

The missing data model, motivated by objective 3 (Section 7.2), was based on Equation 7.3 since

this was found to be close to the best model of those fitted. This was fitted because the approximation

of equally spaced time intervals in the WinBUGS models seemed a gross oversimplification. Figure 7.13

shows one outcome of the attempt to construct credible intervals which would reflect the spatial vari-

ability by adjusting the priors for the two variances of an RW1 model. This fit uses Prior 5. Given that

Table 7.6 shows the essentially arbitrary nature of such an undertaking (see below), there was little point

in fitting such models to the contrasts at other depths.

A further outcome of attempting to partition the variances of an RW1 model was the set of compar-

isons of Table 7.6 which shows DIC results for three priors, together with R2 (calculated using the fit

without the spatial and the unstructured error), and pD (the estimated number of parameters). Table 7.6

shows that the choice of priors dictates the goodness of fit: thus, fits using prior 3 have an R2 ranging from

12% to 33%, while prior 4 gives fits with an R2 of about 80%, and prior 5 an R2 of 99%. In constructing

the priors 3-5 for use in Equation 7.3, mean τ is a an estimate for the total precision estimated from

Method 1. Placing a fixed prior on the precision for the random walk error (Prior 3) resulted in poor fits.

Attempting to partition the precision estimate between the observational and random walk errors (Prior

4) gave posterior estimates of the ratio, r, which were essentially identical to its prior, and a slightly better

fit. Allocating the diffuse prior to the observational error (Prior 5) gave entirely unsatisfactory overfits to

the data with estimates of the number of parameters exceeding the number of observations. Table 7.2 and

Table 7.6 both showed that the model comparison criterion was dependent on the priors for the random

walk models.

We then considered the root mean squared predictive error (Section 7.4) and compared the RW1

models under Prior 1 & Prior 2. (See Table 7.7.) Under this criterion, Prior 2 would seem to give the

better model at depths 100 cm - 180 cm, and Prior 1 the better model at depths 200 cm & 220 cm. This

does not resolve the problem, in that the overfitted models are chosen at depths 200 cm & 220 cm, but in

any case, we are still left with the problem of arbitrary fit and the choice of suitable priors. Hence, we



were unable to satisfy objective 3, since any choice was arbitrarily dependent on the priors chosen for the

precisions.

Tests for convergence for the first model of Method 3 (Equation 7.7) showed failure to converge

using Geweke’s test [Geweke, 1992]. This model is a major simplification of the model, Equation 7.1,

from Method 1, and it is not surprising that it failed converge. See Section 7.4. The more complex

model from Method 3 (Equation 7.8) also failed to converge. We decided not to pursue the modelling

strategy of Method 3. Reformulation to remove the simple additivity may have helped the convergence

problems, but more importantly, it was felt that smoothing the treatment effects prior to calculating the

contrast could lead to biased contrast estimates, i.e., that it was not the treatment curves which should be

smoothed but the contrast curve. For further discussion of this issue, see Section 7.6.

All models from Methods 1 & 2 showed successful convergence using Geweke statistics [Geweke,

1992] and Raftery-Lewis [Raftery and Lewis, 1992] for the quantities of interest. For the random walk

models, which were felt to best model the contrast estimates, we assessed the residuals (residuals from

lack of fit, observational error and system error) for serial correlation. No serial correlation in the resid-

uals was found.

Figures C.1- C.21 in the supplementary materials of Section C show the fits for all time series from

Method 1, together with the penalised smooths of Method 2 Equation 7.6 which show periodicity at the

shallower depths which decreases with increasing depth. The RW final model fits are also shown.

Overall, we concluded that long fallow cropping generally led to moister soils over the experiment,

with both point estimates and 95% credible intervals generally being positive. We also found a problem

with the random walk models of order 1, with estimates and measures of fit being extremely sensitive to

the choice of priors for the variances of the observational and random walk error components.

7.6 Discussion

7.6.1 Modelling spatio-temporal data

Many spatio-temporal papers remain largely descriptive [Bell et al., 2007, Teschke et al., 2001] using

maps at several timepoints, with the maps essentially being descriptive devices. The more complex

models are generally Bayesian and use either geostatistical methods or the convolution CAR prior of

Besag et al. [1991]. Models with CAR priors typically partition the error term in the model, ϵit as ei, et

7.6. Discussion 263

and eit, where the first two error terms capture the structured spatial random effects and the structured

temporal random effects, and eit is a simple unstructured random effect with eit ∼ N(0, σ2). See, e.g.,

Adebayo and Fahrmeir [2005], Crook et al. [2003], Knorr-Held and Besag [1998], Poncet et al. [2010],

Waller et al. [1997], where the last three papers use BayesX software [Belitz et al., 2009a,b] to conduct

the analysis.

Assuncao et al. [2001] fit quadratics over time which differ for each spatial location, and for which the

coefficients are smoothed using CAR priors, but both space and time are separable. This elegant solution

to modelling very short time sequence data allows the possibility of seeing increasing and decreasing

rates, while accounting for spatial closeness. Assuncao [2003], Assuncao et al. [2002] again use space-

varying regression coefficients on their quadratic models in time. Yan and Clayton [2006] use the space-

time interaction to define a set of space-time separable clusters carrying a specific risk, and fit a final

unstructured random effect.

Abellan et al. [2008] decompose the error term into a structured temporal effect, a structured spatial

random effect, and a time by space interaction random effect which is a mixture of two Gaussians and

thus equivalent to an outlier or contaminant model. This allows the identification of sites (areas) and

times which fail to fit the common temporal and common spatial patterns.

Some environmental papers which are not simply descriptive use Higdon [1998]’s method of con-

volution with Gaussian kernels, which allows for non-stationary spatial smoothing, and give snapshots

over time [Lemos and Sanso, 2009, Sahu and Challenor, 2008] or, having failed to find much influence in

terms of spatial proximity, model time sequences for each site using time series methods [Lemos et al.,

2007].

Looking at spatio-temporal analyses within an agricultural context, the analysis of Trought and

Bramley [2011] considers the quality of grape juice by site across time. Their strategy is to fit differ-

ent curves across time for each site, and then to look at spatial outcomes of their model by mapping. In

considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu [2007], Piepho

et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009] use mixed models within a

REML framework to analyse their spatio-temporal data, and fit state-space models via standard software

and REML. The fixed part of their models is generally straightforward and the data are measured on two

spatial dimensions. Moving to the spatial dimension of depth, the soil profile study of Macdonald et al.

[2009] does not use spatial information in the analysis. Other studies composite the soils from different



depths across soil types or treatment [Sleutel et al., 2009], while others [Nayyar et al., 2009] use the

mixed modelling framework advocated by Piepho et al. [2004]. Within a spatial context, Haskard et al.

[2007] fit an anisotropic geostatistical model.

A major difference between these agricultural data and the epidemiological data which is so often

modelled using an additive common spatially structured error, an additive common structured temporal

random term (and an unstructured error with a variance common over both space and time), is that the

spatial units of epidemiological data tend to vary slowly over time scales of a few years. Additionally,

administrative time shocks may often be thought to be constant across a map, and hence this simple

modelling structure works well. In contrast, the agricultural data modelled here, vary markedly from

sampling date to sampling date, and it is clear that the simple separable variance decomposition used by

so many epidemiological models does not describe the data well.

In moving to four dimensions, there are yet more possibilities for the decomposition of the fixed

and residual parts of the model. However, in the context of differing treatments for differing plots in the

horizontal dimensions, with the same treatment along the depth profile at each plot, and in the context of

different scales between the depth measurements and the distances between plots, it was a simple decision

to treat depth differently from the horizontal dimensions. This same choice to treat the third spatial

dimension differently is made by others with three dimensional spatial data. Ridgway et al. [2002],

modelling ocean temperatures and other ocean parameters, separate out the depth component in their

loess data fit.

We excluded depth from the neighbourhood error structures. If depth neighbours were to be included

as neighbours with equal weights, the horizontal layer information would be downweighted. If we weight

using functions of distance, the horizontal correlations become effectively irrelevant. A useful property

of defining neighbours as neighbours only within the same depth layer is that the CAR model is then

permitted to have differing variances across the depths. For our model (Method 1, Equation 7.1), both the

homogeneous and spatial variance components differ by depth, and while no formal tests were conducted

this flexibility in the model appeared useful.

In the fixed part of the model, the choice to fit all treatment by date by depth means, rather than

to find a parsimonious model over the depths, was dictated by the view that smoothing of the different

treatments and then calculating a contrast was not an appropriate way to calculate the contrast estimates.

A false simplification of any of the treatment curves, which are measured at just 15 depths, could lead

7.6. Discussion 265

to greater or lesser estimates of the contrast. This differs from the informal/formal comparisons of racial

differences after semiparametric fitting of the longitudinal bone density by race [Fong et al., 2010, Wand,

2009] where there are many observations at points in the dimension (age) in which the curves are to be

simplified. The assumption of continuity, on which any smooth is made, is better justified for their bone

density analyses with the data’s many points of support on the dimension to be smoothed.

The Method 1 model is a date interaction model with the daily model. Each daily model is indepen-

dent of each other which allows us to sum the DICs and the pDs over all 56 daily models and thus allows

the possibility of comparing DICs with the models of Method 3, Equations 7.7 and 7.8, where all 90,720

observations are modelled at once. We had planned to use this to compare the Method 3 models fitted to

the full dataset, with the fit achieved by the daily models from Method 1. However, the Method 3 models

failed to converge, and this was not done.

Method 1 gives appropriate 95% credible intervals for the contrast estimates, but no insight into the

way in which these contrasts vary over time. The modelling strategy adopted in Method 2 attempts to

remedy that by fitting time-varying covariates and by using time series methods. Two-stage models do

not account for the treatment effect variation observed in the model 1 fits. However they do allow us to

see what level of complexity may be required to account for the time-varying nature of the contrasts.

Figure 7.13 shows the fit for one of the missing data models. However, there is little point in

fitting a missing data model when the posterior variance may be arbitrarily chosen by the choice of a

prior for a precision, as it is with 3% of the observations not being missing. (For further discussion, see

Section 7.6.2.)

7.6.2 Model Comparisons: Problems

Where competing models are suggested, the preference for model comparison is to use some summary

statistic of the analysis fit such as the AIC [Akaike, 1973] (used for geostatistical model comparisons by

Hoeting et al. [2006]), the BIC [Schwarz, 1978] or the DIC [Spiegelhalter et al., 2002]. When WinBUGS

is used for model fitting, an obvious choice for a model comparison criterion is the DIC, which has the

added advantage of estimating the number of parameters used by the model. Table 7.6 shows DICs for

random walk models of order 1 where the only modelling difference is in the priors used for the two

precisions. The differing priors make differences to the fit (R2) and to the DIC.

In arguing the case for the DIC, a CAR model is explicitly discussed in Spiegelhalter et al. [2002].



Their model has a CAR normal spatial prior, but the unstructured error component is Poisson, and there-

fore dictated by the estimates for the Poisson rate, which are themselves modified by the spatial CAR

prior. For the agricultural data here, the CAR (and RW) models (both of which have two variance com-

ponents) gave DICs which were highly sensitive to the choice of priors. Table 7.2 shows that the models

with the more diffuse priors at the observation level are apparently the better models. Graphs of the fitted

values and their credible intervals (not shown) show they also give closer fits. The DICs of the autore-

gressive models are unaffected by the prior choice for the error, but the random walk models of order

one often have estimates for the number of parameters which are greater than the number of observations

used. Additionally, estimates for the number of parameters change with choice of prior, as does the DIC.

This is not a problem of the criterion choice. Calculation of the BIC, which is also based on the final

fit, gives essentially the same preferred models. What is happening with these very diffuse priors, is that

the random walk model of order one becomes effectively a saturated model in which each observation

becomes the fitted value. A model which joins the dots, however, gives no insight into the data. Our

problem is to find prior distributions reflecting ignorance, the ‘statistical holy grail’ talked of by Fienberg

[2006]. The convolution prior of Besag et al. [1991] which works so well for spatial epidemiological

data, works less well when all the model components are normal and there is a single observation to be

partitioned into structured and unstructured error, and a maximum of two neighbours, as is the case for

the RW(1) models here.

Our final view was that the purpose of the modelling across time was to develop insight into the time-

varying nature of the contrast estimates. Credible intervals for the time series models are not realistic,

since they do not include the spatial variance. From the DIC comparisons of the modelling in WinBUGS,

we believe that a random walk model of order 1 with inverse distance based weights for neighbours is

the best of the models considered. There is evidence of periodicity at the shallower levels (See Table 7.3)

and this is also shown by the penalised spline smooths of Equation 7.6 and illustrated in Figure 7.14, with

their double periodics per year at the shallower levels. At the depths of 200 cm and 220 cm these peaks

and troughs have largely disappeared, with the curves showing a possibly increasing trend with time.

7.7 Conclusions

Our purpose was to account for spatial and temporal autocorrelations in the context of four-dimensional

data. The model of Method 1 forms the basis for the analyses within this paper. It fits a fixed parameter


for moisture at every combination of depth, date and treatment. Its error structure is complex, with an

unstructured error at every depth, date and site, and having variances differing by depth and date. The

spatial structured error is fitted across each horizontal layer and ignores depth neighbours. The variance

of these structured spatial errors also differs by depth and date. Comparisons with three dimensional

CAR neighbourhood models (not shown in here) show that that this separation of the two-dimensional

plot arrangement from the depth dimension gave better descriptions of the data.

The simple expedient of fitting the data as a series of daily models allowed the maximum possible

complexity in terms of the experiment and was a useful approach to modelling the full dataset. By

fitting what is an interaction model by date at all levels of the daily model, we were able to explore the

variability of the data effectively, and believe that some of the curiosities of the variability at some depths

need further elucidation. At the shallower levels, they appear to be following cyclical and long term

trends. At the greater depths, seasonal variation is less visible. See Figure 7.8.

The method of defining neighbours within a horizontal layer has potentially wide applicability in

three and four dimensional agricultural datasets, where the plot and treatment are defined by the two-

dimensional surface coordinates. It may be also be applicable in measurements made over the ocean

where variables may also be measured at depth, again a situation where the differences in latitude and

longitude between measurements far outweigh the differences in the depth dimension.

The analysis shows that response cropping delivers lower moisture levels for most times of the year,

in contrast to long fallow cropping. At the shallower depths, not surprisingly, this contrast exhibits

considerable cyclicity which attenuates with depth. Given the final choice of model to determine whether

response cropping delivers less moist soils, it appears that the temporal component adds little or no

additional uncertainty to the estimates.

This paper also illustrates that choosing priors for random walk models of order one can cause

some problems in data modelling, with some choices making the prior on the observational error highly

informative.



7.8 Tables

Table 7.1 Various priors used for the precisions of the timeseries models of Method 2

Precision for observational error Precision for random walk error*Prior 1 ∼ Gamma(.000001,.000001) ∼ Gamma(.000001,.000001)Prior 2 ∼ Gamma(.0001,.0001) ∼ Gamma(.0001,.0001)

Prior 3 mean τ ∼ Gamma(.000001,.000001)Prior 4 total ∗ r total ∗ (1 − r)Prior 5 ∼ Gamma(.000001,.000001) mean τtotal ∼ Gamma(a, b), r ∼ Beta(1, 1)a,b calculated via method of moments from mean & 95%CI for posterior in Method 1*Priors 1 & 2 were also used for other timeseries models.

Depth (cm) Mean τ a b

100 1395 6.934 .004971120 1759 6.024 .003425140 2241 12.413 .005538160 3019 52.316 .017327180 3226 87.249 .027045200 3201 180.410 .056354220 2175 82.412 .037894

7.8. Tables 269

Table 7.2 Summary of DICs for Contrast 1 (Long fallowing vs Response cropping) at Depth 140

Prior 1 Prior 2Model pD DIC pD DIC

Regression 30 -377

AR(1) 4 -343 4 -343AR(1)(12) -2 -356 -2 -355AR(2) 4 -343 5 -342

RW(1) 69 -435 36 -379RW(1) (weighted) 73 -468 * 40 -392 *RW(1) (t10 distribution) 73 -450 39 -378RW(1) (t4 distribution) 74 -451 41 -375RW(2) 20 -370 23 -373RW(2) (weighted) 26 -390 43 -395 *

RW(1) (1768 time points) 49 -304 (Prior 5)

* Best model

Prior 1: both precision priors Gamma(0.000001,0.000001)Prior 2: both precision priors Gamma(0.0001,0.0001)

Table 7.3 DICs for Long fallowing vs Response cropping: 1st order autoregressive models vssimple regression model

AR1 With rainfall* AR1 AR1+5 Regression(28)Depth pD DIC pD DIC pD DIC pD DIC

100 5 -279 4 -278 9 -289 30 -315120 5 -301 4 -303 9 -306 30 -344140 5 -341 4 -343 9 -342 30 -377160 5 -386 4 -386 9 -386 30 -425180 5 -414 4 -414 9 -410 30 -433200 5 -449 4 -450 9 -444 30 -449220 5 -457 4 -458 9 -450 30 -455

* Covariate: log(rainfall+1)(AR1 + 5) Covariates: log(rainfall+1), sin(x), cos(x), sin(2x), cos(2x), x=date/2π(Regression(28)) Covariates: x,x*x,x*x*x), year*(sin(x), cos(x), sin(2x), cos(2x))



Table 7.4 DICs for Long fallowing vs Response cropping: random walk model comparisons,using Prior 2.

RW1 RW1 (W) RW2 RW2 (W)Depth pD DIC pD DIC pD DIC pD DIC

100 47 -342 47 -346 * 21 -332 38 -340120 39 -348 42 -360 * 26 -323 40 -360 *140 36 -379 40 -392 * 23 -373 43 -395 *160 34 -413 38 -424 * 25 -417 43 -419180 32 -433 36 -434 25 -439 * 43 -419220 28 -457 * 34 -448 24 -458 * 42 -424220 28 -461 * 35 -452 24 -463 * 43 -427

(W): inverse time interval weights.* Indicates the better models.

Table 7.5 Square root of the Signal to Noise ratio for the RW models

Depth (cm) SN ratio 95% CI

100 6.1 (2.1, 27.0)120 3.3 (1.2, 19.2)140 3.2 (1.2, 14.5)160 3.3 (1.4, 11.0)180 2.3 (1.1, 7.8)200 1.0 (.5, 2.8)220 1.0 (.5, 3.1)

Table 7.6 R2, pD and DIC for the RW(1) weighted models using priors 3-5

Prior 3 Prior 4 Prior 5Depth (cm) R2 pD DIC R2 pD DIC R2 pD DIC

100 33% 13 -255 80% 36 -258 99% 100 -411120 23% 9 -277 79% 35 -279 99% 97 -421140 12% 6 -299 80% 35 -271 99% 94 -433160 16% 5 -323 83% 35 -251 100% 90 -446180 19% 5 -332 85% 34 -246 99% 89 -448200 27% 4 -337 86% 34 -239 99% 89 -448220 18% 3 -318 82% 34 -225 99% 94 -434

7.8. Tables 271

Table 7.7 Root mean square predicted error for RW1 models under Priors 1 & 2

Depth Prior Median 25%ile 75%ile

100 Prior 1 0.020 0.019 0.020100 Prior 2 0.019 0.019 0.020120 Prior 1 0.016 0.015 0.016120 Prior 2 0.015 0.014 0.016140 Prior 1 0.011 0.011 0.011140 Prior 2 0.010 0.010 0.011160 Prior 1 0.0075 0.0073 0.0077160 Prior 2 0.0073 0.0070 0.0077

180 Prior 1 0.0057 0.0054 0.0059180 Prior 2 0.0056 0.0054 0.0059

200 Prior 1 0.0037 0.0035 0.0039200 Prior 2 0.0041 0.0039 0.0043

220 Prior 1 0.0035 0.0033 0.0037220 Prior 2 0.0039 0.0037 0.0041



Figure 7.1 Long fallowing vs Response cropping at at all depths. Saturated model. Point esti-mates from the MCMC iterates of the full model (Method 1).

7.9 Figures

7.9.1 Contrasts

OJl

OJO

OJII

om 6.07

0.00

(U)5

6.04

0.00

Wl2

o.m 0.00

-o.m -WI2

100 l2Xl 140 100 100 2m 200

-OJII ~~~~--~~-.~~~--~~~.-~~~~~.-~ OlJANllJ96 01JANl1lOO OIJAN1997 01JANl998 01JANlll99 OI.JANmJ OIJAm001

7.9. Figures 273

10 20 30 40 50

−220

−200

−180

−160

−140

−120

−100

Depth

−0.01

0

0

0.01

0.01

0.01

0.01

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.03

0.03

0.03

0.03

0.03

0.03

0.03

0.03

0.03

0.04 0.04 0.04

0.04

0.05 0.05

0.05

0.06

0.06

0.06

0.07 0.

07 0.08

0.08 0.09

Figure 7.2 Long fallowing vs Response cropping. Saturated model. Contour graph from thepoint estimates from the MCMC iterates of the full model (Method 1).



Figure 7.3 Long fallowing vs Response cropping at depth 100 for all trial dates. Saturatedmodel. Point estimates & 95%CIs from MCMC iterates from the full model.

Figure 7.4 Long fallowing vs Response cropping at depth 100 for all trial dates. Penalised splinesmooth across dates. Point estimates & 95%CIs.

n ij J

~s----------------------------------------,

Q.lfi

0.14 Q.IB

OJ2 OJ1 OJO 0.4» 0.4» om !1.06 0.00 11.04. 11.03 11.03 Q.(l1

0.00 -Q.(Il

-11.03 -11.03 -11.01. -1105

-0.4»~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

OJ2

OJ1

OJO

o.cs 1111!

om 0.4» 0.00

001

11411

1102

Q.(l1

0.00

-Q.(Il

-!lOll

-o.m II1.IAN96 OUAN96

7.9. Figures 275

Figure 7.5 Long fallowing vs Response cropping at depth 100 for all trial dates. Regressionmodel (Equation 7.2) fitting 27 time-varying covariates. Point estimates & 95%CIs.

Figure 7.6 Long fallowing vs Response cropping at depth 100 for all trial dates. Random Walkof order one. Point estimates & 95%CIs.

Ql8

0.12

OJl

OlO

ll.4lll

H OJlj

om OJlj

lj 0.00

1!.04.

OJlj J 11.02

o.m 0.00

-o.m -11.02

-11.02

WJANlWii

Ql8

OJl

OlO

o.cs 1111!

n om D.06

ij 0.00

11.01.

!1.411

J 11.02

o.m 0.00

-o.m -11.02

-OJl!

II1.IAN96 OUAN96



Figure 7.7 Spatially structured and unstructured standard deviations & 95% credible intervals atdepths 100 cm. The spatial standard deviations are shown in blue, the unstructuredstandard deviations in green.

0.09

O.a!

O.IY7

0.06

~ O.IX>

;;,j O.M

-- -, -

~ -- -- s -

-~

0.00

0.00

o.m

0.00 'r----,---------,--------,-------,-------,-----------,-'

OlJANl995 OIJAN1996 OIJAN19!17 01JAN1998 Ol.IANl999 01JAN2000 01JAN2001

Date

7.9. Figures 277

Figure 7.8 Spatially structured and unstructured standard deviations & 95% credible intervalsat depth 220 cm. The spatial standard deviations are shown dotted, the unstructuredstandard deviations in green.

Figure 7.9 Long fallowing vs Response cropping at depth 140 for all trial dates (AR1 fit). Pointestimates & 95%CIs.



Figure 7.10 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit usingweights which are reciprocals of the time intervals). Point estimates & 95%CIs.

Figure 7.11 Long fallowing vs Response cropping at depth 140 for all trial dates (RW2 fit).Point estimates & 95%CIs.

rums----------------------------------------,

0.00

-run

-rum ~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

woos----------------------------------------,

'

0.00

-run

.:--~ -\

' ' / I

/ 'i ,/~- .... ....

I • /

/ / \~/ t · ,/ I \ / • ,'

' ' ' ' •v' /

'

~~ \ , ••. ··',!,',,' I \. 11 \~..-...- " r I

l ·· ~ . '~~~ / . / . f

....... ... /'

-rum ~~---,----~~----,---~-.~~--~--~~ WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll mJANlml mJANlDlL

Date

7.9. Figures 279

Figure 7.12 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit witht dist df=10). Point estimates & 95%CIs.

Figure 7.13 Long fallowing vs Response cropping at depth 140 for all trial dates. Random walkwith 97% missing data. Random walk precision fixed at 2241. See Table 7.1. Pointestimates & 95%CIs.

OJl6

!1.06 " " r 0.04.

n ll.4ll

lj lUll

Q.(l1 J 0.00

-Q.Cil

-o.oz CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll lliJANl!l!ll m.JANl!IOO !II.IANl'llm

llot.e

Q.lO

o.w D.a!

D.07

OJl6

n !1.06

o.m D.a!

ij o.oz Q.(l1

0.00

J -Q.(Il

-lUll

-o.oa -o.m -!1.06

-D.a!

-o.ar 1000 1l!IJO 1400 l6IJO l800



Figure 7.14 Non-parametric penalised spline smooths. (Fits for the contrasts at the 7 depths.)

BIBLIOGRAPHY 281

Bibliography

Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investigate the stability of

patterns of disease.(Mini-Monograph). Environmental Health Perspectives 116(8), 1111–1119.

Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive discrete-

time survival models. Statistics in Medicine 24(5), 709–728.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In

B. Petrox and F. Caski (Eds.), Second International Symposium on Information Theory, Akademia

Kiado, Budapest, Hungary.

Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environmetrics 14(5),

453–473.

Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter model

applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.

Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishmaniasis in

a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in Medicine 20(15),

2319–2335.







Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in PM2. 5

chemical composition in the United States for health effects studies. Environmental Health Perspec-

tives 115(7), 989–995.


Biometrika 92(4), 909–920.





Box, G. E. P. and G. M. Jenkins (1976). Time series analysis : forecasting and control (Rev. ed.).

Holden-Day series in time series analysis and digital processing. San Francisco: Holden-Day.





Commandeur, J. J. F. and S. J. Koopman (2007). An introduction to state space time series analysis.

Practical econometrics. Oxford New York: Oxford University Press.

Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time to event data:

a case study using months from angiography to coronary artery bypass graft (CABG). Statistics in

Medicine 22(18), 2943–2961.

Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression for space-time data:

A Bayesian perspective. Statistica Sinica 14, 731–761.

Fienberg, S. E. (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1, 1–40.

Fong, Y., H. Rue, and J. Wakefield (2010). Bayesian inference for generalized linear mixed models.

Biostatistics 11(3), 397–412.

Gamerman, D. and H. F. Lopes (2006). Markov chain Monte Carlo : stochastic simulation for Bayesian

inference (2nd ed.). London ; New York: Chapman & Hall.



Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior

moments. Bayesian Statistics 4, 169–188.

Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge

England: Cambridge University Press.

BIBLIOGRAPHY 283

Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spatial

prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics 12(2),

147–160.

Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North Atlantic

Ocean. Environmental and Ecological Statistics 5, 173–190.

Hoeting, J. A., R. A. Davis, A. A. Merton, and S. E. Thompson (2006). Model selection for geostatistical

models. Ecological Applications 16(1), 87–98.

Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application to nuclear

fall-out. Environmental and Ecological Statistics 10, 179–200.

Kneib, T. (2006). Geoadditive hazard regression for interval censored survival times. Computational

Statistics and Data Analysis 51, 777–792.

Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics in

Medicine 17, 2045–2060.

Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend fields of North

Atlantic sea surface temperature. Journal of the American Statistical Association 104(485), 5–18.

Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in a central

California estuary. Journal of Agricultural, Biological, and Environmental Statistics 12(3), 379–396.



appear.
















Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal analysis

of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest Entomology 12(3),

325–332.

Raftery, A. and S. Lewis (1992). How many iterations in the Gibbs sampler? In J. Bernardo, J. Berger,

A. Dawid, and A. Smith (Eds.), Bayesian Statistics 4. Oxford: Oxford University Press.

Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted least

squares-application to the waters around Australasia. Journal of Atmospheric and Oceanic Technol-

ogy 19(9), 1357–1375.






Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scandi-

navian Journal of Statistics 29(1), 31–49.

Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean temperature and

salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.

BIBLIOGRAPHY 285






583–639.



Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal distribution of

airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for gypsy moth eradication.

Environmental Health Perspectives 109(1), 47–54.

Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New Zealand:

characterising spatial and temporal changes in fruit composition and juice quality in the vineyard.

Australian Journal of Grape and Wine Research 17(1), 79–89.

Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal mapping of

disease rates. Journal of the American Statistical Association 92(438), 607–617.





West, M. and J. Harrison (1997). Bayesian forecasting and dynamic models (2nd ed.). Springer series in

statistics. New York: Springer.


Medicine 25(5), 867–881.

Chapter 8

Conclusions and further work

This chapter provides a brief overview of the thesis and thoughts for further work.

8.1 Conclusions

The aim of this research was firstly to contribute to Bayesian statistical methodology, by contributing to

risk assessment methodology, and to spatial and spatio-temporal methodology, and it seemed this might

be possible by modelling error structures using complex hierarchical models. A further, parallel, aim

was to contribute by applying these new methodologies in the areas of risk analyses for recycled water,

and in the assessment of differences between cropping systems over time, taking account of possible

autocorrelations over the three spatial dimensions and over time.

The statistical contributions made in this thesis are

• the development of methods for

– forming credible intervals for the point estimates of queries in Bayesian nets, having first

elicited the uncertainty for all the net’s various conditional probabilities;

– incorporating experimental uncertainty into risk assessments;

• the introduction of the layered CAR model for three (spatial) dimensional data (in combination

with complex regression models);

287

288 CHAPTER 8. CONCLUSIONS AND FURTHER WORK

• the introduction of Chris Strickland’s Gibbs’ sampler for block updating the layered CAR model

(described in Chapter 6);

• the introduction of a complex time by space interaction model, through repeated use of the layered

CAR model, thereby providing a model where space and time effects are not additive, neither in

the fixed nor the error components.

Thus, we first considered how to build credible intervals for a Bayesian net (Chapter 3), and illus-

trated a simple method, whereby, having elicited uncertainty about the various conditional probability

tables, credible intervals may be found for the complex mixture distributions which result, for any of the

marginal or conditional probabilities and relative risks under any desired scenario in a Bayesian net.In an

addendum (Chapter 3.9), we show that this can produce very different results from credible intervals put

forward by Van Allen et al. [2001, 2008].

Secondly, in Chapter 4, we showed that uncertainty in QMRAs was more appropriately addressed

by incorporating all the primary data into a DAG, and dealing with dose-response data via an errors-in-

variables model within that DAG. This chapter illustrated that with no change of assumptions, estimating

parameters which describe the models used in a risk assessment, simultaneously with the risk assessment,

may increase the size of the credible intervals for risks markedly. Posterior credible intervals found as

a result of using the complex DAG structure reflect the experimental uncertainties of the small scale

experiments on which most QMRAs are based, and we recommend that this should be the preferred

method of undertaking QMRAs. This chapter shows how to incorporate the experimental uncertainty

of the usually small scale experiments which lead to the parameter constants on which risk assessments

may be based. A further contribution of this chapter was to consider dose-response data as an errors-in-

variables problem, and again this leads to markedly greater estimates of risk at lower doses.

In considering the agricultural data, the concern was to make a contribution within both the theoreti-

cal and the applied literature. We used spatial methods currently used for two-dimensions, and extended

them to the context of three spatial dimensions to build a new flexible model, the CAR layered model.

The layering performs three functions: it permits differing spatial and non-spatial variances in each layer;

it permits treatment effects to be modelled independent of spatial correlation along the depth dimension;

and dispenses with the issue of choosing weights when the distances in the depth dimension are not of

the same order as those in the horizontal dimension.

In Chapter 5, we confined consideration to essentially three ways of accounting for spatial correlation


and outlined our explorations in describing data in a three-dimensional space. This work was fundamental

to our later analyses of Chapters 6 and 7. The methodological contribution was to extend two-dimensional

modelling to three-dimensions in a way which was suitable to the experimental data, by the introduction

of the CAR layered model.

Chapter 5 also found appropriate models to describe the treatment curves along the depth dimension

and appropriate models for the neighbourhood structure for CAR models chosen to account for spatial

autocorrelation. The final structure chosen, which allowed neighbours to be neighbours only at the same

depth, was found to capture very flexibly the differing variances at the different depths, while at the

same time being computationally simpler than a CAR model with a 1620 × 1620 precision matrix. This

chapter also demonstrated that very complex models may be fitted within the CAR modelling framework

of WinBUGS, with the final chosen model fitting essentially 15 different depth CAR models while at

the same time fitting cubic radial bases functions with a latent error model in the depth dimension as a

parametric component.

The attempt to find parsimonious models to describe the behaviour of the treatments along the depth

dimension gave insight into the attenuation of the treatment effects with depth, and also showed the

variances varying from greater to smaller and then to greater with depth, in a relatively smooth way.

These insights were critical to an understanding of possible models for the full data.

Chapter 6 built on the work of Chapter 5 and compared the contrast of interest over five time periods.

We found that the errors-in-variables model with linear splines with various knot schemes could not be

sensibly fitted in WinBUGS as a single model for the full 5 day dataset. (Though, in retrospect, this may

well have been due to the very heavy tailed priors used for the latent variable for depth. Instead of the

Half-Cauchy of Marley and Wand [2010], the Gamma prior of Wakefield et al. [2000] may be a more

successful choice.) Given that WinBUGS failed us, a different computational choice was made. Thus,

this paper describes a block updating Gibbs sampler used within purpose-built software for the Gibbs

sampling (pyMCMC of Strickland [2010]), and uses the CAR proper prior of Gelfand and Vounatsou

[2003], to build the layered CAR model. Again, the first order adjacency matrix within each depth, which

had been found to be the best performer in the comparisons of Chapter 5, was used. The methodological

contribution of this chapter is the development together with Chris Strickland of the block updating Gibbs

sampler for the CAR layered model, and its description. This block updating Gibbs sampler permits the

possibility of analysing large datasets.


The applied contribution of Chapter 6 was to show that by a depth of about 200 cm, the trends in the

moisture observations had flattened. That is, that moisture was a constant dependent on treatment from

this depth onward, with, however, greater variance with increasing depth from this depth.

In Chapter 7, we situated our work within the spatio-temporal modelling literature. We found that

treating depth differently from the horizontal spatial dimensions was echoed in the oceanic work of

Ridgway et al. [2002], while in the kernel convolution modelling of Sahu and Challenor [2008] which

considered oceanic temperatures and salinity, separate models are built for three depths. We fitted a

model with different treatment means for each day and depth, together with the (modified) CAR spatial

model of Chapter 5, at each sampling date, to give a complex spatio-temporal model which gave point

estimates for the contrast of interest together with 95% credible intervals. We also treated the point

estimates for the contrast as the starting point for time series modelling, and determined that a weighted

random walk of order 1 best described most of the depth series over time. In this chapter, we argued

that in the context of an agricultural experiment, it is not sensible to smooth treatment curves prior to

calculation of a contrast. We also demonstrated and argued that data arising from a three-dimensional (in

space) agricultural experiment over time are likely to have a complex error structure. We went some way

towards an adequate description.

It may be argued that we could/should have fitted kriging models with sparsity created in the variance

matrices by threshholding. This could have been done, but my feeling is that with a layout of 6×18 rows

by columns, spatial continuities are harder to see and model, with signal to noise ratios being more

difficult to disentangle. Equally, we could have used the Gaussian kernel smooths of Higdon [1998],

but this is may be viewed as just another set of weights with a bandwidth to be chosen. Both of these

possibilities seem worthwhile but probably only when there are considerably more than 6 measured

points along a dimension.

Using a single point estimate for the contrast for each day and depth led to a problem of model choice,

when random walk models of order one were amongst those being considered. These models would seem

to be extremely sensitive to the choice of priors for the structured and observational error. This posed a

major problem for our modelling and the problem is not resolved by moving from a model choice based

on the smallest DIC to a model choice based on minimal forecasting error. Doubtless, we could choose

a grid for the precision priors, or some other sampling process for the priors, but it appears to remain

a considerable task to minimise the error, and having completed the task, the possibility remains that a

8.2. Future Work 291

model chosen this way might still be a join-the-dots model.

In working through potential models for the paper of Chapter 6, it seemed remarkable that none of

the various models tried bettered the three knot model in terms of the DIC. Given that a single observation

has its error partitioned into two components, an unstructured and a structured error, it may again be the

case that the partitioning of the two error types is sensitive on the priors when the data are Gaussian,

and the neighbours are few. This problem does not arise for epidemiological data which typically model

a proportion or a count and thus the unstructured error is binomial or Poisson, dictated by the mean

structure, itself dependent on the spatially structured error, giving rise to a complex interplay which

determines both the unstructured error and the spatially structured error. The possibility of differing fits

with differing priors needs to be explored in the case of the CAR spatial models also.

In Chapter 7, we presented a way of modelling over the dimensions of time and space, which allowed

space-time interactions, in contrast to the many models where space and time effects are additive. By

using the CAR layered model repeatedly, we showed how a full interaction spatio-temporal model might

be fitted and the contrast of interest be found, and how it behaved over time. We applied this methodology

to 90,720 observations, and found periodic behaviour in the contrast difference at the shallower levels,

and that response cropping generally led to less moist soils than long fallowing, although this difference

did not always have a 95% credible interval which failed to include zero.

8.2 Future Work

The Half-Cauchy priors of Chapter 5 failed to generate good fits when we tried to fit the full five days data

of Chapter 6. In retrospect, this work should be reviewed using the Gamma(.5,.0005) prior of Wakefield

et al. [2000].

The neighbourhood models should be calibrated to a kriging model [Hrafnkelsson and Cressie,

2003], or via the work of Lindgren et al. [2011].

The fits (together with the DIC) need to be explored via simulation for the case where both structured

and unstructured error are Gaussian. There seems to be a problem of extreme sensitivity to precision

priors when there are two errors with only one observation at each timepoint few neighbours, when

the model is Gaussian. This problem may also arise in the CAR spatial model where again there is

one observation at each spatial point. In our model, both error components are Gaussian, unlike many

instances of CAR models for epidemiological data where the fitted values are frequently binomial or


Poisson.

The final four-dimensional modelling of Chapter 7 may be thought of as unsatisfactory in a number

of ways. Fitting the full model as a set of daily models restricts the ability to describe what is happening

over time. Thus, ideally, the two-stage model of Chapter 7 should be fitted as a single model for all the

reasons given by Gerlach et al. [2000]. As in Abellan et al. [2008], the quantity of interest together with

an appropriate prior should be embedded in a full model. To see just how to do this, the two-stage model

needs to be more fully exploited.

Chapter 7 noted a problem with the random walk models (which were felt to best describe the time-

varying behaviour of the contrast of interest). A simple solution (but somewhat ad-hoc) for the non-

identifiability of the time series random walk models, would be to use repeated observations at each time

point. This solution has legitimacy since within each day there are 12 repeats of each treatment and hence

a whole series of possible calculations for each contrast within a day. Thus, it seems not inappropriate to

take 12 MCMC simulations of the contrast for a day and fit these to the various time series models. Such

contrasts have already been spatially adjusted. In Figure 8.1, we show the fit of an RW1 model together

with the mean contrast value for the day. This model had a pD of 53, in other words, almost all the date

degrees of freedom.

Thinking further about this, we note that the MCMC iterates form a sample of the distribution of the

contrast at each day and depth. We should be able to use a subsample of these MCMC samples to fit the

second-stage model to get appropriate credible intervals and a meaningful description in the dimension

of time. The question is, do we need a sample of 10000, 100 or 12?

Table 8.1 compares two random (sampled from the MCMC iterates) samples of different size and

two second-stage models at depth 100 cm, and shows that for a simple random walk of order one, sample

sizes of 12 and 100 produce essentially equivalent results. Comparing the DIC for models with the same

sample size, we see the superiority of the RW1 model at both sample sizes in comparison with the AR1

plus covariates model. The pD shows that the effective number of parameters is unchanged for the fits

from sample size to sample size, and estimates of the standard errors are constant from sample size to

sample size, which shows that both 12 and 100 are sufficiently large samples to represent the distribution

of the contrast under the models’ assumptions.

However, with a sample of the contrast at each time point, more complex time series models, with

differing observational variances at each time point should be fitted. For example, one possibility is to fit a


two variance observational error (a contaminated mixture model), and an example is shown in Figure 8.2.

(Figure 8.1, shows the fit of the single variance RW1 model.) More realistically, the variance of the

variance probably varies continuously, and models to reflect this should be fitted. Such a model with its

additional complexity will need a larger sample size of the MCMC iterates in order to be estimated, and

again, the problem arises of how large this sample size should be.

The graphs of the estimates of square roots of the structural (spatial) variances and the unstructured

variance components from Equation 7.1 are shown in Figures C.29- C.35 in Appendix C.2 of Chapter 7.

These show something of the complexity of modelling longitudinal experimental agricultural data.

A further issue for future work is that the 90,720 measurements used in the analysis of Chapter 7

did not include the days for which no measurements at all were taken for treatments 7 & 8. Within the

structure of a repeated daily model, with no postulated distribution over time, it did not seem sensible to

fit a distribution for the missing data, when information about an entire treatment group was missing. In

an integrated model, where variation over time is incorporated into the model structure, such data could

meaningfully be included. Thus, a missing data model fits into the need for an integrated space time

interaction model for the data.

Futher exploration of the modelling of the contrasts as multivariate time series would seem to be an

appropriate way to start to build a fully integrated model.

Considering yet again, the issue of the random walk models of order one, a further way of exploring

this problem would be to normalise the response variable prior to fitting, in order to change the apparent

non-informativeness of the priors. That is, a prior is uninformative relative to the scale of the data.


Figure 8.1 Random walk of order one & 95% credible intervals at depth 100 cm. Fitted to 12posterior contrast estimates at each time point.

Table 8.1 Comparison of some fits for the contrast Long fallowing vs Response cropping atDepth 100.

Model Sample size pD DIC s s2

AR1+5 12 9 -2994 .026 (.025, .027)100 9 -3190 .022 (.021, .024)

RW1 12 53 -3619 .021 (.017, .026) .016 (.015, .017)

100 52 -3585 .021 (.017, .026) .016 (.015, .017)

s: square root of the posterior observational variance (in the RW1 models)s2: square root of the posterior system variance (in the RW1 models)


Figure 8.2 Contaminated observational error: Random walk of order one & 95% credible inter-vals at depth 100 cm. Fitted to 12 posterior contrast estimates at each time point.


Bibliography





Gerlach, R., C. Carter, and R. Kohn (2000). Efficient Bayesian inference for dynamic mixture models.

Journal of the American Statistical Association 95(451), 818–828.






Markov random fields: the stochastic partial differential equation approach. Journal of the Royal

Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.





ogy 19(9), 1357–1375.







Citeseer.

BIBLIOGRAPHY 297



Wakefield, J., N. Best, and L. Waller (2000). Bayesian approaches to disease mapping. In P. Elliott,

J. Wakefield, N. Best, and D. Briggs (Eds.), Spatial Epidemiology: Methods and Applications, pp.

104–127. Oxford: Oxford University Press.

Appendices

299

Appendix A

Some mathematical models

A.1 WinBUGS code for the model 2 Bayesian net of Paper 1

# AngusOriginal260808CI

# Age structure modified to reflect Australian population of 2000

Model for (i in 1:N)

node[i,1] ˜ dbeta( a[1],b[1]) # Primary source water

PSW[i] ˜ dbern(node[i,1])

node[i,5] ˜ dbeta(a[5],b[5]) # Other source water

OSW[i] ˜ dbern(node[i,5])

node[i,6] ˜ dbeta(a[6],b[6]) # Reprocessing

Rep[i] ˜ dbern(node[i,6])

node[i,7] ˜ dbeta(a[7],b[7]) # Other planned/unplanned supply

OPPS[i] ˜ dbern(node[i,7])

node2[i] <- 2*PSW[i]+ OSW[i] + 1 # Primary treatment

X[i,2] ˜ dbeta(aa2[node2[i]],bb2[node2[i]])

PT[i] ˜ dbern(X[i,2])

node[i,3] <- 2*Rep[i] + PT[i] + 1 # Storage

X[i,3] ˜ dbeta(aa3[node[i,3]],bb3[node[i,3]])

Storage[i] ˜ dbern(X[i,3])

node[i,4] <- 2*OPPS[i] + Storage[i] + 1 # Endpoint distribution


ED[i] ˜ dbern(X[i,4])

301

302 APPENDIX A.

node[i,8] ˜ dbeta(a[8],b[8]) # Planned/unplanned Use

Puse[i] ˜ dbern(node[i,8])

node[i,10] ˜ dbeta(a[10],b[10]) # Exposure period (Short)

EP[i] ˜ dbern(node[i,10])

node[i,12] ˜ dbeta(a[12],b[12]) # Pathogen uptake (Low)

PU[i] ˜ dbern(node[i,12])

Age[i] ˜ dcat(p[])

node[i,9] <- 2*ED[i] + Puse[i] + 1 # Pathogen Load


PL[i] ˜ dbern(X[i,9])

node[i,11] <- 4*PU[i] + 2*EP[i] + PL[i] + 1 # Cumulative dose


CD[i] ˜ dbern(X[i,11])

node[i,13] <- 3*CD[i] + Age[i] # Gastroenteritis (Yes)


Gastro[i] ˜ dbern(X[i,13])

# Partitioning the sample

CD1[i] <- step(CD[i]-1)

ED0[i] <- step(-ED[i])

Puse0[i] <- step(-Puse[i])

CD1Gastro[i] <- CD1[i]*Gastro[i]

ED0Gastro[i] <- ED0[i]*Gastro[i]

Puse0Gastro[i] <- Puse0[i]*Gastro[i]

ED0Puse0Gastro[i] <- Puse0Gastro[i]*ED0[i]

ED0Puse0[i] <- Puse0[i]*ED0[i]

Age1[i] <- equals(Age[i],1)



CD1Age1[i] <- step(CD[i]*equals(Age[i],1)-1)



CD1Age1Gastro[i] <- CD1Age1[i]*Gastro[i]



ED0Age1[i] <- ED0[i]*equals(Age[i],1)

A.1. WinBUGS code for the model 2 Bayesian net of Paper 1 303



ED0Age1Gastro[i] <- ED0Age1[i]*Gastro[i]



ED0Puse0Age1[i] <- ED0Puse0[i]*equals(Age[i],1)



ED0Puse0Age1Gastro[i] <- ED0Puse0Age1[i]*Gastro[i]



Age1Gastro[i] <- Age1[i]*Gastro[i]



e[1] <- sum(ED0Age1[]) # E(n) for ED=0, Age=1



e[4] <- sum(ED0Puse0Age1[]) # E(n) for ED=0, Puse=0, Age=1



r[1] <- sum(PT[])/N # p for PT: node 2

r[2] <- sum(Storage[])/N # p for Storage: node 3

r[3] <- sum(ED[])/N # p for Endpoint distribution:

# node 4

r[4] <- sum(PL[])/N # prop for PL: node 9

r[5] <- sum(CD[])/N # prop for CD: node 11

r[6] <- sum(Gastro[])/N # prop for Gastro: node 13

# Conditional probabilities

r[7] <- sum(CD1Gastro[])/sum(CD1[])

# prob for Gastro: All population groups (CD acceptable)

r[8] <- sum(ED0Gastro[])/sum(ED0[])

# prob for Gastro: All population groups (ED fails)

r[20] <- sum(ED0Puse0Gastro[])/sum(ED0Puse0[])

# prob for Gastro: All population groups (ED fails & Puse=0)

r[9] <- r[6]/r[7] # RR Gastro: All population groups:

# r[6]/ r[7] CD v CD1

r[10] <- r[8]/r[7] # RR Gastro: All population groups

# ED0 v CD1

r[21] <- r[20]/r[7] # RR Gastro: All population groups

# ED0Puse v CD1

304 APPENDIX A.

r[11] <- sum(CD1Age1Gastro[])/sum(CD1Age1[])

# prob for Gastro: <5 (CD acceptable)


# prob for Gastro: 5-64 (CD acceptable)


# prob for Gastro: 65+ (CD acceptable)

r[14] <- sum(Age1Gastro[])/sum(Age1[]) # prob for Gastro: <5

r[15] <- sum(Age2Gastro[])/sum(Age2[]) # prob for Gastro: 5-64

r[16] <- sum(Age3Gastro[])/sum(Age3[]) # prob for Gastro: 65+

r[17] <- r[14]/r[11] # RR Gastro: <5: CD v CD1

r[18] <- r[15]/r[12] # RR Gastro: 5-64: CD v CD1

r[19] <- r[16]/r[13] # RR Gastro: 65+: CD v CD1

r[22] <- sum(ED0Age1Gastro[])/sum(ED0Age1[])

# prob for Gastro: (ED fails & Age<5)


# prob for Gastro: (ED fails & Age 5-64)


# prob for Gastro: (ED fails & Age 65+)

r[25] <- r[22]/r[11] # RR Gastro: <5: ED0 v CD1

r[26] <- r[23]/r[12] # RR Gastro: 5-64: ED0 v CD1

r[27] <- r[24]/r[13] # RR Gastro: 65$+$: ED0 v CD1

r[28] <- sum(ED0Puse0Age1Gastro[])/sum(ED0Puse0Age1[])

# prob for Gastro: (ED fails Puse=0 & Age<5)


# prob for Gastro: (ED fails Puse=0 & Age 5-64)


# prob for Gastro: (ED fails Puse=0 & Age 65+)

r[31] <- r[28]/r[11] # RR Gastro: <5: ED0Puse0 v CD1

r[32] <- r[29]/r[12] # RR Gastro: 5-64: ED0Puse0 v CD1

r[33] <- r[30]/r[13] # RR Gastro: 65+: ED0Puse0 v CD1

for(k in 1:4)

aa2[k] ˜ dgamma(.01,.01)

bb2[k] ˜ dgamma(.01,.01)

aa3[k] ˜ dgamma(.01,.01)

bb3[k] ˜ dgamma(.01,.01)

aa4[k] ˜ dgamma(.01,.01)

bb4[k] ˜ dgamma(.01,.01)

A.1. WinBUGS code for the model 2 Bayesian net of Paper 1 305

aa9[k] ˜ dgamma(.01,.01)

bb9[k] ˜ dgamma(.01,.01)

for (j in 1:6)

aa13[j] ˜ dgamma(.01,.01)

bb13[j] ˜ dgamma(.01,.01)

for (jj in 1:8)

aa11[jj] ˜ dgamma(.01,.01)

bb11[jj] ˜ dgamma(.01,.01)

#Data

list(N=50000,

p=c(.0671, .8096, .1233),

a=c(6.751,NA, NA, NA,.599, 59.2524, 48.372, 30.217, NA,

30.217, NA, 47.52),

b=c(668.37, NA,NA, NA,59.25, .59851, 12.093, 3.35744, NA,

3.35744, NA, 47.52),

aa2=c(55.687, 117.083, 375.525,5.135),

bb2=c(2.32, 2.39, 3.79, .01),

aa3=c(48.372,59.252,30.217,239.98),

bb3=c(12.09,.60, 3.36, 2.42),

aa4=c(8.335,21.054, 30.217, 152.358),

bb4=c(3.57, 5.26, 3.36, .15),

aa9=c(8.335, 68.391, 172.529, 152.358),

bb9=c(3.57, 3.60, 5.34,.15),

aa11=c(8.335, 7.068, 30.217, 172.529, 92.103, 68.391, 117.083, 152.358),

bb11=c(3.57, 1.77, 3.36, 5.34, 6.93, 3.60, 2.39, .15),

aa13=c(8.82, 12.093, 10.456, 3.6, 3.739, 5.33595),

bb13=c(13.23, 48.37, 24.40, 68.39, 375.53, 172.529)

)

Appendix B

Supplementary materials for Chapter

Six

B.1 Supplementary tables

These tables tabulate the elements in the model which remain constant across the dates considered. The

symbols used are those used in Chapter 6. Thus, σ represents the square root of the unstructured variance

on a particular date and at a particular depth, and κ, the square root of the spatially structured variance on

a particular date and at a particular depth.

307

308 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX

Table B.1 Differences in σ for depths from 20cm to 100cm

Depth Day1 Day2 Est q025 q975 Sig

20 1 2 0.011 -0.029 0.0713 0.007 -0.042 0.0674 0.027 0.002 0.083 *5 -0.011 -0.062 0.058

20 2 3 -0.004 -0.048 0.0314 0.016 0.001 0.046 *5 -0.022 -0.066 0.022

20 3 4 0.020 0.001 0.062 *5 -0.018 -0.065 0.035

20 4 5 -0.038 -0.076 -0.005 *

40 1 2 0.013 -0.069 0.0683 0.018 -0.068 0.0694 0.052 0.008 0.083 *5 0.030 -0.019 0.066

40 2 3 0.005 -0.083 0.0834 0.040 0.004 0.100 *5 0.017 -0.026 0.082

40 3 4 0.034 0.004 0.103 *5 0.012 -0.027 0.083

40 4 5 -0.023 -0.040 -0.006 *

60 1 2 0.006 -0.040 0.0443 0.004 -0.051 0.0454 0.029 0.002 0.056 *5 0.022 -0.005 0.050

60 2 3 -0.002 -0.057 0.0484 0.023 -0.001 0.0605 0.016 -0.008 0.054

60 3 4 0.025 -0.001 0.0725 0.018 -0.009 0.066

60 4 5 -0.007 -0.017 0.004

80 1 2 -0.003 -0.028 0.0213 -0.004 -0.035 0.0224 0.008 -0.008 0.0275 0.013 -0.002 0.031

80 2 3 -0.001 -0.033 0.0284 0.011 -0.007 0.0345 0.016 -0.002 0.038

80 3 4 0.012 -0.008 0.0415 0.017 -0.002 0.044

80 4 5 0.005 -0.005 0.015

100 1 2 0.001 -0.014 0.0173 -0.002 -0.020 0.0154 0.004 -0.009 0.0185 0.009 -0.002 0.022

100 2 3 -0.003 -0.020 0.0134 0.002 -0.010 0.0165 0.008 -0.002 0.020

100 3 4 0.005 -0.009 0.0225 0.011 -0.001 0.026

100 4 5 0.006 -0.003 0.015

B.1. Supplementary tables 309

Table B.2 Differences in σ for depths from 120 cm to 200 cm


120 1 2 0.004 -0.006 0.0153 0.001 -0.010 0.0134 0.003 -0.007 0.0145 0.007 -0.002 0.018

120 2 3 -0.003 -0.014 0.0074 -0.001 -0.010 0.0085 0.003 -0.004 0.011

120 3 4 0.002 -0.008 0.0135 0.006 -0.002 0.016

120 4 5 0.004 -0.004 0.013

140 1 2 0.002 -0.004 0.0093 0.001 -0.006 0.0074 -0.000 -0.007 0.0075 0.003 -0.003 0.009

140 2 3 -0.001 -0.008 0.0054 -0.002 -0.009 0.0045 0.001 -0.005 0.007

140 3 4 -0.001 -0.008 0.0065 0.002 -0.004 0.009

140 4 5 0.003 -0.003 0.010

160 1 2 0.001 -0.005 0.0073 0.001 -0.004 0.0074 0.000 -0.005 0.0065 0.002 -0.003 0.008

160 2 3 0.000 -0.005 0.0064 -0.001 -0.006 0.0055 0.001 -0.004 0.006

160 3 4 -0.001 -0.006 0.0055 0.001 -0.004 0.006

160 4 5 0.002 -0.004 0.007

180 1 2 0.000 -0.005 0.0063 0.001 -0.004 0.0064 0.001 -0.004 0.0065 0.001 -0.004 0.007

180 2 3 0.001 -0.004 0.0064 0.001 -0.004 0.0065 0.001 -0.004 0.006

180 3 4 -0.000 -0.005 0.0055 0.000 -0.004 0.005

180 4 5 0.000 -0.004 0.005

200 1 2 0.000 -0.005 0.0063 -0.000 -0.006 0.0054 0.000 -0.005 0.0065 0.001 -0.005 0.006

200 2 3 -0.000 -0.006 0.0054 0.000 -0.005 0.0055 0.001 -0.005 0.006

200 3 4 0.001 -0.005 0.0065 0.001 -0.004 0.006

200 4 5 0.000 -0.005 0.006


Table B.3 Differences in σ for depths from 220 cm to 300 cm


220 1 2 -0.004 -0.012 0.0033 -0.005 -0.013 0.0024 -0.004 -0.011 0.0035 -0.005 -0.012 0.002

220 2 3 -0.001 -0.009 0.0084 0.001 -0.007 0.0095 -0.001 -0.009 0.008

220 3 4 0.001 -0.007 0.0105 0.000 -0.008 0.009

220 4 5 -0.001 -0.009 0.007

240 1 2 -0.002 -0.011 0.0073 -0.002 -0.012 0.0074 -0.003 -0.013 0.0075 -0.002 -0.011 0.007

240 2 3 -0.001 -0.010 0.0094 -0.001 -0.011 0.0095 -0.000 -0.010 0.009

240 3 4 -0.001 -0.011 0.0095 0.000 -0.009 0.010

240 4 5 0.001 -0.009 0.011

260 1 2 0.002 -0.009 0.0153 0.003 -0.009 0.0154 0.003 -0.009 0.0155 0.003 -0.009 0.015

260 2 3 0.000 -0.011 0.0114 0.001 -0.010 0.0115 0.001 -0.010 0.011

260 3 4 0.000 -0.010 0.0115 0.000 -0.010 0.011

260 4 5 0.000 -0.010 0.011

280 1 2 0.000 -0.018 0.0213 0.001 -0.017 0.0224 0.002 -0.016 0.0225 0.001 -0.016 0.022

280 2 3 0.001 -0.016 0.0194 0.002 -0.015 0.0195 0.001 -0.015 0.019

280 3 4 0.001 -0.016 0.0175 0.000 -0.016 0.017

280 4 5 -0.000 -0.017 0.016

300 1 2 -0.004 -0.031 0.0193 -0.002 -0.027 0.0204 -0.002 -0.028 0.0215 -0.004 -0.030 0.020

300 2 3 0.002 -0.025 0.0304 0.002 -0.025 0.0305 0.000 -0.028 0.030

300 3 4 0.000 -0.026 0.0265 -0.001 -0.029 0.025

300 4 5 -0.001 -0.029 0.026


Table B.4 Differences in κ for depths from 20cm to 100cm


20 1 2 -0.043 -0.081 -0.012 *3 0.006 -0.026 0.0324 0.035 0.007 0.054 *5 0.015 -0.020 0.042

20 2 3 0.049 0.021 0.080 *4 0.078 0.055 0.105 *5 0.058 0.028 0.091 *

20 3 4 0.029 0.009 0.044 *5 0.009 -0.017 0.033

20 4 5 -0.020 -0.039 -0.003 *

40 1 2 -0.020 -0.053 0.0223 -0.027 -0.059 0.0194 0.015 0.002 0.038 *5 0.008 -0.007 0.032

40 2 3 -0.007 -0.052 0.0434 0.035 0.004 0.062 *5 0.028 -0.004 0.056

40 3 4 0.042 0.005 0.068 *5 0.035 -0.003 0.062

40 4 5 -0.007 -0.013 -0.002 *

60 1 2 -0.009 -0.026 0.0113 -0.014 -0.034 0.0114 0.012 0.002 0.023 *5 0.010 -0.001 0.022

60 2 3 -0.005 -0.029 0.0214 0.020 0.004 0.034 *5 0.018 0.002 0.032 *

60 3 4 0.025 0.004 0.042 *5 0.023 0.002 0.040 *

60 4 5 -0.002 -0.005 0.000

80 1 2 -0.003 -0.011 0.0063 -0.007 -0.018 0.0044 0.005 -0.000 0.0115 0.007 0.001 0.012 *

80 2 3 -0.005 -0.016 0.0074 0.008 0.001 0.015 *5 0.009 0.002 0.016 *

80 3 4 0.012 0.002 0.022 *5 0.014 0.004 0.023 *

80 4 5 0.002 -0.001 0.004

100 1 2 0.001 -0.004 0.0053 -0.002 -0.007 0.0044 0.003 -0.001 0.0075 0.005 0.002 0.009 *

100 2 3 -0.002 -0.008 0.0034 0.002 -0.001 0.0065 0.004 0.001 0.008 *

100 3 4 0.005 0.000 0.009 *5 0.007 0.002 0.011 *

100 4 5 0.002 -0.000 0.004


Table B.5 Differences in κ for depths from 120 cm to 200 cm


120 1 2 0.002 -0.001 0.0053 0.000 -0.003 0.0044 0.002 -0.001 0.0055 0.003 0.001 0.006 *

120 2 3 -0.002 -0.004 0.0014 -0.000 -0.002 0.0025 0.001 -0.000 0.003

120 3 4 0.001 -0.001 0.0045 0.003 0.001 0.006 *

120 4 5 0.002 -0.000 0.004

140 1 2 0.001 -0.001 0.0023 0.000 -0.002 0.0024 -0.000 -0.002 0.0025 0.001 -0.000 0.002

140 2 3 -0.001 -0.002 0.0014 -0.001 -0.002 0.0015 0.000 -0.001 0.001

140 3 4 -0.000 -0.002 0.0025 0.001 -0.000 0.002

140 4 5 0.001 -0.000 0.002

160 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0024 0.000 -0.001 0.0015 0.001 -0.000 0.002

160 2 3 0.000 -0.001 0.0014 -0.000 -0.001 0.0015 0.000 -0.001 0.001

160 3 4 -0.000 -0.001 0.0015 0.000 -0.001 0.001

160 4 5 0.000 -0.001 0.001

180 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001

180 2 3 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001

180 3 4 0.000 -0.001 0.0015 0.000 -0.001 0.001

180 4 5 0.000 -0.001 0.001

200 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001

200 2 3 -0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001

200 3 4 0.000 -0.001 0.0015 0.000 -0.001 0.001

200 4 5 0.000 -0.001 0.001


Table B.6 Differences in κ for depths from 220 cm to 300 cm


220 1 2 -0.001 -0.003 0.0003 -0.001 -0.003 0.0004 -0.001 -0.003 0.0005 -0.001 -0.003 0.000

220 2 3 -0.000 -0.002 0.0024 0.000 -0.002 0.0025 -0.000 -0.002 0.002

220 3 4 0.000 -0.002 0.0025 -0.000 -0.002 0.002

220 4 5 -0.000 -0.002 0.002

240 1 2 -0.000 -0.002 0.0023 -0.001 -0.003 0.0024 -0.001 -0.003 0.0025 -0.000 -0.002 0.002

240 2 3 -0.000 -0.003 0.0024 -0.000 -0.003 0.0025 0.000 -0.002 0.002

240 3 4 -0.000 -0.003 0.0025 0.001 -0.002 0.003

240 4 5 0.001 -0.002 0.003

260 1 2 0.003 -0.001 0.0063 0.003 -0.001 0.0064 0.002 -0.001 0.0065 0.003 -0.001 0.007

260 2 3 0.000 -0.003 0.0034 -0.000 -0.003 0.0035 0.000 -0.003 0.003

260 3 4 -0.000 -0.003 0.0035 0.000 -0.003 0.003

260 4 5 0.000 -0.002 0.003

280 1 2 0.005 -0.002 0.0123 0.006 -0.000 0.0134 0.007 0.000 0.014 *5 0.007 -0.000 0.013

280 2 3 0.001 -0.004 0.0064 0.002 -0.003 0.0075 0.001 -0.004 0.007

280 3 4 0.001 -0.004 0.0065 0.000 -0.005 0.005

280 4 5 -0.000 -0.005 0.004

300 1 2 -0.000 -0.010 0.0103 -0.000 -0.010 0.0104 -0.001 -0.011 0.0095 -0.002 -0.012 0.009

300 2 3 -0.000 -0.011 0.0114 -0.001 -0.012 0.0105 -0.002 -0.013 0.010

300 3 4 -0.001 -0.012 0.0105 -0.001 -0.012 0.010

300 4 5 -0.001 -0.012 0.011


Table B.7 Differences in slope from 200 cm - 300 cm for each treatment across days

Treatment Day1 Day2 Est q025 q975 Sig

1 1 2 -0.002 -0.016 0.0123 0.014 0.000 0.028 *4 0.020 0.007 0.034 *5 0.001 -0.012 0.015

1 2 3 0.017 0.003 0.030 *4 0.023 0.009 0.036 *5 0.004 -0.010 0.017

1 3 4 0.006 -0.007 0.0195 -0.013 -0.026 0.000

1 4 5 -0.019 -0.032 -0.006 *

2 1 2 -0.006 -0.020 0.0083 -0.002 -0.016 0.0124 -0.001 -0.015 0.0135 -0.006 -0.019 0.008

2 2 3 0.004 -0.010 0.0184 0.005 -0.009 0.0185 0.000 -0.013 0.014

2 3 4 0.001 -0.012 0.0145 -0.004 -0.017 0.010

2 4 5 -0.004 -0.018 0.008

3 1 2 -0.002 -0.016 0.0123 -0.000 -0.014 0.0144 0.004 -0.010 0.0185 -0.000 -0.014 0.014

3 2 3 0.002 -0.012 0.0154 0.006 -0.007 0.0195 0.002 -0.012 0.015

3 3 4 0.004 -0.009 0.0185 0.000 -0.013 0.013

3 4 5 -0.004 -0.017 0.009

4 1 2 -0.004 -0.019 0.0103 0.000 -0.014 0.0144 0.003 -0.011 0.0175 -0.001 -0.016 0.013

4 2 3 0.005 -0.009 0.0184 0.007 -0.007 0.0205 0.003 -0.010 0.017

4 3 4 0.003 -0.011 0.0165 -0.001 -0.015 0.012

4 4 5 -0.004 -0.017 0.009




5 1 2 -0.006 -0.020 0.0083 -0.000 -0.014 0.0144 0.002 -0.012 0.0175 -0.008 -0.022 0.006

5 2 3 0.006 -0.008 0.0204 0.008 -0.005 0.0225 -0.002 -0.016 0.012

5 3 4 0.003 -0.011 0.0165 -0.008 -0.022 0.005

5 4 5 -0.011 -0.024 0.002

6 1 2 -0.003 -0.017 0.0103 0.000 -0.013 0.0144 0.002 -0.011 0.0165 -0.001 -0.015 0.012

6 2 3 0.004 -0.010 0.0174 0.006 -0.008 0.0195 0.002 -0.011 0.015

6 3 4 0.002 -0.011 0.0155 -0.002 -0.015 0.012

6 4 5 -0.004 -0.016 0.009




7 1 2 -0.003 -0.016 0.0103 -0.001 -0.014 0.0134 0.005 -0.008 0.0185 -0.002 -0.015 0.012

7 2 3 0.003 -0.010 0.0154 0.008 -0.005 0.0215 0.001 -0.012 0.014

7 3 4 0.005 -0.008 0.0185 -0.001 -0.014 0.012

7 4 5 -0.006 -0.019 0.007

8 1 2 -0.004 -0.018 0.0103 0.001 -0.013 0.0164 0.005 -0.009 0.0195 0.002 -0.012 0.015

8 2 3 0.005 -0.008 0.0194 0.009 -0.005 0.0235 0.006 -0.008 0.019

8 3 4 0.004 -0.010 0.0175 0.000 -0.013 0.014

8 4 5 -0.003 -0.017 0.010

9 1 2 -0.004 -0.018 0.0103 -0.016 -0.030 -0.003 *4 -0.022 -0.036 -0.008 *5 -0.007 -0.021 0.007

9 2 3 -0.013 -0.026 0.0014 -0.018 -0.031 -0.004 *5 -0.003 -0.017 0.010

9 3 4 -0.005 -0.018 0.0085 0.009 -0.004 0.022

9 4 5 0.015 0.001 0.028 *


Table B.10 Differences in slopes for each treatment on day 1

Day Trt1 Trt2 Est q025 q975 Sig

1 1 2 0.003 -0.007 0.0133 -0.008 -0.019 0.0024 0.003 -0.008 0.0135 -0.003 -0.013 0.0086 -0.006 -0.017 0.0047 -0.006 -0.016 0.0048 0.007 -0.002 0.0179 -0.006 -0.016 0.004

1 2 3 -0.011 -0.022 -0.001 *4 -0.000 -0.010 0.0105 -0.006 -0.016 0.0056 -0.009 -0.019 0.0027 -0.009 -0.018 0.0018 0.004 -0.005 0.0149 -0.009 -0.019 0.002

1 3 4 0.011 0.002 0.021 *5 0.006 -0.005 0.0166 0.002 -0.008 0.0137 0.003 -0.007 0.0138 0.016 0.006 0.025 *9 0.003 -0.007 0.013

1 4 5 -0.006 -0.016 0.0046 -0.009 -0.019 0.0027 -0.009 -0.018 0.0018 0.004 -0.005 0.0149 -0.009 -0.018 0.001

1 5 6 -0.003 -0.014 0.0087 -0.003 -0.013 0.0078 0.010 0.000 0.020 *9 -0.003 -0.013 0.007

1 6 7 0.000 -0.010 0.0118 0.013 0.003 0.023 *9 0.000 -0.010 0.010

1 7 8 0.013 0.004 0.022 *9 -0.000 -0.010 0.010

1 8 9 -0.013 -0.023 -0.003 *




2 1 2 0.005 -0.004 0.0153 -0.002 -0.012 0.0074 0.005 -0.005 0.0145 0.002 -0.008 0.0116 0.000 -0.010 0.0107 -0.002 -0.012 0.0078 0.010 0.001 0.019 *9 -0.002 -0.012 0.008

2 2 3 -0.008 -0.017 0.0024 -0.000 -0.010 0.0095 -0.004 -0.013 0.0066 -0.005 -0.015 0.0057 -0.007 -0.016 0.0018 0.005 -0.004 0.0149 -0.007 -0.017 0.003

2 3 4 0.007 -0.002 0.0165 0.004 -0.006 0.0146 0.003 -0.007 0.0137 0.000 -0.009 0.0098 0.013 0.004 0.022 *9 0.001 -0.009 0.010

2 4 5 -0.003 -0.013 0.0066 -0.005 -0.014 0.0057 -0.007 -0.016 0.0028 0.005 -0.003 0.0149 -0.007 -0.015 0.002

2 5 6 -0.002 -0.011 0.0097 -0.004 -0.013 0.0058 0.009 -0.000 0.0189 -0.004 -0.013 0.006

2 6 7 -0.002 -0.012 0.0078 0.010 0.000 0.020 *9 -0.002 -0.011 0.007

2 7 8 0.012 0.004 0.021 *9 0.000 -0.009 0.010

2 8 9 -0.012 -0.021 -0.003 *




3 1 2 -0.011 -0.021 -0.002 *3 -0.006 -0.016 0.0034 0.003 -0.006 0.0135 -0.003 -0.013 0.0076 -0.006 -0.016 0.0037 -0.006 -0.015 0.0038 0.008 -0.001 0.0179 -0.007 -0.017 0.002

3 2 3 0.005 -0.004 0.0144 0.014 0.005 0.024 *5 0.008 -0.001 0.0186 0.005 -0.004 0.0157 0.005 -0.003 0.0148 0.019 0.010 0.028 *9 0.004 -0.005 0.014

3 3 4 0.010 0.001 0.019 *5 0.003 -0.006 0.0136 0.000 -0.009 0.0107 0.001 -0.008 0.0108 0.014 0.005 0.023 *9 -0.001 -0.010 0.009

3 4 5 -0.006 -0.016 0.0036 -0.009 -0.019 0.0007 -0.009 -0.018 -0.000 *8 0.005 -0.004 0.0139 -0.010 -0.019 -0.001 *

3 5 6 -0.003 -0.013 0.0077 -0.003 -0.012 0.0068 0.011 0.001 0.020 *9 -0.004 -0.013 0.005

3 6 7 0.000 -0.009 0.0098 0.014 0.004 0.023 *9 -0.001 -0.010 0.008

3 7 8 0.014 0.005 0.022 *9 -0.001 -0.011 0.008

3 8 9 -0.015 -0.024 -0.006 *




4 1 2 -0.017 -0.027 -0.008 *3 -0.007 -0.017 0.0024 -0.001 -0.011 0.0085 -0.006 -0.015 0.0046 -0.008 -0.018 0.0017 -0.008 -0.017 0.0018 0.003 -0.006 0.0129 -0.011 -0.021 -0.002 *

4 2 3 0.010 0.001 0.020 *4 0.016 0.007 0.025 *5 0.012 0.002 0.021 *6 0.009 -0.000 0.0197 0.010 0.001 0.018 *8 0.020 0.011 0.029 *9 0.007 -0.002 0.016

4 3 4 0.006 -0.003 0.0155 0.002 -0.008 0.0116 -0.001 -0.011 0.0087 -0.001 -0.009 0.0088 0.010 0.001 0.019 *9 -0.004 -0.013 0.006

4 4 5 -0.004 -0.014 0.0056 -0.007 -0.017 0.0027 -0.007 -0.015 0.0028 0.004 -0.005 0.0139 -0.010 -0.018 -0.001 *

4 5 6 -0.003 -0.012 0.0077 -0.002 -0.011 0.0078 0.008 -0.001 0.0179 -0.005 -0.014 0.004

4 6 7 0.001 -0.008 0.0108 0.011 0.002 0.020 *9 -0.002 -0.011 0.006

4 7 8 0.010 0.002 0.019 *9 -0.003 -0.012 0.006

4 8 9 -0.014 -0.022 -0.005 *




5 1 2 0.002 -0.007 0.0113 -0.003 -0.012 0.0064 0.003 -0.006 0.0125 -0.001 -0.011 0.0086 0.002 -0.007 0.0117 -0.004 -0.013 0.0058 0.009 -0.000 0.0189 -0.007 -0.017 0.002

5 2 3 -0.004 -0.014 0.0054 0.001 -0.008 0.0115 -0.003 -0.013 0.0066 0.001 -0.009 0.0107 -0.006 -0.015 0.0038 0.007 -0.001 0.0179 -0.009 -0.018 0.001

5 3 4 0.006 -0.003 0.0155 0.001 -0.008 0.0116 0.005 -0.005 0.0157 -0.002 -0.010 0.0078 0.012 0.003 0.020 *9 -0.005 -0.014 0.004

5 4 5 -0.004 -0.013 0.0056 -0.001 -0.010 0.0097 -0.007 -0.016 0.0018 0.006 -0.003 0.0159 -0.010 -0.019 -0.001 *

5 5 6 0.004 -0.006 0.0137 -0.003 -0.012 0.0068 0.010 0.002 0.020 *9 -0.006 -0.015 0.003

5 6 7 -0.007 -0.015 0.0028 0.007 -0.003 0.0169 -0.010 -0.019 -0.001 *

5 7 8 0.013 0.005 0.022 *9 -0.003 -0.012 0.006

5 8 9 -0.016 -0.025 -0.008 *


Table B.15 Slopes for segment 200 cm - 300 cm for each treatment

Treatment Day (Date) Est q025 q975 Sig

1 1 -0.002 -0.010 0.0052 0.001 -0.006 0.0083 -0.001 -0.008 0.0054 -0.005 -0.012 0.0015 -0.002 -0.009 0.005

2 1 -0.005 -0.013 0.0022 -0.004 -0.011 0.0033 0.010 0.003 0.017 *4 0.012 0.005 0.019 *5 -0.003 -0.010 0.003

3 1 0.006 -0.001 0.0142 0.004 -0.003 0.0113 0.005 -0.002 0.0124 0.002 -0.005 0.0095 0.001 -0.006 0.008

4 1 -0.005 -0.012 0.0022 -0.004 -0.010 0.0033 -0.004 -0.011 0.0024 -0.004 -0.010 0.0025 -0.005 -0.011 0.001

5 1 0.001 -0.007 0.0082 -0.000 -0.007 0.0073 0.002 -0.005 0.0094 0.000 -0.006 0.0075 -0.000 -0.007 0.006

6 1 0.004 -0.004 0.0122 0.001 -0.006 0.0083 0.005 -0.003 0.0124 0.003 -0.004 0.0105 -0.004 -0.011 0.003

7 1 0.003 -0.003 0.0102 0.003 -0.003 0.0103 0.005 -0.002 0.0114 0.003 -0.003 0.0095 0.002 -0.004 0.008

8 1 -0.009 -0.016 -0.003 *2 -0.009 -0.016 -0.002 *3 -0.009 -0.015 -0.003 *4 -0.008 -0.014 -0.002 *5 -0.011 -0.017 -0.005 *

9 1 0.004 -0.003 0.0112 0.003 -0.004 0.0103 0.006 -0.001 0.0134 0.006 -0.001 0.0125 0.005 -0.001 0.012

* indicates 95% credible interval does not include zero.


Table B.16 Slopes for segment 200 cm - 300 cm for Groupings

Cropping Day Est q025 q975 Sig

Long fallowing 1 -0.000 -0.005 0.0042 0.000 -0.004 0.0043 0.005 0.000 0.009 *4 0.003 -0.001 0.0075 -0.001 -0.006 0.002

Response cropping 1 0.002 -0.003 0.0082 0.000 -0.004 0.0063 0.003 -0.002 0.0084 0.002 -0.003 0.0075 -0.002 -0.007 0.003

Pastures 1 -0.001 -0.005 0.0042 -0.001 -0.005 0.0033 0.000 -0.003 0.0044 0.000 -0.004 0.0045 -0.001 -0.005 0.003


Table B.17 Differences in slopes for each group across days

Group Day1 Day2 Est q025 q975 Sig

Long fallowing 1 2 -0.001 -0.007 0.0063 -0.005 -0.011 0.0014 -0.003 -0.009 0.0035 0.001 -0.005 0.007

Long fallowing 2 3 -0.004 -0.010 0.0024 -0.003 -0.009 0.0035 0.002 -0.004 0.008

Long fallowing 3 4 0.002 -0.004 0.0075 0.006 0.000 0.012 *

Long fallowing 4 5 0.005 -0.001 0.010

Response cropping 1 2 0.002 -0.006 0.0093 -0.001 -0.008 0.0074 0.000 -0.007 0.0085 0.004 -0.003 0.012

Response cropping 2 3 -0.003 -0.010 0.0044 -0.001 -0.008 0.0065 0.003 -0.004 0.010

Response cropping 3 4 0.001 -0.006 0.0085 0.005 -0.001 0.013

Response cropping 4 5 0.004 -0.003 0.011

Pastures 1 2 0.000 -0.006 0.0063 -0.001 -0.007 0.0054 -0.001 -0.006 0.0055 0.000 -0.005 0.006

Pastures 2 3 -0.001 -0.007 0.0044 -0.001 -0.007 0.0045 0.000 -0.005 0.006

Pastures 3 4 0.000 -0.005 0.0065 0.001 -0.004 0.007

Pastures 4 5 0.001 -0.004 0.006


Table B.18 Contrasts compared between days

Contrast Depth Day1 1 2 3 4 5

Long Fallow - Response 100 1 + + + +

2 -3 -4 -5 -

Long Fallow - Response 120 1 + + + +

2 -3 - -4 - +

5 -

Long Fallow - Response 140 1 + +

2 - + -3 - - - -4 + + +

5 + -


2 - + -3 - - - -4 + + +

5 + -


2 - +

3 - - - -4 +

5 +

Long Fallow - Response 200 12345

Long Fallow - Response 220 123 -45 +

Long Fallow - Response 240 123 -45 +







Cropping - Pastures 100 1 + -2 - - - -3 + -4 + -5 + + + +

Cropping - Pastures 120 1 + + - -2 - - -3 - - -4 + + + -5 + + + +

Cropping - Pastures 140 1 + - -2 - -3 - - -4 + + + -5 + + + +

Cropping - Pastures 160 1 + - -2 - -3 - - -4 + + + -5 + + + +

Cropping - Pastures 180 1 - -2 - -3 - -4 + + + -5 + + + +

Cropping - Pastures 200 1 - -2 -3 -4 + -5 + + + +

Cropping - Pastures 220 1 - -2 -3 -4 + -5 + + + +

Cropping - Pastures 240 1 - -2 -3 -4 +

5 + + +

Cropping - Pastures 260 1 - -2 -34 +

5 + +

Cropping - Pastures 280 12345

Cropping - Pastures 300 12345




Lucerne mixtures - Native 100 1 - - + +

2 + - + +

3 + + + +

4 - - - -5 - - - +

Lucerne mixtures - Native 120 1 - - +

2 + - + +

3 + + + +

4 - - - -5 - - +

Lucerne mixtures - Native 140 1 -2 +

3 + + +

4 - -5 -

Lucerne mixtures - Native 160 1 -23 + + +

4 -5 -


45

Lucerne mixtures - Native 200 1 - -234 +

5 +

Lucerne mixtures - Native 220 1 - - -23 +

4 +

5 +


5

Lucerne mixtures - Native 260 12345




Table B.21 Contrasts (1)

Contrast Depth Est q025 q975 Sig

Long Fallow - Response 20 0.203 0.151 0.254 *Long Fallow - Response 20 -0.069 -0.134 -0.003 *Long Fallow - Response 20 0.225 0.171 0.278 *Long Fallow - Response 20 . . .Long Fallow - Response 20 0.123 0.087 0.161 *Long Fallow - Response 40 0.171 0.137 0.204 *Long Fallow - Response 40 . . .Long Fallow - Response 40 0.166 0.132 0.203 *Long Fallow - Response 40 0.012 0.001 0.023 *Long Fallow - Response 40 0.088 0.064 0.112 *Long Fallow - Response 60 0.139 0.117 0.162 *Long Fallow - Response 60 . . .Long Fallow - Response 60 0.108 0.083 0.134 *Long Fallow - Response 60 . . .Long Fallow - Response 60 0.052 0.038 0.067 *Long Fallow - Response 80 0.108 0.081 0.135 *Long Fallow - Response 80 0.030 0.002 0.058 *Long Fallow - Response 80 0.050 0.019 0.081 *Long Fallow - Response 80 . . .Long Fallow - Response 80 0.017 0.000 0.034 *Long Fallow - Response 100 0.084 0.066 0.102 *Long Fallow - Response 100 0.024 0.006 0.042 *Long Fallow - Response 100 0.028 0.007 0.048 *Long Fallow - Response 100 0.020 0.007 0.032 *Long Fallow - Response 100 0.017 0.006 0.028 *Long Fallow - Response 120 0.060 0.047 0.073 *Long Fallow - Response 120 0.018 0.006 0.030 *Long Fallow - Response 120 . . .Long Fallow - Response 120 0.031 0.020 0.042 *Long Fallow - Response 120 0.017 0.008 0.027 *Long Fallow - Response 140 0.036 0.021 0.053 *Long Fallow - Response 140 . . .Long Fallow - Response 140 -0.016 -0.031 -0.001 *Long Fallow - Response 140 0.042 0.027 0.057 *Long Fallow - Response 140 0.017 0.004 0.030 *




Long Fallow - Response 160 0.033 0.022 0.043 *Long Fallow - Response 160 0.014 0.005 0.024 *Long Fallow - Response 160 . . .Long Fallow - Response 160 0.034 0.024 0.044 *Long Fallow - Response 160 0.020 0.012 0.029 *Long Fallow - Response 180 0.029 0.020 0.038 *Long Fallow - Response 180 0.016 0.007 0.024 *Long Fallow - Response 180 . . .Long Fallow - Response 180 0.026 0.018 0.035 *Long Fallow - Response 180 0.023 0.015 0.032 *Long Fallow - Response 200 0.025 0.012 0.038 *Long Fallow - Response 200 0.017 0.005 0.030 *Long Fallow - Response 200 . . .Long Fallow - Response 200 0.019 0.006 0.031 *Long Fallow - Response 200 0.027 0.014 0.039 *Long Fallow - Response 220 0.022 0.013 0.032 *Long Fallow - Response 220 0.017 0.008 0.027 *Long Fallow - Response 220 0.011 0.001 0.021 *Long Fallow - Response 220 0.020 0.010 0.029 *Long Fallow - Response 220 0.027 0.018 0.036 *Long Fallow - Response 240 0.020 0.009 0.030 *Long Fallow - Response 240 0.017 0.008 0.027 *Long Fallow - Response 240 0.013 0.003 0.023 *Long Fallow - Response 240 0.021 0.011 0.031 *Long Fallow - Response 240 0.028 0.019 0.038 *Long Fallow - Response 260 0.017 0.003 0.032 *Long Fallow - Response 260 0.017 0.004 0.031 *Long Fallow - Response 260 0.014 0.001 0.028 *Long Fallow - Response 260 0.022 0.008 0.036 *Long Fallow - Response 260 0.029 0.016 0.042 *Long Fallow - Response 280 . . .Long Fallow - Response 280 . . .Long Fallow - Response 280 . . .Long Fallow - Response 280 0.023 0.004 0.042 *Long Fallow - Response 280 0.030 0.011 0.048 *Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 0.030 0.006 0.054 *




Cropping - Pastures 20 0.362 0.324 0.398 *Cropping - Pastures 20 0.429 0.382 0.475 *Cropping - Pastures 20 0.279 0.239 0.319 *Cropping - Pastures 20 0.119 0.107 0.131 *Cropping - Pastures 20 0.489 0.462 0.516 *Cropping - Pastures 40 0.335 0.311 0.359 *Cropping - Pastures 40 0.365 0.335 0.395 *Cropping - Pastures 40 0.277 0.251 0.303 *Cropping - Pastures 40 0.166 0.158 0.174 *Cropping - Pastures 40 0.433 0.416 0.450 *Cropping - Pastures 60 0.309 0.293 0.324 *Cropping - Pastures 60 0.301 0.284 0.320 *Cropping - Pastures 60 0.275 0.257 0.294 *Cropping - Pastures 60 0.213 0.204 0.222 *Cropping - Pastures 60 0.377 0.366 0.387 *Cropping - Pastures 80 0.282 0.263 0.302 *Cropping - Pastures 80 0.237 0.217 0.258 *Cropping - Pastures 80 0.273 0.250 0.296 *Cropping - Pastures 80 0.260 0.246 0.273 *Cropping - Pastures 80 0.320 0.308 0.333 *Cropping - Pastures 100 0.232 0.219 0.245 *Cropping - Pastures 100 0.197 0.184 0.211 *Cropping - Pastures 100 0.218 0.203 0.233 *Cropping - Pastures 100 0.228 0.218 0.237 *Cropping - Pastures 100 0.273 0.265 0.282 *Cropping - Pastures 120 0.182 0.173 0.191 *Cropping - Pastures 120 0.157 0.149 0.166 *Cropping - Pastures 120 0.163 0.154 0.173 *Cropping - Pastures 120 0.196 0.188 0.204 *Cropping - Pastures 120 0.227 0.220 0.234 *Cropping - Pastures 140 0.133 0.121 0.144 *Cropping - Pastures 140 0.117 0.107 0.128 *Cropping - Pastures 140 0.108 0.097 0.119 *Cropping - Pastures 140 0.164 0.153 0.175 *Cropping - Pastures 140 0.180 0.170 0.190 *




Cropping - Pastures 160 0.113 0.105 0.120 *Cropping - Pastures 160 0.104 0.098 0.111 *Cropping - Pastures 160 0.098 0.091 0.105 *Cropping - Pastures 160 0.138 0.131 0.145 *Cropping - Pastures 160 0.153 0.147 0.160 *Cropping - Pastures 180 0.093 0.087 0.100 *Cropping - Pastures 180 0.091 0.085 0.097 *Cropping - Pastures 180 0.088 0.081 0.094 *Cropping - Pastures 180 0.113 0.107 0.119 *Cropping - Pastures 180 0.127 0.121 0.133 *Cropping - Pastures 200 0.074 0.064 0.083 *Cropping - Pastures 200 0.078 0.069 0.087 *Cropping - Pastures 200 0.077 0.068 0.086 *Cropping - Pastures 200 0.087 0.078 0.096 *Cropping - Pastures 200 0.100 0.091 0.109 *Cropping - Pastures 220 0.074 0.067 0.081 *Cropping - Pastures 220 0.078 0.071 0.086 *Cropping - Pastures 220 0.079 0.072 0.086 *Cropping - Pastures 220 0.088 0.081 0.096 *Cropping - Pastures 220 0.099 0.092 0.106 *Cropping - Pastures 240 0.075 0.067 0.082 *Cropping - Pastures 240 0.079 0.071 0.086 *Cropping - Pastures 240 0.081 0.074 0.089 *Cropping - Pastures 240 0.090 0.082 0.097 *Cropping - Pastures 240 0.097 0.090 0.105 *Cropping - Pastures 260 0.075 0.064 0.086 *Cropping - Pastures 260 0.080 0.069 0.090 *Cropping - Pastures 260 0.084 0.073 0.094 *Cropping - Pastures 260 0.091 0.080 0.101 *Cropping - Pastures 260 0.096 0.086 0.106 *Cropping - Pastures 280 0.076 0.060 0.090 *Cropping - Pastures 280 0.080 0.066 0.094 *Cropping - Pastures 280 0.086 0.072 0.099 *Cropping - Pastures 280 0.092 0.078 0.106 *Cropping - Pastures 280 0.095 0.081 0.108 *Cropping - Pastures 300 0.076 0.056 0.095 *Cropping - Pastures 300 0.081 0.063 0.098 *Cropping - Pastures 300 0.088 0.070 0.106 *Cropping - Pastures 300 0.093 0.075 0.111 *Cropping - Pastures 300 0.093 0.076 0.111 *




Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 -0.175 -0.197 -0.154 *Lucerne mixtures - Native 20 -0.230 -0.277 -0.182 *Lucerne mixtures - Native 40 -0.102 -0.144 -0.061 *Lucerne mixtures - Native 40 . . .Lucerne mixtures - Native 40 . . .Lucerne mixtures - Native 40 -0.235 -0.248 -0.220 *Lucerne mixtures - Native 40 -0.236 -0.266 -0.205 *Lucerne mixtures - Native 60 -0.155 -0.182 -0.126 *Lucerne mixtures - Native 60 -0.092 -0.124 -0.059 *Lucerne mixtures - Native 60 . . .Lucerne mixtures - Native 60 -0.294 -0.309 -0.279 *Lucerne mixtures - Native 60 -0.243 -0.261 -0.223 *Lucerne mixtures - Native 80 -0.207 -0.241 -0.173 *Lucerne mixtures - Native 80 -0.161 -0.197 -0.125 *Lucerne mixtures - Native 80 . . .Lucerne mixtures - Native 80 -0.354 -0.378 -0.330 *Lucerne mixtures - Native 80 -0.249 -0.271 -0.227 *Lucerne mixtures - Native 100 -0.170 -0.193 -0.147 *Lucerne mixtures - Native 100 -0.135 -0.158 -0.111 *Lucerne mixtures - Native 100 -0.036 -0.062 -0.010 *Lucerne mixtures - Native 100 -0.274 -0.289 -0.258 *Lucerne mixtures - Native 100 -0.199 -0.213 -0.184 *Lucerne mixtures - Native 120 -0.133 -0.148 -0.118 *Lucerne mixtures - Native 120 -0.108 -0.123 -0.094 *Lucerne mixtures - Native 120 -0.051 -0.067 -0.035 *Lucerne mixtures - Native 120 -0.194 -0.208 -0.181 *Lucerne mixtures - Native 120 -0.149 -0.160 -0.137 *Lucerne mixtures - Native 140 -0.096 -0.115 -0.076 *Lucerne mixtures - Native 140 -0.082 -0.100 -0.064 *Lucerne mixtures - Native 140 -0.066 -0.085 -0.047 *Lucerne mixtures - Native 140 -0.114 -0.133 -0.096 *Lucerne mixtures - Native 140 -0.098 -0.115 -0.082 *




Lucerne mixtures - Native 160 -0.082 -0.095 -0.070 *Lucerne mixtures - Native 160 -0.071 -0.083 -0.059 *Lucerne mixtures - Native 160 -0.056 -0.068 -0.044 *Lucerne mixtures - Native 160 -0.086 -0.099 -0.074 *Lucerne mixtures - Native 160 -0.076 -0.087 -0.065 *Lucerne mixtures - Native 180 -0.069 -0.080 -0.057 *Lucerne mixtures - Native 180 -0.060 -0.071 -0.049 *Lucerne mixtures - Native 180 -0.046 -0.057 -0.035 *Lucerne mixtures - Native 180 -0.059 -0.070 -0.048 *Lucerne mixtures - Native 180 -0.053 -0.064 -0.043 *Lucerne mixtures - Native 200 -0.055 -0.071 -0.038 *Lucerne mixtures - Native 200 -0.049 -0.066 -0.033 *Lucerne mixtures - Native 200 -0.036 -0.052 -0.019 *Lucerne mixtures - Native 200 -0.031 -0.047 -0.015 *Lucerne mixtures - Native 200 -0.031 -0.046 -0.015 *Lucerne mixtures - Native 220 -0.062 -0.074 -0.049 *Lucerne mixtures - Native 220 -0.055 -0.068 -0.043 *Lucerne mixtures - Native 220 -0.044 -0.056 -0.031 *Lucerne mixtures - Native 220 -0.039 -0.051 -0.027 *Lucerne mixtures - Native 220 -0.040 -0.052 -0.028 *Lucerne mixtures - Native 240 -0.068 -0.082 -0.054 *Lucerne mixtures - Native 240 -0.061 -0.074 -0.048 *Lucerne mixtures - Native 240 -0.052 -0.065 -0.038 *Lucerne mixtures - Native 240 -0.048 -0.060 -0.034 *Lucerne mixtures - Native 240 -0.050 -0.062 -0.037 *Lucerne mixtures - Native 260 -0.075 -0.095 -0.055 *Lucerne mixtures - Native 260 -0.067 -0.085 -0.049 *Lucerne mixtures - Native 260 -0.060 -0.077 -0.042 *Lucerne mixtures - Native 260 -0.056 -0.074 -0.038 *Lucerne mixtures - Native 260 -0.060 -0.077 -0.042 *Lucerne mixtures - Native 280 -0.082 -0.109 -0.054 *Lucerne mixtures - Native 280 -0.073 -0.098 -0.048 *Lucerne mixtures - Native 280 -0.068 -0.092 -0.044 *Lucerne mixtures - Native 280 -0.064 -0.089 -0.041 *Lucerne mixtures - Native 280 -0.069 -0.093 -0.045 *Lucerne mixtures - Native 300 -0.089 -0.123 -0.053 *Lucerne mixtures - Native 300 -0.079 -0.111 -0.047 *Lucerne mixtures - Native 300 -0.076 -0.107 -0.045 *Lucerne mixtures - Native 300 -0.073 -0.104 -0.042 *Lucerne mixtures - Native 300 -0.079 -0.109 -0.047 *


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.2

−0.15

−0.1

−0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.1

Figure B.1 Spatial random components: Day 1, Depth 20 cm.

B.2 Supplementary Graphs: Contour Graphs for the spatial resid-

uals

B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 335

5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.06

−0.06

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02

0.04

0.04

0.06


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08 −0.06

−0.04

−0.04

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02 0.02

0.02

0.02

0.04

0.04

0.04

0.04

0.06

0.06

0.06

0.08



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1 −0.06

−0.04

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02

0.02

0.04

0.04

0.04

0.06

0.08


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.05

0

0

0

0

0

0

0

0

0

0.05



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.1

−0.08

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0.02

0.02

0.02

0.02

0.04


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.04

−0.03

−0.03

−0.03

−0

.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0.01

0.01

0.01

0.02

0.02

0.02

0.03

0.03 0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.01

0.02

0.02

0.02

0.02

0.03

0.03

0.03


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.02

−0.02

−0.01

−0.01

0 0

0

0

0

0

0.01

0.01

0.01

0.01

0.02

0.02

0.02

0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.02

−0.02 −0.01

−0.01

−0.01

−0.01 −0.01

0

0

0

0

0

0

0

0 0

0.01

0.01

0.01 0.01

0.01 0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.07

−0.03

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0 0

0

0

0.01 0.01

0.01

0.01

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08

−0.06

−0.04

−0.02

0

0

0

0

0

0

0

0.02 0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.1

−0.08 −0.06

−0.04

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0.02

0.02

0.02

0.04

0.04



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0.05

0.05

0.05


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.1

−0.05

−0.05

−0.05

0

0

0

0 0

0.05

0.05

0.05

0.05

0.1

0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.4

−0.2

−0.2

−0.2

−0.1

−0.1

−0.1

−0.1

−0.1

−0.1

0

0

0

0

0

0

0

0

0

0

0.1

0.1


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.1

0.1

0.1

0.15



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.1

0.1


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.06

−0.06

−0.04

−0.04

−0.04

−0.04

−0.02

−0.02 −0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.04

0.04

0.04

0.04

0.06

0.06



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.06 −0.06

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0 0.02

0.02

0.02

0.02

0.04

0.04

0.06


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.05

−0.05

−0.04

−0.04

−0.04

−0.04

−0.03

−0.03

−0.03

−0.02

−0.02

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.02

0.02

0.02

0.03

0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.03

−0.03

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0 0

0

0

0

0

0.01

0.01

0.01

0.01

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.02

−0.015

−0.015

−0.015

−0.01

−0.01 −0.01 −0

.01

−0.005

−0.005

−0.005

0

0

0

0

0

0

0.005

0.005

0.005

0.005

0.005

0.01

0.01

0.01

0.01

0.01

0.01

0.015

0.015

0.02

0.025



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.01

−0.01

−0.01

0

0

0

0 0

0

0.01 0.01

0.01

0.01

0.01

0.01

0.01 0.01

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.01



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06 −

0.04

−0.02

0

0

0

0

0

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.08

−0.06

−0.04

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08 −0

.06

−0.04

−0.02

−0.02

0 0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.14

−0.1

−0.1

−0.08

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0.02

0.02

0.02

0.02

0.02

0.04

0.04



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.1

−0.05

−0.05

0

0

0

0

0

0

0.05 0.05

0.05

0.05

0.05

0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.05

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.05

0.05

0.1

0.1


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.3

−0.2

−0.15

−0.15

−0.15

−0.1

−0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.2 −0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

0

0

0 0

0

0

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.1

0.15


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1 −0.05

−0.05

0

0

0

0

0

0

0

0

0

0

0

0

0.05 0.05

0.05

0.05 0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.05

−0.05

−0.05

0

0

0

0

0

0

0

0.05

0.05


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.08

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0 0

0

0

0.02

0.02

0.04

0.04



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.07

−0.03

−0.03

−0.02

−0.02

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.015

−0.015

−0.015

−0.01

−0.005

−0.005

−0.005

0

0

0

0

0

0

0

0

0

0

0

0.005

0.005

0.005

0.005

0.005

0.005

0.01

0.01

0.01

0.01

0.01

0.01 0.015

0.015

0.015

0.015

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.01

−0.01

−0.005

−0.005

−0.005

0

0

0

0

0

0

0.005

0.005

0.005

0.005

0.005

0.005

0.005

0.005

0.005 0.005

0.01

0.01

0.01

0.01

0.01

0.01

0.015

0.02

0.025


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0.01



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.04

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.08

−0.06

−0.04

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0.02 0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08

−0.06

−0.04

−0.02

−0.02

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02 0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.14

−0.1

−0.1

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.1

−0.1

−0.05

−0.05

0

0

0

0

0

0

0.05

0.05

0.05

0.05

0.05

0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.05

−0.03

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

0

0 0

0

0

0

0

0

0

0.01

0.01

0.01


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.02

−0.01

−0.01

0

0

0

0

0.01

0.01

0.01

0.01 0.01

0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.03

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0 0

0

0

0

0

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.02

0.02

0.02

0.03

0.03

0.04

0.04

0.04


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0 0

0

0.02

0.02

0.02

0.04

0.04

0.06



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.04

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.06

−0.03

−0.03

−0.03

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01 −0.01

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.02

0.02

0.03

0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.01

−0.01 −0.

01

−0.01

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.03

0.04


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.02

−0.01

−0.01

−0.01

−0.01

0

0

0

0 0

0

0

0

0

0

0.01

0.01

0.01

0.01

0.01

0.01

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.025

−0.015

−0.01

−0.01

−0.01

−0.01

−0.01

−0.005

−0.005

−0.005

−0.005

−0.005

0

0

0

0

0

0.005 0.005

0.005

0.005

0.01

0.01

0.01

0.015


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.08 −0.0

6

−0.02

0

0

0

0

0

0

0

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.08 −0.06

−0.02

0

0

0

0

0

0

0.02

0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08

−0.06

−0.04

−0.02

−0.02

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.1

−0.08

−0.06

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0.02 0.02

0.02 0.02

0.04



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.1

−0.1

−0.05

0

0

0

0

0

0.05

0.05

0.05

0.1



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.2

−0.1

−0.1

−0.1

−0.1

−0.05

−0.05

−0.05

−0.05

−0.05

−0.05

0

0

0

0

0

0 0

0.05


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.06

−0.06

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.04 0.04

0.04

0.06


'- .

~fo~, \ C/; £ ' I

-::>

~ -~~ ---- '


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02

0.02

0.02

0.02

0.04 0.04

0.04


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.02

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0.01

0.01

0.01

0.01 0.01

0.02 0.02

0.02

0.03

0.03

0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.03

−0.03

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.01

0.02 0.02

0.02

0.03


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.04

−0.03

−0.02

−0.02

−0.02

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0 0

0

0

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.03



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.02

−0.02

−0.01

−0.01

−0.01

−0.01

0

0

0

0

0

0

0

0.01

0.01

0.01 0.02

0.03


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.01

−0.01

0

0

0

0

0

0

0

0

0.01

0.01

0.01

0.01

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.015 −0

.015 −

0.01

−0.01

−0.01

−0.005

−0.005

−0.005

0

0

0

0

0

0.005

0.005

0.005

0.005

0.005

0.005

0.01

0.01

0.01

0.015


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.03

−0.02 −0.015

−0.01

−0.01

−0.01

−0.01

−0.005 −0.005

−0.005

−0.005

−0.005 −0.005

−0.005

0

0

0

0

0

0.005

0.005

0.005

0.005

0.005

0.005

0.01

0.01

0.015



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.08 −0.

06

−0.02 0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1 −

0.08

−0.06

−0.04 −0.02

0

0

0

0

0

0

0 0

0

0.02

0.02

0.02

0.02

0.02



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.1

−0.08

−0.06

−0.04

−0.02

−0.02

0

0

0

0

0

0

0

0

0

0

0

0.02

0.02

0.02


5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.12

−0.1

−0.1

−0.06

−0.04

−0.04

−0.02

−0.02

−0.02

−0.02

−0.02

−0.02

0 0

0

0

0.02

0.02

0.02

0.04



5050 5100 5150 5200

4800

4850

4900

4950

5000

5050

−0.15

−0.1

−0.1

−0.05

−0.05

0

0

0 0

0.05

0.05

0.05

0.1


Appendix C

Supplementary graphs and tables for

Chapter 7

C.1 Graphs: Method 1, Method 2 random walk and penalised spline

smoothed models

The fits from Method 1 are those with the more realistic 95% credible intervals. The penalised spline

(over time) models are included to show the seasonality at the shallower depths dampening until there

is virtually no seasonality at the greater depths. The random walk fits show the fits from the chosen

timeseries model.

375

376 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7

Figure C.1 Long fallowing vs Response cropping at depth 100 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model for the contrast. Estimates& 95% CIs.

Figure C.2 Long fallowing vs Response cropping at depth 100 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.

OJ& OJ& 0.11. QJ3

QJ3

OJ1 OJO n o.m Cltl! om Cltl!

lj 1106 ([04.

11111

J 11.02 o.cn o.ro

-o.cn -11.02 -11111 -([04.

-1106 -0.00

CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll

llot.e

QJ3

OJ1

OJO

OJll

Cltl!

H 1107

0.00

lj ll06

([04.

o.m J 11.02

o.cn o.ro

-o.cn -IUIII

-lllll

III.IAN!I6

C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models377

Figure C.3 Long fallowing vs Response cropping at depth 100 cm for all trial days. RandomWalk of order one. Estimates & 95% CIs.


0.11

o.n OJO

DJ»

().(@

n 1107

1106

lj 1106

1104

o.m J 1102

o.m 0.00

-o.m -ll02

-IMII

IILIAN!I6 OISAN!IB

Q.l8

OJll

o.n OJO

DJ»

().(@

n 1107

1106

1106

ij 1104

ll.4ll

1102

J o.m 0.00

-o.m -1102

-IMII

-1104

-1106

Ot.JAJmlll6 Ot.JAJmlll6

t. I

: ·~ .... ··,

,-' '

'f \ ~\, ,I~r\~., If

\.r\ : ( t \ j '-/ ~ /

' :-"\·~~ ! f\,i'2-\_-l---~'"~----------



Figure C.6 Long fallowing vs Response cropping at depth 120 cm for all trial days. RandomWalk of order one. Estimates & 95% CIs.

n ij J

ams-----------------------------------------,

0.00

lltl!

o.ar

0.00

0.00

!1.04.

rum ().02

o.oL

o.w

-o.oL

-!1.02

-o.w~--~-.------,-~~-,------r-~--------~ OlJANlifi OlJANlifi mJANW OLIANll! OLIANll! m.IANro mJANm

0.00

lltl!

o.ar

rum

ll06

!1.04.

ll03

lltl!

o.oL

o.w

-o.oL

-o.w -rum ~~~-,------~~---,----~~----,---~-r

OlJANlifi OLIANll! mJANW OLIANll! OLIANll! m.IANro mJANm

Dao




om

110!

o.ar ().06

n 0.00

(1.04.

ruJI

lj o.oz o.cn J 0.00

-o.cn -o.oz -o.m

-ruM c..-~ .. --~.-.-.. -.-..-.-.. -.-.~.-~-.~ CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll lliJANl!l!ll m.JANl!IOO !II.IANl'llm

llot.e

om

-o.oz~.-.. -,-.~.-.--.-.-.~.-.--.-.-.,-.-.-~ III.IAN!I6 OISAN!I8 IILIAN!I'1 OIJAN!I8 DIJAN!!I !II.TANOO III.IANm


Figure C.9 Long fallowing vs Response cropping at depth 220 for all trial days. Random Walkof order one. Estimates & 95% CIs.


n ij J

~~-----------------------------------,

i I

r / )l ~ ·:

~ .... _\)

-rumc,.-.-.,-.-.~.-.-.,-,-.~.-.-.,-.-.~.-~ IILIAN!I6 OISAN!IB IIIJAN!I'1 OIJAN!I8 DIJAN!I9 !II.TANOO IILIANm

~

11.011

ILOii

1104. / D.OII

rum

o.m

0.00

-o.m J\1 -1!.02

1

-D.OII c,.-.-.,-.-..-.-.,-.-.~--~-.-.~.-.-~-Y m.JAJmlll6 m.JAJmlll6 OIJAl'IIWl OIJAirull8 llUAPruMI OIJAmooo IILIAN!OOl

Data



Figure C.12 Long fallowing vs Response cropping at depth 160 for all trial days. Random Walkof order one. Estimates & 95% CIs.

0.116

I),(M

' '

H OJB ( , . .. ' / , . . '·

lj OJII ~ ,..~-~ .... ~- ---

! / .,

'. / ' __ -- .... __ /

J 11.01 --/

:/ 0.00

-11.01~~~~~~~.-~~-.~~~~~~-.~~~ WJANlili m.IANli6 OIJANi7 OlJANlill OlJANli9 OIJANOO

~

0.116~----------------------------------------,

0.00

-o.m.

-rum ~~~-.~~~~~~-.~~~~~~.-~~-r WJANlili OlJANlill lllJANW OlJANlill OLJANII9 OIJANOO OIJANOL

~


Figure C.13 Long fallowing vs Response cropping at depth 180 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model. Estimates & 95% CIs.


rums----------------------------------------,

0.00

-run

-ru.

-ru.~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

OJJij

OJII

!

n QJB ) ..

ij OJl2 .,.~~ ~ .... ~.-

/1 ·I I o/

J run / noo

-run

CIIJANI6 01JANM WAN91 m.rANIIB ll1.TAN99 lli.IANOO lli.IANOL

lloiB


Figure C.15 Long fallowing vs Response cropping at depth 180 for all trial days. Random Walkof order two. Estimates & 95% CIs.




~s----------------------------------------,

o.cn

0.00

-o.cn

-rum~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

OJBJ

D.OPB

o.a.lll

!1.011!1

11.01!2

n O.a.Jl

Wllll

Wllll

ij o.cJ14

Wll2

Q.CIU)

J 1100!1

D.OJ6

11.001 0.002

0.000

-D.002

-11.001




OJl6

OJII

,''\/\l/\;v\r/-, _/\ _./"'

DJII

0,4D ;1 fl ~ / \ f

~ -:: f ""/ 'V'V'''Yv ',L\e. i -run~~~-.------.-~---.~~-.------,---~~

OlJANJIIi6 OLIAN'llll6 OIJANlW7 OLIAN'llll6 OLTANlllliD OIJANlDJO m.TANlDll

rua;

OJlll.

OJJ82

rum ILIBI

n OJI28

OJlill.

ij OJIIll

ll.lm

OJJl8

J OJJl8

().OK

OJJl8

o.mo OJDI

0..000

lli.TAN95 Ol.IANOL


Figure C.22 100 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.

Random walk models with Prior 5 precisions

Wl

OJO

0.111

o.al

n D.O'I

o.al

0.00

ij llOI.

o.al

J D.02

D.01

0.00

-D.Ol

-!1.02

-o.a~

CII.IANlB85




o.m

o.m om o.m ' fr ..

H 11.06 I! 11.01. f "' ~,,\

o.m \~'\

lj ', D.Qil

\.,\; o.cJl J 0.00

-o.cJL

-o.o:!

-o.m

-11.01.~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

OJl6

11.06 '

1!.04.

n ll03 " ' '\:

D.Qil ·~J j

ij o.cJl '~ ,\

J 0.00

-o.cJL I •I

-ll.Of! :;

-rum ~~---,----~~----,---~-.~~--~--~~ WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll mJANlml mJANlDlL

Date




o.cn

0.00

-o.cn

-rum~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL

OJJij

OJII

n QJB

' ' '

ij OJl2 ' f

j J o.cn

0.00

-o.cn T-~---,------~--~-.--~-,------,-~---r OlJANlllll5 OIJANlSII6 lli.TANlllll7 II1.Willlll8 OIJANllilll OIJANliOOO mJANlDlL

Dao




OJJII llOII2 rum llOPll 1!.0111 ll.OI!I

H llOII2 1!.011)

tl.OlB tl.OlB

lj Q.Q14

II.CIIll Q.CIIO

J 0.0011 D.006 llOOil D.OOil 0.000

-D.WI! -0.00!1 -D.006

IJlJANllill6 OIJANIW& WJANlW1

OJM2

11.010

OJII8

OJII6

OJJII

OJll2

n OJlll

OJIIII

OJIIII

ij ll.llill.

ll.02Z

ll.OID

J OJD8

OJD8

OJlll.

o.mz o..om D..OII!

0..000


Figure C.29 Square root of variances & 95% credible intervals at depth 100 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.

Square root of the spatially structured and the unstructured variances




O.a!

O.<YT

0.06

o.a;

0.00

0.00

0.01

0.00 'r----,-------,----,-------,-------------,-------,'

OJJANl995 OIJAN1996 OIJANl!l!n 01JAN1998 Q1JANl999 01JAN2000 01JAN2001

Date

0.00

o.a;

O.Oi

§

1 0.00

~ 0.00

0.01

0.00 'r--- -,------- ,---- ,------- ,-------------,- ------,'

OJJANl995 01JAN1996 OIJANl!l!n 01JAN1998 OlJANl999 01JAN2000 01JAN2001

Date




o.mo 0.008 -

' 0.006 ' ---

0.004 -- - - -

0.002 - -== O.lro

t 0.018

0.016

.... 0.014

~ 0.012

0.010

O.WI

0.006

0.004

0.000

0.000

OlJANl995 OJJAN1996 QlJANl9!Y7 Q1JAN1998 Q1JANl999 OlJANIDlO Q1JAN2001

:Date

0.002

0.000 - -

0.008

0.006

0.004

0.002

O.lro

i 0.018

0.016

~ 0.014

~~ ~= - -

s '

' -=c--c '

F-

~ • 0.012

0.010

0.00!

0.006

0.004

0.000 i

HHHimlltHHHHHH fHmm IH:!HHHH ±m±tB 0.000

OlJANl995 01JAN1996 OJJANllm 01JANl998 OlJANl999 OlJANIDlO 01JAN2001

:Date




t .... ~

O.!rlS

0.004 -,

0.002 _- - - r:. -, -, - - Lo=:- -, -,

-: - - 10' ~.--r- ~ ~ -, O.lm

0.018 --- f-

0.016

0.014 -~ .-_~ ~- ~-- · -~~

-~

0.012

0.010

0.()(11

0.006

0.004

0.000

0.000 'T-----,-----,-----,----,----,--------,-'

OlJANl995 OJJAN1996 QlJANl9!Y7 OIJAN1998 OlJANl999 !llJANIDlO OIJAN2001

0.002

0.000

0.008

O.!rlS

0.004

0.002

O.lm

--

:Date

j 0.018

Ji 0.016

a! 0.014

0.012

0.010

0.()(11

0.006

0.004

0.000 mm±m HHHHilHHH f~HH~ iH !HiHH iiHHH

0.000 'T-----,-----,----,----,----,--------,-'

OlJANl995 01JAN1996 OJJANllm 01JANl998 OlJANl999 !llJANIDlO 01JAN2001


10 20 30 40 50

−300

−250

−200

−150

−100

−50

Day

Depth

2e−07

2e−07

4e−07

4e−07

4e−07

4e−07

4e−07

6e−07

6e−07

6e−07

6e−07

6e−07

8e−07

8e−07

8e−07

8e−07 8e−07

1e−06

1e−06

1.2e−06

1.2e−06

1.4e−06

1.4e−06

1.4e−06

1.8e−06

Figure C.36 Square root of unstructured variance: Days by Depth. Contour graph smooth.

10 20 30 40 50

−300

−250

−200

−150

−100

−50

Day

Depth

0.005

0.005

0.005

0.005

0.005

0.01

0.01

0.01

0.01

0.01

0.015

0.015

0.015

0.015

0.015

0.015

0.015

0.015

0.02 0.02

0.02

Figure C.37 Square root of spatially structured variances: Days by Depth. Contour graphsmooth.


Figure C.38 ρ & 95% credible intervals.


C.2 Final estimates and credible intervals for the contrast of long

fallow cropping versus response cropping

C.2. Final estimates and credible intervals for the contrast of long fallow cropping versusresponse cropping 399

Table C.1 Contrast estimates for depth 100 cm: Long fallow cropping vs Response cropping

Depth Year Estimate q025 q975

100 1995 0.053 0.033 0.0730.049 0.029 0.0700.038 0.018 0.0580.027 0.008 0.047

-0.023 -0.051 0.0060.017 -0.016 0.050

100 1996 0.021 -0.001 0.0440.006 -0.012 0.0250.002 -0.021 0.0250.038 0.008 0.0700.099 0.052 0.1470.096 0.043 0.1480.099 0.044 0.1530.100 0.047 0.1530.069 0.016 0.1220.069 0.034 0.1040.046 0.008 0.084

100 1997 0.046 0.009 0.0830.023 -0.014 0.0610.024 -0.006 0.0540.027 0.000 0.0540.075 0.046 0.1050.081 0.048 0.1150.085 0.051 0.1200.087 0.053 0.1210.085 0.051 0.1190.084 0.049 0.1180.057 0.022 0.0910.029 -0.003 0.0610.032 0.000 0.062

100 1998 0.029 0.000 0.0590.017 -0.012 0.0460.025 -0.006 0.0570.025 -0.011 0.0600.071 0.034 0.1060.047 0.007 0.0860.022 -0.005 0.0480.015 -0.012 0.0410.014 -0.009 0.0390.005 -0.017 0.0280.006 -0.016 0.029

100 1999 0.017 -0.004 0.0370.016 -0.005 0.0370.026 0.006 0.0480.037 0.016 0.0590.040 0.019 0.0610.041 0.019 0.0620.040 0.018 0.0620.024 0.002 0.0460.018 -0.005 0.0420.016 -0.008 0.0410.015 -0.008 0.039

100 2000 0.015 -0.005 0.0360.054 0.033 0.0750.072 0.051 0.0940.086 0.066 0.107




120 1995 0.027 0.010 0.0440.027 0.010 0.0440.020 0.003 0.0370.016 -0.001 0.033

-0.026 -0.047 -0.005-0.005 -0.027 0.016

120 1996 0.028 0.007 0.0500.013 -0.005 0.0320.011 -0.010 0.0320.013 -0.015 0.0410.071 0.035 0.1090.065 0.022 0.1070.071 0.025 0.1170.079 0.036 0.1220.060 0.012 0.1090.061 0.024 0.0980.044 0.006 0.080

120 1997 0.058 0.020 0.0950.038 0.000 0.0770.033 0.001 0.0650.029 0.002 0.0550.046 0.020 0.0740.060 0.033 0.0890.060 0.032 0.0870.057 0.027 0.0860.056 0.024 0.0870.058 0.028 0.0870.042 0.013 0.0710.024 -0.001 0.0480.027 0.002 0.052

120 1998 0.028 0.004 0.051-0.001 -0.025 0.022-0.003 -0.028 0.0220.005 -0.023 0.0330.043 0.013 0.0720.029 -0.003 0.0600.037 0.011 0.0630.029 0.004 0.0530.029 0.006 0.0530.018 -0.004 0.0400.012 -0.009 0.033

120 1999 0.010 -0.010 0.0310.020 -0.001 0.0400.016 -0.005 0.0360.033 0.014 0.0520.033 0.013 0.0520.037 0.017 0.0560.032 0.013 0.0520.029 0.009 0.0490.021 0.002 0.0410.020 0.001 0.0400.024 0.005 0.044

120 2000 0.016 -0.004 0.0370.048 0.028 0.0690.065 0.045 0.0860.067 0.047 0.086




140 1995 0.010 -0.007 0.0270.012 -0.006 0.0290.005 -0.013 0.0220.010 -0.007 0.027

-0.015 -0.033 0.004-0.005 -0.024 0.013

140 1996 0.020 -0.002 0.0410.016 -0.005 0.0360.008 -0.012 0.0270.003 -0.020 0.0270.030 -0.001 0.0610.028 -0.006 0.0620.033 -0.002 0.0680.046 0.013 0.0800.038 0.000 0.0760.041 0.009 0.0720.031 -0.001 0.065

140 1997 0.050 0.021 0.0790.038 0.010 0.0660.029 0.004 0.0530.024 0.003 0.0450.026 0.006 0.0470.040 0.019 0.0610.036 0.015 0.0570.044 0.022 0.0670.043 0.017 0.0710.036 0.014 0.0590.027 0.003 0.0500.013 -0.006 0.0330.018 -0.002 0.039

140 1998 0.019 -0.002 0.040-0.004 -0.022 0.014-0.009 -0.028 0.011-0.011 -0.032 0.0100.014 -0.007 0.0350.014 -0.008 0.0360.043 0.020 0.0670.042 0.021 0.0640.044 0.023 0.0660.021 0.001 0.0420.020 -0.000 0.039

140 1999 0.013 -0.006 0.0320.020 0.002 0.0380.017 -0.001 0.0350.022 0.004 0.0400.023 0.005 0.0410.022 0.003 0.0410.021 0.003 0.0390.019 0.001 0.0370.018 0.000 0.0350.020 0.001 0.0380.015 -0.002 0.033

140 2000 0.014 -0.004 0.0330.025 0.007 0.0430.047 0.028 0.0670.046 0.025 0.066




160 1995 0.002 -0.014 0.0190.001 -0.015 0.0180.003 -0.013 0.0200.004 -0.012 0.021

-0.010 -0.026 0.008-0.003 -0.019 0.014

160 1996 0.011 -0.007 0.0310.011 -0.006 0.0290.008 -0.009 0.0260.000 -0.019 0.0200.005 -0.015 0.0270.012 -0.009 0.0320.011 -0.010 0.0330.023 0.003 0.0430.024 0.003 0.0450.022 0.001 0.0430.022 0.000 0.043

160 1997 0.027 0.006 0.0480.026 0.006 0.0460.027 0.008 0.0460.025 0.007 0.0430.018 -0.001 0.0360.029 0.010 0.0480.021 0.003 0.0390.026 0.006 0.0450.026 0.005 0.0460.026 0.008 0.0450.021 0.001 0.0400.011 -0.007 0.0290.012 -0.006 0.030

160 1998 0.013 -0.005 0.0310.007 -0.010 0.024

-0.009 -0.025 0.007-0.009 -0.026 0.0090.002 -0.015 0.0200.003 -0.014 0.0210.036 0.018 0.0540.038 0.020 0.0560.041 0.022 0.0580.026 0.009 0.0440.022 0.005 0.040

160 1999 0.020 0.003 0.0370.016 -0.001 0.0330.018 0.002 0.0350.019 0.000 0.0380.016 -0.001 0.0320.019 0.002 0.0360.016 -0.002 0.0320.022 0.005 0.0390.019 0.002 0.0340.014 -0.002 0.0310.021 0.005 0.037

160 2000 0.014 -0.002 0.0300.016 0.000 0.0320.029 0.013 0.0450.030 0.011 0.048




180 1995 -0.000 -0.016 0.0160.003 -0.014 0.0190.002 -0.015 0.0190.006 -0.011 0.0230.001 -0.016 0.018

-0.002 -0.018 0.015

180 1996 0.007 -0.011 0.0240.009 -0.008 0.0260.011 -0.006 0.0280.008 -0.009 0.0250.003 -0.016 0.0210.009 -0.010 0.0280.009 -0.010 0.0270.015 -0.004 0.0340.017 -0.003 0.0360.021 0.002 0.0400.018 -0.001 0.037

180 1997 0.022 0.004 0.0410.021 0.003 0.0390.024 0.006 0.0420.028 0.011 0.0450.021 0.003 0.0380.022 0.003 0.0410.022 0.004 0.0400.025 0.007 0.0440.011 -0.013 0.0360.022 0.004 0.0400.020 0.001 0.0400.015 -0.003 0.0320.014 -0.004 0.032

180 1998 0.017 -0.000 0.0350.014 -0.004 0.031

-0.002 -0.019 0.016-0.004 -0.021 0.012-0.000 -0.018 0.0170.007 -0.010 0.0240.026 0.009 0.0440.025 0.008 0.0420.035 0.018 0.0520.027 0.010 0.0440.034 0.017 0.051

180 1999 0.021 0.004 0.0380.024 0.008 0.0400.020 0.004 0.0370.020 0.004 0.0360.014 -0.002 0.0310.017 0.001 0.0340.019 0.003 0.0360.018 0.002 0.0350.019 0.003 0.0350.017 0.001 0.0320.012 -0.004 0.028

180 2000 0.011 -0.004 0.0270.015 -0.001 0.0310.022 0.006 0.0380.024 0.008 0.040




200 1995 0.001 -0.016 0.0180.003 -0.014 0.0190.006 -0.012 0.0230.004 -0.013 0.0210.003 -0.015 0.0200.005 -0.013 0.023

200 1996 0.009 -0.009 0.0270.007 -0.011 0.0240.006 -0.012 0.0230.011 -0.007 0.0280.003 -0.014 0.0210.010 -0.007 0.0290.011 -0.006 0.0290.014 -0.003 0.0320.014 -0.003 0.0320.013 -0.004 0.0310.016 -0.001 0.034

200 1997 0.013 -0.004 0.0300.021 0.003 0.0400.016 -0.001 0.0340.021 0.003 0.0380.022 0.004 0.0400.022 0.004 0.0390.017 -0.001 0.0340.023 0.004 0.0410.026 0.008 0.0430.022 0.005 0.0400.019 0.000 0.0380.014 -0.004 0.0320.016 -0.001 0.032

200 1998 0.015 -0.003 0.0320.017 -0.001 0.0350.012 -0.006 0.0310.009 -0.009 0.0270.007 -0.011 0.0250.011 -0.006 0.0290.019 0.003 0.0380.014 -0.002 0.0320.025 0.007 0.0430.026 0.009 0.0440.028 0.012 0.045

200 1999 0.017 0.001 0.0340.027 0.010 0.0440.023 0.005 0.0410.015 -0.001 0.0320.017 0.001 0.0340.020 0.003 0.0360.019 0.003 0.0350.018 0.002 0.0350.019 0.003 0.0350.012 -0.004 0.0290.016 -0.000 0.033

200 2000 0.017 -0.001 0.0340.017 0.000 0.0340.017 0.001 0.0340.016 0.000 0.033




220 1995 0.014 -0.006 0.0330.014 -0.005 0.0330.016 -0.004 0.0360.013 -0.006 0.0320.013 -0.006 0.0330.016 -0.004 0.035

220 1996 0.015 -0.006 0.0360.023 0.004 0.0430.025 0.005 0.0430.024 0.004 0.0430.021 0.002 0.0400.023 0.004 0.0430.025 0.005 0.0450.021 0.002 0.0400.022 0.003 0.0420.024 0.005 0.0440.025 0.006 0.045

220 1997 0.032 0.012 0.0510.032 0.013 0.0530.026 0.007 0.0450.027 0.008 0.0460.021 0.001 0.0410.030 0.011 0.0500.025 0.005 0.0450.030 0.010 0.0500.023 0.003 0.0420.028 0.008 0.0480.026 0.005 0.0470.026 0.003 0.0490.028 0.006 0.050

220 1998 0.027 0.004 0.0500.024 0.001 0.0480.024 0.002 0.0470.023 -0.000 0.0470.015 -0.008 0.0380.019 -0.005 0.0420.031 0.008 0.0540.027 0.005 0.0490.031 0.008 0.0550.033 0.010 0.0550.035 0.012 0.058

220 1999 0.034 0.009 0.0590.032 0.008 0.0550.029 0.006 0.0520.030 0.007 0.0530.031 0.007 0.0540.026 0.003 0.0490.033 0.010 0.0560.030 0.007 0.0530.023 0.001 0.0450.024 0.002 0.0450.027 0.005 0.048

220 2000 0.022 -0.000 0.0440.021 -0.002 0.0430.023 0.000 0.0450.024 0.002 0.046

Full Reference List

Full Reference List



Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive discrete-

time survival models. Statistics in Medicine 24(5), 709–728.

Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior

distribution of the likelihood. Statistics and Computing 7, 253–261.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In

B. Petrox and F. Caski (Eds.), Second International Symposium on Information Theory, Akademia

Kiado, Budapest, Hungary.

Albert, I., E. Grenier, J.-B. Denis, and J. Rousseau (2008). Quantitative Risk Assessment from Farm

to Fork and Beyond: A Global Bayesian Approach Concerning Food-Borne Diseases. Risk Analy-

sis 28(2), 557–571.

Andersen, S., K. Olesen, F. Jensen, and F. Jensen (1989). Hugin - a shell for building Bayesian belief

universes for expert systems. In Eleventh International Joint Conference on Artificial Intelligence,

Detroit, Michigan, pp. 1080–1085.

Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum,

S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third Edition (22

Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics (SIAM).

407


Anderson, J. M. (2007). AWA water recycling forum position paper: Water recycling to meet our water

needs. In S. J. Khan, R. M. Stuetz, and J. M. Anderson (Eds.), Water Reuse and Recycling 2007.

Sydney: UNSW Publishing & Printing Services.

Anderson, R. and R. May (1991). Infectious diseases of humans: dynamics and control. New York:

Oxford University Press.

Anon. (2008). Accessed, June, 2008: http://www.nationmaster.com/country/as-australia/.

Arrowood, M. J., P. J. Lammie, J. W. Priest, D. G. Addiss, M. R. Hurd, W. R. MacKenzie, A. C. McDon-

ald, M. S. Gradus, G. Linke, and E. Zembrowski (2001). Cryptosporidium parvum-specific antibody

responses among children residing in Milwaukee during the 1993 waterborne outbreak. Journal of

Infectious Diseases 183(9), 1373–1378.

Asano, T. (1998). Wastewater reclamation and reuse. Water Quality Management Library ; V. 10.

Lancaster, Pa.: Technomic Pub.

Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson (2005). Mi-

crobial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers University of Tech-

nology.

Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environmetrics 14(5),

453–473.

Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter model

applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.

Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishmaniasis in

a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in Medicine 20(15),

2319–2335.

Ayars, J. E., P. Shouse, and S. M. Lesch (2009). In situ use of groundwater by alfalfa. Agricultural Water

Management 96(11), 1579–1586.

Baird, D. and R. Mead (1991). The empirical efficiency and validity of two neighbour models. Biomet-

rics 47(4), 1473–1487.

FULL REFERENCE LIST 409



Chapman & Hall.

Barker, G. C., N. L. C. Talbot, and M. W. Peck (2002). Risk assessment for Clostridium botulinum: a

network approach. International Biodeterioration & Biodegradation 50(3-4), 167–175.

Bartlett, M. (1978). Nearest neighbour models in the analysis of field experiments. Journal of the Royal

Statistical Society. Series B (Methodological) 40(2), 147–174.

Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). BayesX Software for Bayesian Infer-



Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). BayesX Software for Bayesian Inference in



Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in PM2. 5

chemical composition in the United States for health effects studies. Environmental Health Perspec-

tives 115(7), 989–995.

Bellhouse, D. R. (2004). The Reverend Thomas Bayes, FRS: A biography to celebrate the tercentenary

of his birth. Statistical Science 19(1), 3–43.

Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995). Bayesian

analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22), 2433–2443.

Besag, J. and R. Kempton (1986). Statistical analysis of field experiments using neighbouring plots.

Biometrics 42(2), 231–251.


R. Statist. Soc. B 36(2), 192–236.




Besag, J. E. and D. Higdon (1993). Bayesian inference for agricultural field experiments. Bull. Inst.

Internat. Statist 55(Book 1), 121–136.



Besag, J. E. and C. Kooperberg (1995). On conditional and intrinsic autoregressions. Biometrika 82(4),

733–746.


Biometrika 92(4), 909–920.



Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman,

A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms (BLAS).

ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.

Blaser, M. J. and L. S. Newman (1982). A review of human salmonellosis: I. Infective dose. Reviews of

infectious diseases 4(6), 1096–1106.

Boerlage, B. (1992). Link Strength in Bayesian Networks. Ph. D. thesis, University of British Columbia,

Canada.

Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness. Journal of

the Royal Statistical Society. Series A (General) 143(4), 383–430.

Box, G. E. P. and G. M. Jenkins (1976). Time series analysis : forecasting and control (Rev. ed.).

Holden-Day series in time series analysis and digital processing. San Francisco: Holden-Day.

Brandl, M. T. and R. Amundson (2008). Leaf age as a risk factor in contamination of lettuce with

Escherichia coli O157 : H7 and Salmonella enterica. Applied and Environmental Microbiology 74(8),

2298–2306.






Brook, D. (1964). On the distinction between the conditional probability and the joint probability ap-

proaches in the specification of nearest-neighbour systems. Biometrika 51(3-4), 481.

Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg (2002).

Statistical estimation of parameters in a disease transmission model: analysis of a Cryptosporidium

outbreak. Statistics in Medicine 21, 3627–3638.

Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salinity studies

- Liverpool Plains, TS94.026. Technical report, New South Wales Department of Water Resources.

Bureau of Meteorology (2010, April 15). 2010JR12235 *** Student/Request for Data, Forecasts

or other services/wa/Climate and Past Weather*** (JR- [SEC=UNCLASSIFIED]. email: cli-

[email protected].

Burgman, M. (2005). Risks and Decisions for Conservation and Environmental Management. New York:

Cambridge University Press.

Butler, D. G., B. R. Cullis, A. R. Gilmour, and B. J. Gogel (2007). Analysis of Mixed Models for S

Language Environments, ASReml-R Reference Manual Release 2, Volume No. QE02001 of Training

and Development Series. Brisbane, Australia: Queensland Department of Primary Industries and

Fisheries.

Casman, E. A., B. Fischhoff, C. Palmgren, M. J. Small, and F. Wu (2000). An integrated risk model of a

drinking water borne cryptosporidiosis outbreak. Risk Analysis 20(4), 495–511.

Castillo, E., J. M. Gutierrez, and E. Castillo (1997). Sensitivity analysis in discrete Bayesian networks.

IEEE Transactions on Systems, Man & Cybernetics: Part A 27, 412–423.

Castillo, E., J. M. Gutierrez, A. S. Hadi, and C. Solares (1997). Symbolic propagation and sensitivity

analysis in Gaussian Bayesian networks with application to damage assessment. Artificial Intelligence

in Engineering 11, 173–181.


Chan, H. and A. Darwiche (2004). Sensitivity analysis in Bayesian networks: from single to multiple

parameters. In UAI ‘04 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence,

pp. 67–75. AUAI Press.

Chib, S. and B. P. Carlin (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and

Computing 9, 17–26.

Clements, A., S. Brooker, U. Nyandindi, A. Fenwick, and L. Blair (2008). Bayesian spatial analysis

of a national urinary Schistosomiasis questionnaire to assist geographic targeting of Schistosomiasis

control in Tanzania, East Africa. International Journal for Parasitology 38, 401–415.

Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F. Gabrielli,

and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated uncertainty, West

Africa. Emerging Infectious Diseases 14(10), 1629–1632.

Commandeur, J. J. F. and S. J. Koopman (2007). An introduction to state space time series analysis.

Practical econometrics. Oxford New York: Oxford University Press.

Cowell, R. G. and A. P. Dawid (1992). Fast retraction of evidence in a probabilistic expert system.

Statistics and Computing 2(1), 37–40.

Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic Networks and

Expert Systems. Springer.



Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time to event data:

a case study using months from angiography to coronary artery bypass graft (CABG). Statistics in

Medicine 22(18), 2943–2961.

Cullen, A. C. and H. C. Frey (1999). Probabilistic techniques in exposure assessment : a handbook for

dealing with variability and uncertainty in models and inputs. New York: Plenum Press.

Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to two dimen-

sions. Biometrics 47, 1449–1460.


Cullis, B. R., W. J. Lill, J. A. Fisher, B. J. Read, and A. C. Gleeson (1989). A new procedure for the

analysis of early generation variety trials. Journal of the Royal Statistical Society Series C Applied

Statistics 38(2), 361–375.

Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship between

yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions. Australian Journal

of Experimental Agriculture 41, 211–217.

Darroch, J. N., S. L. Lauritzen, and T. P. Speed (1980). Markov fields and log-linear interaction models

for contingency tables. The Annals of Statistics 8(3), 522–539.

Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems.

Statistics and Computing 2(1), 25–36.

Dawid, A. P., U. Kjaerulff, and S. L. Lauritzen (1995). Hybrid propagation in junction trees. In Advances

in Intelligent Computing - Ipmu ’94, Volume 945 of Lecture Notes in Computer Science, pp. 87–97.

Springer Verlag KG.

Delignette-Muller, M. L., M. Cornu, R. Pouillot, and J. B. Denis (2006). Use of Bayesian modelling

in risk assessment: Application to growth of Listeria monocytogenes and food flora in cold-smoked

salmon. International Journal of Food Microbiology 106(2), 195–208.

Dillon, P., D. Page, J. Vanderzalm, P. Pavelic, S. Toze, E. Bekele, J. Sidhu, H. Prommer, S. Higginson,

R. Regel, S. Rinck-Pfeiffer, M. Purdie, C. Pitman, and T. Wintgens (2008). A critical evaluation of

combined engineered and aquifer treatment systems in water recycling. Water Science & Technology

- WST 57(5), 753–762.

Dunson, D. (2001). Commentary: practical advantages of Bayesian analysis of epidemiologic data.

American Journal of Epidemiology 153(12), 1222–1226.

Durban, M., C. A. Hackett, J. W. McNicol, A. C. Newton, W. T. B. Thomas, and I. D. Currie (2003).

The practical use of semiparametric models in field trials. Journal of Agricultural Biological and

Environmental Statistics 8(1), 48–66.

Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn, D. Muscatello,

and K. Mengersen (2010). Small area estimation of sparse disease counts using shared component


models-application to birth defect registry data in New South Wales, Australia. Health & Place 16,

684–693.

Edwards, D. (1995). Introduction to Graphical Modelling. New York: Springer-Verlag.

Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an epidemi-

ological framework. Risk Analysis 16, 549–563.

Eisenberg, J. N. S., M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr (2002). Disease transmis-

sion models for public health decision making: Analysis of epidemic and endemic conditions caused

by waterborne pathogens. Environmental Health Perspectives 110(8), 783–790.

Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An analysis

of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection process.


Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publications. Ox-

ford: Oxford University Press.

Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression for space-time data:

A Bayesian perspective. Statistica Sinica 14, 731–761.

Fewtrell, L. and J. Bartram (2001). Water Quality: Guidelines, Standards and Health. London: World

Health Organisation.

Fienberg, S. E. (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1, 1–40.

Fong, Y., H. Rue, and J. Wakefield (2010). Bayesian inference for generalized linear mixed models.

Biostatistics 11(3), 397–412.


Gamerman, D. and H. F. Lopes (2006). Markov chain Monte Carlo : stochastic simulation for Bayesian

inference (2nd ed.). London ; New York: Chapman & Hall.




Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian data analysis. Texts in statistical

science. London: Chapman & Hall.

Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration

of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741.

Gerba, C. P., N. C.-d. Campo, J. P. Brooks, and I. L. Pepper (2008). Exposure and risk assessment of

Salmonella in recycled residuals. Water Science & Technology 57(7), 1061–1065.

Gerlach, R., C. Carter, and R. Kohn (2000). Efficient Bayesian inference for dynamic mixture models.

Journal of the American Statistical Association 95(451), 818–828.

Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior

moments. Bayesian Statistics 4, 169–188.

Gibbs, R. A. (1995). Die-off of human pathogens in stored wastewater sludge and sludge applied to

land. Technical report, Urban Water Research Association of Australia, Water Services Association

of Australia, Melbourne.

Gibbs, R. A. and G. E. Ho (1993). Health risks from pathogens in untreated wastewater sludge: implica-

tions for Australian sludge management guidelines. Water 20(1), 17–22.

Gibbs, R. A., C. J. Hu, G. E. Ho, P. A. Phillips, and I. Unkovich (1995). Pathogen die-off in stored

wastewater sludge. Water Science & Technology 31(5-6), 91–95.



269–293.

Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide Release 2.0.

Technical report, VSN International Ltd, Hemel Hempstead, UK.

Gilmour, A. R., R. Thompson, and B. R. Cullis (1995). Average information REML: an efficient algo-

rithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–1450.

Gordon, C. and S. Toze (2003). Influence of groundwater characteristics on the survival of enteric viruses.

Journal of Applied Microbiology 95(3), 536–544.


Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water infiltration.

Water resources research 26(11), 2695–2703.

Gotway, C. A. and L. J. Young (2002). Combining incompatible spatial data. Journal of the American

Statistical Association 97(458), 632–648.

Green, P. J. and R. Sibson (1978). Computing Dirichlet tessellations in the plane. Computer Journal 21,

168–173.

Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.), Water

Quality: Guidelines, Standards and Health. WHO.


1205–1214.


Wiley.

Hall, G. (2004). Results from the National Gastroenteritis Survey 2001 2002. Technical Report NCEPH

Working Paper Number 50, National Centre for Epidemiology & Population Health.

Hall, G. and M. Kirk (2005). Foodborne illness in Australia annual incidence circa 2000. Technical

report, Australian Government Department of Health and Ageing.

Hall, G., J. Raupach, and K. Yohannes (2006). An estimate of under-reporting of foodborne notifiable

diseases: Salmonella Campylobacter Shiga toxin producing E. coli (STEC). Technical report, National

Centre for Epidemiology & Population Health.

Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen (2007).

Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula bloom initiation

in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–1287.

Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge

England: Cambridge University Press.


Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spatial

prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics 12(2),

147–160.

Hastie, T. and R. Tibshirani (1990). Generalized additive models (1st ed.). Monographs on statistics and

applied probability. London ; New York: Chapman and Hall.



Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and G. Medema

(2007). Removal and fate of Cryptosporidium parvum, Clostridium perfringens and small-sized centric

diatoms (Stephanodiscus hantzschii) in slow sand filters. Water Research 41, 2151–2162.

Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms by drinking

water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.

Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-organisms

by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.

Hijnen, W. A. M., J. F. Schijven, P. Bonne, A. Visser, and G. J. Medema (2004). Elimination of viruses,

bacteria and protozoan oocysts by slow sand filtration. Water Science & Technology 50(1), 147–154.

Hoeting, J. A., R. A. Davis, A. A. Merton, and S. E. Thompson (2006). Model selection for geostatistical

models. Ecological Applications 16(1), 87–98.

Holbrook, N. and N. Bindoff (2000). A statistically efficient mapping technique for four-dimensional

ocean temperature data. Journal of Atmospheric and Oceanic Technology 17(6), 831–846.



Hrudey, S. E., P. M. Huck, P. Payment, R. W. Gillham, and E. J. Hrudey (2002). Walkerton: Lessons

learned in comparison with waterborne outbreaks in the developed world. Journal of Environmental

Engineering and Science 1(6), 397–407.

Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6, 2008.


Hugin Expert A/S (2007). Hugin Expert - Publications. Available on:

www.hugin.com/developer/Publications/. Accessed: November 6, 2008.

Hunter, P. R. and L. Fewtrell (2001). Acceptable risk. In L. Fewtrell and J. Bartram (Eds.), Water Quality:

Guidelines, Standards and Health. WHO.

Isaac, D. (2008a). Email: June 27,2008: Re: Fw: Recycled water: measurements required under licence

by the Health Department.

Isaac, D. (2008b). Fit for purpose guidelines for recycled water. email, received June 26, 2008.

Jacobsen, K. and J. Koopman (2004). Declining hepatitis A seroprevalence: a global review and analysis.

Epidemiology and Infection 132, 1005–1022.

Jensen, F. (1994). Implementation aspects of various propagation algorithms in Hugin. Technical Report

Research Report R-94-2014, Department of Mathematics and Computer Science, Aalborg University,

Denmark, Aalborg, Denmark.

Jensen, F. (2001). Bayesian Networks and Decision Graphs. Springer.

Jensen, F. V., S. H. Aldenryd, and K. B. Jensen (1995). Sensitivity analysis in Bayesian networks. Lecture

Notes in Artificial Intelligence 946, 243.

Jensen, F. V., B. Chamberlain, T. Nordahl, and F. Jensen (1991). Analysis in Hugin of data conflict. In

Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, New

York, NY, USA, pp. 519–528. Elsevier Science Inc.

Jordan, M. I. (2004). Graphical models. Statistical Science 19(1), 140–155.

Karanis, P., C. Kourenti, and H. Smith (2007). Waterborne transmission of protozoan parasites: A

worldwide review of outbreaks and lessons learnt. Journal of Water and Health 5(1), 1–38.

Karim, M. R., E. P. Glenn, and C. P. Gerba (2008). The effect of wetland vegetation on the survival of

Escherichia coli, Salmonella typhimurium, bacteriophage MS-2 and polio virus. Journal of Water and

Health 06(2), 167–175.


Kelly, D. L. and C. L. Smith (2009). Bayesian inference in probabilistic risk assessment–The current

state of the art. Reliability Engineering & System Safety 94(2), 628–643. 0951-8320 doi: DOI:

10.1016/j.ress.2008.07.002.

Kennett, R. J., K. B. Korb, and A. E. Nicholson (2001). Seebreeze prediction using Bayesian networks:

a case study. Lecture Notes in Computer Science 2035, 148–153.

Khan, S. J. (2010). Quantitative chemical exposure assessment for water recycling schemes. Waterlines

Report Series, No 27. Australian Government National Water Commission.

Kinde, H., M. Adelson, A. Ardans, E. H. Little, D. Willoughby, D. Berchtold, D. H. Read, R. Breitmeyer,

D. Kerr, R. Tarbell, and E. Hughes (1997). Prevalence of Salmonella in municipal sewage treatment

plant effluents in Southern California. Avian Diseases 41(2), 392–398.

Kneib, T. (2006). Geoadditive hazard regression for interval censored survival times. Computational

Statistics and Data Analysis 51, 777–792.

Kneib, T. and L. Fahrmeir (2006). Structured additive regression for categorical spacetime data: A mixed

model approach. Biometrics 62(1), 109–118.

Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics in

Medicine 17, 2045–2060.


Lang, S. and A. Brezger (2004). Bayesian P-splines. Journal of Computational and Graphical Statis-

tics 13(1), 183–212.

Laskey, K. B. (1995). Sensitivity analysis for probability assessments in Bayesian networks. IEEE

Transactions on Systems, Man and Cybernetics 25, 901–909.

Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data. Computa-

tional Statistics & Data Analysis 19, 191–201.

Lauritzen, S. L. and D. J. Spiegelhalter (1988). Local computations with probabilities on graphical

structures and their application to expert systems. Journal of the Royal Statistical Society. Series B

(Methodological) 50(2), 157–224.


Lawson, C. L., R. J. Hanson, D. R. Kincaid, and K. F. T (1979). Basic Linear Algebra Subprograms for

Fortran usage. ACM Trans. Math. Software 5(3), 324–325.

Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend fields of North

Atlantic sea surface temperature. Journal of the American Statistical Association 104(485), 5–18.

Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in a central

California estuary. Journal of Agricultural, Biological, and Environmental Statistics 12(3), 379–396.


Markov random fields: the stochastic partial differential equation approach. Journal of the Royal

Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.

Lui, J. S., W. H. Wong, and A. Kong (1994). Covariance structure of the Gibbs sampler with applications

to the comparisons of estimators and augmentations schemes. Journal of the Royal Statistical Society,

Series B 57(1), 157–169.

Lumina Decision Systems (2004). Analytica. Available on:

www.lumina.com/ana/editiondescriptions.htm. Accessed: April 24, 2008.






MacKenzie, W., N. J. Hoxie, M. E. Proctor, M. S. Gradus, K. A. Blair, D. E. Peterson, J. J. Kazmierczak,

D. G. Addiss, K. R. Fox, J. B. Rose, and J. P. Davis (1994). A massive outbreak in Milwaukee

of cryptosporidium infection transmitted through the public water supply. New England Journal of

Medicine 331(3), 161–167.

MacKenzie, W. R., W. L. Schell, K. A. Blair, D. G. Addiss, D. E. Peterson, N. J. Hoxie, J. J. Kazmierczak,

and J. P. Davis (1995). A massive outbreak of waterborne cryptosporidium infection in Milwaukee,

Wisconsin: Recurrence of illness and risk of secondary transmission. Clinical infectious diseaseas 21,

57–62.


Marcot, B. G. (2006). Characterizing species at risk I: Modeling rare species under the Northwest Forest

Plan. Ecology and Society 11(2), 10.

Marcot, B. G., P. A. Hohenlohe, S. Morey, R. Holmes, R. Molina, M. C. Turley, M. H. Huff, and J. A.

Laurence (2006). Characterizing species at risk II: Using Bayesian belief networks as decision support

tools to determine species conservation categories under the Northwest Forest Plan. Ecology and

Society 11(2), 12.

Marks, H. M., M. E. Coleman, C. T. J. Lin, and T. Roberts (1998). Topics in microbial risk assessment:

Dynamic flow tree process. Risk Analysis 18(3), 309–328.



Martin, J. E., T. Rivas, J. M. Matas, J. Taboada, and A. Argelles (2009). A Bayesian network analysis of

workplace accidents caused by falls from a height. Safety Science 47(2), 206–214.

Martin, R. J., N. Chauhan, J. A. Eccleston, and B. S. P. Chan (2006). Efficient experimental designs when

most treatments are unreplicated. Linear Algebra and its Applications 417(1), 163–182.

Martino, S. and H. Rue (2008). Implementing approximate Bayesian inference using Integrated Nested

Laplace Approximation: A manual for the INLA program. Citeseer.

Martino, S. and H. Rue (2009). R Package: INLA. Department of Mathematical Sciences NTNU,

Norway.

Matias, J. M., T. Rivas, C. Ordonez, J. Taboada, and J. M. Matias (2007). Assessing the environmental

impact of slate quarrying using Bayesian networks and GIS. In AIP Conference, Volume 963, pp.

1285–1288.

McCullough, N. B. and C. W. Eisele (1951a). Experimental human salmonellosis: I. pathogenicity of



McCullough, N. B. and C. W. Eisele (1951b). Experimental human salmonellosis: II. Immunity studies

following experimental illness with Salmonella meleagridis and Salmonella anatum. The Journal of

Immunology 66(5), 595–608.


McCullough, N. B. and C. W. Eisele (1951c). Experimental human salmonellosis: III. Pathogenicity of

strains of Salmonella newport, Salmonella derby, and Salmonella bareilly obtained from spray-dried

whole egg. The Journal of Infectious Diseases 89(3), 209–213.

McCullough, N. B. and C. W. Eisele (1951d). Experimental human salmonellosis: IV. Pathogenicity

of strains of Salmonella pullorum obtained from spray-dried whole egg. The Journal of Infectious

Diseases 89(3), 259–265.

Messner, M. J., C. L. Chappell, and P. C. Okhuysen (2001). Risk assessment for Cryptosporidium: A

hierarchical Bayesian analysis of human dose response data. Water Research 35(16), 3934–3940.

Mons, M., J. Van der Wielen, E. Blokker, M. Sinclair, K. Hulshof, F. Dangendorf, P. Hunter, and

G. Medema (2007). Estimation of the consumption of cold tap water for microbiological risk as-

sessment: an overview of studies and statistical analysis of data. Journal of Water and Health 5(1),

151–170.

Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification & risk

assessment for drinking water supplies. Technical report, CRC for Water Quality and Treatment.

National Notifiable Diseases Surveillance System (2008). National Notifiable Diseases Surveillance

System. Available on: http://www9.health.gov.au/cda/Source/CDA-index.cfm. Accessed:

April 9, 2008.







Neapolitan, R. E. and X. Jiang (2007). Probabilistic Methods for Financial and Marketing Informatics.

Elsevier.


1–56.


Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian networks for water quality prediction

in Sydney Harbour. Available online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Ac-

cessed: March 27,2008.

Norsys Software Corp. (2007). Netica 3.25. Available online: www.norsys.com. Accessed February

15, 2008.

NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release 1.5.0.dev8106. Avail-

able online: http://docs.scipy.org/doc/. Accessed: February 9, 2010.

Olivieri, A. W., R. Danielson, J. N. Eisenberg, L. Johnson, V. Pon, R. Sakaji, R. Soller, J. A. Soller,

J. Stephenson, and C. Trese (2007). Evaluation of microbial risk assessment techniques and appli-

cations in water reclamation. Technical report, Water Environment Research Foundation (WERF),

Alexandria, VA. Available online: www.werf.org/AM/.


Palacios, M. P., P. Lupiola, M. T. Tejedor, E. Del-Nero, A. Pardo, and L. Pita (2001). Climatic effects

on Salmonella survival in plant and soil irrigated with artificially inoculated wastewater: preliminary

results. Water Science Technology 43(12), 103–108.

Palisade Corporation (2008). At Risk5.0. Available online: www.palisade.com/risk/. Accessed:

October 22, 2009.

Papadakis, J. S. (1937). Mthode statistique pour des expriences sur champ. Bulletin scientifique damlio-

ration des plantes de Thessalonique 23, 30.

Paulo, M. J., H. v. d. Voet, M. J. W. Jansen, C. J. F. t. Braak, and J. D. v. Klaveren (2005). Risk assessment

of dietary exposure to pesticides using a Bayesian method. Pest Management Science 61(8), 759–766.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference. San

Mateo, California: Morgan Kaufmann Publishers.

Petterson, S. and N. Ashbolt (2001). Viral risks associated with wastewater reuse: modeling virus per-

sistence on wastewater irrigated salad crops. Water Science and Technology 43(12), 23–26.


Petterson, S., N. Ashbolt, and A. Sharma (2001). Microbial risks from wastewater irrigation of salad

crops: A screening-level risk assessment. Water Environment Research 73(6), 667–672.

Petterson, S. A. and N. J. Ashbolt (2006). WHO Guidelines for the safe use of wastewater and excreta in

agriculture microbial risk assessment section. Technical report, World Health Organization.

Petterson, S. R. (2002). Microbial Risk Assessment of Wastewater Irrigated Salad Crops. Ph. D. thesis,

University of New South Wales.

Piepho, H. P., A. Buchse, and K. Emrich (2003). A hitchhiker’s guide to mixed models for randomized

experiments. Journal of Agronomy and Crop Science 189(5), 310–322.







Piepho, H. P. and E. R. Williams (2010). Linear variance models for plant breeding trials. Plant Breed-

ing 129(1), 1–8.

Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal of the

American Water Resources Association 40(6), 1563–1578.

Pitt, M. and N. Shephard (1999). Analytic convergence rates and parameterisation issues for the Gibbs

sampler applied to state space models. Journal of Time Series Analysis 20, 63–85.

Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of ecotoxicolog-

ical information in risk assessments. Australian Journal of Ecotoxicology 11, 57–58.

Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elicitation for

parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman, and A. Rizzoli

(Eds.), Third Biennial Meeting: Summit on Environmental Modelling and Software, Burlington, USA.


Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Parameterisation and

evaluation of a Bayesian network for use in an ecological risk assessment. Environmental Modelling

and Software 22, 1140–1152.

Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal analysis

of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest Entomology 12(3),

325–332.

Pouillot, R., P. Beaudeau, J.-B. Denis, and F. Derouin (2004). A quantitative risk assessment of water-

borne Cryptosporidiosis in France using second-order Monte Carlo simulation. Risk Analysis 24(1),

1–17.

Qian, S. S., C. A. Stow, and M. E. Borsuk (2003). On Monte Carlo methods for Bayesian inference.

Ecological Modelling 159, 269.

Raftery, A. and S. Lewis (1992). How many iterations in the Gibbs sampler? In J. Bernardo, J. Berger,

A. Dawid, and A. Smith (Eds.), Bayesian Statistics 4. Oxford: Oxford University Press.

Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety Science 27(2-

3), 183–213.

Raso, G., P. Vounatsou, L. Gosoniu, M. Tanner, E. K. N’Goran, and J. Utzinger (2006). Risk factors and

spatial patterns of hookworm infection among schoolchildren in a rural area of western Cte d’Ivoire.

International Journal for Parisitology 36(2), 201–210.

Rassmussen, L. (1995). Bayesian network for blood typing and parentage verification of cattle. Technical

report, Department of Mathematics and Computer Science, Aalborg University, Denmark. Hugin

reference Hugin 6.9.

Reich, B., J. Hodges, and B. Carlin (2007). Spatial analyses of periodontal data using conditionally

autoregressive priors having two classes of neighbor relations. Journal of the American Statistical

Association 102(477), 44–55.

Rentdorff, R. (1954). The experimental transmission of human intestinal protozoan parasites: II. Giardia

lamblia cysts given in capsules. American Journal of Hygiene 59, 209–220.




ogy 19(9), 1357–1375.






Rizak, S. and S. Hrudey (2007). Strategic water quality monitoring for drinking water safety. Technical

Report 37, CRC for Water Quality and Treatment.

Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Sociological

Review 15(3), 351–357.

Robinson, W. (2009). Ecological correlations and the behavior of individuals. International Journal of


Roser, D., S. Khan, C. Davies, R. Signor, S. Petterson, and N. Ashbolt (2006). Screening health risk

assessment for the use of microfiltration-reverse osmosis treated tertiary effluent for replacement of

environmental flows. Technical Report CWWT Report 2006-20, Centre for Water and Waste Technol-

ogy, School of Civil and Environmental Engineering, University of NSW.

Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to estimate

baseline and hazardous event risks with management end uses in mind. Technical report, MicroRisk

project co-funded by the European Commission under the Fifth Framework Programme, Theme 4:

Energy, environment and sustainable development (contract EVK1-CT-2002-00123).

Roy, V. and S. d. Blois (2008). Evaluating hedgerow corridors for the conservation of native forest herb

diversity. Biological Conservation 141, 298–307.


Chapman & Hall/CRC.


Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scandi-

navian Journal of Statistics 29(1), 31–49.

Saad, Y. (2003). Iterative methods for sparse linear systems. Society for Industrial and Applied Mathe-

matics. [electronic resource].




Schabenberger, O. and C. A. Gotway (2005). Statistical methods for spatial data analysis. Texts in

statistical science. Boca Raton: Chapman & Hall/CRC.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.

Shillito, R. M., D. J. Timlin, D. Fleisher, V. R. Reddy, and B. Quebedeaux (2009). Yield response of

potato to spatially patterned nitrogen application. Agriculture Ecosystems & Environment 129(1-3),

107–116.

Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass surfaces

irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.

Signor, R. and N. Ashbolt (2006). Pathogen monitoring offers questionable protection against drinking-

water risks: a QMRA (Quantitive Microbial Risk Analysis) approach to assess management strategies.

Erratum in Water Science and Technology 54 (11-12):451. Water Science and Technology 54, 261–

268.

Signor, R. S. (2007a). Microbial risk implications of rainfall-induced runoff events entering a reservoir

used as a drinking-water source. Journal of Water Supply Research and Technology - AQUA 56, 515–

531.

Signor, R. S. (2007b). Probabilistic Microbial Risk Assessment & Management Implications for Urban

Water Supply Systems. Ph. D. thesis, UNSW.





Sinclair, M. (2005). Strategic review of waterborne viruses. Technical report, CRC for Water Quality

and Treatment.

Singh, M., R. S. Malhotra, S. Ceccarelli, A. Sarker, S. Grando, and W. Erskine (2003). Spatial variability

models to improve dryland field trials. Experimental Agriculture 39(02), 151–160.

Sinton, L., C. Hall, and R. Braithwaite (2007). Sunlight inactivation of Campylobacter jejuni and

Salmonella enterica, compared with Escherichia coli, in seawater and river water. Journal of Wa-

ter and Health 5(3), 357–365.




Smeets, P. W. M. H., Y. J. Dullemont, P. H. A. J. M. V. Gelder, J. C. V. Dijk, and G. J. Medema (2008).

Improved methods for modelling drinking water treatment in quantitative microbial risk assessment; a

case study of Campylobacter reduction by filtration and ozonation. Journal of Water and Health 6(3),

301–314.

Smeets, P. W. M. H., G. J. Medema, Y. J. Dullemont, P. H. A. J. M. V. Gelder, and J. C. V. Dijk. (2008).

Case study of Campylobacter reduction by filtration and ozonation. Journal of Water and Health 6,

301–314.

Smeets, P. W. M. H., G. J. Medema, G. Stanfield, J. C. v. Dijk, and L. C. Rietveld (2007). How can the UK

statutory Cryptosporidium monitoring be used for quantitative risk assessment of Cryptosporidium in

drinking water? Journal of Water and Health 5(1 (Suppl)), 107–118.

Snow, J. (1849). On the mode of communication of cholera. London: John Churchill.

Snow, J. (1855). On the mode of communication of cholera (2nd Edition ed.). London: John Churchill.


Song, H.-R., A. Lawson, R. B. D’Agostino Jr, and A. D. Liese (2011). Modeling type 1 and type 2

diabetes mellitus incidence in youth: An application of Bayesian hierarchical regression for sparse

small area data. Spatial and Spatio-temporal Epidemiology 2(1), 23–33.

Spiegelhalter, D. (1989). A unified approach to imprecision and sensitivity of beliefs in expert systems. In

L.N.Kanal (Ed.), Uncertainty in Artificial Intelligence 3. North Holland: Elsevier Science Publishers

B.V.



583–639.



583–639.

Spiegelhalter, D. J., A. P. Dawid, S. L. Lauritzen, and R. G. Cowell (1993). Bayesian analysis in expert


Spiegelhalter, D. J., N. L. Harris, K. Bull, and R. C. G. Franklin (1994). Empirical-evaluation of prior

beliefs about frequencies - methodology and a case-study in congenital heart-disease. Journal of the


Steck, H. (2001). Constrained-Based Structural Learning in Bayesian Networks Using Finite Data Sets.

Ph. D. thesis, Institut fur der Informatik der Technischen Universitat.

Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial analysis of

field trials. Journal of Agricultural Biological and Environmental Statistics 14(4), 392–410.

Strahm, B. D., R. B. Harrison, T. A. Terry, T. B. Harrington, A. B. Adams, and P. W. Footen (2009).

Changes in dissolved organic matter with depth suggest the potential for postharvest organic matter

retention to increase subsurface soil carbon pools. Forest Ecology and Management 258(10), 2347–

2352.





analysis of spatial dynamic factor models for multitemporal remotely sensed imagery. Journal of the

Royal Statistical Society: Series C (Applied Statistics) 60(1), 109–124.

Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety of wastew-

ater reclamation and reuse using enteric virus monitoring data. Water Environment Research 70(1),

39–51.

Tawk, H. M., K. Vickery, L. Bisset, W. Selby, and Y. E. Cossart (2006). The impact of hepatitis B

vaccination in a western country: recall of vaccination and serological status in Australian adults.

Vaccine 24(8), 1095–1106.

Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal distribution of

airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for gypsy moth eradication.

Environmental Health Perspectives 109(1), 47–54.



Research 31, 1333–1346.

Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The dose-

response relation in human volunteers for gastro-intestinal pathogens. Technical report, National In-

stitute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.

Toze, S. (1999). PCR and the detection of microbial pathogens in water and wastewater. Water Re-

search 33(17), 3545–3556.

Toze, S. (2002). Review of the risk of groundwater contamination from microbial pathogen due to

the infiltration of treated effluent to groundwater at the Bridgetown wastewater treatment plant. A

consultancy report to the Water Corporation, WA. Technical report, CSIRO.

Toze, S. (2004). Literature Review on the Fate of Viruses and Other Pathogens and Health Risks in

Non-Potable Reuse of Storm Water and Reclaimed Water. CSIRO. Accessed: February 1, 2011.

Toze, S., J. Hanna, and J. Sidhu (2005). Microbial monitoring of the McGillivray Oval direct reuse

scheme Report to the Water Corporation WA. Technical report, CSIRO.


Toze, S., J. Hanna, A. Smith, and W. Hick (2002). Halls Head indirect treated wastewater reuse scheme.

Technical report, CSIRO.

Toze, S., J. Hanna, T. Smith, L. Edmonds, and A. McCrow (2004). Determination of water quality

improvements due to the artificial recharge of treated effluent. In J. Steenworden and T. Endreny

(Eds.), IAHS Publications-Series of Proceedings and Reports: Wastewater reuse and groundwater

quality, Volume 285, pp. 53–60. Wallingford [Oxfordshire]: IAHS, 1981-.

Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New Zealand:

characterising spatial and temporal changes in fruit composition and juice quality in the vineyard.

Australian Journal of Grape and Wine Research 17(1), 79–89.



Citeseer.



Varis, O. (1995). Belief networks for modelling and assessment of environmental change. Environ-

metrics 6, 439–444.

Varis, O. (1997). Bayesian decision analysis for environmental and resource management. Environmental

Modelling and Software 12(2-3), 177–185.

Varis, O. (1998). A belief network approach to optimization and parameter estimation: application to

resource and environmental management. Artificial Intelligence 101(1-2), 135–163.

Verbyla, A., B. Cullis, M. Kenward, and S. Welham (1999). The analysis of designed experiments

and longitudinal data by using smoothing splines. Journal of the Royal Statistical Society: Series C

(Applied Statistics) 48(3), 269–311.

VSN International (2011). Genstat. Available online: http://www.vsni.co.uk/software/genstat/.

Wakefield, J., N. Best, and L. Waller (2000). Bayesian approaches to disease mapping. In P. Elliott,

J. Wakefield, N. Best, and D. Briggs (Eds.), Spatial Epidemiology: Methods and Applications, pp.

104–127. Oxford: Oxford University Press.


Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal mapping of

disease rates. Journal of the American Statistical Association 92(438), 607–617.





Ward, R., D. Bernstein, E. Young, J. Sherwood, D. Knowlton, and G. Schiff (1986). Human Rotavirus

studies in volunteers: determination of infectious dose and serological response to infection. Journal

of Infectious Diseases 154(5), 871–880.

Water Corporation (2010). Subiaco Wastewater Treatment Plant Annual Report 2009-10. Technical

Report PM-3851463, Water Corporation, Perth, Western Australia.

Water Corporation (2011a). McGillivray Oval Irrigation Project. Available online:

http://www.watercorporation.com.au/M/mcgillivray_oval.cfm. Accessed: February

2, 2011.

Water Corporation (2011b). Subiaco treatment plant. Available online:

http://www.watercorporation.com.au/W/wwtp_subiaco.cfm. Accessed: February 2,

2011.

Water Environment Research Foundation, A. Olivieri, and C. Summers (2007). Assessing risk of

pathogens in separate stormwater systems. Available online: http://www.werf.org/am/. Accessed:

February 17, 2011.

Weidl, G., A. Madsen, and E. Dahlquist (2003). Object oriented Bayesian network for industrial process

operation.

Weidl, G., A. L. Madsen, and S. S. Israelson (2005). Applications of object-oriented Bayesian networks

for condition monitoring, root cause analysis and decision support on operation of complex continuous

processes. Computers and Chemical Engineering 29, 1996–2009.

Wermuth, N. and D. R. Cox (1998). On association models defined over independence graphs.

Bernouilli 4(4), 477–495.


West, M. and J. Harrison (1997). Bayesian forecasting and dynamic models (2nd ed.). Springer series in

statistics. New York: Springer.

Westrell, T., O. Bergstedt, T. Stenstrom, and N. Ashbolt (2003). A theoretical approach to assess micro-

bial risks due to failures in drinking water systems. International Journal of Environmental Health

Research 13, 181–197.

Whelan, B. M., A. B. McBratney, and B. Minasny (2001). Vesper-spatial prediction software for preci-

sion agriculture. In Third European Conference on Precision Agriculture. (G. Grenier, S. Blackmore

Eds.) pp. 139-144. Agro Montpellier, Ecole Nationale Agronomique de Montpellier., pp. 18–20. Cite-

seer.

Whiting, R. C. and R. L. Buchanan (1997). Development of a quantitative risk assessment model for

Salmonella enteritidis in pasteurized liquid eggs. International Journal of Food Microbiology 36,

111–125.

Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New York:

Wiley.

Williams, E. R. (1986). A neighbour model for field experiments. Biometrika 73(2), 279–287.

Wong, V. N. L., B. W. Murphy, T. B. Koen, R. S. B. Greene, and R. C. Dalal (2008). Soil organic carbon

stocks in saline and sodic landscapes. Australian Journal of Soil Research 46(4), 378–389.

Woo, D. M. and K. J. Vicente (2003). Sociotechnical systems, risk management, and public health: com-

paring the North Battleford and Walkerton outbreaks. Reliability Engineering & System Safety 80(3),

253–269.

World Health Organization (2008). ICD-10 Classification of Diseases. Available online:

www.cdc.gov/nchs/data/dvs/2008Vol1.pdf. Accessed: April 10, 2008.


Medicine 25(5), 867–881.

Margaret Donald Thesis

Documents

Transcript of Margaret Donald Thesis