Margaret Donald Thesis
-
Upload
truongtram -
Category
Documents
-
view
218 -
download
0
Transcript of Margaret Donald Thesis
Using Bayesian methods for theestimation of uncertainty in complex
statistical models
Margaret Donald
Bachelor of Arts (Hon), University of Melbourne
Master of Applied Statistics, Macquarie University
Submitted in fulfilment of the requirements
of the degree of Doctor of Philosophy
August 25, 2011
Discipline of Mathematical Sciences
Faculty of Science and Technology
Queensland University of Technology
Principal supervisor: Professor Kerrie Mengersen, Queensland University of Technology
Associate supervisor: Professor Anthony Pettitt, Queensland University of Technology
Abstract
The research objectives of this thesis were to contribute to Bayesian statistical methodology
by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal
methodology, by modelling error structures using complex hierarchical models.
Specifically, I hoped to consider two applied areas, and use these applications as a spring-
board for developing new statistical methods as well as undertaking analyses which might give
answers to particular applied questions.
Thus, this thesis considers a series of models, firstly in the context of risk assessments for
recycled water, and secondly in the context of water usage by crops. The research objective
was to model error structures using hierarchical models in two problems, namely risk assess-
ment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences
between cropping systems over time and over three spatial dimensions.
The aim was to use the simplicity and insight afforded by Bayesian networks to develop
appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore
the necessarily complex modelling of four dimensional agricultural data.
The specific objectives of the research were to develop a method for the calculation of
credible intervals for the point estimates of Bayesian networks; to develop a model structure to
incorporate all the experimental uncertainty associated with various constants thereby allowing
the calculation of more credible credible intervals for a risk assessment; to model a single day’s
data from the agricultural dataset which satisfactorily captured the complexities of the data; to
build a model for several days’ data, in order to consider how the full data might be modelled;
1
2
and finally to build a model for the full four dimensional dataset and to consider the time-
varying nature of the contrast of interest, having satisfactorily accounted for possible spatial and
temporal autocorrelations.
This work forms five papers, two of which have been published, with two submitted, and
the final paper still in draft.
The first two objectives were met by recasting the risk assessments as directed, acyclic
graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed
by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain
Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In
the second case, we incorporated the experimental data underlying the risk assessment constants
into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-in-
variables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of
experimental error into risk assessments.
In considering one day of the three-dimensional agricultural data, it became clear that geo-
statistical models or conditional autoregressive (CAR) models over the three dimensions were
not the best way to approach the data. Instead CAR models are used with neighbours only in
the same depth layer. This gave flexibility to the model, allowing both the spatially structured
and non-structured variances to differ at all depths. We call this model the CAR layered model.
Given the experimental design, the fixed part of the model could have been modelled as a set of
means by treatment and by depth, but doing so allows little insight into how the treatment effects
vary with depth. Hence, a number of essentially non-parametric approaches were taken to see
the effects of depth on treatment, with the model of choice incorporating an errors-in-variables
approach for depth in addition to a non-parametric smooth. The statistical contribution here was
the introduction of the CAR layered model, the applied contribution the analysis of moisture
over depth and estimation of the contrast of interest together with its credible intervals. These
models were fitted using WinBUGS [Lunn et al., 2000].
The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS
BIBLIOGRAPHY 3
becomes more problematic because of its highly correlated term by term updating. In this
work, we introduce a Gibbs sampler with block updating for the CAR layered model. The
Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This
framework is then used to consider five days data, and we show that moisture in the soil for
all the various treatments reaches levels particular to each treatment at a depth of 200 cm and
thereafter stays constant, albeit with increasing variances with depth.
In an analysis across three spatial dimensions and across time, there are many interactions
of time and the spatial dimensions to be considered. Hence, we chose to use a daily model
and to repeat the analysis at all time points, effectively creating an interaction model of time
by the daily model. Such an approach allows great flexibility. However, this approach does
not allow insight into the way in which the parameter of interest varies over time. Hence, a
two-stage approach was also used, with estimates from the first-stage being analysed as a set of
time series. We see this spatio-temporal interaction model as being a useful approach to data
measured across three spatial dimensions and time, since it does not assume additivity of the
random spatial or temporal effects.
Bibliography
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian mod-
elling framework: Concepts, structure, and extensibility. Statistics and Computing 10(4),
325–337.
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal
of Computational and Graphical Statistics, 1–46. submitted August, 2010.
Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a degree or diploma at
any other higher educational institution. To the best of my knowledge and belief, the thesis con-
tains no material previously published or written by another person except where due reference
is made.
Signed:
Date: ~ /1- t Z.O 11
5
List of Publications arising from this
Thesis
The following publications form the basis for this thesis. They have either been published,
submitted for publication, or are in preparation.
Chapter 3: Bayesian Network for Risk of Diarrhoea Associated with the Use of Recycled
Water, Risk Analysis, 29 (12) 1672-1685, 2009.
Chapter 4: Incorporating Parameter uncertainty into Quantitative Microbial Risk Assess-
ment, Journal of Water and Health, 9 (1) 10-26, 2011, published online, October 2010.
Chapter 5: A Bayesian analysis of an agricultural field trial with three spatial dimensions,
submitted August 2010, Computational Statistics & Data Analysis.
Chapter 6: Comparison of three dimensional profiles over time, submitted December 2010,
Journal of Applied Statistics.
Chapter 7: Four dimensional spatio-temporal analysis of an agricultural field trial, in prepa-
ration.
7
Acknowledgements
I would like to thank Professor Kerrie Mengersen, firstly, for offering me a scholarship, without
which, I might not have found the courage to undertake and to persist in this, and secondly, for
her unfailing support and encouragement, her capacity for always focussing on the task, and her
ability to listen and let nature take its course. Thank you, Kerrie, for everything.
I would also like extend my deep appreciation to Clair Alston for her friendship, support
and help, and to Chris Strickland for his cheery help in the programming and mathematics of
Chapter 5. And to the members of BRAG and the other denizens of room O415, my thanks also.
The generosity of all my collaborators in this research has been an extraordinary experience.
Special thanks are due to Anne-Marie Clements for her continued and continuing encour-
agement of this pursuit, and her generosity in giving me a place to complete this work. To Ann
Eyland for her never-failing support of my statistical endeavours. And my thanks also to Ellis
Roberts and John Evans, my first statistical mentors. Ellis’ voice is heard in the concerns of this
thesis.
I also wish to thank Dr Maureen Aitken who gave me shelter and succour at the Women’s
College within the University of Queensland. And finally, my thanks to my daughters, Nicole
& Rachel who encouraged me to undertake this study, far from home.
9
Contents
1 Introduction 33
1.1 Overall objectives of this research . . . . . . . . . . . . . . . . . . . . . . . . 33
1.2 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3.1 Case study 1a: Bayesian network . . . . . . . . . . . . . . . . . . . . 39
1.3.2 Case study 1b: QMRA . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.3.3 Case study: Field trial data . . . . . . . . . . . . . . . . . . . . . . . . 40
1.3.4 Agricultural data: Crop cycles and treatment layout . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Literature Review 49
2.1 Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.1.1 Graphical models and Bayesian networks . . . . . . . . . . . . . . . . 49
2.1.2 Bayesian networks: applications . . . . . . . . . . . . . . . . . . . . . 55
2.2 Risk Assessments for Pathogens . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.2.1 Risk Assessment methodologies . . . . . . . . . . . . . . . . . . . . . 56
2.2.2 Data for a risk assessment . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Spatio-temporal modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.1 Two dimensional Lattice data analyses . . . . . . . . . . . . . . . . . . 66
2.3.2 Agricultural studies with measurements at different depths . . . . . . . 69
11
12 CONTENTS
2.3.3 Spatio-temporal data analyses . . . . . . . . . . . . . . . . . . . . . . 70
2.3.4 Four dimensional spatio-temporal data analyses . . . . . . . . . . . . . 73
2.4 Addendum: The dynamic risk assessment model . . . . . . . . . . . . . . . . 75
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3 Paper One: Network for Risk of Diarrhoea Associated with the Use of Recycled
Water 101
3.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2 Network for Risk of Diarrhoea Associated with the Use of Recycled Water . . . 104
3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.4.1 Development of a conceptual model . . . . . . . . . . . . . . . . . . . 106
3.4.2 Determination of prior probabilities . . . . . . . . . . . . . . . . . . . 108
3.4.3 Set up and use of models . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.4.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5.1 Constructed BN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5.2 Model 1: Analysis of the BN without uncertainty . . . . . . . . . . . . 113
3.5.3 Model 2: Analysis of the BN with uncertainty . . . . . . . . . . . . . . 114
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6.2 Internal validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6.3 Discussion of the results . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.9 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
CONTENTS 13
4 Paper Two: Incorporating parameter uncertainty into Quantitative Microbial Risk
Assessment (QMRA) 135
4.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.2 Incorporating parameter uncertainty into Quantitative Microbial Risk Assess-
ment (QMRA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.4.1 Standard QMRA methodology . . . . . . . . . . . . . . . . . . . . . . 140
4.4.2 The extended QMRA model . . . . . . . . . . . . . . . . . . . . . . . 142
4.4.3 Data for the extended model . . . . . . . . . . . . . . . . . . . . . . . 143
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.9 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5 Paper Three: An analysis of a field trial with three spatial dimensions 177
5.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2 A Bayesian analysis of an agricultural field trial with three spatial dimensions . 179
5.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.4.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.4.3 Treatment (fixed) effects . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.4.4 Choice of Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.4.5 Model comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
14 CONTENTS
5.4.6 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.5.1 Assessing presence of spatial correlation . . . . . . . . . . . . . . . . 190
5.5.2 Determining neighbourhoods and random components . . . . . . . . . 191
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6 Paper Four: Comparison of three dimensional profiles over time 211
6.1 Paper Four: Comparison of three dimensional profiles over time . . . . . . . . 214
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.4.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.4.3 Fixed effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.4.4 Contrast and parameter comparisons . . . . . . . . . . . . . . . . . . . 221
6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.5.1 Model choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.5.2 Variance components . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.5.3 Depth segments and dates . . . . . . . . . . . . . . . . . . . . . . . . 223
6.5.4 Point by point contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.5.5 Spatial residual components, ψ . . . . . . . . . . . . . . . . . . . . . . 224
6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.7.1 Sampling β. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
CONTENTS 15
6.7.2 Sampling σ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.7.3 Sampling ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.7.4 Sampling τ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.7.5 Sampling ρ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.9 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7 Paper Five: Four dimensional spatio-temporal analysis of an agricultural dataset 247
7.1 Four dimensional spatio-temporal analysis of an agricultural dataset . . . . . . 249
7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.6.1 Modelling spatio-temporal data . . . . . . . . . . . . . . . . . . . . . 262
7.6.2 Model Comparisons: Problems . . . . . . . . . . . . . . . . . . . . . 265
7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.9 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.9.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8 Conclusions and further work 287
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16 CONTENTS
Appendices 299
A 301
A.1 WinBUGS code for the model 2 Bayesian net of Paper 1 . . . . . . . . . . . . 301
B Supplementary materials for Chapter Six 307
B.1 Supplementary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
B.2 Supplementary Graphs: Contour Graphs for the spatial residuals . . . . . . . . 334
C Supplementary graphs and tables for Chapter 7 375
C.1 Graphs: Method 1, Method 2 random walk and penalised spline smoothed models375
C.2 Final estimates and credible intervals for the contrast of long fallow cropping
versus response cropping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Full Reference List 407
List of Figures
1.1 Site treatments for the agricultural data of chapters 5-7. Details of the 9 treat-
ments are given in Section 5.4.1 and again in Chapter 7 in Section 7.3 in the
description of the four-dimensional dataset. . . . . . . . . . . . . . . . . . . . 44
1.2 Crop cycles for the cropping treatments (Treatments 1-6). The vertical line in-
dicates the date for the data analysed in Chapter 5. The three dimensional data
are described in Section 5.4.1 and again in Section 7.3 of Chapter 7 as a four-
dimensional dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1 An undirected graph for which x ⊥ y|z. . . . . . . . . . . . . . . . . . . . . . . 50
2.2 Four differing directed acyclic graphs (DAGs) with the same (undirected) struc-
ture as the undirected graph of Figure 2.1. . . . . . . . . . . . . . . . . . . . . 51
2.3 The dynamic model of Eisenberg et al. [2002]. Schematic diagram of trans-
mission model. t, independent variable representing time. Solid lines repre-
sent movement of individuals from one state to another. Dashed lines represent
movement of pathogens either directly from infectious host to susceptible host
or indirectly via the environment. State variables and parameters are defined in
the text.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.1 Conceptual model of Cook and Roser1 . . . . . . . . . . . . . . . . . . . . . . 124
17
18 LIST OF FIGURES
3.2 Bayesian network based on the conceptual model: Node numbering is that used
in the text and in the WinBUGS model . . . . . . . . . . . . . . . . . . . . . . 125
3.3 Relative risks for each age group (0-4, 5-64, 65+) and for the entire population
(All) for each risk scenario, estimated from the BN (Model 1) and by MCMC
(Model 2), with 95% credible intervals from Model 2. . . . . . . . . . . . . . . 126
3.4 Distribution of the probability of being infected with gastroenteritis. . . . . . . 133
3.5 Distribution of the probability of being infected with gastroenteritis when the
endpoint distribution fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.1 Model for a QMRA for surface vegetable irrigated with treated wastewater. Ob-
served data nodes shown in white, parameter nodes in green, and outcome nodes
in a light grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.2 Model for the part of the standard QMRA implemented here. Observed data
nodes shown in white, parameter nodes in green, and outcome nodes in a light
grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.3 Schematic Model for the directed acyclic graph implemented in WinBUGS for
estimation of parameters and risk. Observed data nodes (1,3,4,7) are shown in
white. Unknown parameter nodes to be estimated (2,6) in green, and outcome
nodes (5,8) in a light grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.4 Sunlight hours for January/June 2008 at Perth airport. . . . . . . . . . . . . . 160
4.5 Dose-Response curve with uncertainty for S.anatum: P = 1 − (1 + Dose/β)−α.
The bounding curves are the 95% credible intervals from the MCMC simulation. 161
4.6 Graphical model for Dose-Response estimated with error in measurement and
error in individual dosage: Measured dose is the observed batch dose, Batch
dose is the unobserved true batch dose, individual dose is the true unobserved
individual dose. The observation of an individual’s infection status is assumed
to be without error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
LIST OF FIGURES 19
4.7 Graphical model for a risk assessment which includes the parameters for dose-
response based on the errors-in-variables concept. . . . . . . . . . . . . . . . . 163
4.8 Dose-response curve parameters (α, β): Posterior distribution for logα vs logβ/1000
using log uniform priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.9 Die-off distributions for S.typhimurium: fixed effects pooled variance model
Nt = N0e−kt, k > 0. Note that for die-off k > 0. . . . . . . . . . . . . . . . . . . 165
4.10 Summer & Winter: Probability of infection - constant (the line) vs varying (the
dots) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.11 Summer & Winter: Probability of infection - Constant vs Varying by ranked
initial pathogen numbers groups. . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.12 Probability of infection (no die-off) against ranked initial pathogen numbers
groups: using constant vs varying parameters for Beta binomial distribution. . . 168
4.13 Comparison of the Dose-response curves for S.anatum with 95% credible inter-
vals, estimated with & without “errors-in-variables”. . . . . . . . . . . . . . . 169
5.1 95% credible intervals for the contrast differences based on the cubic radial
bases model with errors-in-measurement (graphed where the 95% CI did not
cover zero). The lines with the widest tops and tails show “Long Fallow - Re-
sponse Cropping”, with the thinnest “Lucerne - Native Pastures”, and those with
medium width “Crop - Pasture”. . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2 Fixed effects curves for errors-in-variables model: Linear spline treatment ef-
fects & 95% credible intervals, CAR model, sites 1-54. The true depths are
those implied by the errors-in-measurement model. For each treatment there
are 6 sites, each with the same treatment curve. . . . . . . . . . . . . . . . . . 204
5.3 Fixed effects curves for errors-in-variables model: Cubic radial bases model
showing estimates at the nominal depth. Depth has been jittered to allow credi-
ble intervals to be seen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
20 LIST OF FIGURES
5.4 95% CI for the ratio of square root of the spatial variance to that of the non-
spatial variance at the fifteen depths: Cubic radial bases model with errors-in-
measurement for depth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.1 Cumulative distribution curves for the posterior distribution of the deviance, for
(date 4) September 23, 1998. The solid line represents that for the saturated
model, the middle broken line that for the 3-knot linear spline, and the more
coarsely broken line on the left that for the 5-knot linear spline model. . . . . . 235
6.2 Square root of non-spatial variances, by date and depth. Credible intervals are
staggered in date order. Note the comparatively smaller variances at the shal-
lower depths for Date 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.3 Square root of spatial variance, by date and depth. Credible intervals are stag-
gered in date order. Note the comparatively smaller variances at the shallower
depths for Date 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6.4 Contrast: Long Fallow - Response cropping. Credible intervals are staggered in
date order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6.5 Contrast: Cropping - Pastures. Credible intervals are staggered in date order. . 239
6.6 Contrast: Lucerne mixtures - Native pastures. Credible intervals are staggered
in date order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.7 Spatial residual components at depth 240 cm. . . . . . . . . . . . . . . . . . . 241
7.1 Long fallowing vs Response cropping at at all depths. Saturated model. Point
estimates from the MCMC iterates of the full model (Method 1). . . . . . . . . 272
7.2 Long fallowing vs Response cropping. Saturated model. Contour graph from
the point estimates from the MCMC iterates of the full model (Method 1). . . . 273
7.3 Long fallowing vs Response cropping at depth 100 for all trial dates. Saturated
model. Point estimates & 95%CIs from MCMC iterates from the full model. . . 274
LIST OF FIGURES 21
7.4 Long fallowing vs Response cropping at depth 100 for all trial dates. Penalised
spline smooth across dates. Point estimates & 95%CIs. . . . . . . . . . . . . . 274
7.5 Long fallowing vs Response cropping at depth 100 for all trial dates. Regres-
sion model (Equation 7.2) fitting 27 time-varying covariates. Point estimates &
95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
7.6 Long fallowing vs Response cropping at depth 100 for all trial dates. Random
Walk of order one. Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . 275
7.7 Spatially structured and unstructured standard deviations & 95% credible inter-
vals at depths 100 cm. The spatial standard deviations are shown in blue, the
unstructured standard deviations in green. . . . . . . . . . . . . . . . . . . . . 276
7.8 Spatially structured and unstructured standard deviations & 95% credible in-
tervals at depth 220 cm. The spatial standard deviations are shown dotted, the
unstructured standard deviations in green. . . . . . . . . . . . . . . . . . . . . 277
7.9 Long fallowing vs Response cropping at depth 140 for all trial dates (AR1 fit).
Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.10 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit
using weights which are reciprocals of the time intervals). Point estimates &
95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
7.11 Long fallowing vs Response cropping at depth 140 for all trial dates (RW2 fit).
Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
7.12 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit
with t dist df=10). Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . 279
7.13 Long fallowing vs Response cropping at depth 140 for all trial dates. Random
walk with 97% missing data. Random walk precision fixed at 2241. See Ta-
ble 7.1. Point estimates & 95%CIs. . . . . . . . . . . . . . . . . . . . . . . . . 279
7.14 Non-parametric penalised spline smooths. (Fits for the contrasts at the 7 depths.) 280
22 LIST OF FIGURES
8.1 Random walk of order one & 95% credible intervals at depth 100 cm. Fitted to
12 posterior contrast estimates at each time point. . . . . . . . . . . . . . . . . 294
8.2 Contaminated observational error: Random walk of order one & 95% credible
intervals at depth 100 cm. Fitted to 12 posterior contrast estimates at each time
point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
B.1 Spatial random components: Day 1, Depth 20 cm. . . . . . . . . . . . . . . . . 334
B.2 Spatial random components: Day 1, Depth 40 cm. . . . . . . . . . . . . . . . . 335
B.3 Spatial random components: Day 1, Depth 60 cm. . . . . . . . . . . . . . . . . 335
B.4 Spatial random components: Day 1, Depth 80 cm. . . . . . . . . . . . . . . . . 336
B.5 Spatial random components: Day 1, Depth 100 cm. . . . . . . . . . . . . . . . 336
B.6 Spatial random components: Day 1, Depth 120 cm. . . . . . . . . . . . . . . . 337
B.7 Spatial random components: Day 1, Depth 140 cm. . . . . . . . . . . . . . . . 337
B.8 Spatial random components: Day 1, Depth 160 cm. . . . . . . . . . . . . . . . 338
B.9 Spatial random components: Day 1, Depth 180 cm. . . . . . . . . . . . . . . . 338
B.10 Spatial random components: Day 1, Depth 200 cm. . . . . . . . . . . . . . . . 339
B.11 Spatial random components: Day 1, Depth 220 cm. . . . . . . . . . . . . . . . 339
B.12 Spatial random components: Day 1, Depth 240 cm. . . . . . . . . . . . . . . . 340
B.13 Spatial random components: Day 1, Depth 260 cm. . . . . . . . . . . . . . . . 340
B.14 Spatial random components: Day 1, Depth 280 cm. . . . . . . . . . . . . . . . 341
B.15 Spatial random components: Day 1, Depth 300 cm. . . . . . . . . . . . . . . . 341
B.16 Spatial random components: Day 2, Depth 20 cm. . . . . . . . . . . . . . . . . 342
B.17 Spatial random components: Day 2, Depth 40 cm. . . . . . . . . . . . . . . . . 342
B.18 Spatial random components: Day 2, Depth 60 cm. . . . . . . . . . . . . . . . . 343
B.19 Spatial random components: Day 2, Depth 80 cm. . . . . . . . . . . . . . . . . 343
B.20 Spatial random components: Day 2, Depth 100 cm. . . . . . . . . . . . . . . . 344
B.21 Spatial random components: Day 2, Depth 120 cm. . . . . . . . . . . . . . . . 344
LIST OF FIGURES 23
B.22 Spatial random components: Day 2, Depth 140 cm. . . . . . . . . . . . . . . . 345
B.23 Spatial random components: Day 2, Depth 160 cm. . . . . . . . . . . . . . . . 345
B.24 Spatial random components: Day 2, Depth 180 cm. . . . . . . . . . . . . . . . 346
B.25 Spatial random components: Day 2, Depth 200 cm. . . . . . . . . . . . . . . . 346
B.26 Spatial random components: Day 2, Depth 220 cm. . . . . . . . . . . . . . . . 347
B.27 Spatial random components: Day 2, Depth 240 cm. . . . . . . . . . . . . . . . 347
B.28 Spatial random components: Day 2, Depth 260 cm. . . . . . . . . . . . . . . . 348
B.29 Spatial random components: Day 2, Depth 280 cm. . . . . . . . . . . . . . . . 348
B.30 Spatial random components: Day 2, Depth 300 cm. . . . . . . . . . . . . . . . 349
B.31 Spatial random components: Day 3, Depth 20 cm. . . . . . . . . . . . . . . . . 350
B.32 Spatial random components: Day 3, Depth 40 cm. . . . . . . . . . . . . . . . . 350
B.33 Spatial random components: Day 3, Depth 60 cm. . . . . . . . . . . . . . . . . 351
B.34 Spatial random components: Day 3, Depth 80 cm. . . . . . . . . . . . . . . . . 351
B.35 Spatial random components: Day 3, Depth 100 cm. . . . . . . . . . . . . . . . 352
B.36 Spatial random components: Day 3, Depth 120 cm. . . . . . . . . . . . . . . . 352
B.37 Spatial random components: Day 3, Depth 140 cm. . . . . . . . . . . . . . . . 353
B.38 Spatial random components: Day 3, Depth 160 cm. . . . . . . . . . . . . . . . 353
B.39 Spatial random components: Day 3, Depth 180 cm. . . . . . . . . . . . . . . . 354
B.40 Spatial random components: Day 3, Depth 200 cm. . . . . . . . . . . . . . . . 354
B.41 Spatial random components: Day 3, Depth 220 cm. . . . . . . . . . . . . . . . 355
B.42 Spatial random components: Day 3, Depth 240 cm. . . . . . . . . . . . . . . . 355
B.43 Spatial random components: Day 3, Depth 260 cm. . . . . . . . . . . . . . . . 356
B.44 Spatial random components: Day 3, Depth 280 cm. . . . . . . . . . . . . . . . 356
B.45 Spatial random components: Day 3, Depth 300 cm. . . . . . . . . . . . . . . . 357
B.46 Spatial random components: Day 4, Depth 20 cm. . . . . . . . . . . . . . . . . 358
B.47 Spatial random components: Day 4, Depth 40 cm. . . . . . . . . . . . . . . . . 358
B.48 Spatial random components: Day 4, Depth 60 cm. . . . . . . . . . . . . . . . . 359
24 LIST OF FIGURES
B.49 Spatial random components: Day 4, Depth 80 cm. . . . . . . . . . . . . . . . . 359
B.50 Spatial random components: Day 4, Depth 100 cm. . . . . . . . . . . . . . . . 360
B.51 Spatial random components: Day 4, Depth 120 cm. . . . . . . . . . . . . . . . 360
B.52 Spatial random components: Day 4, Depth 140 cm. . . . . . . . . . . . . . . . 361
B.53 Spatial random components: Day 4, Depth 160 cm. . . . . . . . . . . . . . . . 361
B.54 Spatial random components: Day 4, Depth 180 cm. . . . . . . . . . . . . . . . 362
B.55 Spatial random components: Day 4, Depth 200 cm. . . . . . . . . . . . . . . . 362
B.56 Spatial random components: Day 4, Depth 220 cm. . . . . . . . . . . . . . . . 363
B.57 Spatial random components: Day 4, Depth 240 cm. . . . . . . . . . . . . . . . 363
B.58 Spatial random components: Day 4, Depth 260 cm. . . . . . . . . . . . . . . . 364
B.59 Spatial random components: Day 4, Depth 280 cm. . . . . . . . . . . . . . . . 364
B.60 Spatial random components: Day 4, Depth 300 cm. . . . . . . . . . . . . . . . 365
B.61 Spatial random components: Day 5, Depth 20 cm. . . . . . . . . . . . . . . . . 366
B.62 Spatial random components: Day 5, Depth 40 cm. . . . . . . . . . . . . . . . . 366
B.63 Spatial random components: Day 5, Depth 60 cm. . . . . . . . . . . . . . . . . 367
B.64 Spatial random components: Day 5, Depth 80 cm. . . . . . . . . . . . . . . . . 367
B.65 Spatial random components: Day 5, Depth 100 cm. . . . . . . . . . . . . . . . 368
B.66 Spatial random components: Day 5, Depth 120 cm. . . . . . . . . . . . . . . . 368
B.67 Spatial random components: Day 5, Depth 140 cm. . . . . . . . . . . . . . . . 369
B.68 Spatial random components: Day 5, Depth 160 cm. . . . . . . . . . . . . . . . 369
B.69 Spatial random components: Day 5, Depth 180 cm. . . . . . . . . . . . . . . . 370
B.70 Spatial random components: Day 5, Depth 200 cm. . . . . . . . . . . . . . . . 370
B.71 Spatial random components: Day 5, Depth 220 cm. . . . . . . . . . . . . . . . 371
B.72 Spatial random components: Day 5, Depth 240 cm. . . . . . . . . . . . . . . . 371
B.73 Spatial random components: Day 5, Depth 260 cm. . . . . . . . . . . . . . . . 372
B.74 Spatial random components: Day 5, Depth 280 cm. . . . . . . . . . . . . . . . 372
B.75 Spatial random components: Day 5, Depth 300 cm. . . . . . . . . . . . . . . . 373
LIST OF FIGURES 25
C.1 Long fallowing vs Response cropping at depth 100 for all trial days. Saturated
model. summary of MCMC iterates from the full model for the contrast. Esti-
mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
C.2 Long fallowing vs Response cropping at depth 100 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 376
C.3 Long fallowing vs Response cropping at depth 100 cm for all trial days. Random
Walk of order one. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 377
C.4 Long fallowing vs Response cropping at depth 120 for all trial days. Saturated
model. summary of MCMC iterates from the full model for the contrast. Esti-
mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
C.5 Long fallowing vs Response cropping at depth 120 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 378
C.6 Long fallowing vs Response cropping at depth 120 cm for all trial days. Random
Walk of order one. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 378
C.7 Long fallowing vs Response cropping at depth 140 for all trial days. Saturated
model. summary of MCMC iterates from the full model for the contrast. Esti-
mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
C.8 Long fallowing vs Response cropping at depth 140 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 379
C.9 Long fallowing vs Response cropping at depth 220 for all trial days. Random
Walk of order one. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 380
C.10 Long fallowing vs Response cropping at depth 160 for all trial days. Saturated
model. summary of MCMC iterates from the full model for the contrast. Esti-
mates & 95% CIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
C.11 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 381
26 LIST OF FIGURES
C.12 Long fallowing vs Response cropping at depth 160 for all trial days. Random
Walk of order one. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 381
C.13 Long fallowing vs Response cropping at depth 180 for all trial days. Saturated
model. summary of MCMC iterates from the full model. Estimates & 95% CIs. 382
C.14 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 382
C.15 Long fallowing vs Response cropping at depth 180 for all trial days. Random
Walk of order two. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 383
C.16 Long fallowing vs Response cropping at depth 200 for all trial days. Saturated
model. summary of MCMC iterates from the full model. Estimates & 95% CIs. 384
C.17 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 384
C.18 Long fallowing vs Response cropping at depth 200 for all trial days. Random
Walk of order two. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 385
C.19 Long fallowing vs Response cropping at depth 220 for all trial days. Saturated
model. summary of MCMC iterates from the full model. Estimates & 95% CIs. 386
C.20 Long fallowing vs Response cropping at depth 220 cm for all trial days. Non-
parametric penalised spline smooth across dates. Estimates & 95% CIs. . . . . 386
C.21 Long fallowing vs Response cropping at depth 220 for all trial days. Random
Walk of order two. Estimates & 95% CIs. . . . . . . . . . . . . . . . . . . . . 387
C.22 100 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 388
C.23 120 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 389
C.24 140 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 389
LIST OF FIGURES 27
C.25 160 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 390
C.26 180 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 390
C.27 200 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 391
C.28 220 cm: Long fallowing vs Response cropping with Prior 5 precisions applied
to the random walk model. Estimates & 95% CIs. . . . . . . . . . . . . . . . . 391
C.29 Square root of variances & 95% credible intervals at depth 100 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 392
C.30 Square root of variances & 95% credible intervals at depth 120 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 393
C.31 Square root of variances & 95% credible intervals at depth 140 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 393
C.32 Square root of variances & 95% credible intervals at depth 160 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 394
C.33 Square root of variances & 95% credible intervals at depth 180 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 394
C.34 Square root of variances & 95% credible intervals at depth 200 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 395
C.35 Square root of variances & 95% credible intervals at depth 220 cm. Unstruc-
tured: green with broader bars, spatial: blue with narrower bars. . . . . . . . . 395
C.36 Square root of unstructured variance: Days by Depth. Contour graph smooth. . 396
C.37 Square root of spatially structured variances: Days by Depth. Contour graph
smooth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
C.38 ρ & 95% credible intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
List of Tables
3.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.2 Sensitivity of two nodes: Gastroenteritis & Endpoint Distribution . . . . . . . . 121
3.3 Model Comparisons: Marginal probabilities & 95% credible intervals . . . . . 121
3.4 Model comparisons for Gastroenteritis under various conditions . . . . . . . . 122
3.5 Expected subgroup sizes for BN (Model 2) . . . . . . . . . . . . . . . . . . . 132
4.1 Settings for constant parameters . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2 Summary statistics for p(infected) over groupings . . . . . . . . . . . . . . . . 171
4.3 Summary statistics for groupings: Group Initial Pathogen Numbers / Doses . . 172
5.1 Comparing spatial neighbourhood modelling. Treatment effects model is iden-
tical for all models (Orthogonal polynomial degree 8). Models have 15 spatial
variance components (σ2d), and one homogeneous variance component (τ2), ex-
cept where otherwise stated. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.2 Values of Moran’s I for each depth layer. A normal approximation is used for
testing significance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.3 Comparing Fixed Effects modelling. Random components for all models are
given by 4 neighbour CAR with 15 depth variances (σ2d), and one homogeneous
variance component (τ2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
29
30 LIST OF TABLES
5.4 Contrasts at nominal depths: Cubic radial bases model where depth is measured
with error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.1 Summary of DICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.2 Estimates for ρ in the spatial precision matrix . . . . . . . . . . . . . . . . . . 232
6.3 Differences in ρ across the five time periods. . . . . . . . . . . . . . . . . . . . 232
6.4 Slopes for segment 200 cm - 300 cm for each treatment . . . . . . . . . . . . . 233
6.5 Signs for contrasts with 95% credible intervals not including zero, for each date.
Positive (+) and negative (−) values indicated. . . . . . . . . . . . . . . . . . . 234
7.1 Various priors used for the precisions of the timeseries models of Method 2 . . 268
7.2 Summary of DICs for Contrast 1 (Long fallowing vs Response cropping) at
Depth 140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.3 DICs for Long fallowing vs Response cropping: 1st order autoregressive models
vs simple regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.4 DICs for Long fallowing vs Response cropping: random walk model compar-
isons, using Prior 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.5 Square root of the Signal to Noise ratio for the RW models . . . . . . . . . . . 270
7.6 R2, pD and DIC for the RW(1) weighted models using priors 3-5 . . . . . . . . 270
7.7 Root mean square predicted error for RW1 models under Priors 1 & 2 . . . . . 271
8.1 Comparison of some fits for the contrast Long fallowing vs Response cropping
at Depth 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B.1 Differences in σ for depths from 20cm to 100cm . . . . . . . . . . . . . . . . 308
B.2 Differences in σ for depths from 120 cm to 200 cm . . . . . . . . . . . . . . . 309
B.3 Differences in σ for depths from 220 cm to 300 cm . . . . . . . . . . . . . . . 310
B.4 Differences in κ for depths from 20cm to 100cm . . . . . . . . . . . . . . . . . 311
B.5 Differences in κ for depths from 120 cm to 200 cm . . . . . . . . . . . . . . . 312
LIST OF TABLES 31
B.6 Differences in κ for depths from 220 cm to 300 cm . . . . . . . . . . . . . . . 313
B.7 Differences in slope from 200 cm - 300 cm for each treatment across days . . . 314
B.8 Differences in slope from 200 cm - 300 cm for each treatment across days . . . 315
B.9 Differences in slope from 200 cm - 300 cm for each treatment across days . . . 316
B.10 Differences in slopes for each treatment on day 1 . . . . . . . . . . . . . . . . 317
B.11 Differences in slopes for each treatment on day 2 . . . . . . . . . . . . . . . . 318
B.12 Differences in slopes for each treatment on day 3 . . . . . . . . . . . . . . . . 319
B.13 Differences in slopes for each treatment on day 4 . . . . . . . . . . . . . . . . 320
B.14 Differences in slopes for each treatment on day 5 . . . . . . . . . . . . . . . . 321
B.15 Slopes for segment 200 cm - 300 cm for each treatment . . . . . . . . . . . . . 322
B.16 Slopes for segment 200 cm - 300 cm for Groupings . . . . . . . . . . . . . . . 323
B.17 Differences in slopes for each group across days . . . . . . . . . . . . . . . . . 324
B.18 Contrasts compared between days . . . . . . . . . . . . . . . . . . . . . . . . 325
B.19 Contrasts compared between days . . . . . . . . . . . . . . . . . . . . . . . . 326
B.20 Contrasts compared between days . . . . . . . . . . . . . . . . . . . . . . . . 327
B.21 Contrasts (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.22 Contrasts (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.23 Contrasts (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.24 Contrasts (4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
B.25 Contrasts (5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
B.26 Contrasts (6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
C.1 Contrast estimates for depth 100 cm: Long fallow cropping vs Response cropping399
C.2 Contrast estimates for depth 120 cm: Long fallow cropping vs Response cropping400
C.3 Contrast estimates for depth 140 cm: Long fallow cropping vs Response cropping401
C.4 Contrast estimates for depth 160 cm: Long fallow cropping vs Response cropping402
C.5 Contrast estimates for depth 180 cm: Long fallow cropping vs Response cropping403
32 LIST OF TABLES
C.6 Contrast estimates for depth 200 cm: Long fallow cropping vs Response cropping404
C.7 Contrast estimates for depth 220 cm: Long fallow cropping vs Response cropping405
Chapter 1
Introduction
1.1 Overall objectives of this research
While Bayes’ theorem has been known since its posthumous publication in 1763 [Bellhouse,
2004], its enthusiastic use in statistics is a modern phenomenon, based on the advent of fast
computers and clever algorithms. This explosion of Bayesian statistics has allowed a far more
natural way of analysing data.
Equally, because of the simplicity of the Gibbs sampler, we can conceptualise our data as a
sequence of probabilities conditioned on something being the case. This allows the specification
and solution of complex problems. As Jordan [2004] says in discussing graphical models, “What
is perhaps most distinctive about the graphical model approach is its naturalness in formulating
probabilistic models of complex phenomena in applied fields, while maintaining control over
the computational cost associated with these models.”
With the advent of fast computers and the development of Markov chain Monte-Carlo
methodology, the ability to integrate the complex integrals implied by the use of Bayes the-
orem has led to a flowering of Bayesian statistics, where probability statements are made in
terms of the probability of observing a parameter value given the data. This seems more natural
than finding the likelihood of the data given the parameters, a more classical mode of inference.
33
34 CHAPTER 1. INTRODUCTION
Additionally, the Gibbs sampler [Geman and Geman, 1984] has allowed complex models to
be expressed in terms of conditional probabilities, most of which are easy to envisage and ex-
press, and which often form a natural way of seeing the connections between various quantities.
A Gibbs sampler generates an instance of each variable (or set of variables) in turn, conditional
on the values of the other variables. The resulting sequence of samples forms a Markov chain,
and the stationary distribution of this resulting Markov chain, forms a sample from the desired
joint distribution. See, e.g., Gelman et al. [1995].
The research in this thesis is based entirely on the possibility of setting up complex condi-
tional models to describe data and using Gibbs sampling to fit such models. As Dunson [2001]
says: “A major advantage of the Bayesian MCMC approach is its extreme flexibility. Using
MCMC techniques, it is straightforward to fit realistic models to complex data sets with mea-
surement error, censored or missing observations, multilevel or serial correlation structures, and
multiple endpoints. It is typically much more difficult to develop and justify the theoretical
properties of frequentist procedures for fitting such models.”
A standard criticism of Bayesian statistics, is that it is subjective, requiring as it does, prior
probabilities (prior beliefs). However, long ago, Box [1980] pointed out that any fitting of a
model requires some sort of prior belief, saying “the need for probabilities expressing prior
belief has often been thought of, not as a necessity for all scientific inference, but rather as a
feature peculiar to Bayesian inference. This seems to come from the curious idea that an outright
assumption does not count as a prior belief....The model is the prior in the wide sense that it is
a probability statement of all the assumptions currently to be tentatively entertained a priori.”
Hence, the objective of this research was to make a contribution to Bayesian statistics and
to use Bayesian methodologies in an applied setting.
In this thesis, we use two application areas to motivate our statistical and applied contribu-
tions within Bayesian statistics. Firstly we consider risk assessments for recycled water using
graphical models, and secondly we consider models for an agricultural cropping system, where
measurements are taken over four dimensions, the aim being both to answer the substantive
1.1. Overall objectives of this research 35
questions and to contribute to Bayesian methodologies.
For the risk assessments for recycled water, we use directed acyclic graphs (DAGS), firstly
as Bayesian nets (BNs), and secondly, very much in the terms of Jordan [2004], where they pro-
vide computationally simple and conceptually simple ways of envisaging complex problems.,
Answers for the cropping systems are found using spatial and spatio-temporal models where
spatial and temporal autocorrelations must be accounted for in any assessment of difference.
In the first case study (Chapter 3.2), we describe a method for building credible intervals for
the point estimates of queries in a Bayesian network, and use the method to describe more fully
the marginal and conditional probabilities and relative risks of diarrhoea arising from various
scenarios.
In the second case study (Chapter 4.2), we postulate a risk assessment framework for Salmonella
infections, and recast it, firstly as a graphical model, and secondly as a graphical model with
nodes for all the data on which the risk assessment is predicated together with nodes for the
parameters which describe those data. This simple modification allows all the experimental un-
certainty of the underlying data to be incorporated into the risk assessment. The second model
is shown to produce estimates very different from those of the first model which used the ‘plug-
in’ estimates. Additionally, the dose-response data used in the risk assessment is used to form
a submodel within the complete risk model. In the submodel, the probability of infection and
its underlying parameters is calculated on the basis of an errors-in-variables model, which is
advocated as a more realistic use of the data. WinBUGS [Lunn et al., 2000], a program based
on directed acyclic graphs (DAGs), is used to implement the models.
The third case study considers an agricultural dataset, with the usual row/column framework
for the experimental plots of agricultural data. This dataset involves a third spatial dimension,
depth, and was collected over a five-year period with 61 days of moisture measurements at 15
depths in the soil. The first concern was to model a single day’s data, addressing the questions of
concern, while accounting for spatial correlation. This led to the paper of Chapter 5.2, in which
fixed models (with 45-135 terms) are fitted, together with complex random components, to give
36 CHAPTER 1. INTRODUCTION
more realistic estimates of the uncertainty associated with the contrasts of interest. Chapters 6.1
and 7.1 take a daily model for the agricultural data and apply it to (1) five days of data, and
(2) the full dataset, in a two-stage analysis. Chapters 6 and 7 utilise a new general purpose
software framework, pyMCMC [Strickland, 2010], to fit the models. This exploits the sparse
neighbourhood matrices typical of spatial data.
Chapter 5.2 introduces our conditional autoregressive (CAR) layered model, and fits many
differing fixed and spatial models using WinBUGS [Lunn et al., 2000], the purpose being to
determine a model which might describe all 61 days’ data.
Updating in WinBUGS (using the Gibbs sampler) is observation by observation, with con-
ditional probabilities defined on an observation by obervation basis. Chapters 6-7 use the more
efficient framework of Rue and Held [2005], which allows the exploitation of the sparse ad-
jacency matrices of Gaussian Markov Random Fields (GMRFs) to block update the spatial
components. In particular, this exploits the fact that the precision matrix of multivariate Gaus-
sian models describes the independence structure of the various elements [Whittaker, 1990].
The adjacency structures used to describe spatial dependence thus give rise to simple graphical
structures describing the data, and to more efficient estimation.
1.2 Research Aims
The aim of this thesis was to contribute to Bayesian statistical methodology by contributing to
risk assessment statistical methodology. and to spatial and spatio-temporal methodology, and
then to apply the new methodologies to risk assessments for recycled water in Western Australia,
and to the assessment of differences between cropping systems over time and over three spatial
dimensions.
The specific objectives were
1. to develop a method for the calculation of credible intervals for the point estimates of
Bayesian networks;
1.3. Structure of Thesis 37
2. to develop an appropriate DAG structure to allow the calculation of more credible credible
intervals for a risk assessment;
3. to model a single day’s data from the agricultural dataset which satisfactorily captured the
complexities of the data;
4. to build a model for several days’ data, in order to consider how the full data might be
modelled;
5. to consider the full four dimensional dataset and the time-varying nature of the contrast
of interest.
1.3 Structure of Thesis
This thesis has been written as a series of papers included as Chapters 3 to 7, which have been
published by, or submitted to journals, or which are in preparation, and which are here presented
in their entirety. Chapters 3 and 4 look at graphical models representing risks from re-used water,
loosely based on a Western Australian context. Both show methods for addressing uncertainty
more effectively via the use of graphical models. These two chapters address research objectives
(1) and (2), and have been published as papers. Chapters 5 to 7 address the uncertainty arising
from spatial correlations in three and four dimensional data, where the data address the problem
of salination of soils due to agricultural practices. Chapters 5 & 6 have been submitted to
journals. Chapter 7 describes work still to be crafted as a paper.
Chapter 2 comprises a Literature Review which covers Bayesian nets, graphical models, and
risk assessments for water, and provides the background and foundation for the methodological
component of the first two papers of the thesis, giving a more detailed basis than is available
in the papers. It includes a review of spatial correlation models and spatio-temporal models,
together with their use in an agricultural and three and four dimensional context, which again,
provides a more detailed background for the papers of Chapters 5-7.
38 CHAPTER 1. INTRODUCTION
Chapter 3 addresses the research aims by introducing a method for determining the uncer-
tainty associated with the point estimates in a Bayesian net, and uses this method in an applied
context to find credible intervals for marginal and conditional probabilities and relative risks for
various scenarios associated with the use of recycled water.
Chapter 4 advocates the inclusion of all experimental data on which a risk assessment may
be based, into a directed acyclic graph. This permits the estimation of the parameters simulta-
neously with the risk assessment, and thereby automatically includes all the experimental un-
certainty into the risk assessment. Thus, this chapter akes a typical Quantitative Microbial Risk
Assessment (QMRA), recasts the flow diagram as a graphical model, and adds further nodes
to allow the estimation of all parameters required by the risk assessment. In addition, for one
set of parameters, an errors-in-variables model is included. Creating a single graphical model
using all the data from which plug-in estimates are found allows more realistic estimation of
uncertainty associated with the estimated risk. By adding Markov chain to Monte Carlo, this
chapter contributes to risk assessment methodology. Its applied contribution is to introduce an
errors-in-variables model for the estimation of dose-response parameters.
Chapter 5 takes a three-dimensional spatial dataset and, using WinBUGS [Lunn et al., 2000],
compares models for the soil moisture profiles along the depth dimension by cropping treatment
(the fixed part of the model) and models for the spatial autocorrelation structure. An errors-in-
covariates model is also considered, since it has the capacity to model the fact that measured
depth does not represent depth within the soil profile. We introduce the layered CAR model and
show that the spatial autocorrelations are best modelled when depth is separated from the hori-
zontal dimensions. Here, too, for the regression component of the model, an interval censored
errors-in-variables model is fitted. Using conditional autoregressive (CAR) models, which work
from the simple sparse precision matrix, rather than using kriging models which work via (typ-
ically) dense covariance structures, complex fitted effects models may be fitted, even within the
inefficient observation by observation updating of WinBUGS.
Chapters 6 and 7 extend the modelling of Chapter 5 into the time domain, and use the sparse
1.3. Structure of Thesis 39
graphical structure of the adjacency matrices to estimate the models more efficiently. Chapter 6
is an exploratory foray into the field of spatio- temporal modelling in the context of agricultural
data, and attempts to see what might remain constant over time. It also presents a block updating
Gibbs sampler for the problem. Chapter 7 fits a complex model to the full agricultural dataset.
The model is complex in its fixed part but more particularly in its error structure where CAR
spatial autocorrelation models are used for each day and each depth (the layered CAR model)
with a first order adjacency structure defined across the horizontal layer. The purpose of the
model was to describe a contrast between treatments over time. The modelling methods used
were both a one- and a two-stage analysis. In the second stage analysis, contrast estimates from
the first stage are explored using various time series models. This chapter also shows that for
random walk models of order one, where there is just one observation for two error sources, the
model estimates may be overly sensitive to the priors for the variances.
An overview of the research, together with some issues which arise, is provided in Chapter
8.
1.3.1 Case study 1a: Bayesian network
Starting from a conceptual model which represented the factors and pathways by which recycled
water may pose a risk of contracting gastroenteritis, a graphical model (Bayesian net) was cre-
ated, which was quantified by an expert. The contribution to Bayesian net methodology is that
all model predictions, whether risk or relative risk estimates, are expressed as credible intervals,
instead of simple point estimates.
1.3.2 Case study 1b: QMRA
Quantitative Microbial Risk Assessments (QMRA) are the method of choice for the estimation
of health risks from pathogens. A typical QMRA is considered, and rather than working from
a set of plug-in parameters, we show how to estimate all such parameters contemporaneously
within the risk assessment, thereby incorporating all the parameter uncertainty arising from
40 CHAPTER 1. INTRODUCTION
the experiments from which these parameters are estimated. The method is illustrated by a
case study that involves incorporating three disparate datasets into an MCMC graphical model
framework. The contribution here is to recognise that any and all primary data underlying a
risk assessment should be incorporated into a graphical model to estimate risk, and, equally
importantly, that the dose-response model should be fitted as an errors-in-variables model.
1.3.3 Case study: Field trial data
The viability of rainfed grain cropping on the Liverpool Plains (New South Wales, south eastern
Australia) is threatened by salination of land and water resources. Salination is caused by exces-
sive deep drainage below the plant root zone which mobilises sometimes vast sub-soil stores of
salt deposited at the time of soil formation. Deep drainage occurs when rain infiltrates already
wet soil that has insufficient capacity to store the additional water. This excess saline water may
produce water logging and shallow saline water tables or may discharge at lower points in the
landscape or into surface- or ground-waters [Broughton, 1994]. When saline ground waters en-
croach on the crop root zone, the salt kills germinating crops or reduces yields depending on salt
concentrations and rainfall [Daniells et al., 2001]. This excess water is usually due to a combina-
tion of above average rainfall falling onto land farmed using long fallow cropping practices (ie.
the land is kept as bare fallow for about 2/3 of the time). Although long-fallow cropping usually
results in good grain yields for each crop, average yields over time are generally less than yields
from more intensive, but somewhat more risky systems. To overcome both the problems of
excess water in the landscape under long fallow cropping, and the risk of poor crop yields due
to insufficient water supply between successive crops where cropping is frequent, a practice of
planting a crop, appropriate for the time of year, crop health and economic considerations, in
response to soil water content (opportunity or response cropping) is being increasingly adopted
by farmers.
This problem was addressed by running a cropping experiment which compared essentially
three types of cropping. The primary question for the scientists was whether response cropping
1.3. Structure of Thesis 41
gives lower moisture values∗ both at the intermediate and greater depths, in comparison with
long fallowing, and whether this is sustained over different stages of the cropping cycle. Sub-
sidiary questions addressed here are the comparison of all cropping treatments with all pasture
treatments, and the comparison of the lucerne pasture mixtures with the native grass pasture.
Figure 1.1 shows the layout of the 9 treatments described in Sections 5.4.1 and 7.3 of Chapters
5 & 7, and Figure 1.2 shows the cycles of crops for treatments 1-6 throughout the five year
experiment.
The complexity of the data motivated the use of Bayesian methods to account for spatial
correlation and to answer the substantive question as to which cropping system was the most
viable form of agriculture.
This led to two submitted papers, the first of which explored possible models for describing
a single day’s data. This paper explored possible fixed models for the treatment curves with
depth, together with possible neighbourhood models for a CAR prior description of spatial
autocorrelation, and AR models in the horizontal directions. A major conclusion was that for
three dimensional data where an observation may be seen to have up to 26 immediate neighbours
in three dimensional space, there are immense advantages in not using depth neighbours, instead
allowing neighbours to be defined only at the same depth. This permits the possibility of having
differing spatial and non-spatial variances for each depth, a possibility which was found to
better describe the data. It also obviates the problem that neighbours in the depth dimension are
markedly closer to one another than neighbours across the horizontal layer, which gives rise to
the problem of choosing suitable weights. We call this model the CAR layered model.
The second agricultural paper took the lessons learned from the day’s modelling exercise and
considered five days of data. In doing this, the modelling framework of WinBUGS, exploited
so successfully in the earlier paper, was found inadequate to this larger data analysis task, since
models for these five days failed to converge to sensible estimates. An alternative approach was
to develop software specific to the task, and use block updating to eliminate the problem of very
∗Moisture in the soil is measured by the log(neutron count ratio), a surrogate for moisture. For details seeRingrose-Voase et al. [2003].
42 CHAPTER 1. INTRODUCTION
highly correlated MCMC chains. Working with Chris Strickland, modules for the CAR models
were built and incorporated into an MCMC modelling framework written in Python which uses
the libraries of Anderson et al. [1999], Blackford et al. [2002], Lawson et al. [1979], NumPy
Community [2010], together with methods based on Krylov subspace methods from Simpson
et al. [2008]. Rather than the improper CAR models of the earlier paper, the CAR model used
was the CAR proper model of Gelfand and Vounatsou [2003], but again modelled as a CAR
layered model.
The last agricultural paper (Chapter 7.1, in draft), analyses the full agricultural dataset,
basing the analysis on the work described in Chapter 6. Given the differences in spatial and
temporal variances, it made sense to analyse the dataset as a series of daily models, rather than to
assume, a-priori, some time varying data structure. Using the modelling framework developed in
Chapter 6, this chapter fits a complex model to the treatments, and models the contrast estimates
in a one- and two-stage analysis to answer the question as to whether response cropping results
in less moist soils than long fallow cropping. This chapter also considers problems associated
with random walk of order one models which posed problems for model choice.
1.3. Structure of Thesis 43
1.3.4 Agricultural data: Crop cycles and treatment layout
These figures were not able to be included in the papers of Chapter 5 - Chapter 7, but are included
here to complete the description of the agricultural data in those chapters. Figure 1.2 shows the
crops sown over the five year period of the trial, together with their treatment identification
(1-6). Figure 1.1 shows the layout of the 9 treatments in the field.
44 CHAPTER 1. INTRODUCTION
Figure 1.1 Site treatments for the agricultural data of chapters 5-7. Details of the 9 treatmentsare given in Section 5.4.1 and again in Chapter 7 in Section 7.3 in the description ofthe four-dimensional dataset.
1.3. Structure of Thesis 45
Figure 1.2 Crop cycles for the cropping treatments (Treatments 1-6). The vertical line indicatesthe date for the data analysed in Chapter 5. The three dimensional data are describedin Section 5.4.1 and again in Section 7.3 of Chapter 7 as a four-dimensional dataset.
46 CHAPTER 1. INTRODUCTION
Bibliography
Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Green-
baum, S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third
Edition (22 Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics
(SIAM).
Bellhouse, D. R. (2004). The Reverend Thomas Bayes, FRS: A biography to celebrate the
tercentenary of his birth. Statistical Science 19(1), 3–43.
Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kauf-
man, A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms
(BLAS). ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.
Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness.
Journal of the Royal Statistical Society. Series A (General) 143(4), 383–430.
Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salin-
ity studies - Liverpool Plains, TS94.026. Technical report, New South Wales Department of
Water Resources.
Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship
between yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions.
Australian Journal of Experimental Agriculture 41, 211–217.
Dunson, D. (2001). Commentary: practical advantages of Bayesian analysis of epidemiologic
data. American Journal of Epidemiology 153(12), 1222–1226.
Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models
for spatial data analysis. Biostatistics 4(1), 11–25.
BIBLIOGRAPHY 47
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian data analysis. Texts in
statistical science. London: Chapman & Hall.
Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions and the Bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6,
721–741.
Jordan, M. I. (2004). Graphical models. Statistical Science 19(1), 140–155.
Lawson, C. L., R. J. Hanson, D. R. Kincaid, and K. F. T (1979). Basic Linear Algebra Subpro-
grams for Fortran usage. ACM Trans. Math. Software 5(3), 324–325.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian mod-
elling framework: Concepts, structure, and extensibility. Statistics and Computing 10(4),
325–337.
NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release
1.5.0.dev8106. Available online: http://docs.scipy.org/doc/. Accessed: February
9, 2010.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating,
J. Scott, M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells,
R. Farquharson, R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage
under different land uses in the Liverpool Plains Catchment. Technical Report 3, Agricultural
Resource Management Report Series, NSW Agriculture Orange.
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca
Raton: Chapman & Hall/CRC.
Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov
random field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Avail-
able online: http://eprints.qut.edu.au.
48 CHAPTER 1. INTRODUCTION
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal
of Computational and Graphical Statistics, 1–46. submitted August, 2010.
Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New
York: Wiley.
Chapter 2
Literature Review
The literature review is divided into a review of two areas (1) Bayesian networks & Risk As-
sessment, and (2) Spatio-temporal modelling.
Within the Bayesian network literature we consider some theoretical literature in the pursuit
of credible intervals for point estimates. In the risk assessment section, we also demonstrate a
search for data for use in a risk assessment for recycled water. (However, having hunted, we
preferred not to undertake the planned risk assessment.)
2.1 Bayesian networks
2.1.1 Graphical models and Bayesian networks
A fundamental concept of graphical models is that of conditional independence. Two random
variables x and y are independent iff p(x, y) = p(x)p(y). This is written as x ⊥ y. Two variables x
and y are conditionally independent given z iff p(x, y|z) = p(x|z)p(y|z). This is written as x ⊥ y|z.
Note that x and y may be marginally dependent despite being conditionally independent given
z. This relationship of conditional independence is captured in the undirected graphical model
of Figure 2.1. (Explanation adapted from Rue and Held [2005].)
For multivariate normal data, conditional independence of x and y corresponds to a zero in
49
50 CHAPTER 2. LITERATURE REVIEW
Figure 2.1 An undirected graph for which x ⊥ y|z.
the precision matrix [Rue and Held, 2005, Whittaker, 1990]. Using undirected graphical models
as descriptions of conditionally independent terms has been used to determine data models by
Darroch et al. [1980], Edwards [1995], Wermuth and Cox [1998].
A Bayesian network, sometimes called a probabilistic network, is a directed acyclic graph
(DAG) that represents a set of random variables and their conditional dependencies.
Figure 2.2 illustrates four differing DAGs for the graphical model of Figure 2.1. The condi-
tional independence of x and y, defined by the relationship shown in Figure 2.1, is common to
the four DAGs of Figure 2.2. In graphical theory, x and y are called nodes and represent random
variables x and y. The topmost graph of Figure 2.2 may be read as ‘x explains z, which explains
y’. Or more causal language may be used: the graph second from the bottom may be read as ‘x
and y cause z’. In the bottommost diagram of Figure 2.2, the directed edge from z to x models
a causal relationship between z and x. When variables are connected in this way, the variable
from which the edge originates is called the parent and the variable to which the edge leads is
called the child. If V represents the set of n variables (x1, x2, ..., xn) of a network, then the joint
2.1. Bayesian networks 51
Figure 2.2 Four differing directed acyclic graphs (DAGs) with the same (undirected) structureas the undirected graph of Figure 2.1.
52 CHAPTER 2. LITERATURE REVIEW
probability of the network,
p(V) = p(x1, x2, ..., xn) =n∏
i=1
p(xi|Parents(xi)).
Thus, in Figure 2.2, for the topmost and bottommost figures, we have
p(x, y, z) = p(x)p(z|x)p(y|z),
p(x, y, z) = p(z)p(x|z)p(y|z).
These equations express the conditional independence properties inherent in the structure of
these two DAGs. An excellent reference which gives the theoretical background of Bayesian
net software, proofs and examples is Cowell et al. [2001].
In another useful reference, Korb and Nicholson [2004] emphasize the notion of ‘knowledge
engineering’, where Bayesian networks are used to encode structural relationships elicited from
experts. In addition to the theoretical bases for Bayesian net software, they describe techniques
for eliciting and verifying the resultant expert systems.
Bayesian nets have become a standard tool in artificial intelligence [Cowell et al., 2001],
and in the building of expert systems, which involves several stages. The first stage involves
‘natural judgements of relevance or irrelevance...’ with ‘missing edges in the graph encod(ing)
the irrelevance properties.’ [Cowell et al., 2001]. Having elicited a structure from experts,
the graph must be quantified by eliciting sufficient conditional probabilities to specify the joint
distribution.
Some important papers in the development of the theory of Bayesian nets are Andersen
et al. [1989], Boerlage [1992], Cowell and Dawid [1992], Jensen [2001], Jensen et al. [1995],
Lauritzen and Spiegelhalter [1988].
Pearl [1988] introduced the notion of d-separation (illustrated in the bottommost network
2.1. Bayesian networks 53
of Figure 2.2, which was important in in developing the first algorithms for Bayesian networks
and their use for artificial intelligence. The seminal paper by Lauritzen and Spiegelhalter in
1998 developed efficient algorithms for transfer of evidence within causal networks. Other key
contributors and contributions to the development of the theory and the software to implement
Bayesian nets were Cowell and Dawid [1992], Dawid [1992], Dawid et al. [1995]. Spiegel-
halter et al. [1993] show how parts of a network are updated with evidence using a Dirichlet
prior, while Jensen [1994] discusses how the various threads are drawn together to implement
Hugin[Hugin Expert A/S, 2007]. The Hugin webpage www.hugin.com/developer/Publications/
[Hugin Expert A/S, 2007] also lists various papers which underly its implementation.
Laskey [1995] looks at a measure of sensitivity for nodes via differentiation. Castillo et al.
[1997], Castillo et al. [1997] also propose measures for sensitivity analysis of Bayesian nets.
Jensen et al. [1991] consider measures of data conflict.
In addition to their use as expert systems, Bayesian nets may be derived from data which
allows elucidation of relationships in complex multivariate data. This can be thought of as being
a development of the work of Whittaker [1990], where the structure of the precision matrix for
multivariate normal models is used to determine the graphical structure of the model. This work
was continued and extended to categorical data in the software MIM [Edwards, 1995]. In the
context of Bayesian nets and directed as opposed to undirected models, Steck [2001] developed
the necessary path condition (NPC) algorithm to determine directionality, when using data to
‘recover’ the DAG structure of a Bayesian net. The work by Boerlage [1992] on ‘link strength’
also develops the data mining capacity of Bayesian nets. Lauritzen [1995] develops an EM
algorithm for fitting data to a predetermined Bayesian net. Neapolitan and Jiang [2007] further
detail types of learning for BNs. However, while this is an extremely valuable area of research
for finding meaningful parsimonious models to explain multivariate data, this is not a direction
of my research.
There is some work relating to the uncertainty associated with nets implied by data, see,
for example, the chapter on structural learning in Cowell et al. [2001]. Van Allen et al. [2001,
54 CHAPTER 2. LITERATURE REVIEW
2008] demonstrate a method whereby uncerainty may be propagated through a network. They
show that under certain conditions a query response is asymptotically Gaussian and provide
its mean and asymptotic variance. However, our work differs somewhat from theirs: we elicit
credible intervals for all the probabilities of the conditional probability tables, and represent this
uncertainty by using beta distributed priors, and, in general, the probabilities we use are close
to 0 and 1. Van Allen et al. [2008] populate their nets with Dirichlet mixtures generated by the
conditional probability tables, where the majority of the probabilities are not close to 0 or 1.
2.1. Bayesian networks 55
2.1.2 Bayesian networks: applications
Bayesian networks are used very widely to envisage the consequences and risks in complex
systems. Thus, for example, Matias et al. [2007] use a BN to evaluate the environmental impact
of a mine, Martin et al. [2009] use BNs to look at accidents from falls, Rassmussen [1995] uses
them to relate blood type to cattle parentage. Barker et al. [2002] build a BN for Clostridium
botulinum growth model combining contamination processes, thermal death kinetics for spores,
germination and regrowth of cells, toxin production and patterns of consumer behaviour.
Within the environmental and water research literature Bayesian nets have also been widely
used. See, for example, Hamilton et al. [2007], Nicholson et al. [2003], Pike [2004], Pollino
and Hart [2005a,b], Pollino et al. [2007], Varis [1995, 1997, 1998].Kennett et al. [2001] Most of
these examples integrate expert opinion and data to consider a problem. Pike [2004] is partic-
ularly relevant to the concerns of this thesis, in that he builds a net to describe water treatment
plant failures on the Susquehanna River, using treatment plant failure data.
Albert et al. [2008] use directed acyclic graphs, via WinBUGS [Lunn et al., 2000], data from
food diaries, from sampling chicken flocks for contamination, and broiler production together
with expert opinion to quantify the risk of contracting a campylobacteriosis as a result of broiler
contamination. Their BN model integrates all these disparate data sources to produce a risk
estimate.
Influence diagrams integrated with Monte Carlo simulation were used by Casman et al.
[2000] to look at how long it might take to stem an outbreak of cryptosporidiosis taking into
account time for authorities (both health and water) to react and the likelihood and proportion
of the public’s compliance with advisories.
Object oriented Bayesian networks have been used for process control [Weidl et al., 2003,
2005], in a paper pulp plant. In this application, hidden markov models and ‘soft’ Bayes classi-
fication were used to detect and anticipate particular types of failure in the plant.
Barker et al. [2002] build a BBN for Clostridium botulinum growth model combining con-
56 CHAPTER 2. LITERATURE REVIEW
tamination processes , thermal death kinetics for spores, germination and growth of cells, toxin
production and patterns of consumer behaviour, and comment that it ‘supports, and prioritises,
decisions and actions that minimise the chances and extent of detrimental events and maximise
opportunities for awareness and control’.
In the risk assessment of Chapter 3.2, the Bayesian net built fits into the knowledge engi-
neering framework of Korb and Nicholson [2004], where a net is built to describe and synthesize
qualitative expert knowledge, quantify it meaningfully and look at the implications of the model
built. Marcot [2006], Marcot et al. [2006] emphasize the need to use such implications to refine
the model. The net of Chapter 3.2 is built using expert knowledge. However, its main interest is
in the development of credible intervals for the risk ratios of interest.
2.2 Risk Assessments for Pathogens
In this section I explore the ‘how-to’ as described in influential books, as prescribed by the World
Health Organization, and by peak Australian water bodies, as well as the ‘how-to’ exemplified
and extended by various papers. I also look at the data mandated to be collected by regulation
(or law) in Western Australia, for the protection of the public and the environment. And finally,
I explore data which might be used for a risk assessment of a pathogen for recycled water in
Western Australia.
2.2.1 Risk Assessment methodologies
The literature for risk assessments and risk assessment methodology for drinking and receiving
waters is extensive. With respect to risks from pathogens, Haas and Eisenberg [2001] contribute
a chapter to Fewtrell and Bartram [2001], (the WHO Guidelines for water quality standards and
health), which looks at two types of quantitative microbial risk assessments (QMRA), static and
dynamic, and recommend establishing which of these two types of risk assessment is appropriate
to the pathogen and situation under consideration. The WHO guidelines [Fewtrell and Bartram,
2.2. Risk Assessments for Pathogens 57
2001] include a chapter on what constitutes an acceptable or tolerable risk [Hunter and Fewtrell,
2001]. And of course, there are many books on risk assessment, see, e.g. Burgman [2005], who
looks more generally at ecological risk modelling.
Looking at risk assessments associated with water, there are many papers, reports and guide-
lines, some of which mandate how a risk assessment should be undertaken. In the last decade
or more, the interest has shifted from chemicals with their lifetime, short-term and foetal risks,
to risks from pathogens and the potential for water-borne epidemics. ‘How-to’ manuals include
Roser et al. [2006], Petterson and Ashbolt [2006], Ashbolt et al. [2005]. (The last of these
proposes quantitative microbial risk assessments via a GUI front end programmed in Analytica
[Lumina Decision Systems, 2004]). An Austalian example of a risk assessment associated with
wastewater is Roser et al. [2006].
The literature discussed thus far works within the paradigm of Haas et al. [1999], who sum-
marise much prior work, and give many tables of constants and rates for use in risk assessments.
An alternative approach is to model the risks in terms of exposure, latent infection, symp-
tomatic infection, immunity and (again) exposure. This draws on the work of Anderson and
May [1991] and earlier researchers analysing epidemics and vector-borne diseases. This model
is termed the ‘dynamic’ model by Olivieri et al. [2007] and is based on sets of partial differential
equations, whereby individuals progress through a Markov chain model in a sequence of expo-
sure, latent infection, symptomatic infection, immunity and re-exposure stages. This model is
sometimes called a compartmental model and is illustrated as Figure 2.3, together with its re-
lated partial differential equations 2.1. The model can be used to simulate risk [Eisenberg et al.,
1996], or to analyse outbreak data [Eisenberg et al., 1998]. (Further details of this model are
given in an addendum to this literature review (Section 2.4). Eisenberg et al. [1998] modelled
daily symptom onset, that is, the time scale was extremely fine and unlikely to be replicated in
any available Australian data, where monthly incidence data is probably the best one can hope
for. Thus, in most cases, if we were to analyse data using such a model, we would not need the
many latent infection stages used by Brookhart et al. [2002], Eisenberg et al. [1998]. While if
58 CHAPTER 2. LITERATURE REVIEW
we wished to use such a model as the basis of a risk simulation, the number of rates required to
be known, mitigates against its use.
The ‘static model’ currently most used in microbial risk assessments, requires a description
of the dose-response curve for the pathogen, together with a basis for calculating the likely
dose of the pathogen in the water being considered, while a ‘dynamic’ model run as a risk
simulation requires estimates of duration of immunity, pathogen shedding rates, latent infection
rates and more [Eisenberg et al., 1998]. Thus, both kinds of risk assessment have considerable
data requirements.
In a related discussion about risks from stormwater, the Water Environment Research Foun-
dation et al. [2007] commented that the data available to parameterize MRA models were not
robust, and this was our feeling with respect to running a simulation study for wastewater risks
based on the various constants available.
Outbreak reports are of considerable use in pinpointing the causes of outbreaks and hence
potentially extremely useful for risk assessment. Papers and reports listing outbreaks through-
out the developed world are Hrudey et al. [2002], Karanis et al. [2007], Nadebaum et al. [2004],
Rizak and Hrudey [2007], Sinclair [2005]. The long list of incidents cited in Nadebaum et al.
[2004] show that plant failures, human stupidity, extreme weather, and loss of corporate knowl-
edge have led to catastrophic water-borne disease outbreaks via the drinking water supply in
Europe and the US. Thus, plant failures, rather than the perfect plant operation postulated in
many risk assessments should be an important component of any water-borne disease risk as-
sessment. Khan [2010], too, makes the point that ‘risks to public health are determined by the
performance reliability of the system’. Rizak and Hrudey [2007] in looking at current sampling
methodologies for clean water discuss the problem of “intermittent, event-driven contamination
or system failure”. Event-driven contamination is also discussed by Signor and Ashbolt [2006]
who examine significant rainfall events in the Sydney water catchment and find that they give
rise to potentially large numbers of Cryptosporidium oocysts in the water supply. They too,
remark with Rizak and Hrudey [2007] that routine monitoring may well be both misinforma-
2.2. Risk Assessments for Pathogens 59
tive and uninformative. Woo and Vicente [2003] examine the water-borne disease outbreaks at
Walkerton and North Battleford using the framework of Rasmussen [1997] and conclude that
to minimise risk, the players in the complex sociotechnical system of clean water supply must
be identified, the objectives at each level be explicit, systematic feedback between all levels be
required, and that the players at each level be both competent and committed to safety. Westrell
et al. [2003] address the issue of plant reliability in examining risks for a water treatment plant
in Sweden. Clearly, this should be a component of any risk assessment. The pity of it is, that
we typically undertake risk assessments before embarking on new projects, but not when tech-
nology is old, except in response to an identified disaster. However, I ignore this component
in a risk assessment for wastewater: reliability data for wastewater treatment plants in WA is
limited to the data available in the annual audit reports [Water Corporation, 2010], and relates
to overflows, nutrients and odour issues.
2.2.2 Data for a risk assessment
To undertake a risk assessment on the lines laid down by Haas et al. [1999] or Natural Resource
Management Ministerial Council et al. [2006] one needs
1. A description of the microbe numbers in either the waste water, or in the final treated
water.
2. Data or summarised log reductions of decimal elimination capacity (DEC) for each stage
of the water treatment. See, e.g. Hijnen et al. [2007, 2005, 2004].
3. Data or a description of the effect of other pathogen reducing processes such as die-off in
sunlight, e.g., Sidhu et al. [2008].
4. An amount of crop/water ingested by a person, under the risk scenario. One may use
survey data (see.e.g.,Mons et al. [2007]) if available, or use choices made by other re-
searchers, such as those of Tanaka et al. [1998].
60 CHAPTER 2. LITERATURE REVIEW
5. Data or a description for the dose of the pathogen and the probability of infection or other
outcome.
We focussed on the recycling plant at Subiaco wastewater treatment facility in WA, to con-
cretize the risk assessment problem. Subiaco is a class C plant where recycled water is destined
for non-potable uses [Isaac, 2008a, email], and therefore, post treatment water samples are tested
for E.coli only [Isaac, 2008b, email]. Results are owned and kept by the Water Corporation, and
sent to the WA Department of Health, and are not publicly available. Subiaco WWTP like many
other WWTPs produces an annual audit report which details discharge loads and concentra-
tions for total nitrogen, total phosphorus and some other parameters [Water Corporation, 2010].
No pathogens are reported. A good reason for the non-collection of both viral and microbial
pathogens is that they are difficult to detect in treated wastewater due the presence of inhibitory
chemicals and suspended solids. Viruses, being present in small numbers, require very large
samples. Recovery methods are “too cumbersome or complicated to be used on a routine basis”
[Toze, 1999, 2004].
Another planned use for reclaimed wastewater in WA is for aquifer recharge, with water to
be reclaimed to potable standard and then to be injected into groundwaters for storage. The fate
of various microorganisms, when treated wastewater was discharged to groundwaters in Western
Australian, is reported in Gordon and Toze [2003], Toze [2002, 2004], Toze et al. [2002, 2004],
while Toze et al. [2005] look at selected pathogens, at the sprinkler and pre- and post-irrigation
on the grass of McGillvray Oval, and find that Salmonella, while present in the sprinkler water
is not found in the grass (seven samples). Dillon et al. [2008] examine combined engineered
and aquifer treatment systems in water recycling, and give pathogen removal rates for various
treatments for aquifer recharge.
In a recycled water context, Petterson et al. [2001], Petterson and Ashbolt [2001], Petterson
[2002] looked at viruses and other pathogens and inactivation rates when contaminated water
was used to irrigate salad crops.
2.2. Risk Assessments for Pathogens 61
Log reduction data may be found for many pathogens in Hijnen et al. [2007, 2005, 2004].
However, Smeets et al. [2008],Smeets et al. [2008] find that the log reduction model does not
work well for their Campylobacter source water and treated water data, and Smeets et al. [2007]
find that reduction rates for Cryptosporidia are largely dependent on the source water, i.e, the
treated water concentrations are not a function of the source water concentrations. In using such
sources for an Australian risk assessment, we note that the very great differences in climate
(and possibly catchment waters), together with the differences between data for pilot plants
and data under actual conditions make these reduction rates subject to even greater uncertainty.
It remains a largely unanswered question as to whether experimental European and American
results for treating pathogens in water translate to the Australian environment. Sydney Water
(David Roser, 2008, pers.comm.) has collected pre- and post-treatment data for Cryptosporidia
and Giardia, but these data are not publicly available. Signor [2007], collected baseline and
runoff data in a Sydney catchment for Cryptosporidia, Campylobacter, Giardia and E.coli to
consider risks to the water supply from heavy rainfall events. Toze et al. [2004] consider water
quality improvements for (among other things) pathogens with aquifer recharge. Again, such
data are not publicly available for reanalyis.
The majority of the proposed and actual recycling plants in WA are add-ons to existing
wastewater treatment plants, whose treated water now flows to the sea or a river. Heavy rainfall
events in a catchment, which can overload a wastewater treatment plant due to storm water
runoff, have little effect on the recycled part of the plant, which for the McGillvray project,
takes less than 4% of the total throughput of wastewater for reprocessing [Water Corporation,
2011a,b]. However, as we start to recycle greater and greater proportions of our wastewater,
high rainfall events may need to be factored in to recycling plant failures. Wastewater treatment
plants are required by law, bylaw or regulation to monitor their effluents for many contaminants,
but currently the routine monitoring regime only involves such things as suspended solids, total
nitrogen and total phosphorus [Water Corporation, 2010].
62 CHAPTER 2. LITERATURE REVIEW
Many experiments to determine the likelihood of infection or of overt disease from a par-
ticular dose of a pathogen are summarised by Teunis et al. [1996]. Salmonella dose-response
relationships are typically based on the experiments of McCullough and Eisele [1951a,b,c,d]
from the early 50s. Rotavirus dose-response equations are usually based on Ward et al. [1986],
while those for Giardia are based on the experiments of Rentdorff [1954]. The errors in mea-
surement of the Salmonella experiments are probably extremely large (Toze, 2009, pers comm),
and the experiments themselves were conducted on adults (prisoners and ‘volunteers’). These
volunteers form a particular cohort with immunities (potentially) quite different from any/most
population subgroups today. Haas et al. [1999] shows that the dose-response infection rates
for Campylobacter jejuni taken from Teunis et al. [1996] are of the order of the infection rates
calculated for the Milwaukee outbreak. However, depending on the pathogen, patterns of im-
munity within the population can vary considerably between groups [Tawk et al., 2006] & over
time [Jacobsen and Koopman, 2004]. Blaser and Newman [1982], far earlier, point out the dif-
fering susceptibilities for different agegroups, for the immuno-compromised versus the healthy,
for those who have had gastric surgery and other subgroups.
Salmonella data for a risk assessment
A number of experiments involving Salmonella die-off using some of the treatments for sewage
have been conducted [Karim et al., 2008, Palacios et al., 2001]. Wastewater effluents have been
sampled for Salmonella [Kinde et al., 1997].
Salmonellosis is a notifiable disease in Australia and summaries of cases and rates of cases
by month and by state are available at http://www9.health.gov.au/cda/Source/Rpt 5.cfm. How-
ever, these case numbers and rates are always an underreporting. Hall et al. [2006] used surveys
to build a model to allow prediction of what fraction of those with a diarrhoeal disease might
make it through the multiple hurdles to be recorded in the national database. Hall [2004] reports
on Salmonellosis in a study of foodborne gastroenteritis in Australia. Hall and Kirk [2005] es-
timate how much enteritis is due to food, and in particular, what proportion of illness for each
2.3. Spatio-temporal modelling 63
pathogen may be attributed to food.
Given the non-availability of primary data for a risk assessment and the need to make many
assumptions, I chose to approach the risk assessment task from a largely hypothetical viewpoint.
Hence, the paper of Chapter 3.2 constructs, quantifies and looks at the implications of a BN for
the risk of diarrhoea associated with the use of recycled water, and in doing so, constructs
credible intervals for the point estimates of probabilities for the scenarios of interest. The paper
of Chapter 4.2 takes the flowchart of a risk assessment, converts it to a DAG containing all the
disparate data needed for the usual plug-in estimates, and shows that estimating these within the
risk assessment process contributes greater uncertainty to the risk estimates. It also shows how
to build an errors-in-variables model for the dose-response equations taking the view that this
type of model is a more appropriate model.
2.3 Spatio-temporal modelling
Motivating case-study for spatio-temporal modelling. Motivating this literature review is a
dataset supplied by the NSW Department of Agriculture, which consists of moisture measure-
ments taken at 15 depths from a field experiment laid out in 6 rows of 18 plots. The measure-
ments were made over a period of 5 years from June 26, 1995 to May 23, 2000, measurements
being made for 61 different dates, spaced roughly one month apart. The dimensions row, col-
umn, date and depth are essentially orthogonal. (Not all plots were measured on each sampling
day.) The experiment involved randomised complete blocks of 9 treatments, with measure-
ments taken every 20 cm from 20 cm to 300cm. The main concern was to assess the differences
between long fallow treatments (3 treatments) and opportunity cropping (2 treatments).
Thus, the data are lattice data measured over four dimensions. The purpose of the analysis
is estimate the effects of the nine treatments on the profiles of soil moisture over depth and over
time, and to estimate the difference between long fallow cropping and opportunity cropping.
These data are supplied as point-referenced data. Specialist methods for the analysis of
64 CHAPTER 2. LITERATURE REVIEW
spatial data are required since observations cannot be thought of as independent. Observations
which are close to each other in space (and/or in time) are typically correlated with each other
and this autocorrelation must be accounted for in the analysis.
Methods for the analysis of two dimensional spatial data: Point-referenced vs areal data.
There are two broad ways of thinking of spatial data, firstly as point-referenced data, where each
response is located at a point in space, and secondly, as areal data, where the response may be a
summary or aggregate of data for an area.
Generally, when data are point-referenced, analysts use point-referenced analysis methods
for dealing with auto-correlation, while where the data are summarised over a spatial area,
analysts use areal data methods, typically conditional autoregressive (CAR) models.
These data are supplied as point-referenced data and therefore each day and depth of these
data could be analysed using the spatial methods available for point-referenced data, such as
those suggested in, for example, Banerjee et al. [2004] or Cressie [1991], where various as-
sumptions are made about the data, such as smoothness and differentiability, which lead to vari-
ogram fitting, kriging and perhaps assessment of isotropy. Examples of applied point-referenced
spatial data analyses are Clements et al. [2008], Raso et al. [2006]).
Cressie [1991], Schabenberger and Gotway [2005] distinguish data types by spatial domain.
Geostatistical data is such that the response variable Z(s) at point s is observable everywhere
within the spatial domain D. Thus, between any two sample locations, there are theoretically
an infinite number of possible samples. They contrast this with lattice and regional data, where
the domain D is fixed and discrete, and thus both non-random and countable, and comment
that such data usually represent areal regions, where the response is some aggregation over the
region. The distinction becomes important when change of support becomes an issue [Banerjee
et al., 2004, Gotway and Young, 2002, Schabenberger and Gotway, 2005].
The geostatistical model [Banerjee et al., 2004, Cressie, 1991] is based on a point process
being weakly stationary. A spatial process Y(s) is weakly stationary if µ(s) ≡ µ, where µ(s) =
2.3. Spatio-temporal modelling 65
E(Y(s)), and Cov(Y(s), Y(s + h)) = C(h) for all h ∈ ℜr where s and s + h lie within D ∈
ℜr. A third type of stationarity is intrinsic stationarity, where E(Y(s + h) − Y(s)) = 0, and
E(Y(s + h) − Y(s))2 = Var(Y(s + h) − Y(s)) = 2γ(h). That is, the variance of the difference is
a function of the distance between the two points and nothing else. Weak stationarity implies
intrinsic stationarity. In order to specify a stationary process a valid covariance function must
be provided. That is, c(h) ≡ cov(Y(s), Y(s + h)) is such that for any finite set of sites s1, s2, ..sn
and for any a1, a2, ..an, Var[∑
aiY(si)] =∑
aia jCov(Y(si),Y(s j)) =∑
aia jc(si − s j) >= 0,
with strict inequality if not all the ai are 0. That is, c(h) must be a positive definite function.
Generally, someone seeking to fit such a model chooses one of a variety of possible covariance
functions which satisfy this condition in ℜr (usually in ℜ2). (This paragraph paraphrases text
from Banerjee et al. [2004].) Geostatistical models allow prediction at points where data has
not been observed.
Point-referenced data have been contrasted with ’areal’ data, for which other special meth-
ods of analysis have been devised. However, while data may be aggregated by area, with the
area being defined by a polygon, rather than a point, it is not the case that the data analysis
method is necessarily dictated by the spatial referencing system. A Voronoi (or Dirichlet) tes-
selation (or polygon) (see, e.g., Green and Sibson [1978]) may be formed for any set of points,
and equally any polygon may be considered to have a point mass at some sort of centroid. Thus,
methods devised for one type of data or another may be used regardless of the original form of
the data. What matters is whether the method makes sense for the data at hand.
We note that for the agricultural data of our case-study, the ecological fallacy [Robinson,
1950, 2009], where an analysis of aggegrate data is used for inference on the individuals being
aggregated, is not an issue: these data are essentially point data, unlike agricultural yield data
which are aggregates of the plot.
Areal data are often analysed using the notion of ‘neighbour’, using adjacency matrices and
corresponding weight matrices. Such analyses are based largely on the work of Besag [1974]
and Besag et al. [1991], where a local conditional specification determines a joint and global
66 CHAPTER 2. LITERATURE REVIEW
distribution and allows spatial smoothing [Banerjee et al., 2004]. Conditional autoregressive
(CAR) models were advocated for use in agricultural contexts by Besag et al. [1995], Besag and
Higdon [1999].
The differences between CAR and geostatistical models may be more apparent than real.
Following Rue and Tjelmeland [2002], Hrafnkelsson and Cressie [2003] ‘calibrate’ a geostatis-
tical model using a Matern covariance structure to a CAR model, and conclude that the CAR
model is faster to run. This is not surprising, since the CAR model employs a sparse precision
matrix, and requires no inversion of the covariance function. More recently, Besag and Mondal
[2005], Lindgren et al. [2011] show the equivalence of various geostatistical and CAR models.
CAR neighbourhood models have been widely used for areal data spatial analyses, particu-
larly in health or economics, where for privacy/ethical reasons, data are usually aggregated over
some administrative spatial unit. For examples, see, Clements et al. [2008], Reich et al. [2007],
Song et al. [2011], all of whose papers use the convolution prior of Besag et al. [1991].
Two dimensional agricultural data have a whole set of methodologies specially devised for
them. These include the use of kriging models [Banerjee et al., 2004, Cressie, 1991]. Com-
mercial packages such as SAS [SAS Institute, 2004] and Vesper [Whelan et al., 2001] allow
geostatistical models to be fitted. Within WinBUGS [Lunn et al., 2000], both CAR and geosta-
tistical models may be fitted.
2.3.1 Two dimensional Lattice data analyses
Spatial ‘lattice’ data have a long history of methodologies, since agricultural trials are typically
laid out in lattices. Spatial effects in agriculture have been recognised as an issue for a long time
and this has led to considerable effort in experimental design to avoid spatially biased estimates.
However, a recognition that experimental design alone cannot overcome the problem, since
it may still be the case that errors are spatially correlated, gave rise to work such as that of
Papadakis [1937], and later to that of Bartlett [1978] who established that Papadakis’ approach
gave more efficient estimates. Baird and Mead [1991] review a number of methods for local
2.3. Spatio-temporal modelling 67
adjustments, and remark that the first difference plus errors method from Besag and Kempton
[1986] is equivalent to the linear variance approach of Williams [1986] and show this to be a the
most efficient approach, or close thereto.
Cullis and Gleeson [1991] advocate the use of ARIMA methods for modelling both row
and column residuals. In a later version of this approach, Gilmour et al. [1997] fit a complete
blocks model and AR(1), AR(1) models as a starting point for their REML modelling, and
look at kriging graphs on the residuals to determine how the data may be better modelled by
the introduction of further ‘global’ extraneous random effects. This approach is continued in
the work of Durban et al. [2003] who in addition to the AR(1) models for local effects, fit
semiparametric smoothers separately by row and column. Singh et al. [2003] use a similar
approach but consider linear & cubic splines in various combinations with AR(1) models for
both rows and columns. Martin et al. [2006] consider efficient experimental designs for the
complex spatial analyses of Cullis and Gleeson [1991], Gilmour et al. [1997], and remark that
“Current practice in NSW Agriculture, Australia is to use a spatial model for the dependence
fitted using ASREML [Gilmour et al., 1995].”
Another set of possible ways of modelling is found in the mixed modelling approach of
Piepho et al. [2003, 2004], Piepho and Ogutu [2007], Piepho et al. [2008], Piepho and Williams
[2010]. Typically, a model based on the experimental design is fitted, together with global and
local spatial smooths as needed, see, e.g., Stefanova et al. [2009]. Such a model may deal
with anisotropy using the AR1*AR1 choice of Gilmour et al. [1997], or by fitting an anistropic
kriging smooth as in Stefanova et al. [2009].
Thus, the critical first choice for the three dimensional modelling was a to choose a frame-
work for two-dimensional modelling. Whatever model was chosen needed to be common to all
depths and days.
A disadvantage of the CAR modelling choice within the WinBUGS framework is that
weights must be chosen a priori and not estimated as in Besag and Higdon [1999] and Besag
and Mondal [2005]. The lattice framework, so typically found in agricultural data, generally
68 CHAPTER 2. LITERATURE REVIEW
needs an anisotropic treatment such as that found in the Besag models already cited and in, for
example, the models of Gilmour et al. [1997], Stefanova et al. [2009].
The primary difficulty was the determination of the spatial model. Given that the data are
point referenced, an obvious choice for a spatial autocorrelation model was a kriging model,
such as that of Gotway and Cressie [1990]. However, the large number of terms (45-135) in
the fixed part of the model made such an approach impossible within the MCMC framework
of WinBUGS. Additionally, including depth in the calculation of distance, would have meant
greater difficulties in disentangling treatment effects over depth from spatial modelling consider-
ations. Software such as SAS PROC MIXED [SAS Institute, 2004] offers the possibility of both
kriging and the various correlation model structures of Gilmour et al. [1997] within a REML
or ML framework, but when a model is poorly specified or very complex, PROC MIXED can
be difficult to use, and neither SAS [SAS Institute, 2004] nor ASREML [Butler et al., 2007,
Gilmour et al., 2005], nor Genstat [VSN International, 2011], nor Vesper [Whelan et al., 2001]
is freely available. We had hoped to show that the CAR models used were comparable to the
AR(1), AR(1) basis models of Gilmour et al. [1997], Stefanova et al. [2009]. However, the
AR(1), AR(1) models were unable to be fitted within WinBUGS with the desired complexity,
but did show comparability with the best CAR models of Table 5.1 (∆DIC = 257). A general
consensus for agricultural lattice data is that they should be dealt with anisotropically.
Alternative methods for lattice data are considered by Besag and Kempton [1986], Besag
[1974], Besag et al. [1995], Besag and Higdon [1999] using variants of conditional autoregres-
sive (CAR) models. These methods for analysing spatially correlated data have been available
via the freely available software, WinBUGS, for some time and many papers have been written
using conditional auto-regressive (CAR) models to smooth spatial data, particularly in the field
of spatial epidemiology, see, for example, Bernardinelli et al. [1995], Clements et al. [2008],
Earnest et al. [2010], Elliott [2000].
2.3. Spatio-temporal modelling 69
2.3.2 Agricultural studies with measurements at different depths
There are a large number of soil studies which consider some soil characteristic at various depths
in the soil. Some do this in circumstances almost identical to this study. Thus, e.g., Wong et al.
[2008] look at soil depth profiles for soil organic carbon. The study analyses depth profiles
only. Macdonald et al. [2009] look at 3 soil profiles, taken from 0.1 to 1.8 m and use observed
means and standard deviations for each depth and soil for statistical testing. Sleutel et al. [2009]
compare three different forests soils after compositing. There is no spatial modelling in any
dimension for these data. Strahm et al. [2009] look at dissolved organic carbon and dissolved
organic nitrogen at two depths (20 cm and 100 cm) and compare two harvesting treatments
in forests, using monthly measurements taken over a period of 3 years, and compare various
characteristics via regressions on geometric means. Comparisons are made via t-tests using
observed means and standard deviations. Shillito et al. [2009] use REML and SAS PROC
MIXED [SAS Institute, 2004] to deal with two dimensional spatial correlation using kriging in
a study of potato yields as a response to a nitrogen fertilizer experiment.
Wong et al. [2008] fit depth profiles at 7 sites, accounting for correlation over depth by
allowing correlation to diminish as a power of distance and allowing for heterogeneous variances
between depths. For the depth profile curves they used the cubic spline approach of Verbyla
et al. [1999]. Nayyar et al. [2009] account for spatial variation using the mixed model approach
of Wang and Goonewardene [2004] and fit 4 depths as fixed categorical effects, ignoring their
orderedness and continuity, which is reasonable with just 4 depths. Differing variances for
different depths are not used.
Ayars et al. [2009] look at groundwater measurements over time at two depths in a field
trial. The combinations of different depths and treatments are treated as differing treatments.
The spatial dimensions of the data in the field are not used and the analysis becomes a time
series analysis of treatments over time, with autocorrelation modelled as an AR(1) process.
Thus, despite the papers above involving the third spatial dimension, they provide little in
70 CHAPTER 2. LITERATURE REVIEW
the way of a paradigm for a three-dimensional analysis.
2.3.3 Spatio-temporal data analyses
When we consider two-dimensional spatial analyses and time, we find that analyses fall into the
following categories:
1. Simple description. See, e.g. Bell et al. [2007], Teschke et al. [2001] who use maps at
various timepoints as descriptive devices.
2. Spatial analyses at various timepoints.
3. Temporal analyses at various spatial sites. See, e.g.Lemos et al. [2007], who, having
failed to find much influence in terms of spatial proximity, model time sequences for each
site using timeseries methods.
4. Spatio-temporal analyses with separable time and site effects.
5. Spatio-temporal analyses with both separable & non-separable time and site effects.
In group (4) we find Knorr-Held and Besag [1998], who in an early space-time model for epi-
demiological data, model the residuals from their fixed model as a spatial residual component
(which remains constant over time) and a time residual (which remains constant over space) plus
an unmodelled residual. Adebayo and Fahrmeir [2005], while including time varying covariates,
again include a common residual spatial component and a common residual time component.
Another model with complete separability of time and space effects is used by Crook et al.
[2003], who use the nonparametric penalised spline smooths available in BayesX [Belitz et al.,
2009a,b] to fit a smooth over time, in addition to smooths over covariates such as age, while
fitting the spatial CAR model of Besag et al. [1991]. The more complex models are generally
Bayesian and use either geostatistical methods or the CAR priors of Besag et al. [1991]. Models
with CAR priors typically partition the error term in the model, ϵit, as ei, et and eit, where the first
two error terms capture the structured spatial random effects and the structured temporal random
2.3. Spatio-temporal modelling 71
effects, and eit is a simple unstructured random effect with eit ∼ N(0, σ2). See, e.g., Adebayo
and Fahrmeir [2005], Crook et al. [2003], Knorr-Held and Besag [1998], Poncet et al. [2010],
Waller et al. [1997], where the last three papers use BayesX software[Belitz et al., 2009a,b] to
conduct the analysis. In these models, again, no structured time-space interaction effects are
fitted, only a final common error term. The structure of the group (4) models recognises that for
data aggregated over administrative units, as is so often the case for epidemiological data, the
commonalities of that area, in terms of demographic structure, but even in terms of various kinds
of pollution exposure are likely to be relatively constant for the time periods considered. And
again it is not unreasonable to assume that the time scale errors, which might be thought of as
the result of administrative changes, are common to the entire map. Thus, many spatio-temporal
analyses of epidemiological data use this framework.
In the group (5) models, in an analysis of epidemiological space time data, Abellan et al.
[2008] postulate a model with two CAR components, one for a constant spatial residual struc-
ture and another CAR model for the time component, but they fit the final space-time random
component as a Gaussian contamination mixture of large and small residuals, thereby allow-
ing the identification of sites and times which deviate from the common spatial residual and
common time residual of the models of group (4).
Assuncao et al. [2001] integrate time and space in a different way. They fit quadratics in time
which differ for each spatial location, but for which the coefficients are smoothed using CAR
priors. This elegant solution to a very short time sequence data allows the possibility of seeing
increasing and decreasing infection rates, while accounting for spatial closeness, and Assuncao
[2003], Assuncao et al. [2002] again use space-varying regression coefficients. In a further
variation, Yan and Clayton [2006] use the space-time interaction to define a set of space-time
separable clusters carrying a specific risk, and fit a final unstructured random effect.
A further very different approach to spatial modelling is that of Higdon [1998] who uses a
convolution approach with non-stationary dependence structures. This allows local anisotropies.
The approach was needed because the geostatistical models of Cressie [1991] are largely un-
72 CHAPTER 2. LITERATURE REVIEW
workable with large datasets. Papers using this approach for spatial smoothing are Lemos and
Sanso [2009], Sahu and Challenor [2008], who give snapshots over time (group 2). A point
made by Higdon [1998] is that ocean datasets over time differ markedly from most other spatio-
temporal data in that measurements at a different time are not made at the same spatial location.
Looking at spatio-temporal analyses within an agricultural context, the analysis of Trought
and Bramley [2011] effectively fits all spatio-temporal interactions to look at the quality of grape
juice by site across time. Their strategy is to fit different curves across time for each site, and
then to look at spatial outcomes of their model by mapping (a group 3 type approach).
In considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu
[2007], Piepho et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009]
use mixed models within a REML framework to analyse their spatio-temporal data, and explic-
itly address the fitting of state-space models via standard software and REML. The fixed part of
their models is generally simple and the data are measured on two spatial dimensions.
When we move to a third spatial dimension (depth) the soil profile study of Macdonald et al.
[2009] does not use spatial information in the analysis. Other studies composite the soils from
different depths across soil types or treatment [Sleutel et al., 2009], while others [Nayyar et al.,
2009] use the mixed modelling framework advocated by Piepho et al. [2004]. Within a spatial
context only, Haskard et al. [2007] fit an anisotropic geostatistical model.
A major difference between the agricultural data of our study and epidemiological data
which is so often modelled using the convolution CAR prior of Besag et al. [1991] for the
spatially structured error, a structured temporal random term and an unstructured error with a
variance common over both space and time, is that the spatial units of epidemiological data tend
to vary slowly over the time scale of a few years. In contrast, the moisture data modelled here,
vary markedly from sampling day to sampling day, and it is clear that the simple separable vari-
ance decomposition used by so many epidemiological models, does not describe the data well.
For the agriculture data we consider, it is not reasonable to assume constant spatial residual com-
ponents over time as in Adebayo and Fahrmeir [2005], Knorr-Held and Besag [1998], neither
2.3. Spatio-temporal modelling 73
for the full time period of 5 years, nor for monthly time intervals. Nor does the elegant stability
model of Abellan et al. [2008] have anything to offer here. We chose to assume spatial residuals
differed for each time period, since it was difficult to postulate a reasonable relationship for the
evolution of the spatial residuals.
In moving to four dimensions, there are yet more possibilities for the decomposition of
the fixed and error parts of the model. However, in the context of differing treatments for
differing plots in the horizontal dimensions, with the same treatment along the depth profile
at each plot, and in the context of different scales between the depth measurements and the
distances between plots, it was a simple decision to exclude depth from the neighbourhood
error structures. If depth neighbours were to be included as neighbours with equal weights,
the horizontal layer information is downweighted. If we weight using functions of distance,
the horizontal correlations become effectively irrelevant. This choice to deal with the third
dimension differently is made by others analysing three dimensional data. Ridgway et al. [2002]
modelling ocean measurements separate out the depth component in their loess data fits.
2.3.4 Four dimensional spatio-temporal data analyses
Large four-dimensional datasets are found in the oceanographic literature and some different
approaches are given in Holbrook and Bindoff [2000], Lemos and Sanso [2009], Ridgway et al.
[2002]. The papers of both Holbrook and Bindoff [2000], Ridgway et al. [2002] consider in-
terpolation methods to create realistic grid points of data for further analysis. Ridgway et al.
[2002], who use Loess quadratic fits, treat depth quite differently from the coordinates of lati-
tude and longitude(X, Y). The quadratic surface they fit to the latitude and longitude includes an
XY term in addition to X2 and Y2 terms, but there are no crossed terms for depth. Additionally,
their weighting term which determines neighbours to be included treats depth differently from
the latitude and longitude coordinates.
What is clear from this Literature Review, is that the decision to deal with depth separately
is a decision made by many before us and made for the same reasons, namely that the mea-
74 CHAPTER 2. LITERATURE REVIEW
surements in the third dimension are made on a very different scale from those of the other
two dimensions. In considering the agricultural data of our study, the decision to fit a complete
spatio-temporal interaction model (the daily model fitted for each day) is appropriate when com-
mon spatial and temporal residuals are unlikely.
2.4. Addendum: The dynamic risk assessment model 75
2.4 Addendum: The dynamic risk assessment model
The example (below, Figure 2.3) of a dynamic risk assessment model is taken directly from
Eisenberg et al. [2002].
“This conceptual modeling methodology is dynamic and population based; that is,
the risk of infection manifests at the population level. Specifically, in the transmis-
sion of infectious diseases (but not of diseases due to chemical exposure), the risk
of disease due to pathogen exposure depends on the disease status of the population
and potentially on the contact patterns within the population. Figure 2.3 is a dia-
gram of a transmission model for enteric pathogens. Each box represents one state
of the system. Five of the six states represent the epidemiologic states of the popu-
lation: S , susceptible; E, latent (infected but noninfectious and asymptomatic); IS .
diseased (infectious and symptomatic); IA, carrier (infectious but asymptomatic); P,
immune (either partial or complete). The sixth state, W, represents concentration of
pathogens in the environment. Members of a given state may move to another state
based on the causal relationships of the disease process. For example, members of
the population who are in the susceptible state may move to the diseased state after
exposure to a pathogenic agent.
To describe the epidemiology of enteric pathogen transmission, the conceptual
model includes both state variables and rate parameters. State variables (S , E,
IS , IA, and P) track the number of individuals in each of the states at any given
point in time and are defined such that S + E + IS + IA + P = N (i.e., the sum of
the state variables equals the total population). The rate parameters determine the
movement of the population from one state to another. In general, the rate parame-
ters are β, the rate of transmission from a noninfected state, S , to an infected state,
E, due to both environmental (e,g., drinking water) and person-person exposure to
a pathogen; α, the rate of movement from exposure to illness; δ and σ, the rates of
76 CHAPTER 2. LITERATURE REVIEW
recovery from an infectious state, IS or IA, respectively, to the postinfection state,
P; γ, the rate of movement from the postinfection state (partial immunity), P, to
the susceptible state, S ; ϕ, the rate of shedding of pathogens into the environment
by infectious individuals; and ξ, the per capita mortality rate of the pathogen in the
environment. An additional parameter in the model, ρ, represents the proportion
of asymptomatic infections. For more mathematical detail pertaining to the model,
see (the) publications, Brookhart et al. [2002], Eisenberg et al. [1996].”
dS (t)dt = γP(t) − (β + η[IA(t) + IS (t)])S (t)
dE1(t)dt = (β + η[IA(t) + IS (t)])S (t) − αE1(t)
dE2(t)dt = αE1(t) − αE2(t)
...
dEk(t)dt = αEk−1(t) − αEk(t)
dIA(t)dt = ραEk(t) − δIA(t)
dIS (t)dt = (1 − ρ)αEk(t) − δIS (t)
dP(t)dt = δ[IA(t) + IS (t)]
(2.1)
The diagram of Eisenberg et al. [2002] does not match their text (quoted above). Figure 2.3
implies that δ = σ, and that the rate of recovery from the infectious state is the same whether
a person is asymptomatic or not, and this common rate, δ is seen in the equations 2.1. Again,
differing from the text, the infectious population, both asymptomatic and asymptomatic, have
been assumed to be infecting the uninfected at a common rate, η, both in the diagram and in
the equation. A further difference is the sequence of ‘boxes’ in the latently infected group. This
corresponds to modelling a ‘distributed delay’ over the latent infection period. See Eisenberg
et al. [1998].
2.4. Addendum: The dynamic risk assessment model 77
Figure 2.3 The dynamic model of Eisenberg et al. [2002]. Schematic diagram of transmissionmodel. t, independent variable representing time. Solid lines represent movement ofindividuals from one state to another. Dashed lines represent movement of pathogenseither directly from infectious host to susceptible host or indirectly via the environ-ment. State variables and parameters are defined in the text..
\
~~GJ -
S(t)
Susceptible ;;y Ek -
Latently infected
(3 T)
:
)I
IA(t) j W~o~ \ asymptomatic 1 : n ,1 P<t>
\' IS(t) '
' ' ' ' ' ' ' ' ' '
W~o~
symptomatic
/1 Protected
W(t) Pathogen concentration
-------~--------------J
in water _j
78 CHAPTER 2. LITERATURE REVIEW
Bibliography
Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investi-
gate the stability of patterns of disease.(Mini-Monograph). Environmental Health Perspec-
tives 116(8), 1111–1119.
Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive
discrete-time survival models. Statistics in Medicine 24(5), 709–728.
Albert, I., E. Grenier, J.-B. Denis, and J. Rousseau (2008). Quantitative Risk Assessment from
Farm to Fork and Beyond: A Global Bayesian Approach Concerning Food-Borne Diseases.
Risk Analysis 28(2), 557–571.
Andersen, S., K. Olesen, F. Jensen, and F. Jensen (1989). Hugin - a shell for building Bayesian
belief universes for expert systems. In Eleventh International Joint Conference on Artificial
Intelligence, Detroit, Michigan, pp. 1080–1085.
Anderson, R. and R. May (1991). Infectious diseases of humans: dynamics and control. New
York: Oxford University Press.
Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson
(2005). Microbial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers
University of Technology.
Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environ-
metrics 14(5), 453–473.
Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter
model applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.
Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishma-
BIBLIOGRAPHY 79
niasis in a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in
Medicine 20(15), 2319–2335.
Ayars, J. E., P. Shouse, and S. M. Lesch (2009). In situ use of groundwater by alfalfa. Agricul-
tural Water Management 96(11), 1579–1586.
Baird, D. and R. Mead (1991). The empirical efficiency and validity of two neighbour models.
Biometrics 47(4), 1473–1487.
Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for
spatial data. Monographs on statistics and applied probability. Boca Raton, London, New
York, Washington D.C.: Chapman & Hall.
Barker, G. C., N. L. C. Talbot, and M. W. Peck (2002). Risk assessment for Clostridium bo-
tulinum: a network approach. International Biodeterioration& Biodegradation 50(3-4), 167–
175.
Bartlett, M. (1978). Nearest neighbour models in the analysis of field experiments. Journal of
the Royal Statistical Society. Series B (Methodological) 40(2), 147–174.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). BayesX Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25,
2010.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). BayesX Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Software Methodology Manual.
Online at http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: Oc-
tober 25, 2010.
Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in
80 CHAPTER 2. LITERATURE REVIEW
PM2. 5 chemical composition in the United States for health effects studies. Environmental
Health Perspectives 115(7), 989–995.
Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995).
Bayesian analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22),
2433–2443.
Besag, J. and R. Kempton (1986). Statistical analysis of field experiments using neighbouring
plots. Biometrics 42(2), 231–251.
Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with
discussion). J. R. Statist. Soc. B 36(2), 192–236.
Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochas-
tic systems. Statistical Science 10(1), 3–41.
Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal
of the Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.
Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.
Biometrika 92(4), 909–920.
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in
spatial statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–
59.
Blaser, M. J. and L. S. Newman (1982). A review of human salmonellosis: I. Infective dose.
Reviews of infectious diseases 4(6), 1096–1106.
Boerlage, B. (1992). Link Strength in Bayesian Networks. Ph. D. thesis, University of British
Columbia, Canada.
BIBLIOGRAPHY 81
Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, in-
cluding longitudinal experiments. Journal of Agricultural, Biological, and Environmental
Statistics 14(3), 253–280.
Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg
(2002). Statistical estimation of parameters in a disease transmission model: analysis of a
Cryptosporidium outbreak. Statistics in Medicine 21, 3627–3638.
Burgman, M. (2005). Risks and Decisions for Conservation and Environmental Management.
New York: Cambridge University Press.
Butler, D. G., B. R. Cullis, A. R. Gilmour, and B. J. Gogel (2007). Analysis of Mixed Models for
S Language Environments, ASReml-R Reference Manual Release 2, Volume No. QE02001 of
Training and Development Series. Brisbane, Australia: Queensland Department of Primary
Industries and Fisheries.
Casman, E. A., B. Fischhoff, C. Palmgren, M. J. Small, and F. Wu (2000). An integrated risk
model of a drinking water borne cryptosporidiosis outbreak. Risk Analysis 20(4), 495–511.
Castillo, E., J. M. Gutierrez, and E. Castillo (1997). Sensitivity analysis in discrete Bayesian
networks. IEEE Transactions on Systems, Man & Cybernetics: Part A 27, 412–423.
Castillo, E., J. M. Gutierrez, A. S. Hadi, and C. Solares (1997). Symbolic propagation and
sensitivity analysis in Gaussian Bayesian networks with application to damage assessment.
Artificial Intelligence in Engineering 11, 173–181.
Clements, A., S. Brooker, U. Nyandindi, A. Fenwick, and L. Blair (2008). Bayesian spatial
analysis of a national urinary Schistosomiasis questionnaire to assist geographic targeting of
Schistosomiasis control in Tanzania, East Africa. International Journal for Parasitology 38,
401–415.
82 CHAPTER 2. LITERATURE REVIEW
Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F.
Gabrielli, and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated
uncertainty, West Africa. Emerging Infectious Diseases 14(10), 1629–1632.
Cowell, R. G. and A. P. Dawid (1992). Fast retraction of evidence in a probabilistic expert
system. Statistics and Computing 2(1), 37–40.
Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic
Networks and Expert Systems. Springer.
Cressie, N. A. C. (1991). Statistics for spatial data. Wiley series in probability and mathematical
statistics. Applied probability and statistics. New York: John Wiley.
Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time
to event data: a case study using months from angiography to coronary artery bypass graft
(CABG). Statistics in Medicine 22(18), 2943–2961.
Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to
two dimensions. Biometrics 47, 1449–1460.
Darroch, J. N., S. L. Lauritzen, and T. P. Speed (1980). Markov fields and log-linear interaction
models for contingency tables. The Annals of Statistics 8(3), 522–539.
Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert
systems. Statistics and Computing 2(1), 25–36.
Dawid, A. P., U. Kjaerulff, and S. L. Lauritzen (1995). Hybrid propagation in junction trees.
In Advances in Intelligent Computing - Ipmu ’94, Volume 945 of Lecture Notes in Computer
Science, pp. 87–97. Springer Verlag KG.
Dillon, P., D. Page, J. Vanderzalm, P. Pavelic, S. Toze, E. Bekele, J. Sidhu, H. Prommer, S. Hig-
ginson, R. Regel, S. Rinck-Pfeiffer, M. Purdie, C. Pitman, and T. Wintgens (2008). A critical
BIBLIOGRAPHY 83
evaluation of combined engineered and aquifer treatment systems in water recycling. Water
Science & Technology - WST 57(5), 753–762.
Durban, M., C. A. Hackett, J. W. McNicol, A. C. Newton, W. T. B. Thomas, and I. D. Currie
(2003). The practical use of semiparametric models in field trials. Journal of Agricultural
Biological and Environmental Statistics 8(1), 48–66.
Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn,
D. Muscatello, and K. Mengersen (2010). Small area estimation of sparse disease counts
using shared component models-application to birth defect registry data in New South Wales,
Australia. Health & Place 16, 684–693.
Edwards, D. (1995). Introduction to Graphical Modelling. New York: Springer-Verlag.
Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an
epidemiological framework. Risk Analysis 16, 549–563.
Eisenberg, J. N. S., M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr (2002). Disease
transmission models for public health decision making: Analysis of epidemic and endemic
conditions caused by waterborne pathogens. Environmental Health Perspectives 110(8), 783–
790.
Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An anal-
ysis of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection
process. Epidemiology 9(3), 255–263.
Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publica-
tions. Oxford: Oxford University Press.
Fewtrell, L. and J. Bartram (2001). Water Quality: Guidelines, Standards and Health. London:
World Health Organisation.
84 CHAPTER 2. LITERATURE REVIEW
Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extrane-
ous variation in the analysis of field experiments. Journal of Agricultural Biological and
Environmental Statistics 2, 269–293.
Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide
Release 2.0. Technical report, VSN International Ltd, Hemel Hempstead, UK.
Gilmour, A. R., R. Thompson, and B. R. Cullis (1995). Average information REML: an efficient
algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–
1450.
Gordon, C. and S. Toze (2003). Influence of groundwater characteristics on the survival of
enteric viruses. Journal of Applied Microbiology 95(3), 536–544.
Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water
infiltration. Water resources research 26(11), 2695–2703.
Gotway, C. A. and L. J. Young (2002). Combining incompatible spatial data. Journal of the
American Statistical Association 97(458), 632–648.
Green, P. J. and R. Sibson (1978). Computing Dirichlet tessellations in the plane. Computer
Journal 21, 168–173.
Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.),
Water Quality: Guidelines, Standards and Health. WHO.
Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New
York: Wiley.
Hall, G. (2004). Results from the National Gastroenteritis Survey 2001 2002. Technical Report
NCEPH Working Paper Number 50, National Centre for Epidemiology & Population Health.
Hall, G. and M. Kirk (2005). Foodborne illness in Australia annual incidence circa 2000. Tech-
nical report, Australian Government Department of Health and Ageing.
BIBLIOGRAPHY 85
Hall, G., J. Raupach, and K. Yohannes (2006). An estimate of under-reporting of foodborne no-
tifiable diseases: Salmonella Campylobacter Shiga toxin producing E. coli (STEC). Technical
report, National Centre for Epidemiology & Population Health.
Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen
(2007). Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula
bloom initiation in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–
1287.
Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spa-
tial prediction using REML. Journal of Agricultural, Biological, and Environmental Statis-
tics 12(2), 147–160.
Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North
Atlantic Ocean. Environmental and Ecological Statistics 5, 173–190.
Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and
G. Medema (2007). Removal and fate of Cryptosporidium parvum, Clostridium perfrin-
gens and small-sized centric diatoms (Stephanodiscus hantzschii) in slow sand filters. Water
Research 41, 2151–2162.
Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms
by drinking water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Nether-
lands.
Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-
organisms by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The
Netherlands.
Hijnen, W. A. M., J. F. Schijven, P. Bonne, A. Visser, and G. J. Medema (2004). Elimination
of viruses, bacteria and protozoan oocysts by slow sand filtration. Water Science & Technol-
ogy 50(1), 147–154.
86 CHAPTER 2. LITERATURE REVIEW
Holbrook, N. and N. Bindoff (2000). A statistically efficient mapping technique for four-
dimensional ocean temperature data. Journal of Atmospheric and Oceanic Technology 17(6),
831–846.
Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application
to nuclear fall-out. Environmental and Ecological Statistics 10, 179–200.
Hrudey, S. E., P. M. Huck, P. Payment, R. W. Gillham, and E. J. Hrudey (2002). Walkerton:
Lessons learned in comparison with waterborne outbreaks in the developed world. Journal
of Environmental Engineering and Science 1(6), 397–407.
Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6,
2008.
Hugin Expert A/S (2007). Hugin Expert - Publications. Available on:
www.hugin.com/developer/Publications/. Accessed: November 6, 2008.
Hunter, P. R. and L. Fewtrell (2001). Acceptable risk. In L. Fewtrell and J. Bartram (Eds.),
Water Quality: Guidelines, Standards and Health. WHO.
Isaac, D. (2008a). Email: June 27,2008: Re: Fw: Recycled water: measurements required
under licence by the Health Department.
Isaac, D. (2008b). Fit for purpose guidelines for recycled water. email, received June 26, 2008.
Jacobsen, K. and J. Koopman (2004). Declining hepatitis A seroprevalence: a global review and
analysis. Epidemiology and Infection 132, 1005–1022.
Jensen, F. (1994). Implementation aspects of various propagation algorithms in Hugin. Techni-
cal Report Research Report R-94-2014, Department of Mathematics and Computer Science,
Aalborg University, Denmark, Aalborg, Denmark.
Jensen, F. (2001). Bayesian Networks and Decision Graphs. Springer.
BIBLIOGRAPHY 87
Jensen, F. V., S. H. Aldenryd, and K. B. Jensen (1995). Sensitivity analysis in Bayesian net-
works. Lecture Notes in Artificial Intelligence 946, 243.
Jensen, F. V., B. Chamberlain, T. Nordahl, and F. Jensen (1991). Analysis in Hugin of data con-
flict. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence,
UAI ’90, New York, NY, USA, pp. 519–528. Elsevier Science Inc.
Karanis, P., C. Kourenti, and H. Smith (2007). Waterborne transmission of protozoan parasites:
A worldwide review of outbreaks and lessons learnt. Journal of Water and Health 5(1), 1–38.
Karim, M. R., E. P. Glenn, and C. P. Gerba (2008). The effect of wetland vegetation on the
survival of Escherichia coli, Salmonella typhimurium, bacteriophage MS-2 and polio virus.
Journal of Water and Health 06(2), 167–175.
Kennett, R. J., K. B. Korb, and A. E. Nicholson (2001). Seebreeze prediction using Bayesian
networks: a case study. Lecture Notes in Computer Science 2035, 148–153.
Khan, S. J. (2010). Quantitative chemical exposure assessment for water recycling schemes.
Waterlines Report Series, No 27. Australian Government National Water Commission.
Kinde, H., M. Adelson, A. Ardans, E. H. Little, D. Willoughby, D. Berchtold, D. H. Read,
R. Breitmeyer, D. Kerr, R. Tarbell, and E. Hughes (1997). Prevalence of Salmonella in
municipal sewage treatment plant effluents in Southern California. Avian Diseases 41(2),
392–398.
Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics
in Medicine 17, 2045–2060.
Korb, K. B. and A. E. Nicholson (2004). Bayesian Artificial Intelligence. London: CRC Press.
Laskey, K. B. (1995). Sensitivity analysis for probability assessments in Bayesian networks.
IEEE Transactions on Systems, Man and Cybernetics 25, 901–909.
88 CHAPTER 2. LITERATURE REVIEW
Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data.
Computational Statistics & Data Analysis 19, 191–201.
Lauritzen, S. L. and D. J. Spiegelhalter (1988). Local computations with probabilities on graph-
ical structures and their application to expert systems. Journal of the Royal Statistical Society.
Series B (Methodological) 50(2), 157–224.
Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend
fields of North Atlantic sea surface temperature. Journal of the American Statistical Associ-
ation 104(485), 5–18.
Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in
a central California estuary. Journal of Agricultural, Biological, and Environmental Statis-
tics 12(3), 379–396.
Lindgren, F., H. Rue, and J. Lindstrom (2011). An explicit link between Gaussian fields and
Gaussian Markov random fields: the stochastic partial differential equation approach. Journal
of the Royal Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.
Lumina Decision Systems (2004). Analytica. Available on:
www.lumina.com/ana/editiondescriptions.htm. Accessed: April 24, 2008.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian mod-
elling framework: Concepts, structure, and extensibility. Statistics and Computing 10(4),
325–337.
Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and
I. White (2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid
sulfate landscape. Applied Clay Science 46(2), 166–175.
Marcot, B. G. (2006). Characterizing species at risk I: Modeling rare species under the North-
west Forest Plan. Ecology and Society 11(2), 10.
BIBLIOGRAPHY 89
Marcot, B. G., P. A. Hohenlohe, S. Morey, R. Holmes, R. Molina, M. C. Turley, M. H. Huff,
and J. A. Laurence (2006). Characterizing species at risk II: Using Bayesian belief networks
as decision support tools to determine species conservation categories under the Northwest
Forest Plan. Ecology and Society 11(2), 12.
Martin, J. E., T. Rivas, J. M. Matas, J. Taboada, and A. Argelles (2009). A Bayesian network
analysis of workplace accidents caused by falls from a height. Safety Science 47(2), 206–214.
Martin, R. J., N. Chauhan, J. A. Eccleston, and B. S. P. Chan (2006). Efficient experimental
designs when most treatments are unreplicated. Linear Algebra and its Applications 417(1),
163–182.
Matias, J. M., T. Rivas, C. Ordonez, J. Taboada, and J. M. Matias (2007). Assessing the envi-
ronmental impact of slate quarrying using Bayesian networks and GIS. In AIP Conference,
Volume 963, pp. 1285–1288.
McCullough, N. B. and C. W. Eisele (1951a). Experimental human salmonellosis: I. pathogenic-
ity of strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried
whole egg. The Journal of Infectious Diseases 88(3), 278–289.
McCullough, N. B. and C. W. Eisele (1951b). Experimental human salmonellosis: II. Immunity
studies following experimental illness with Salmonella meleagridis and Salmonella anatum.
The Journal of Immunology 66(5), 595–608.
McCullough, N. B. and C. W. Eisele (1951c). Experimental human salmonellosis: III.
Pathogenicity of strains of Salmonella newport, Salmonella derby, and Salmonella bareilly
obtained from spray-dried whole egg. The Journal of Infectious Diseases 89(3), 209–213.
McCullough, N. B. and C. W. Eisele (1951d). Experimental human salmonellosis: IV.
Pathogenicity of strains of Salmonella pullorum obtained from spray-dried whole egg. The
Journal of Infectious Diseases 89(3), 259–265.
90 CHAPTER 2. LITERATURE REVIEW
Mons, M., J. Van der Wielen, E. Blokker, M. Sinclair, K. Hulshof, F. Dangendorf, P. Hunter,
and G. Medema (2007). Estimation of the consumption of cold tap water for microbiological
risk assessment: an overview of studies and statistical analysis of data. Journal of Water and
Health 5(1), 151–170.
Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification
& risk assessment for drinking water supplies. Technical report, CRC for Water Quality and
Treatment.
Natural Resource Management Ministerial Council, Environment Protection and Heritage
Council, and Australian Health Ministers Conference (2006). Australian Guidelines for
Water Recycling: Managing health and environmental risks (Phase1) 2006. Available on:
www.ephc.gov.au/taxonomy/term/39. Accessed: March 29, 2008.
Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil micro-
bial quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1),
115–121.
Neapolitan, R. E. and X. Jiang (2007). Probabilistic Methods for Financial and Marketing
Informatics. Elsevier.
Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian net-
works for water quality prediction in Sydney Harbour. Available
online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Accessed: March 27,2008.
Olivieri, A. W., R. Danielson, J. N. Eisenberg, L. Johnson, V. Pon, R. Sakaji, R. Soller, J. A.
Soller, J. Stephenson, and C. Trese (2007). Evaluation of microbial risk assessment tech-
niques and applications in water reclamation. Technical report, Water Environment Research
Foundation (WERF), Alexandria, VA. Available online: www.werf.org/AM/.
Palacios, M. P., P. Lupiola, M. T. Tejedor, E. Del-Nero, A. Pardo, and L. Pita (2001). Climatic
BIBLIOGRAPHY 91
effects on Salmonella survival in plant and soil irrigated with artificially inoculated wastewa-
ter: preliminary results. Water Science Technology 43(12), 103–108.
Papadakis, J. S. (1937). Mthode statistique pour des expriences sur champ. Bulletin scientifique
damlioration des plantes de Thessalonique 23, 30.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference.
San Mateo, California: Morgan Kaufmann Publishers.
Petterson, S. and N. Ashbolt (2001). Viral risks associated with wastewater reuse: modeling
virus persistence on wastewater irrigated salad crops. Water Science and Technology 43(12),
23–26.
Petterson, S., N. Ashbolt, and A. Sharma (2001). Microbial risks from wastewater irrigation of
salad crops: A screening-level risk assessment. Water Environment Research 73(6), 667–672.
Petterson, S. A. and N. J. Ashbolt (2006). WHO Guidelines for the safe use of wastewater
and excreta in agriculture microbial risk assessment section. Technical report, World Health
Organization.
Petterson, S. R. (2002). Microbial Risk Assessment of Wastewater Irrigated Salad Crops. Ph.
D. thesis, University of New South Wales.
Piepho, H. P., A. Buchse, and K. Emrich (2003). A hitchhiker’s guide to mixed models for
randomized experiments. Journal of Agronomy and Crop Science 189(5), 310–322.
Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized
experiments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–
247.
Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework.
American Statistician 61(3), 224–232.
92 CHAPTER 2. LITERATURE REVIEW
Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear
variance models in plant breeding trials. Biometrical Journal 50(2), 164–189.
Piepho, H. P. and E. R. Williams (2010). Linear variance models for plant breeding trials. Plant
Breeding 129(1), 1–8.
Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal
of the American Water Resources Association 40(6), 1563–1578.
Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of
ecotoxicological information in risk assessments. Australian Journal of Ecotoxicology 11,
57–58.
Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elici-
tation for parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman,
and A. Rizzoli (Eds.), Third Biennial Meeting: Summit on Environmental Modelling and
Software, Burlington, USA.
Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Param-
eterisation and evaluation of a Bayesian network for use in an ecological risk assessment.
Environmental Modelling and Software 22, 1140–1152.
Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal
analysis of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest
Entomology 12(3), 325–332.
Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety
Science 27(2-3), 183–213.
Raso, G., P. Vounatsou, L. Gosoniu, M. Tanner, E. K. N’Goran, and J. Utzinger (2006). Risk
factors and spatial patterns of hookworm infection among schoolchildren in a rural area of
western Cte d’Ivoire. International Journal for Parisitology 36(2), 201–210.
BIBLIOGRAPHY 93
Rassmussen, L. (1995). Bayesian network for blood typing and parentage verification of cattle.
Technical report, Department of Mathematics and Computer Science, Aalborg University,
Denmark. Hugin reference Hugin 6.9.
Reich, B., J. Hodges, and B. Carlin (2007). Spatial analyses of periodontal data using condition-
ally autoregressive priors having two classes of neighbor relations. Journal of the American
Statistical Association 102(477), 44–55.
Rentdorff, R. (1954). The experimental transmission of human intestinal protozoan parasites:
II. Giardia lamblia cysts given in capsules. American Journal of Hygiene 59, 209–220.
Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted
least squares-application to the waters around Australasia. Journal of Atmospheric and
Oceanic Technology 19(9), 1357–1375.
Rizak, S. and S. Hrudey (2007). Strategic water quality monitoring for drinking water safety.
Technical Report 37, CRC for Water Quality and Treatment.
Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Socio-
logical Review 15(3), 351–357.
Robinson, W. (2009). Ecological correlations and the behavior of individuals. International
Journal of Epidemiology 38(2), 337–341.
Roser, D., S. Khan, C. Davies, R. Signor, S. Petterson, and N. Ashbolt (2006). Screening
health risk assessment for the use of microfiltration-reverse osmosis treated tertiary effluent
for replacement of environmental flows. Technical Report CWWT Report 2006-20, Centre
for Water and Waste Technology, School of Civil and Environmental Engineering, University
of NSW.
Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to
estimate baseline and hazardous event risks with management end uses in mind. Technical
94 CHAPTER 2. LITERATURE REVIEW
report, MicroRisk project co-funded by the European Commission under the Fifth Framework
Programme, Theme 4: Energy, environment and sustainable development (contract EVK1-
CT-2002-00123).
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca
Raton: Chapman & Hall/CRC.
Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields.
Scandinavian Journal of Statistics 29(1), 31–49.
Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean tempera-
ture and salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.
SAS Institute (2004). SAS Version 9.1.3. Cary, NC., USA: SAS Institute Inc.
Schabenberger, O. and C. A. Gotway (2005). Statistical methods for spatial data analysis. Texts
in statistical science. Boca Raton: Chapman & Hall/CRC.
Shillito, R. M., D. J. Timlin, D. Fleisher, V. R. Reddy, and B. Quebedeaux (2009). Yield
response of potato to spatially patterned nitrogen application. Agriculture Ecosystems &
Environment 129(1-3), 107–116.
Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass
surfaces irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.
Signor, R. and N. Ashbolt (2006). Pathogen monitoring offers questionable protection against
drinking-water risks: a QMRA (Quantitive Microbial Risk Analysis) approach to assess man-
agement strategies. Erratum in Water Science and Technology 54 (11-12):451. Water Science
and Technology 54, 261–268.
Signor, R. S. (2007). Microbial risk implications of rainfall-induced runoff events entering a
reservoir used as a drinking-water source. Journal of Water Supply Research and Technology
- AQUA 56, 515–531.
BIBLIOGRAPHY 95
Sinclair, M. (2005). Strategic review of waterborne viruses. Technical report, CRC for Water
Quality and Treatment.
Singh, M., R. S. Malhotra, S. Ceccarelli, A. Sarker, S. Grando, and W. Erskine (2003). Spatial
variability models to improve dryland field trials. Experimental Agriculture 39(02), 151–160.
Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and
S. De Neve (2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and
coniferous forests under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.
Smeets, P. W. M. H., Y. J. Dullemont, P. H. A. J. M. V. Gelder, J. C. V. Dijk, and G. J. Medema
(2008). Improved methods for modelling drinking water treatment in quantitative microbial
risk assessment; a case study of Campylobacter reduction by filtration and ozonation. Journal
of Water and Health 6(3), 301–314.
Smeets, P. W. M. H., G. J. Medema, Y. J. Dullemont, P. H. A. J. M. V. Gelder, and J. C. V. Dijk.
(2008). Case study of Campylobacter reduction by filtration and ozonation. Journal of Water
and Health 6, 301–314.
Smeets, P. W. M. H., G. J. Medema, G. Stanfield, J. C. v. Dijk, and L. C. Rietveld (2007). How
can the UK statutory Cryptosporidium monitoring be used for quantitative risk assessment of
Cryptosporidium in drinking water? Journal of Water and Health 5(1 (Suppl)), 107–118.
Song, H.-R., A. Lawson, R. B. D’Agostino Jr, and A. D. Liese (2011). Modeling type 1 and type
2 diabetes mellitus incidence in youth: An application of Bayesian hierarchical regression for
sparse small area data. Spatial and Spatio-temporal Epidemiology 2(1), 23–33.
Spiegelhalter, D. J., A. P. Dawid, S. L. Lauritzen, and R. G. Cowell (1993). Bayesian analysis
in expert systems. Statistical Science 8(3), 219–247.
Steck, H. (2001). Constrained-Based Structural Learning in Bayesian Networks Using Finite
Data Sets. Ph. D. thesis, Institut fur der Informatik der Technischen Universitat.
96 CHAPTER 2. LITERATURE REVIEW
Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial
analysis of field trials. Journal of Agricultural Biological and Environmental Statistics 14(4),
392–410.
Strahm, B. D., R. B. Harrison, T. A. Terry, T. B. Harrington, A. B. Adams, and P. W. Footen
(2009). Changes in dissolved organic matter with depth suggest the potential for posthar-
vest organic matter retention to increase subsurface soil carbon pools. Forest Ecology and
Management 258(10), 2347–2352.
Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety
of wastewater reclamation and reuse using enteric virus monitoring data. Water Environment
Research 70(1), 39–51.
Tawk, H. M., K. Vickery, L. Bisset, W. Selby, and Y. E. Cossart (2006). The impact of hepatitis
B vaccination in a western country: recall of vaccination and serological status in Australian
adults. Vaccine 24(8), 1095–1106.
Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal
distribution of airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for
gypsy moth eradication. Environmental Health Perspectives 109(1), 47–54.
Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The
dose-response relation in human volunteers for gastro-intestinal pathogens. Technical report,
National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.
Toze, S. (1999). PCR and the detection of microbial pathogens in water and wastewater. Water
Research 33(17), 3545–3556.
Toze, S. (2002). Review of the risk of groundwater contamination from microbial pathogen due
to the infiltration of treated effluent to groundwater at the Bridgetown wastewater treatment
plant. A consultancy report to the Water Corporation, WA. Technical report, CSIRO.
BIBLIOGRAPHY 97
Toze, S. (2004). Literature Review on the Fate of Viruses and Other Pathogens and Health Risks
in Non-Potable Reuse of Storm Water and Reclaimed Water. CSIRO. Accessed: February 1,
2011.
Toze, S., J. Hanna, and J. Sidhu (2005). Microbial monitoring of the McGillivray Oval direct
reuse scheme Report to the Water Corporation WA. Technical report, CSIRO.
Toze, S., J. Hanna, A. Smith, and W. Hick (2002). Halls Head indirect treated wastewater reuse
scheme. Technical report, CSIRO.
Toze, S., J. Hanna, T. Smith, L. Edmonds, and A. McCrow (2004). Determination of water
quality improvements due to the artificial recharge of treated effluent. In J. Steenworden and
T. Endreny (Eds.), IAHS Publications-Series of Proceedings and Reports: Wastewater reuse
and groundwater quality, Volume 285, pp. 53–60. Wallingford [Oxfordshire]: IAHS, 1981-.
Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New
Zealand: characterising spatial and temporal changes in fruit composition and juice quality
in the vineyard. Australian Journal of Grape and Wine Research 17(1), 79–89.
Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01),
Seattle. Citeseer.
Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a
Belief Net response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172,
483–513.
Varis, O. (1995). Belief networks for modelling and assessment of environmental change. En-
vironmetrics 6, 439–444.
Varis, O. (1997). Bayesian decision analysis for environmental and resource management. En-
vironmental Modelling and Software 12(2-3), 177–185.
98 CHAPTER 2. LITERATURE REVIEW
Varis, O. (1998). A belief network approach to optimization and parameter estimation: applica-
tion to resource and environmental management. Artificial Intelligence 101(1-2), 135–163.
Verbyla, A., B. Cullis, M. Kenward, and S. Welham (1999). The analysis of designed exper-
iments and longitudinal data by using smoothing splines. Journal of the Royal Statistical
Society: Series C (Applied Statistics) 48(3), 269–311.
VSN International (2011). Genstat. Available online:
http://www.vsni.co.uk/software/genstat/.
Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal
mapping of disease rates. Journal of the American Statistical Association 92(438), 607–617.
Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal
experiments with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.
Ward, R., D. Bernstein, E. Young, J. Sherwood, D. Knowlton, and G. Schiff (1986). Human
Rotavirus studies in volunteers: determination of infectious dose and serological response to
infection. Journal of Infectious Diseases 154(5), 871–880.
Water Corporation (2010). Subiaco Wastewater Treatment Plant Annual Report 2009-10. Tech-
nical Report PM-3851463, Water Corporation, Perth, Western Australia.
Water Corporation (2011a). McGillivray Oval Irrigation Project. Available on-
line: http://www.watercorporation.com.au/M/mcgillivray_oval.cfm. Accessed:
February 2, 2011.
Water Corporation (2011b). Subiaco treatment plant. Available online:
http://www.watercorporation.com.au/W/wwtp_subiaco.cfm. Accessed: February
2, 2011.
Water Environment Research Foundation, A. Olivieri, and C. Summers (2007). Assessing risk
BIBLIOGRAPHY 99
of pathogens in separate stormwater systems. Available online: http://www.werf.org/am/.
Accessed: February 17, 2011.
Weidl, G., A. Madsen, and E. Dahlquist (2003). Object oriented Bayesian network for industrial
process operation.
Weidl, G., A. L. Madsen, and S. S. Israelson (2005). Applications of object-oriented Bayesian
networks for condition monitoring, root cause analysis and decision support on operation of
complex continuous processes. Computers and Chemical Engineering 29, 1996–2009.
Wermuth, N. and D. R. Cox (1998). On association models defined over independence graphs.
Bernouilli 4(4), 477–495.
Westrell, T., O. Bergstedt, T. Stenstrom, and N. Ashbolt (2003). A theoretical approach to
assess microbial risks due to failures in drinking water systems. International Journal of
Environmental Health Research 13, 181–197.
Whelan, B. M., A. B. McBratney, and B. Minasny (2001). Vesper-spatial prediction software
for precision agriculture. In Third European Conference on Precision Agriculture. (G. Gre-
nier, S. Blackmore Eds.) pp. 139-144. Agro Montpellier, Ecole Nationale Agronomique de
Montpellier., pp. 18–20. Citeseer.
Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New
York: Wiley.
Williams, E. R. (1986). A neighbour model for field experiments. Biometrika 73(2), 279–287.
Wong, V. N. L., B. W. Murphy, T. B. Koen, R. S. B. Greene, and R. C. Dalal (2008). Soil organic
carbon stocks in saline and sodic landscapes. Australian Journal of Soil Research 46(4), 378–
389.
Woo, D. M. and K. J. Vicente (2003). Sociotechnical systems, risk management, and public
100 CHAPTER 2. LITERATURE REVIEW
health: comparing the North Battleford and Walkerton outbreaks. Reliability Engineering &
System Safety 80(3), 253–269.
Yan, P. and M. K. Clayton (2006). A cluster model for space-time disease counts. Statistics in
Medicine 25(5), 867–881.
Statement of Contribution of Co-Authors for Thesis by Publication
The authors listed below certify that:
1. they meet the criteria for authorship, in that they have participated in the conception , execution, or interpretation , of at least that part of the publication in their field of expertise;
2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;
3. there are no other authors according to these criteria;
4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and
5. they agree to the use of the publication in the student's thesis and its publication on the Australasian Digital Thesis database consistent with any limitations set by publisher requirements.
In the case of Chapter 3:
Title: Bayesian Network for Risk of Diarrhoea Associated with the Use of Recycled Water
Journal: Risk Analysis Status: Published 2009, 29(12) 1672-1685
Contributor Statement of Contribution Signature Date
Margaret Donald Margaret Oonald as first author
~ .. 10/"' / ro was responsible for the concept of the paper, data analysis, interpretation and the writing of all drafts. ~ ()
Or Angus Cook Was responsible for the ~? f')_r(~ (o definition of the network and forC: editorial comment.
Professor Kerrie Was responsible for general l(k-t P91to Mengersen advice and editorial comment
Principal supervisor's Confirmation
I have sighted email or other correspondence from all co-authors confirming their certifying authorship. JU 1
/.((M, I( M c /\Jt;., ,(1-;;,;Jt;_" 1\J __ ...J"--A.._A...,------ ___ I....:;.G.~-/_t__,9-+/....:....1=0--Name Signature Date
Chapter 3
Network for Risk of Diarrhoea
Associated with the Use of Recycled
Water
3.1 Preamble
This chapter has been written as a journal article, and addresses research objective (1) which
aimed to build credible intervals for point estimates of a Bayesian net (BN) in the context of a
risk assessment.
The technique used was to recast the BN as a DAG within WinBUGS [Lunn et al., 2000],
and then to elicit priors for the uncertainty of the conditional priors required by the BN. Some-
what annoyingly, the method demands that the user postulate a sample size for the population at
risk. The other simple idea was to recognise that conditioning on the realisation of a particular
node was equivalent to subsetting the data. Even for small (and large) probabilities the method
outlined would seem necessary, as the complex mixtures which give rise to the final point esti-
mates of the proportions are not well approximated by a simple binomial distribution or a normal
distribution. For further discussion see the addendum to this chapter 3.9, and Figures 3.4- 3.5.
101
102CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
WinBUGS code for the model 2 BN is found in Section A.
I am the principal author and the paper is presented here in its entirety, but with different bib-
liographic conventions from the journal, Risk Analysis, in which it was published. Angus Cook
provided the framework and asked the key question about confidence limits, Kerrie Mengersen
oversaw, guided and elicited ideas. Margaret Donald as first author was responsible for the
concept of the paper, data analysis, interpretation, writing all drafts and addressing reviewers’
comments.
Title: Network for Risk of Diarrhoea Associated with the Use of Recycled Water
Authors: Margaret Donalda, Angus Cookb, Kerrie Mengersena.
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434,Brisbane, QLD 4001, Australia.
bThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia.
In this paper we take a Bayesian net, and consider various possible outcomes. The paper’s
contribution is to express those outcomes as credible intervals.
The conceptual model used represents the factors and pathways by which recycled water
may pose a risk of contracting gastroenteritis. This was converted to a Bayesian net and quanti-
fied using Angus Cook’s expert opinion. Bayesian nets are an important aid in conceptualising
complex relationships. The quantification of conditional probabilities and the consequences
which flow from the net and those probabilities permits the possibility of adjusting both net and
probabilities so that the consequences match our understandings and the data.
The method was to create Markov chain Monte Carlo samples for all nodes of the net, having
recognised that the Directed Acyclic Graph (DAG) structure of Bayesian nets (BN) is replicated
in the WinBUGS software Lunn et al. [2000]. The technique involved eliciting uncertainty
bounds for all elements of the Conditional Probability Tables (CPTs), and matching moments
of the Beta distribution to those bounds, thereby adding an extra node to the DAG of the original
BN for each element of any CPT.
There were a number of conditional outcomes of interest and these were found by forming
BIBLIOGRAPHY 103
the subset which matched the condition within each MCMC iteration. To allow estimation
of the relevant ratios, a single MCMC iteration ran for a fixed population size of 50000 (a
number chosen to ensure that each condition had a reasonably sized denominator within the
iteration). Within the iteration, counts satisfying each outcome were found and ratios for the
relevant condition found. These were then able to be summarised to give 95% credible intervals.
This method is quite general and does not require the ability to calculate partial differentials
as do, for example, the methods of Van Allen et al. [2001] or Chan and Darwiche [2004]. (These
papers came to my attention after the paper had been accepted.)
Bibliography
Chan, H. and A. Darwiche (2004). Sensitivity analysis in Bayesian networks: from single
to multiple parameters. In UAI ‘04 Proceedings of the 20th Conference on Uncertainty in
Artificial Intelligence, pp. 67–75. AUAI Press.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian mod-
elling framework: Concepts, structure, and extensibility. Statistics and Computing 10(4),
325–337.
Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01),
Seattle. Citeseer.
3.2 Network for Risk of Diarrhoea Associated with the Use of Re-
cycled Water
Margaret Donalda, Kerrie Mengersena, Angus Cookb
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,
QLD 4001, Australia bThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009,
Australia.
Abstract
Estimating potential health risks associated with recycled (reused) water is highly complex given the
multiple factors affecting water quality. We take a conceptual model which represents the factors and
pathways by which recycled water may pose a risk of contracting gastroenteritis, convert the conceptual
model to a Bayesian net and quantify the model using one expert’s opinion. This allows us to make
various predictions as to the risks posed under various scenarios. Bayesian nets provide an additional
way of modelling the determinants of recycled water quality and elucidating their relative influence on
a given disease outcome. The important contribution to Bayesian net methodology is that all model
predictions, whether risk or relative risk estimates, are expressed as credible intervals.
Keywords
Bayesian nets, credible intervals, recycled water, gastroenteritis, expert opinion.
3.3. Introduction 105
3.3 Introduction
With climate change and increasingly prolonged droughts affecting most states in Australia, interest has
developed in reusing waste water. Recycled water is currently being used in numerous schemes globally
[Anderson, 2007, Asano, 1998]. Sewage has been associated with the outbreak of waterborne diseases
since John Snow’s pioneering work on cholera [Snow, 1849, 1855] and continues to be associated with
the outbreak of enteric diseases world-wide, where cross-contamination of water distribution systems, or
poorly treated sewage-contaminated source waters continue to give rise to epidemics [Nadebaum et al.,
2004]. It is therefore of major public health significance to assess potential risks associated with the use
of recycled water.
Various forms of risk assessment have been used to determine the safety of recycled water. In par-
ticular, Quantitative Microbial Risk Assessment(QMRA) is currently the method of choice for assessing
the risk of infection due to consumption of drinking water [Ashbolt et al., 2005, Haas and Eisenberg,
2001, Roser et al., 2006]. However, these may often be limited by a paucity of data either for source or
finished water (wastewater). There may also limitations in the capacity of such risk assessments to deter-
mine which process components are contributing to disease risk. In this paper we adopt a supplementary
analysis based on a Bayesian network (BN) to help inform the process of risk estimation. Networks also
provide insight into possible problems arising in recycled water systems because starting conditions can
be varied to explore a range of scenarios of interest. The model presented is not intended to serve as a
replacement for a comprehensive QMRA, and indeed QMRA estimates may be used to guide the inputs
into such a network.
A BN is a graphical model with an underlying probabilistic framework, that characterises and quan-
tifies an outcome of interest, and the variables and their interactions associated with this outcome. It
is a form of directed acyclic graph (DAG). See Cowell et al. [2001] or Korb and Nicholson [2004] for
more details. Bayesian networks have been used widely in environmental literature [Hamilton et al.,
2007, Nicholson et al., 2003, Pike, 2004, Pollino and Hart, 2005a,b, Pollino et al., 2007, Varis, 1995,
1997, 1998]. Most of these examples integrate expert opinion and data to quantify the probability tables
underlying the network.
Recycled water may differ considerably with respect to quality depending on the treatment and reuse
purpose [Natural Resource Management Ministerial Council et al., 2006], and the possible exposure path-
106CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
ways will also differ. In this paper, we illustrate a generic Bayesian network for contaminants which may
enter a recycled water scheme and consider the potential health risks with respect to enteric pathogens
which may still remain in the recycled water component. This model may then be extended to many
other contexts, including specific schemes and distribution systems with their own inputs (based on ex-
pert opinion and/or series of process data).
A general model representing the factors and pathways involved in such a process was developed by
Angus Cook and David Roser∗. This conceptual framework comprises six components: recycled water
and distribution pathways, exposure pathways and populations, cumulative end-user dose, identified tox-
icity and pathogenicity pathways, individual covariates and health endpoints. This model was used as the
basis for the development of the BN as described in Section 3.4. An exposition of results arising from
interrogation of the network is provided in Section 3.5, followed by a general discussion in Section 3.6.
The purpose of the paper is fourfold: firstly to illustrate a Bayesian network for the assessment of a
recycled water system; secondly, to quantify the model using one expert’s opinion; thirdly, to assess the
sensitivity of the model to its various parent nodes; and finally, to add uncertainty to the probabilities and
conditional probabilities of the nodes in order to examine the difference these uncertainties might make
to the model’s predictions.
A secondary purpose of the paper is to highlight the fact that a BN is a directed acyclic graph (DAG),
and as such can be represented not only by such purpose built software as Hugin [Hugin Expert A/S,
2007] and Netica [Norsys Software Corp., 2007], but also with any software which represents DAGs. In
this case, we have used WinBUGS [Lunn et al., 2000] as this software allows one to add uncertainty to
every node and prediction (including relative risks) of the BN.
3.4 Methods
3.4.1 Development of a conceptual model
A conceptual model developed by Cook and Roser (Figure 3.1) was represented as a Bayesian Network.
The model was not designed to reflect a particular recycled water system, but to indicate the various
∗Dr. David J. Roser, Centre for Water and Waste Technology, University of New South Wales, Australia.Dr. Angus G.Cook, School of Population Health, University of Western Australia.The conceptual model was prepared as part of the project “Assessing the Public Health Impacts of Recycled WaterUse”, funded by the Western Australian Government through the Department of Water’s Water Fund.
3.4. Methods 107
factors (or nodes) that influence whether the standard of the water is likely to be classified as acceptable
(’safe’) or unacceptable (’unsafe’). The Bayesian network comprises two distinct subnetworks describing
water supply distribution endpoints and health endpoint, joined by a directed link. In this particular
example, the health endpoint considered was gastroenteritis, although the framework is appropriate for a
wider range of health outcomes.
Each node of the network was ascribed categories or ordinal levels. The underlying conditional
probability tables (CPTs) and possible ranges for these probabilities were based on an expert opinion of
epidemiological water risk assessment (were provided by Angus Cook). The structure of the network, the
underlying probabilities and the resultant tentative model predictions were presented at two workshops:
one consisting of environmental health authorities from the University of Western Australia School of
Population Health and the Western Australian Department of Health, the other consisting of water and
wastewater researchers from the Centre of Water and Wastewater Technology (University of New South
Wales). Hereafter, this network is referred to as Model 1.
The nodes of this network may be variously defined. For example, the final outcome node of the
BN, ‘Gastroenteritis’ has a number of possible interpretations: the rate of gastroenteritis episodes per
person per year, or perhaps, the rate of gastroenteritis hospital admissions, where the gastroenteritis
is attributable to, or associated with, recycled water in this hypothetical model. Further discussion of
possible meanings for nodes is deferred until the discussion in Section 4.
The expert, Angus Cook, was asked to express uncertainty about all probabilities used in Model 1,
by specifying a 95% confidence interval for the probability. This information was incorporated in an
augmented BN (Model 2), where the elicited 95% confidence interval for the uncertainty for each binary
node was re-expressed as a beta distribution with parameters (α, β) that were determined on the basis of
the ranges of probability values provided. These beta distributions became the priors to the relevant nodes
in Model 2. To allow appropriate comparisons between Model 1 (without uncertainty) and Model 2 (with
uncertainty), the expected value of each prior in Model 2 was set to equal the value of the corresponding
probability in Model 1; hence the probabilities of Model 2, both conditional and marginal, are those
obtained from Model 1, provided the MCMC chain is sufficiently long for the desired accuracy. In all,
some 40 priors were needed in Model 2 to express the uncertainties associated with the conditional and
unconditional probabilities of Model 1.
To illustrate the difference between Model 1 and Model 2: In model 1, the parent node 1 (Pri-
108CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
mary Source Water) was given a probability of .01 of meeting an acceptable standard. In model 2, the
probability of Primary Source Water meeting an acceptable standard is drawn from a Beta distribution,
Beta(6.751, 668.37), whose expected value is .01, and which 95% of the time returns probability values
between .004 and .02.
3.4.2 Determination of prior probabilities
In determining the beta priors†, we used the method of moments to determine (α, β). For details, see
Spiegelhalter et al. [1994] where Dirichlet priors were developed for use in a congenital heart disease
network. Dirichlet priors may be used in Hugin (as ‘experience’), but such priors are used when updating
a node within the network on a case by case basis. They do not add uncertainty to model predictions.
The age structure for the population to which the BN probabilities was applied was taken from
Australian population data [Anon., 2008] for 2000. No priors were placed on the multinomial age distri-
bution, as age structures evolve slowly in Australia.
3.4.3 Set up and use of models
Detailed description of the model setup
Model 1 is described fully by Figure 3.2, together with the conditional and unconditional probabili-
ties given in Table 3.1 in column p. Properties and likelihood functions are well explained in such books
as that by Cowell et al. [2001]. Model 2 is identical in framework, but the conditional and unconditional
probabilities vary to have the corresponding mean of p and its variance (elicited from the expert), and are
drawn from the Beta distribution with parameters (α, β).
The values of α and β were found by considering the upper and lower bounds in Table 3.3 as 95%ile
limits for the distribution. These were used to deduce 2 × 1.96 ×√
Var(X) and then the moments (the
mean and the variance) were matched to solve for α and β, where E(X) = αα+β
, Var(X) = αβ(α+β)2(α+β+1) .
Thus, if we consider nodes 1 and 5 which feed into node 2, and use Xi to describe the (0,1) output
from node i, then under the simple Bayesian network (Model 1), and using B(π) to represent a Bernoulli
distribution with parameter π, and Be(α, β) to represent a Beta distribution with parameters (α, β), we
†p ∼ Beta(α, β)⇒ f (p) = pα−1(1 − p)β−1 Γ(α+β)
Γ(α)Γ(β) , 0 < p < 1
3.4. Methods 109
have
X1 ∼ B(.01), X5 ∼ B(.01) and X2 is given by
X2|X1 = 0, X5 = 0 ∼ B(.96), X2|X1 = 0, X5 = 1 ∼ B(.98),
X2|X1 = 1, X5 = 0 ∼ B(.99), X2|X1 = 1, X5 = 1 ∼ B(.98).
For model 2, the p parameter of the Bernoulli distribution is, itself, stochastic and we have from
Table 3.1 and Figure 3.2, and using the same naming convention as above
p1 ∼ Be(6.751, 668.37), X1 ∼ B(p1),
p5 ∼ Be(.599, 59.25), X5 ∼ B(p5), while p2 and X2 are given by
p2(1) ∼ Be(55.687, 2.32), p2(2) ∼ Be(117.083, 2.39),
p2(3) ∼ Be(375.525, 3.79), p2(4) ∼ Be(5.135, .01),
X2|X1 = 0, X5 = 0 ∼ B(p2(1)), X2|X1 = 0, X5 = 1 ∼ B(p2(2)),
X2|X1 = 1, X5 = 0 ∼ B(p2(3)), X2|X1 = 1, X5 = 1 ∼ B(p2(4)).
At Node 4 in Model 2, the output X4 depends only on the values of X3 and X7. Thus,
X4|X7 = 0, X3 = 0 ∼ B(p4(1)), X4|X7 = 0, X3 = 1 ∼ B(p4(2)),
X4|X7 = 1, X3 = 0 ∼ B(p4(3)), X4|X7 = 1, X3 = 1 ∼ B(p4(4)), where
p4(1) ∼ Be(8.335, 3.57), p4(2) ∼ Be(21.054, 5.26),
p4(3) ∼ Be(30.217, 3.36), p4(4) ∼ Be(152.358, .15).
In model 2, the entire network is embedded in a for-loop of a particular population size, N. This
means that each simulation consists of X1 drawn N times, thereby allowing us to calculate ratios and
relative risks for that population (and all 12000 simulations).
Models 1 and 2 give marginal probabilities of the nodes and conditional probabilities of interest.
For example, scenarios that are explored include the probability of gastroenteritis for a child aged less
than five years, when the “cumulative dose” (CD) is acceptable, or the probability of gastroenteritis for
an adult when there is a failure to achieve the acceptable standard at the end point distribution. The
110CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
conditional probabilities were also represented as risks relative to the risk when the cumulative dose is
acceptable, for the group considered. All model predictions are thus represented initially in terms of
probabilities and are then expressed as relative risks of gastroenteritis, relative to the input nodes at the
safest settings (that is, least likely to lead to gastroenteritis). All relative risks are thus relative to the risk
posed for the comparable group when the cumulative dose (CD) is acceptable‡.
The impact of the various nodes on the final outcome node of ‘Gastroenteritis’ (and on the node
‘Endpoint Distribution’) was assessed using the mutual information and the variance of belief for the
various nodes. These calculations are based on Pearl [1988], while the variance of belief calculations are
based on Spiegelhalter [1989]. The mutual information for two nodes(X,Y) is a measure of the distance
between P(X)P(Y) and P(X,Y)§, while the variance of belief is a variance measure which again measures
the effect of one node on another.
The BN (Model 1) was analysed using both Netica [Norsys Software Corp., 2007] and Hugin [Hugin
Expert A/S, 2007]. As discussed earlier, the BN is a directed acyclic graph and thus may also be analysed
in a Bayesian Markov Chain Monte Carlo framework. Thus, uncertainty was added to all the underly-
ing probabilities of Model 1, to give Model 2, using Winbugs [Lunn et al., 2000], a Bayesian MCMC
graphical package. The purpose of translating Model 1 to Model 2 was to associate uncertainty with the
marginal and conditional probability and relative risk point estimates from the BN of Model 1. The BN
software is a convenient framework in which to calculate such estimates, but does not provide uncertainty
analysis and corresponding credible (or confidence) intervals. Thus, we used Netica and Hugin to draw
inferences of interest (see section 3).
We then built a somewhat more complex model (Model 2) in Winbugs, which mimicked Model 1 in
order to find posterior credible intervals for all point estimates. In particular, the conditional probabili-
ties found under Model 1, are found by subsetting the sample at each iteration to satisfy the particular
condition.
In order to quantify the variability of potential health outcomes, a population sample size is required,
since beta distributions give rise to binomial outcomes. Although the marginal probabilities have the size
of the whole population as their denominator, the conditional probabilities are based on the subset of the
sample population which satisfies the condition. Thus, with small sample sizes, some relative risks may
‡Relative Risk=Probability(gastroenteritis for the specific agegroup under scenario)/Probability(gastroenteritisfor the specific agegroup when Cumulative dose is acceptable).
§Defined as I(X,Y) =∑
Y P(Y)∑
X P(X|Y)log P(X,Y)P(X)P(Y) .
3.4. Methods 111
not always be able to be calculated. In this example, we needed to choose a population size that was
sufficiently large that the various desired conditional probabilities and relative risks were always able
to be estimated. A population size of 50000 was selected which enabled the construction of credible
intervals for all relative risks and conditional probabilities found using the BN.
The MCMC simulations for Model 2 were run for 12000 iterations. There is no requirement for
burn-in since the starting distribution is the target distribution. Several runs showed that 12000 iterations
was a sufficient run length for estimates to be stable at the number of decimal points shown.
3.4.4 Model Validation
The models may be assessed by comparison with known outcomes (external validation), or by critical
consideration of how the components of the model interact to give conclusions, and consideration of the
sensitivity of the model to changes in inputs (internal validation).
In the absence of an identified outbreak, and related data, external validation of a risk assessment
model is not usually feasible. Thus, for example the Milwaukee Cryptosporidium outbreak was large
and well-defined, and thereby triggered an enormous commitment of resources. This allowed the de-
termination of the duration of contamination, the proportion of the population that was affected (from a
random telephone survey), oocyst concentrations in the treated water, and so on Arrowood et al. [2001],
Brookhart et al. [2002], Eisenberg et al. [1996, 1998], Haas et al. [1999], MacKenzie et al. [1994, 1995].
With no data available for source waters for Cryptosporidium, rotavirus, Campylobacter nor surrogates,
a direct external validation was not possible. We did attempt external validation by consideration of
the National Notifiable Diseases Surveillance System data¶, and calculated the disease rates for Cryp-
tosporidiosis, Rotaviral enteritis, and Campylobacter infections per 100 000 people per year for the three
age groups. However, such incidence rates are likely to be undercounts of disease rates, given the many
hurdles to be crossed before being recorded in such a database. We also looked at the same three enteric
infections, and infectious diarrhoea as a proportion of admissions to a group of NSW hospitals∥ for the
¶The National Notifiable Diseases Surveillance System data [National Notifiable Diseases Surveillance System,2008], was queried to find both the rates and number of notifications by age group and sex for 2008. This allowedinference of baseline populations for each age group. The numbers of notifications by agegroup and the inferredpopulation sizes were then summed for each age group to give the rates for the agegroups used here.
∥The hospital admission rates are based on all admissions (678248, numbers per agegroup were 83699, 388113,206436) over five years (from January 1, 2001 to December 1, 2005), from six hospitals in the South West SydneyArea Health Service. The principal diagnosis at discharge, coded by ICD10 [World Health Organization, 2008], wasused to classify each admission. The ICD10 codes used were A08.0 (Rotaviral enteritis), A04.5 (Campylobacterenteritis), A07.2 (Cryptosporidiosis), while “Intestinal infectious diseases” are given by the ICD10 chapter codes of
112CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
three age-groups, but without adjusting for population figures, the proportion of admissions affected by
infectious diarrhoeas is not useful for validation of this model.
Moreover, as indicated in the introduction, this is a generic model: it has not been built to specif-
ically describe outbreaks of Cryptosporidium related diarrhoea. (Had that been so, the frequency of
prolonged/wet weather events might have formed part of the network35).Again, had it been intended
to describe rotavirus infections, a seasonal component would have been necessary, to account for the
seasonality in outbreaks, and hence seasonality in sewage, and potentially, in the recycled water.
Finally, it has not been formulated for a particular site. A site-specific model would have had condi-
tional probabilities tailored to the site, and some aspects of a site-specific external validation might then
have been possible.
Internal validation of the model occurs in the discussion, where we consider the implications of
the structure. Thus, we consider the sensitivity of the model to changes in inputs. We also discuss
how the structure of the model implies certain assumptions and suggest potential structural changes to
accommodate the situation when these assumptions may not be satisfied.
However, this is a generic model. It has not been built to specifically describe the outbreaks of
Cryptosporidium related diarrhoea, for example. (Had that been so the frequency of prolonged/heavy
rainy weather events might have formed part of the network [Signor, 2007].)
3.5 Results
3.5.1 Constructed BN
The final agreed BN is displayed in Figure 3.2. It contains 14 nodes: 7 binary terminal parent nodes,
1 ternary parent node, and 6 other nodes containing a further 30 binary probabilities as described in
Section 3.4. It consists of two subnetworks: one which describes various influences and processes on
the wastewater to the point of distribution (‘Endpoint distribution’) and another which describes various
aspects of the water and their influences on the population health outcome ‘Gastroenteritis’. This struc-
ture of the network, together with the probability settings (which are generally close to one or zero), is
important when we consider the sensitivity of inference to the various parent nodes; see Section 3.2.
A00, A01, A02, A03, A04, A05, A06, A07, A08 and A09. The databases used were the NSW Department of HealthHIE (”Health Information Exchange”) databases. The admission (disease count) excluded admissions within 3 daysof an earlier admission.
3.5. Results 113
Table 3.1 shows the values of the parameters (α, β) corresponding to the beta priors, calculated from,
and representing, the various elicited probabilities and ranges. The expert’s ranges are not shown but may
be back-calculated using moments and the method of Spiegelhalter et al. [1994]. Thus, Node 1 (“Primary
Source Water”) has (α, β) values as (6.751, 668.37). These correspond to the choice of a probability of .01
of meeting an acceptable standard and a 95% range for that probability as .005 to .02. The probabilities
(p) of Table 3.1 are used for the nodes in the BN (Model 1), and the values of (α, β) show settings for
the beta priors used in Model 2, which were chosen to achieve mean marginal probabilities equivalent to
those obtained under Model 1.
In the following sections we discuss the outcomes of the BN depicted in Figure 3.2 and quantified
using the probabilities of Table 3.1. In Section 3.5.2, we examine the network ignoring the uncertainty of
the probability estimates (Model 1) and then, in Section 3.5.3, assess the impact of adding this uncertainty
(Model 2).
3.5.2 Model 1: Analysis of the BN without uncertainty
Table 3.3 shows the marginal probabilities obtained under Model 1 for all nodes in the model, both the
simple and complex. (Note that the marginal probabilities for the simple nodes are those given by the
settings.) In this illustrative example, the probability of gastroenteritis over the entire population is .0208;
the probability of a cumulative dose that would be classified as “acceptable” is .9732; the probability that
the endpoint distribution meets defined ‘acceptable’ conditions is .9579; and the probability that the
pathogen load is acceptable is .9931.
One of the strengths of a BN is the ease with which a user may vary conditions and find conditional
probabilities relating to scenarios of interest. We define a baseline risk as the risk when the system is
running at its safest level, that is when the cumulative dose is accceptable. Under these conditions, the
probability of gastroenteritis is .0151 (over all age groups). The baseline risks are used as the denomi-
nator in estimating the relative risks. Thus, compared with this baseline, the overall relative risk (RR) of
gastroenteritis obtained from the BN is .0208/.0151 or 1.38. In contrast, using the same baseline risk, if
the endpoint distribution fails, the gastroenteritis probability becomes .0214 (RR of 1.42). If the distri-
bution endpoint fails and unplanned/planned usage is unplanned, the gastroenteritis probability becomes
.0236, i.e., a RR of 1.56.
For children under five years of age, the BN indicates that the gastroenteritis probability would be
114CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
.0594, given the expert-based probabilities described in section 3.1. Using again a baseline risk corre-
sponding to the safest operation of the system (an acceptable cumulative dose), for which the probability
is .0500, this translates to a RR of 1.19. In contrast, if the endpoint distribution fails, the gastroenteritis
rate for children under five years old becomes .0605, giving a RR of 1.21, and if, in addition, the usage
is unplanned, the rate for these children becomes .0641, a RR of 1.28. For those aged between 5 and 64
years, these relative risks are 1.51, 1.57 and 1.77 respectively, and for older adults (65+), they become
1.24, 1.27 and 1.36, respectively.
When the cumulative dose is acceptable, the probability of gastroenteritis in this hypothetical model
is an age-weighted average of .05, .01 and .03. Thus, even with perfect inputs from the parent nodes,
the final probability for gastroenteritis must lie between .01 and .05. Moreover under this model, the
marginal probabilities for gastroenteritis are .0594 for children under five years of age, .0151 for those
between 5 and 64, and .0372 for those of 65 and over, which indicates little change from the rates with
perfect drinking water, thereby indicating little impact from recycled water.
3.5.3 Model 2: Analysis of the BN with uncertainty
Analysis of Model 2 was undertaken in Winbugs as described in section 2. Using a population size of
50000 and 12000 iterations, the probabilities and relative risks of the BN (Model 1) were recovered and
95% credible intervals were found for all estimates of interest. Marginal probability point estimates ob-
tained under Model 1 and Model 2 are given in Table 3.3, together with corresponding credible intervals.
(For the ‘simple’ nodes, Table 3.3 shows the credible interval and thus the effective 95% range introduced
by the Beta prior. Table 3.4 shows the probabilities and relative risks of gastroenteritis, for the population
as a whole, for various agegroups, and under various conditions, together with the 95% credible intervals.
(Figure 3.3 shows the relative risks for all agegroups and scenarios.) The point estimates obtained under
Model 1 are also given in Table 3.4.
The relative risks considered in Table 3.4 and Figure 3.3 have all been defined to have higher risk
than the baseline (which occurs when the cumulative dose is acceptable) and as such should be greater
than 1. When a condition subsets a small subgroup, the consequent uncertainty of estimation may re-
sult in a credible interval that spans 1. Thus, for example, when the endpoint distribution fails and
planned/unplanned usage is unplanned, the credible interval for the relative risk of gastroenteritis is (.3,
3.1). Similarly, for all age subgroups, when both endpoint distribution fails and planned/unplanned usage
3.6. Discussion 115
is unplanned, the 95% credible intervals for the RRs are (0, 4.5), (0, 4.1) and (0,4.5) respectively. Other
relative risks, based on larger subsets, have smaller credible intervals that exclude 1; for example, the RR
across the whole population is 1.38 with 95% CI (1.3, 1.4) and that for children under five years of age
is 1.19 with 95% CI (1.1, 1.3).
From Table 3.4, we see that under an acceptable cumulative dose, the probabilities of infection for
each age group are .05, .01, .03, respectively. If we interpret these probabilities as the probability of a
person being infected per year, then these translate to rates of infection of 5000 (4300, 5800) cases, 1000
(900,1100), and 3000 (2600, 3400) per 100000 persons per year, respectively, which increase under the
model to 5940(5100,6800), 1510 (1400, 1600), and 3270(3300, 4200) per 100000 per year.
3.6 Discussion
3.6.1 Framework
It should be noted that we have used the MCMC capacity of Winbugs in model 2, although any Monte
Carlo framework would have been adequate. However, we wished to draw attention to the fact that
both Bayesian nets and Gibbs sampling take place in a directed acyclic graph (DAG) framework, with
simplifying Markovian properties, and to emphasize the common framework. We also chose to use this
framework, because it is simple, explicit and transparent, something which cannot be said of spreadsheet
frameworks, where considerable detective work is required to determine both what has been done and in
what sequence. The Winbugs software has now been freely available for many years and is robust and
well-supported by its many users, so it seemed sensible to demonstrate its use as another, transparent,
option in the armoury of risk assessment.
3.6.2 Internal validation
Inspection of the number of links between nodes in the BN and the final outcome of ‘Gastroenteritis’,
indicates that this network is deep, with ‘Primary Source Water’ being six links from ‘Gastroenteritis’.
With most probabilities near to the extremes of zero and one, a node at even a depth of three from
a later node is insufficient to induce a substantive impact on that node. Thus, most of the network
describing the catchment up to the endpoint distribution has little influence on the findings with respect
to ‘Gastroenteritis’, while within the subnetwork leading to endpoint distribution, neither the ‘Other
116CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Source Water’ nor the ‘Primary Source Water’ have much impact. This observation is supported by
the mutual information results of Table 3.2 which confirmed that the network comprises two strongly
distinct components, representing the water distribution subnetwork and the health outcome subnetwork.
Moreover, these results showed that other factors in the health outcome subnetwork largely mitigate the
impact of the endpoint distribution node.
A property of the structure of the network is that it essentially consists of two sub-networks linked
by only one node-to-node link from the endpoint distribution node to the pathogen load node. The first
subnet describes the water distribution, while the second describes the health effects. This impacts on the
sensitivity of the network to changes in inputs. The structure, combined with the depth of the network
and the more extreme probabilities associated with each node, results in the second half of the network
(the part which follows on from the endpoint distribution) being largely insensitive to any findings in the
earlier subnetwork, just as the distribution subnetwork shows little sensitivity to any node further than
two nodes away. A consequence is that there is little need to include additional complexity in the model
through, for example, feed-back loops since the nodes of such loops would be even further away and
even less likely to exert an influence.
Within the health outcome subnet, the unplanned/planned use node clearly envisages water of a po-
tentially less than potable quality which may carry an unacceptable pathogen concentration. The structure
of this subnet presupposes that unplanned usage has the same probabilities for all segments of the pop-
ulation. However, in light of the various proposed reuses, unplanned usage by joggers and swimmers,
for example, may more probably be age-related, than the consumption of contaminated foodstuffs which
may apply more broadly across the population. This problem could be addressed by allowing the Age
node to also be a parent to the ‘Planned/Unplanned usage’ node. Furthermore, the population-wide inter-
pretation of the probabilities in the BN does not explicitly acknowledge that for many enteric diseases,
while the index case may have been exposed via water, subsequent cases may be largely due to person-to-
person contact. This could also be accommodated by a structural change to the network or a redefinition
of nodes.
Other structural models could also be considered. For example, Pike [2004] describes a BN which
shows the possibility of all treatment processing nodes being bypassed and which is able to be verified
by monitoring data. This BN has a flatter, less vertical structure than that of the BN used here and also
includes nodes with more than one child-node, with the advantage that nodes with low/high probabil-
3.6. Discussion 117
ities are allowed greater possibility of being influential than in the BN used here. Pike’s network also
represents the possibility of complete failure to process (for example when a plant goes to bypass). In
any system where the whole of the wastewater plant output flows to a river and is reused downstream,
one should take into account the frequency of the sewage treatment plant going to bypass. However,
it is acknowledged that this is not a problem imposed by recycling, but rather a problem of wastewater
treatment.
3.6.3 Discussion of the results
The aim of this paper was to illustrate how the risks of gastroenteritis posed by the use of recycled
water could be represented using a Bayesian network. The network approach provides a full description
of relevant nodes, levels, probabilities and ranges of uncertainty for these probabilities, and allows us
to determine factors and links having most influence in the model. This summary is not intended to
provide a commentary on a particular recycled water system, but rather to indicate how risk may be
conceptualised in a network. In this context, we have used an expert’s opinion to populate the nodes, but
networks may also be built based upon group opinion or may use inputs drawn from more quantitative
sources, as in a QMRA.
Comparing the two models, it was apparent that substantial uncertainty was added by the inclusion
of the beta priors. The addition of uncertainty to the network has brought considerable variation to the
predictions. For example, the relative risk for a child of age less than five, when the endpoint distribution
fails is 1.21, but the 95% credible interval for the RR is wide (.5, 2.0), with a corresponding probability
point estimate of .0605 and 95% credible interval of (.024, .102). Public health implications and con-
sequent decisions may vary considerably on inspection of the range and bounds of such intervals. This
addition of credible intervals on point estimates of outcome probabilities and relative risks is arguably a
valuable addition to the analytical and inferential results of the BN.
The network approach allowed identification of the nodes that contributed most to the outcome of
gastroenteritis. These were cumulative dose, age, exposure period and pathogen intake. In summary,
based on the conceptual model and expert-based probabilities, the BN revealed an overall risk for gas-
troenteritis of 1.38 relative to that for an acceptable cumulative dose, with a 95% credible interval (1.3,
1.4) based on a population of 50000. The relative risks varied over age cohorts and, as expected, had
point estimates greater than one under adverse scenarios. For example, with a failure at the endpoint
118CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
distribution, the relative risk became 1.42 (1.0, 1.8).
With large populations, small percentage changes may represent many people with an increased
chance of contracting diarrhoea as a consequence of exposure to recycled water. Thus, .0208, the prob-
ability of gastroenteritis under the default values, represents some 1040 extra cases with a 95% credible
interval of (950, 1100) cases. The addition of uncertainty to the predictions is a useful addition to the
potential inferences of the Bayesian net.
A difficulty for the suggested methodology for calculation of credible intervals occurs when the
condition implies a relatively small subset of the population. This may be overcome in two ways: by
considering just the subset and its links under the given condition, or by sampling from a sufficiently
large base population which allows the sampling to occur without null subsets. (Note that for a BN this
is never a problem: all conditional probabilities relating to the model may be found, regardless of the
closeness of a probability to zero or one, or of the depth of the network.)
The BN described in this paper was developed to reflect a conceptual model for health risks asso-
ciated with recycled water proposed by two experts in the field (Cook/Roser). This paper is intended to
contribute to the available tools for assessing this important environmental health issue, and in particular
to contribute a methodology for quantifying the uncertainty of point estimates arising from BNs.
3.7. Tables 119
3.7 Tables
120CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Table 3.1 Settings
Node Name Value p
14 Age1 0-4 years 0.06715-64 years 0.809665+ years 0.1233
Node Name Description p α β
1 PSW Primary Source Water Meets2 0.01 6.751 668.375 OSW Other Source Water Meets 0.01 0.599 59.256 Rep Reprocessing Meets 0.99 59.2524 0.598517 OPPS Other planned/unplanned supply Meets 0.80 48.372 12.0938 Puse Planned/Unplanned use Planned 0.90 30.217 3.3574410 EP Exposure period Short 0.90 30.217 3.3574412 PU Pathogen uptake Low 0.50 47.52 47.52
Node Name p α β
2 Primary Treatment(Meets) PSW OSW
(PT) 0 0 1 .960 55.687 2.321 2 .980 117.083 2.39
1 0 3 .990 375.525 3.791 4 .998 5.135 0.01
3 Storage(Meets) Rep PT
0 0 1 .800 48.372 12.091 2 .990 59.252 0.6
1 0 3 .900 30.217 3.361 4 .990 239.98 2.42
4 Endpoint Distribution (Meets) OPPS Storage
(ED) 0 0 1 .700 8.335 3.571 2 .800 21.054 5.26
1 0 3 .900 30.217 3.361 4 .999 152.358 0.15
9 Pathogen Load(Low) ED Puse
(PL) 0 0 1 .700 8.335 3.571 2 .950 68.391 3.6
1 0 3 .970 172.529 5.341 4 .999 152.358 0.15
11 Cumulative Dose(Accept3) PU EP PL
(CD) 0 0 0 1 .700 8.335 3.570 1 2 .800 7.068 1.771 0 3 .900 30.217 3.361 1 4 .970 172.529 5.34
1 0 0 5 .930 92.103 6.930 1 6 .950 68.391 3.61 0 7 .980 117.083 2.391 1 8 .999 152.358 0.15
13 Gastroenteritis(Yes) CD Age
0 1 1 .400 8.82 13.232 2 .200 12.093 48.373 3 .300 10.456 24.4
1 1 4 .050 3.6 68.392 5 .010 3.739 375.533 6 .030 5.33595 172.529
1. Based on the Australian Population census 2000. p values common to both Model 1 & 2.2. Meets an acceptable standard.3. An acceptable dose.p gives the settings for Model 1.α, β give the settings for the Beta prior for the corresponding node in Model 2.
3.7. Tables 121
Table 3.2 Sensitivity of two nodes: Gastroenteritis & Endpoint Distribution
Mutual Variance Distance fromNode Information of Beliefs Relevant Node
Gastroenteritis (to the findings at each node)
13 Gastroenteritis .14584 .0203551 011 Cumulative Dose .01498 .0011554 114 Age .00436 .0001595 110 Exposure Period .00137 .0000480 212 Pathogen Uptake .00068 .0000191 29 Pathogen Load .00002 .0000006 24 Endpoint Distribution .00000 .0000000 38 Planned/Unplanned Use .00000 .0000000 37 Other planned/unplanned supply .00000 .0000000 41 Primary Source Water .00000 .0000000 65 Other Source Water .00000 .0000000 62 Primary Treatment .00000 .0000000 56 Reprocessing .00000 .0000000 53 Storage .00000 .0000000 4
Endpoint Distribution (to the findings at earlier nodes-Half model)
4 Endpoint Distribution .25206 .0403721 07 Other planned/unplanned supply .08803 .0063370 13 Storage .00151 .0001320 12 Primary Treatment .00005 .0000031 26 Reprocessing .00000 .0000000 25 Other Source Water .00000 .0000000 31 Primary Source Water .00000 .0000000 3
Table 3.3 Model Comparisons: Marginal probabilities & 95% credible intervals
ProbabilityNode Name Value Model 1 Model 2
Complex Nodes
2 Primary Treatment Meets .9605 .9605 (.959, .962)3 Storage Meets .9864 .9864 (.985, .987)4 Endpoint Distribution Meets .9579 .9579 (.956, .960)9 Pathogen Load Acceptable .9931 .9931 (.992, .994)11 Cumulative Dose Acceptable .9732 .9732 (.972, .975)13 Gastroenteritis Yes .0208 .0207 (.019, .022)
Simple Nodes
1 Primary Source Water Meets .01 .01 (.004, .019)5 Other Source Water Meets .01 .01 (.000, .046)6 Reprocessing Meets .99 .99 (.954, 1.000)7 Other Planned/Unplanned Supply Acceptable .80 .80 (.692, .889)8 Planned/Unplanned Use Planned .90 .90 (.778, .974)10 Exposure period Short .90 .90 (.780, .976)12 Pathogen uptake Low .50 .50 (.399, .600)
These probabilities are given to at most 3 significant figures since .9605 = 1 − .0395
122CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Table 3.4 Model comparisons for Gastroenteritis under various conditions
p(Gastroenteritis)Condition Model 1 Model 2
(No conditions) .0208 .0207 (.019, .022)(0-4 yrs) .0594 .0594 (.051, .068)
(5-64 yrs) .0151 .0150 (.014, .016)(65+ yrs) .0372 .0372 (.033, .042)
Cumulative Dose Acceptable .0151 .0151 (.014, .016)(0-4 yrs) .0500 .0501 (.043, .058)
(5-64 yrs) .0100 .0099 (.009, .011)(65+ yrs) .0300 .0300 (.026, .034)
Endpoint Distribution Fails .0214 .0214 (.015, .028)(0-4 yrs) .0605 .0605 (.024, .102)
(5-64 yrs) .0157 .0156 (.010, .022)(65+ yrs) .0381 .0381 (.016, .063)
Endpoint Distribution Fails .0236 .0236 (.005, .046)& Planned/Unplanned Use is unplanned (0-4 yrs) .0641 .0639 (0, .222)
(5-64 yrs) .0177 .0175 (0, .040)(65+ yrs) .0409 .0410 (0, .130)
RR(Gastroenteritis)Condition Model 1 Model 2
1.38 1.38 (1.3, 1.4)(0-4 yrs) 1.19 1.19 (1.1, 1.3)
(5-64 yrs) 1.51 1.52 (1.4, 1.6)(65+ yrs) 1.24 1.24 (1.2, 1.3)
Endpoint Distribution Fails 1.42 1.42 (1.0, 1.8)(0-4 yrs) 1.21 1.21 (.5, 2.0)
(5-64 yrs) 1.57 1.58 (1.0, 2.2)(65+ yrs) 1.27 1.27 (.6, 2.1)
Endpoint Distribution Fails 1.56 1.57 (.3, 3.1)& Planned/Unplanned Use is unplanned (0-4 yrs) 1.28 1.28 (0, 4.5)
(5-64 yrs) 1.77 1.78 (0, 4.1)(65+ yrs) 1.36 1.37 (0, 4.5)
3.8. Figures 123
3.8 Figures
124CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Figure 3.1 Conceptual model of Cook and Roser1
COMPONENT 1: RECYCLED WATER PROCESSING AND DISTRIBUTION PATHWAYS
I
I
I
l. PRIMARY SOURCE WATUt
! .), PRIMARY TR£A.TM(NT PR()(($$£5
! ft, HVOAAVUC 0\'NAMIC$ A.NO STOAAG£
PAitAM£TE~
l
J I
~";o~ ./
1 .OTHER SOURCE WAT(R tNPUTS
... REPROCESSIIlG
$. Ol14£il. PlJ,Nti(O !R UtiPLANtt£0 SUPPLY ~ ~ vn
I 7. EtlOPOitUDtSTRJBUTIOH I SUPPLY I
I
I
I
I
COMPONENT 2: EXPOSURE PATHWAYS AND POPULATION$ (actual and potential)
IJ .PLMI.NED OR REGULA lED WATER UTILISATIOII OR CONTACT 8Y END USER
9.UH:PU.NIIED OR IIOII·REGULATEO WATER Ul'IUSATIOH OR CONTACT
COMPONENT 3: CUMULATIVE END·USER DOSE
COMPONENT 5: COMPONENT 4: ~-·~~--IDENTIFIED TOXICITY/ 0 PATitOGENSI (H[MJCALS OF INDIVIDUAL
PATHOGENICITY CON" AN COVARIATES
PATHWAYS I ~ I 11 ... mw<Eor INDIVIDUAL COVARIATCS 12. PATHWAYS TO I 810LOGICAL Eff'ECT
/ COMPONENT 6: HEALTH ENDPOINTS
I 1l. PROJ£(l[0 DEVUOP.¥.EHT OF ACUTt I I 1•. PROJECTED OEVELOPA\ENT Of LONG· ADVERSE HEALTH ENDPOIIOS TERM (CHROIIK) AOVEitSE HEAlTH
(NOPOIIUS
r== ~
= ~ + C:TUAL OEVELOPMEKT Of LON~ TERM
I ACTUAL O(V(LOPM(NT OF" ACUTE
I HRONKJ ADVEIIS£ Hr.AL TH ENDPOIIITS I
ADVERS.E HEAL TU [HOP01t41S
===="
3.8. Figures 125
Figure 3.2 Bayesian network based on the conceptual model: Node numbering is that used inthe text and in the WinBUGS model
126CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Figure 3.3 Relative risks for each age group (0-4, 5-64, 65+) and for the entire population (All)for each risk scenario, estimated from the BN (Model 1) and by MCMC (Model 2),with 95% credible intervals from Model 2.
BIBLIOGRAPHY 127
Supporting information
The WinBUGSs code for the model is available from the first author.
Acknowledgments
The models described in this paper have been developed as part of the research project “Assessing the
Public Health Impacts of Recycled Water Use” that received funding from the Western Australian Gov-
ernment through the Department of Water’s Water Fund. The project is led by the University of Western
Australia and partnered by Queensland University of Technology and the Western Australian Department
of Health.
Dr Ken Hillman and the Simpson Centre for Health Services Research, Liverpool Hospital, NSW
supplied the HIE data.
Bibliography
Anderson, J. M. (2007). AWA water recycling forum position paper: Water recycling to meet our water
needs. In S. J. Khan, R. M. Stuetz, and J. M. Anderson (Eds.), Water Reuse and Recycling 2007.
Sydney: UNSW Publishing & Printing Services.
Anon. (2008). Accessed, June, 2008: http://www.nationmaster.com/country/as-australia/.
Arrowood, M. J., P. J. Lammie, J. W. Priest, D. G. Addiss, M. R. Hurd, W. R. MacKenzie, A. C. McDon-
ald, M. S. Gradus, G. Linke, and E. Zembrowski (2001). Cryptosporidium parvum-specific antibody
responses among children residing in Milwaukee during the 1993 waterborne outbreak. Journal of
Infectious Diseases 183(9), 1373–1378.
Asano, T. (1998). Wastewater reclamation and reuse. Water Quality Management Library ; V. 10.
Lancaster, Pa.: Technomic Pub.
Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson (2005). Mi-
crobial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers University of Tech-
nology.
128CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg (2002).
Statistical estimation of parameters in a disease transmission model: analysis of a Cryptosporidium
outbreak. Statistics in Medicine 21, 3627–3638.
Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic Networks and
Expert Systems. Springer.
Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an epidemi-
ological framework. Risk Analysis 16, 549–563.
Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An analysis
of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection process.
Epidemiology 9(3), 255–263.
Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.), Water
Quality: Guidelines, Standards and Health. WHO.
Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New York:
Wiley.
Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen (2007).
Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula bloom initiation
in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–1287.
Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6, 2008.
Korb, K. B. and A. E. Nicholson (2004). Bayesian Artificial Intelligence. London: CRC Press.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.
MacKenzie, W., N. J. Hoxie, M. E. Proctor, M. S. Gradus, K. A. Blair, D. E. Peterson, J. J. Kazmierczak,
D. G. Addiss, K. R. Fox, J. B. Rose, and J. P. Davis (1994). A massive outbreak in Milwaukee
of cryptosporidium infection transmitted through the public water supply. New England Journal of
Medicine 331(3), 161–167.
BIBLIOGRAPHY 129
MacKenzie, W. R., W. L. Schell, K. A. Blair, D. G. Addiss, D. E. Peterson, N. J. Hoxie, J. J. Kazmierczak,
and J. P. Davis (1995). A massive outbreak of waterborne cryptosporidium infection in Milwaukee,
Wisconsin: Recurrence of illness and risk of secondary transmission. Clinical infectious diseaseas 21,
57–62.
Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification & risk
assessment for drinking water supplies. Technical report, CRC for Water Quality and Treatment.
National Notifiable Diseases Surveillance System (2008). National Notifiable Diseases Surveillance
System. Available on: http://www9.health.gov.au/cda/Source/CDA-index.cfm. Accessed:
April 9, 2008.
Natural Resource Management Ministerial Council, Environment Protection and Heritage Coun-
cil, and Australian Health Ministers Conference (2006). Australian Guidelines for Wa-
ter Recycling: Managing health and environmental risks (Phase1) 2006. Available on:
www.ephc.gov.au/taxonomy/term/39. Accessed: March 29, 2008.
Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian networks for water quality prediction
in Sydney Harbour. Available online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Ac-
cessed: March 27,2008.
Norsys Software Corp. (2007). Netica 3.25. Available online: www.norsys.com. Accessed February
15, 2008.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference. San
Mateo, California: Morgan Kaufmann Publishers.
Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal of the
American Water Resources Association 40(6), 1563–1578.
Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of ecotoxicolog-
ical information in risk assessments. Australian Journal of Ecotoxicology 11, 57–58.
Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elicitation for
parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman, and A. Rizzoli
(Eds.), Third Biennial Meeting: Summit on Environmental Modelling and Software, Burlington, USA.
130CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Parameterisation and
evaluation of a Bayesian network for use in an ecological risk assessment. Environmental Modelling
and Software 22, 1140–1152.
Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to estimate
baseline and hazardous event risks with management end uses in mind. Technical report, MicroRisk
project co-funded by the European Commission under the Fifth Framework Programme, Theme 4:
Energy, environment and sustainable development (contract EVK1-CT-2002-00123).
Signor, R. S. (2007). Probabilistic Microbial Risk Assessment & Management Implications for Urban
Water Supply Systems. Ph. D. thesis, UNSW.
Snow, J. (1849). On the mode of communication of cholera. London: John Churchill.
Snow, J. (1855). On the mode of communication of cholera (2nd Edition ed.). London: John Churchill.
Spiegelhalter, D. (1989). A unified approach to imprecision and sensitivity of beliefs in expert systems. In
L.N.Kanal (Ed.), Uncertainty in Artificial Intelligence 3. North Holland: Elsevier Science Publishers
B.V.
Spiegelhalter, D. J., N. L. Harris, K. Bull, and R. C. G. Franklin (1994). Empirical-evaluation of prior
beliefs about frequencies - methodology and a case-study in congenital heart-disease. Journal of the
American Statistical Association 89(426), 435–443.
Varis, O. (1995). Belief networks for modelling and assessment of environmental change. Environ-
metrics 6, 439–444.
Varis, O. (1997). Bayesian decision analysis for environmental and resource management. Environmental
Modelling and Software 12(2-3), 177–185.
Varis, O. (1998). A belief network approach to optimization and parameter estimation: application to
resource and environmental management. Artificial Intelligence 101(1-2), 135–163.
World Health Organization (2008). ICD-10 Classification of Diseases. Available online:
www.cdc.gov/nchs/data/dvs/2008Vol1.pdf. Accessed: April 10, 2008.
3.9. Addendum 131
3.9 Addendum
Reconsidering the Model 2 BN for diarrhoea, that is, the BN for which not only probabilities, but also
uncertainty has been elicited for the conditional probability tables (Figure 3.2), we see that
p(Gastro = 1) = p(Gastro = 1|CD = 0, Age < 5)p(CD = 0)p(Age < 5)+
p(Gastro = 1|CD = 0, 5 ≤ Age < 65)p(CD = 0)p(5 ≤ Age < 65)+
p(Gastro = 1|CD = 0, Age ≥ 65)p(CD = 0)p(Age ≥ 65)+
p(Gastro = 1|CD = 1, Age < 5)p(CD = 1)p(Age < 5)+
p(Gastro = 1|CD = 1, 5 ≤ Age < 65)p(CD = 1)p(5 ≤ Age < 65)+
p(Gastro = 1|CD = 1, Age ≥ 65)p(CD = 1)p(Age ≥ 65)
(3.1)
Van Allen et al. [2001, 2008] demonstrate a method whereby uncertainty of any kind may be prop-
agated through a network. They show that a query response is asymptotically Gaussian and provide its
mean value and asymptotic variance. However, Figures 3.4- 3.5 (where the population was taken as 50
000 and these figures represent 6000 MCMC iterations), show clearly defined mixtures of the proportions
satisfying (1) getting Gastroenteritis in any of the different agegroups, and (2) getting Gastroenteritis in
any of the different agegroups when the endpoint distribution (ED) fails.
Within the BN the probabilities for the age groups are given as constants without error. Thus, the
distribution of p(Gastro = 1) is a mixture of six distributions. If we consider the first right hand term of
Equation 3.1, the probability (p(Gastro = 1|CD = 0, Age < 5)) was set as a Beta(8.82, 13.23) distribution
(Table 3.1), which has a mean value of 0.4 with 95% of its distribution lying in the interval (0.21, 0.61).
The probability, p(CD = 0) has a mean of 0.027 with 95% of its distribution lying in the interval (0.025,
0.028)∗∗, while the p(Age < 5) equals 0.0671 (Table 3.1). In comparison, the fifth term, p(Gastro =
1|CD = 1, 5 ≤ Age < 65)p(CD = 1)p(5 ≤ Age < 65), uses p(Gastro = 1|CD = 1, 5 ≤ Age < 65),
which from Table 3.1 was set as a Beta(3.739, 375.53) with a mean of 0.01 and 95% of which lies in the
interval (0.003, 0.022), p(CD = 1), which from Table 3.3 has a mean of 0.97 with 95% of its distribution
in (0.972, 0.975), and p(5 ≤ Age < 65) is a constant 0.8096 (Table 3.1).
In embedding the essential DAG within the population loop, no evidence is added to the nodes. The
structure of the BN remains unchanged, as do the Beta priors reflecting the elicited uncertainty. We
∗∗Derived by subtraction from 1 of the results for Cumulative Dose (acceptable) in Table 3.3
132CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
would argue that with the relatively extreme binomial probabilities of the conditional probability tables,
plus their (elicited) relatively narrow 95% bands in this BN (Table 3.1), the structure of almost discrete
probability mixtures at each node is preserved. Van Allen et al. [2008] note that their normal result is
asymptotic only, and show examples which seem well approximated by beta distributions, but do not
envisage distributions such as that shown in Figure 3.5. From our work, it is clear that the result of
querying a Bayesian net does not necessarily result in a normal or a beta distribution (or a binomial
distribution) in the case of a finite population, particularly when the evidence is overwhelmingly strong
in a particular direction. A further difference between our work and that of Van Allen et al. [2008]
is that they assume a Dirichlet distribution at each node of the conditional probability tables, and they
populate their nodes by simulation. The major difference between their work and ours possibly lies in the
extremely small populations in our BN for some of the queries, despite an overall population of 50,000,
which means that Van Allen’s asymptotics become irrelevant (as they, themselves, note when saying that
Beta distributions fit the error distributions of the queries better than the normal in some cases). As can
be seen from Table 3.5, an overall population of 50,000 gives rise to an expected count of 14 children
under the age of 5 when both the Endpoint distribution and Unplanned usage nodes fail. Van Allen et al.
[2008] argues the asymptotics for any query within a BN. However, our work shows that it might be
foolish to rely on asymptotics with finite populations, when forming credible intervals for BNs.
Table 3.5 Expected subgroup sizes for BN (Model 2)
Condition E(n)Condition
All populationsBase Rate 50000Cumulative dose acceptable 48660Endpoint distribution fails 2105EndpointDistribution fails & UnplannedUsage fails 211
Age <5Base Rate 3405Cumulative dose acceptable 3265Endpoint distribution fails 141EndpointDistribution fails & UnplannedUsage fails 14
For a population of 50000
3.9. Addendum 133
Figure 3.4 Distribution of the probability of being infected with gastroenteritis.
Figure 3.5 Distribution of the probability of being infected with gastroenteritis when the end-point distribution fails.
134CHAPTER 3. PAPER ONE: NETWORK FOR RISK OF DIARRHOEA ASSOCIATED
WITH THE USE OF RECYCLED WATER
Bibliography
Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle.
Citeseer.
Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a Belief Net
response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172, 483–513.
Statement of Contribution of Co-Authors for Thesis by Publication
The authors listed below certify that:
1. they meet the criteria for authorship, in that they have participated in the conception, execution, or interpretation , of at least that part of the publication in their field of expertise;
2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;
3. there are no other authors according to these criteria;
4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and
5. they agree to the use of the publication in the student's thesis and its publication on the Australasian Digital Thesis database consistent with any limitations set by publisher requirements.
In the case of Chapter 4:
Tit le: Incorporating parameter uncertainty into Quantitive Microbial Risk Assessment
Journal: Journal of Water and Health Status: Accepted for publication
Contributor
Margaret Donald
Or Simon Toze
Or Jatinder ~ SI!> If V
Or Angus Cook
Professor Kerrie Mengersen
Statement of Contribution
Margaret Donald as first author was responsible for the concept ~<_-..9-;~of the paper, data analysis, interpretation and the writing of all drafts.
Was responsible for advice on microbiological issues and editorial comment
Was responsible for editorial comment
Was responsible for editorial comment
Was responsible for general advice and for editorial comment.
Principal Supervisor's Confirmation
Date
..2o fO I
I have sighted email or other correspondence from all co-authors confirming their certifying
authorship. /' ; • U ... (~v< t~ .--tlfM ~J v A...A-- to /uttf I)
Name Signature Date
Chapter 4
Incorporating parameter uncertainty
into Quantitative Microbial Risk
Assessment (QMRA)
4.1 Preamble
This chapter has been written as a journal article, and addresses research objective (2). The purpose was
to build a graphical model based on the flow diagrams used in a typical Quantitative Microbial Risk As-
sessment (QMRA), and rather than using plug-in estimates allow the data behind those plug-in estimates
to contribute the appropriate uncertainties to the risk via a complex hierarchical model. Thus, the impor-
tant contribution is to not use plug-in estimates for the many parameters required, but to build a graphical
model to incorporate all the data used for estimation of such parameters. This means that we largely dis-
pense with assumptions about normality, and no longer ignore parameter estimates’ correlations, since
the estimation process incorporates these automatically into the risk assessment. Additionally, and im-
portantly, we include an errors-in-variables model within the graphical model to estimate the parameters
for the risk of infection equation based on McCullough and Eisele [1951]’s experiments. The software
used is WinBUGS [Lunn et al., 2000]. Hence, the methodology is easily accessible to any practitioner in
the field.
135
136CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
This paper does several things. It incorporates disparate primary data to estimate the parameters
required in a risk assessment within the risk assessment, thereby incorporating automatically the patterns
of parameter correlations which are not necessarily bivariate normal, together with all their uncertainty.
Haas et al. [1999] suggests dealing with this using a two-stage approach, with the parameter uncer-
tainty being captured by 2,000 bootstrap samples, and alternatively, via rank correlations and a copula,
back transforming to known marginal distributions [Haas, 1999]. Secondly, it uses an errors-in-variables
model for the infection model, which despite all the analyses and reanalyses of these data, see e.g., Haas
et al. [1999], Oscar [2004], Teunis et al. [1997] has not been done before.
In our view, the method and the incorporation of an errors-in-variables model for the parameter
estimates for the risk of infection are important contributions to finding more realistic error estimates in
quantitative microbial risk assessments.
I am the principal author and it is reprinted here in its entirety, but with different bibliographic con-
ventions from the Journal of Water and Health, for which it has been accepted (May 2010). Jatinder
Singh provided the microbiological data and editorial comment. Simon Toze helped with interpretation
and understandings of the microbiological data and provided editorial comment. Kerrie Mengersen over-
saw and guided the exposition. Margaret Donald as first author was responsible for the concept of the
paper, data analysis, interpretation, writing all drafts and ddressing reviewers’ comments.
Title: Incorporating parameter uncertainty into Quantitative Microbial Risk Assessment (QMRA)
Authors: Margaret Donalda, Kerrie Mengersena, Simon Tozebc, Jatinder Singhb, Angus Cookd.
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,QLD 4001, Australia.
bCSIRO Land and Water, Queensland Biosciences Precinct, 306 Carmody Road, St Lucia, QLD4067, Australia.
cSchool of Population Health, University of Queensland, Herston Road, Herston, QLD 4006, Aus-tralia.
dThe University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia.
Bibliography
Haas, C. N. (1999). On modeling correlated random variables in risk assessment. Risk Analysis 6,
1205–1214.
BIBLIOGRAPHY 137
Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New York:
Wiley.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.
McCullough, N. B. and C. W. Eisele (1951). Experimental human salmonellosis: I. pathogenicity of
strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried whole egg. The
Journal of Infectious Diseases 88(3), 278–289.
Oscar, T. (2004). Dose-response model for 13 strains of Salmonella. Risk Analysis 24(1), 41–49.
Teunis, P. F. M., G. J. Medema, L. Kruidenier, and A. H. Havelaar (1997). Assessment of the risk
of infection by Cryptosporidium or Giardia in drinking water from a surface water source. Water
Research 31, 1333–1346.
4.2 Incorporating parameter uncertainty into Quantitative Micro-
bial Risk Assessment (QMRA)
Abstract
Modern statistical models and computational methods can now incorporate uncertainty of the parameters
used in Quantitative Microbial Risk Assessments (QMRA). Many QMRAs use Monte Carlo methods,
but work from fixed estimates for means, variances, and other parameters. We illustrate the ease of
estimating all parameters contemporaneously with the risk assessment, incorporating all the parameter
uncertainty arising from the experiments from which these parameters are estimated. A Bayesian ap-
proach is adopted, using Markov Chain Monte Carlo Gibbs sampling (MCMC) via the freely available
software, WinBUGS.
The method and its ease of implementation are illustrated by a case study that involves incorporating
three disparate datasets into an MCMC framework. The probabilities of infection when the uncertainty
associated with parameter estimation is incorporated into a QMRA are shown to be considerably more
variable over various dose ranges than the analogous probabilities obtained when constants from the
literature are simply ‘plugged’ in as is done in most QMRAs. Neglecting these sources of uncertainty
may lead to erroneous decisions for public health and risk management.
Keywords
parameter uncertainty; MCMC; Quantitative Microbial Risk Assessment (QMRA); recycled water; risk
assessment; Salmonella spp.
4.3. Introduction 139
4.3 Introduction
In Australia, Quantitative Microbial Risk Assessment (QMRA) is recommended as the method of choice
for assessing health risks from exposure to pathogens in recycled water, e.g., Natural Resource Manage-
ment Ministerial Council et al. [2006]. The particular application examined in this paper is the risk of
microbial infections associated with exposure to recycled water.
This paper presents a modification to the standard QMRA methodology, in which the risk assessor
typically finds various quantities of interest, such as dose-response, die-off and/or log-reduction parame-
ters, and plugs these quantities into the risk assessment model. There is often little acknowledgement of
the fact that these quantities are uncertain. We contrast this ‘plug-in’ approach with an approach based
on a Bayesian risk assessment model, in which all the data which have been used to produce the quanti-
ties of interest necessary to the risk assessment are included. The uncertainty associated with the model
parameters is therefore propagated throughout the analysis. This may be considered an extension of the
standard QMRA model.
To illustrate the approach, we consider the probability of a person becoming infected with Salmonella
spp. after being exposed to recycled wastewater. The scenario is not drawn from actuality but is designed
to illustrate the extension of the standard QMRA methodology. In the illustration, we ignore the problems
of dose estimation, and investigate the part of risk estimation for which we have data available.
The paper is arranged as follows. First a standard QMRA method is outlined, followed by a brief
description of the extended method, together with the datasets which will be used to illustrate it. The
conceptual and statistical models into which these data are incorporated are then detailed, and the results
are compared with those one would obtain without the incorporation of parameter uncertainty. This
case study demonstrates that considerable uncertainty is induced in the probability of infection when
this Bayesian approach is adopted. In the discussion, we elaborate the differences seen between the two
methods and note the simplicity of our method.
140CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
4.4 Methods
4.4.1 Standard QMRA methodology
A QMRA requires a knowledge of pathogen numbers at some stage of the treatment process, generally
in the influent. Estimates for log-reductions for various water treatment processes from pilot or other
studies are then needed to estimate pathogen numbers in the treated water. A mechanism of ingestion,
and an amount of the treated recycled water ingested must be postulated or found. This, together with the
pathogen numbers in the treated water, allows estimation of possible microbial doses. Finally, specifica-
tion of a dose-response curve for the microbe of interest is needed to allow estimation of the probability
of infection given a particular dose. A natural representation for a QMRA is via a graphical model such
as Figure 4.1.
The QMRA of Figure 4.1 shows the steps for assessing the risk associated with eating a crop irrigated
with recycled water. In such a figure, nodes without parents need information in order to run the risk
assessment. Thus, for a standard QMRA, reading down the figure and from left to right, we need
1. A description of the microbe numbers in either the waste water, or in the final treated water. Typi-
cally, if Salmonella spp. is sampled at all, it is sampled in the wastewater and may be described as
coming from a log normal distribution with mean, µ and possibly a standard deviation σ.
2. ‘Log reductions’ in order to estimate the microbial numbers in the treated water. Water treatments
are generally thought to reduce the numbers of pathogens at a rate proportional to the influent
numbers of the pathogen in the water. This may be expressed in terms of log base 10, when it may
be referred to as ‘log reduction’, or a decimal elimination capacity (DEC); see, e.g., Hijnen et al.
[2007, 2005, 2004]. However, the DEC is typically given by a single number, e.g. 3 which would
mean that log10 Cin f luent − log10 Ce f f luent = 3, where Cin f luent is the number/L in the influent, and
Ce f f luent, the number/L in the effluent. Such a log reduction would imply that the effluent numbers
are one thousandth those of the influent. To find these, published or grey literature involving the
particular treatment type for a particular plant is searched.
3. A die-off constant k or T90 time to 90% die-off. In the case study, where a field is irrigated with
recycled wastewater, it is expected that sunlight will kill particular microbes at a rate proportional
to their number. I.e., dNdt ∝ N or Nt = N0e−kt, where k is sometimes referred to as the die-off
4.4. Methods 141
constant. Other equations may be used, but this is a reasonably common approximation to die-off
for some organisms, and is a good fit for the data used in this case study. Sinton et al. [2007] use
a ‘shoulder’ equation∗, but as is common, the quantities in their various equations are given as
constants, with no error indicated.
4. Sunlight and shade hours, for the locality in which the recycled water is to be used.
5. A suitable amount of crop/water ingested by a person. One may use survey data if available, or
use choices made by other researchers, for example, Tanaka et al. [1998]. (Typically such data are
supplied as constants.)
6. An equation and the parameters which describe the dose-response, i.e., the probability of be-
coming infected, having ingested a particular dose of the microbe. For Salmonella, the equation
usually used is Beta-Poisson, and from p 401 of Haas et al. [1999], the risk assessor would select
α = .3126 and N50 = 2.36 × 104, to give the probability of infection, P, from a given dose D,
where D is the number of microbes ingested, as: P = 1 −(1 + D
N50(21/α − 1)
)−α. In an alternative
parameterisation, we have P = 1 −(1 + D
β
)−α, where β = N50(21/α − 1) ≃ 193120, and N50 is the
number of microbes giving a 50% probability of infection.
Thus, to perform a risk assessment, the risk assessor performs a Monte-Carlo simulation, working
through the graphical model (Figure 4.1). Starting at some stage in the water processing cycle, an initial
number/L of the pathogen is drawn from the water treatment distribution described by constants, µ0, σ0.
This number is then reduced by either the value obtained by drawing a log-reduction value from the DEC
distribution, described by µ1, σ1, or, if no distribution is given or able to be inferred, then reduced by
the DEC, µ1, for the process or processes. In the scenario considered, sunlight is expected to reduce
pathogen numbers, so the die-off equation is used to give a final pathogen number in water which, then,
together with a draw for the quantity of water ingested, gives the number of pathogens ingested. Finally,
the probability of infection is calculated, via a dose-response equation, and a final draw made from a
Bernoulli distribution to simulate the person’s infection status. This is repeated many times to simulate
the risk, resulting in a distribution of the simulated endpoint risk.
For the case study, we consider an abbreviated version of the QMRA of Figure 4.1. This is rep-
resented by Figure 4.2. In this version, the information requirements enumerated above are limited to
∗100[1 − [1 − exp(−kT )]n]
142CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
1 (distribution for treated water, not influent), 3 (die-off constants), 4 (sunlight hours), and 6 (dose-
response equation parameter constants). Table 4.1 shows the fixed constants used in the risk simulation
of the QMRA of Figure 4.2.
As can be seen, this is not a risk assessment, since we abstract just a part of the full model, in order
to illustrate more clearly that much uncertainty may fail to be incorporated into risk assessments. In
partial justification, we note that it is generally not thought worthwhile to monitor the end-use water for
the pathogens of interest as it is believed that they will be present in such small quantities and will be
so diffuse within the water body that substantive positive results would only be obtained by processing
impractically large samples. Data on pathogen reductions or log reduction studies exist but have been col-
lected from typically small-scale, short-run experiments, usually in countries with very different climatic
conditions. Moreover, such data may be owned by private utilities and are either not publicly available,
or provided with minimal details. Thus, often only summary statistics or incomplete statistics (at best)
filter into the public domain.
If the risk of a particular health outcome needs to be estimated, available data are even more limited.
For example, Salmonella spp. have been linked with a number of outbreaks in the US, Europe and Japan
Marks et al. [1998]. In Australia, limited data for Salmonella spp. numbers and their inactivation by
various wastewater treatment processes are available. The few studies available are those of Gibbs and
others, and these focus on Salmonella spp. in sludge, rather than within the water fraction [Gibbs, 1995,
Gibbs and Ho, 1993, Gibbs et al., 1995].
Thus, this study takes a small part of the risk assessment process and shows how it may be extended,
by embedding data within a Bayesian framework to estimate the corresponding parameters, and thereby
give better uncertainty estimates. The extended model for doing this is described in the next section.
4.4.2 The extended QMRA model
In the extended model, the small experiments which lead to the various required constants, are incorpo-
rated directly into the risk assessment process, allowing the uncertainty associated with the estimates to
be automatically incorporated into the risk assessment. Thus, Figure 4.3, the extended model, contains
two additional nodes (1 & 7) in comparison with Figure 4.2, the standard QMRA. These nodes repre-
sent the data which give rise to the ‘constants’ fed into the QMRA assessment of Figure 4.2, but in the
extended model are used to derive estimates of the random quantities that describe these data.
4.4. Methods 143
The starting point for the extended model is the graphical representation of the QMRA which is
seen to be a directed acyclic graph or DAG. Thus, a risk assessment may be embedded in a Bayesian
framework, thereby allowing parameters to be estimated simultaneously with the risk assessment. In
the extended model (Figure 4.3), parameters (supplied as constants under Figure 4.2) are both estimated
and used for the derivation of other quantities. Thus, the model descriptions at nodes (2) and (6) are
the explanatory models of the data at the new nodes (1) and (7), and also the means for estimation of
dose after die-off (node 5), and estimation of the probability of a person becoming infected (node 8).
In this Bayesian framework ‘prior’ probabilities (prior beliefs) for the parameters of the explanatory
models are needed, and uninformative priors are used in order that the parameter estimates and the
uncertainty associated with them will closely approximate the maximum likelihood solutions for each
set of parameters and data.
As with the standard QMRA model, a Monte Carlo approach is taken to analyse the extended model.
Here, however, given the Bayesian setup and additional information, a more formal Markov chain Monte
Carlo approach is used to estimate posterior distributions of the various quantities of interest such as
dose-response parameters, die-off parameters, and the risk of infection. The Bayesian framework used
was the freely available WinBUGS [Lunn et al., 2000]. This is described in more detail later, in the
context of the case study.
We now give a more detailed description of the data and the models used to explain them.
4.4.3 Data for the extended model
Three disparate sources of information are integrated into the model described above: die-off data for
S. typhimurium, dose-response data for S. anatum [Teunis et al., 1996], and a short run of weather data
from an Australian city, Perth [Bureau of Meteorology, 2010], giving the number of hours of sunlight in
a summer and a winter month. We also use a fictitious pathogen distribution for the treated water with
a range which allows the possibility of a 100% infection rate. These data sets and distributions are now
described in more detail.
Salmonella dose-response data (Figure 4.3, node 7). In considering the risks of Salmonella spp.
poisoning, we chose to use the S.anatum data presented in the report by Teunis et al. [1996], in which
infection curves were fitted by strain and species. These authors concluded that for S.anatum the three
strains could be grouped together to determine a single dose-response curve, using a likelihood ratio test.
144CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Others have made different choices; thus Haas et al. [1999] used all thirteen species and strains detailed
in Teunis et al. [1996] and discarded some ‘outliers’, after similar testing, as did Oscar [2004]. Each of
these authors’ strategies gives a different set of quantities which the risk-assessor may use, but whatever
parameter estimates he/she uses, they are used without any error being associated with them. Our purpose
was not to determine a best model for dose-response for Salmonella, but to show how to incorporate the
uncertainty associated with the estimation of such model parameters into a risk assessment.
A further reason for using Salmonella dose-response data is that the data are best summarised by a
beta-Poisson dose-response curve, where the probability of infection for a given dose of D microbes, is
given as, P(in f ection) = 1 − (1 + D/β)−α, which is characterised by two parameters (α, β) which are
highly correlated.
Hence, there are two issues here in using these parameters in a risk assessment. Firstly, they are
typically included as point estimates without acknowledgment of uncertainty of specification or of their
correlation. However, even if this uncertainty were included, it is preferable to use the posterior distri-
bution of the two parameters directly instead of making the standard assumption of bivariate normality,
which Teunis et al. [1996] have shown does not hold.
Salmonella typhimurium die-off data (Figure 4.3, node 1). S. typhimurium die-off data from Sidhu
et al. (2008) were supplied by the authors. These data allow the estimation of die-off rates with uncer-
tainty for S. typhimurium under several conditions, winter/summer, sun/shade, grass/thatch. The available
dataset consists of 34 observations. The summer observations were taken over all the combinations of
conditions, but the winter data were for grass only and measured die-off in light and shade, thereby giving
six sets of experimental conditions, and potentially six die-off constants. In the experiment, grass irri-
gated with sterile effluent, was seeded with known numbers of S.typhimurium and samples of the grass
and thatch were then harvested at 1, 2, 4, 6, 7.3 and 9.3 hours after the initial seeding (in summer). Mi-
crobial numbers were counted and averaged for the samples taken at each harvest time and sample type
(sun/shade, grass/thatch). For the winter samples, harvest times were 1, 2, 4, 6, and 8 hours and grass
and thatch were not separated. For further details of the experiment, see Sidhu et al. [2008].
Die-off over time is expected to be proportional to the number of organisms. Thus, dNdt ∝ N. This
equation has the solution Nt = N0e−kt, where k is positive. One can use any base for exponentiation and
this changes the constant k, often referred to as the ‘die-off’ constant. To avoid confusion about bases
for exponentiation, the constant used to express this equation may be given as T90, or the time to 90%
4.4. Methods 145
die-off. Solving .10 = e−kT90 , gives T90 in terms of k and vice versa.
Microbial count numbers are usually thought to be log normally distributed. Hence, the die-off
equation takes the form: log Nt ∼ N(log N0−kt, σ2), or, alternatively. log(Nt/N0) ∼ N(−kt, σ2). We fit the
second version of this equation, for each set of experimental conditions. This was done because greater
effort (in terms of replicates) had gone into finding the value of the initial seedings. The original complete
data had shown differences in the die-off rates for all combinations of sun/shade and winter/summer, but
none for thatch. Hence, we fit four die-off constants, k1...k4 to the model log(Ni,t/Ni,0) ∼ N(−kit, σ2),
where i references the summer/winter, sun/shade combination, and a common (pooled) variance σ2 is
used. Natural logs were used, and the die-off value was not constrained to be negative; indeed, 14%
of the posterior estimates for die-off, in winter and in the shade were positive. When this occurred, a
zero was substituted in the corresponding decay equation in the risk modelling (see below), although
some evidence exists for the regrowth of Salmonella spp. on lettuce leaves under the right circumstances
[Brandl and Amundson, 2008].
The posterior estimates for die-off act on the treated water pathogen number node (‘initial dose’,
node 3 of Figure 4.3), together with the sunlight hours of node 4, to produce the dose after die-off at
node 5. The die-off calculation uses the maximum 17 hour period for winter and summer available for
die-off, based on the irrigation regime for the sports ovals in Perth where the experimental die-off data
were collected. Note that the various values of k are the die-off constants of item 3, in the description of
Standard QMRA Methodology.
Sunlight hours (Figure 4.1, Figure 4.2 and Figure 4.3, node 4). Die-off is a function of sun/shade,
summer/winter. The daily sunshine hours at Perth Airport, for January 2008, and for June 2008 were
supplied by the Australian Bureau of Meteorology (pers.comm, 2010). Rather than work with summary
statistics, or fit a distribution to these data, they were resampled. These data are clearly not normally
distributed (see Figure 4.4), nor are they expected to be, since the number of sunlight hours is bounded
by 0 and the number of possible hours of sunlight on a particular day at the latitude of Perth. Figure 4.4
indicates a mixture of rainy and sunny days and discretization. Given that the data are bounded and
possibly a mixture of distributions, it seemed more sensible to resample, rather than to fit and sample
from an arbitrary distribution.
Doses. (Item 5, Standard QMRA Methodology section and Figure 4.1). No data were used for the
person’s dose. This node is not included in the case study (Figures 4.2 & 4.3).
146CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
The Salmonella dose-response curve (Figure 4.5) shows that very high doses of S.anatum are re-
quired for infection. (Using the point estimates found by Teunis et al. [1996], a dose of 400 S. anatum
gives a probability of infection of .01, while for a dose of 1000 the probability of infection becomes .03.)
Given the probabilities of infection, it appears that the dose-response curve applies largely to healthy
adults. Since small children and the elderly are more likely to become ill under the same dosing regime,
a range of doses was induced (via the treated water pathogen numbers distribution) in order to see the
effect of parameter uncertainty over the full dose-response curve.
Under the models used in this case-study, the node “person’s microbe dose” of Figure 4.1 is equated
to “Dose after die-off” (Figures 4.2 & node 5 of Figure 4.3).
Treated water numbers. This node is a derived node in Figure 4.1 but an initial node in Figures
4.2 & 4.3. The pathogen numbers distribution in treated water was ascribed a (natural) log-uniform
distribution over the range (-1, 30). Thus, in this study, both consumption rates and numbers per L of the
pathogen in the recycled or the influent source water were ignored. Instead, an arbitrary distribution was
chosen for the Salmonella spp. numbers distribution in treated water, to allow the possibility of seeing
the effect of the uncertainty in die-off rates and the uncertainty in the dose-response parameters on the
estimate of probability of infection, under many possible scenarios.
Putting it all together
Conceptual Model. The directed acyclic graph for the extended model (the ‘conceptual model’) is given
by Figure 4.3. Here, the Sidhu et al. [2008] data (node 1) are explained by the regression model (equation
3) of node 2 which estimates the die-off parameters. For each iteration of the MCMC algorithm, the four
die-off rates are estimated; a dose sample is drawn from node 3; and a sample is drawn from the sunlight
hour data of node 4. At node 5, the die-off constants for this iteration are applied using (equations 4-8)
for a 17 hour day with the sunlight and shade hours from node 4. When the draw for the winter shade or
sunlight die-off parameter is negative, it is replaced by zero.
Independently, the Teunis et al. [1996] dose-response data for S. anatum (node 7) are explained by the
current estimates of (α, β) (node 6, and fitted using equations 1 & 2), which are used in the same MCMC
iteration (at node 8, using equations 9 & 10) to calculate the probability of infection, thus allowing a
single estimate of the probability of infection (and the infection status of an individual) at each iteration.
Statistical Model. Node 7 contains the dose-response data from Teunis et al. [1996] which may be
4.4. Methods 147
represented as (Di, Ni, Xi), i = 1, ..., 19, where Di is the ith dose, Ni the number of subjects given the ith
dose, and Xi the number of subjects infected by the ith dose.
These are explained by the dose-response equation (node 6, equations 1 & 2) with parameters (α, β).
Uninformative log-uniform priors are given for (α, β), and after burn-in, the posteriors for (α, β) are
essentially identical to the maximum likelihood estimates. Thus, nodes 6 & 7 are described by
Xi ∼ Bin(pi,Ni), (4.1)
pi = 1 −(1 +
Di
β
)−α. (4.2)
with priors for (α, β) given by
ln(α) ∼ U(−10, 15)
ln(β) ∼ U(−6, 20)
The current MCMC simulation of the posteriors for (α, β) is passed to node 8, again using equations (1)
& (2) (but now in the form of (9) & (10)), to give a value for the probability of infection (and whether an
individual is infected), after sunlight die-off.
Node 1 represents the die-off data, which may be considered as (L j, t j, N0( j), Nt( j)), j = 1, ..., 34. j
references each data point, while L j=1,...,6, represents the line and experimental condition to which the
jth point belongs, and there are 6 of these corresponding to the number of different conditions of the
experiment, t j is the number of hours elapsed from the initial seeding (with count N0( j) on line L j), and
Nt( j) is the count at time t j for line L j. The die-off constants kL( j) = kl, L j = 1, ..., 6, j = 1, ..., 34.
However, as discussed earlier, different values of k are fit for summer/winter and sun/shade, since the
grass/thatch in combination with sun/shade for summer did not need separate fits. The die-off regression
equations (node 2) which explain the die-off data are given by
ln(
Nt( j)
N0( j)
)∼ N(−t jkl, σ
2) (4.3)
148CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
with uninformative priors for kl (l = 1, ..., 4) and σ2, given by
kl ∼ N(0, 1000),
σ2 ∼ IG(.01, .01).
The posterior estimates for kl and σ2 in the MCMC simulation are used at node 5, to estimate the dose
after die-off from sunlight, based on the dose from node 3, and the sunlight hours from node 4.
At node 4 the season sunlight hours are sampled directly from the data (S m, hm), where S m is the
season (winter/summer) and hm are the sunlight hours for the day of that season.
Let D0 be the initial number of pathogens drawn from the treated water distribution (node 3) and the
number of hours of sunlight drawn in winter/summer be h (node 4). Then D17, the number of pathogens
17 hours after irrigation is drawn from
log(D17/D0) ∼ N(−k1h − k2(17 − h), σ2) In winter, k1, k2 ≥ 0 (4.4)
∼ N(−k2(17 − h), σ2) where k1 < 0, k2 ≥ 0 (4.5)
∼ N(−k1h, σ2) where k1 ≥ 0, k2 < 0 (4.6)
∼ N(0, σ2) where k1, k2 < 0 (4.7)
log(D17/D0) ∼ N(−k3h − k4(17 − h), σ2) In Summer (4.8)
where k1, ..., k4 and σ2 are posterior draws from node 2. (Note that although there may be a possibility
of bacterial growth, this possibility was not permitted in the risk estimation since where an estimate for
any winter die-off k value was negative, it was replaced by zero.) D0, the initial dose (node 3: treated
water/Effluent distribution), is drawn from a log uniform distribution which allows the full curve for the
probability of infection to be seen:
ln(D0) ∼ U(−1, 30)
D17 then passes to node 8, where the probability of infection is calculated using the current posterior
estimates for α and β (from node 6). Then pin f , the probability of infection, and I (whether an individual
is infected or not, taking a value of 1 for infected, 0 for not infected), are given by
4.4. Methods 149
pin f = 1 −(1 +
D17
β
)−α(4.9)
I ∼ Bin(pin f , 1). (4.10)
As noted earlier, the model described above and in Figure 4.3 was implemented in WinBUGS [Lunn
et al., 2000]. The initial distribution of the dose is drawn from a log uniform distribution to allow the
consequences of parameter uncertainty at any dose to be explicitly included. In the simulation, for the
draw of each dose, each parameter is drawn conditional on the data and all other associated parameters.
For the final results, a burn-in of 30,000 was used to reach the target distributions for dose and die-off,
with a further 10,000 iterations used for the ‘risk’ estimation. Two chains and Gelman Rubin statistics
[Lunn et al., 2000] for each of the quantities of interest were used to verify convergence to the stationary
distribution.
Further extensions to the model. The model described above can be extended in a number of ways.
We present here two further conceptual models, which again can be expressed as DAGs: an errors-in-
variables [Fuller, 1987, Wand, 2009] model for the estimation of the dose-response model (Figure 4.6),
and a DAG for the incorporation of the errors-in-variables model into the QMRA presented here (Fig-
ure 4.7). The errors-in-variables model estimates the parameters of the dose-response equation on the
assumption that the doses are measured with error, and is detailed below. Not surprisingly, the additional
uncertainty postulated in this model increases the uncertainty associated with the estimation of the prob-
ability of infection. This approach is appropriate: dose is measured with error and this should be taken
into account when estimating the dose-response curve, though this example is intended to be illustrative
rather than definitive. The second model, Figure 4.7, expands node 7 of Figure 4.3, and shows how the
errors-in-variables dose-response model would be integrated into the ‘risk assessment’ carried out in this
paper. That such a model can be easily fit and incorporated into a risk assessment, further justifies the
data-based risk assessment approach used here.
Estimating dose-response assuming errors in dose
McCullough and Eisele [1951] prepared batches of S.anatum for which the S.anatum count was
measured. In the model we present below, we recognise the difficulty of ascertaining such dosages.
Thus, we assume that the batch dose is measured with error, and that the individual’s true dose from the
batch is not the true batch dose. The individual then becomes infected or not infected. In the model,
150CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
the status of infected/not infected has been assumed to be measured with no error. Figure 4.6 shows a
schematic directed acyclic graph for this model.
Let the unobserved true dose of batch b be Zb, the unobserved true dose for individual i subjected to
batch b be Yi(b), the observed dose for batch b be Xb, and the infected status of individual i be Ii. There
were 19 dosage batches, and 114 individuals each received a dose from a particular batch. Then letting
i(b) reference the individual i receiving a dosage from batch b, and pi be the probability of individual i
becoming infected, we have
log(Zb) ∼ N(0, σ2)
log(Xb) ∼ N(log(Zb), .001)
where b = 1, ..., 19 and σ2 ∼ IG(.1, .1).
log(Yi(b)) ∼ N(log(Zb), .001)
pi = 1 − (1 + Yi(b)/β)−α
Ii ∼ Bernouilli(pi)
where i = 1, ..., 114. (All measured batch doses were divided by 1000 prior to fitting.)
The assumed errors in measurement used here are possibly unrealistic, with the variance of the errors
for the true batch dose, and for the true individual dose being set at 0.001, but they do affect the model
and particularly the width of its credible intervals. (When we consider the rounding of McCullough and
Eisele’s S.anatum numbers, it is, however, more likely that we have underestimated the variance.)
4.5 Results
Figure 4.8 shows the bivariate posterior distribution of the dose response parameters (α, β); as indicated
earlier, this is unlikely to be bivariate normal and indeed this is apparent from the figure. The parameters
are highly correlated and the surface of the loglikelihood at the point of convergence is fairly flat (not
shown), which means that the values of the parameters estimated using conventional maximum likeli-
hood methods are somewhat dependent on the stopping rule for convergence. In terms of the methods
advocated in this paper, it would seem that the dose-response curve parameters are not distributed as a bi-
variate normal, and that to simulate such a distribution via some summary parameters would be relatively
difficult.
Figure 4.5 shows the dose-response curve distribution given by these parameters’ posterior distribu-
tion. This curve is created from the outputs of nodes 1 and 2 and shows the estimates of the probability
4.5. Results 151
of infection based solely on the Teunis et al. [1996] data. The considerable variation of the probability
at low doses should be noted (not shown in the graph, but noted from the MCMC data). P(in f ection)
between e0 and e10 (2 × 105), ranges from almost zero to occasionally 0.5 for the same dose. Even for
a dose of 20 bacteria (e3) some realisations show a probability of infection of 0.2. For extremely high
dose values, the majority of probabilities of infection are close to one, but occasionally the probability is
considerably less.
Figure 4.9 shows the distributions for the die-off parameters for summer/winter, sun/shade. These
are fairly symmetric, reflecting in part the model assumptions of normality, given the few data. Note that,
for shade in winter, a substantial proportion (14%) of the posterior die-off values are negative, which
could lead to the inference of no die-off under these conditions.
Figure 4.4 shows the daily sunlight data for winter or summer, and for both months, it can be seen that
the majority of days were neither cloudy, overcast or rainy, but attained the maximum possible number
of sunlight hours.
In Figure 4.10, the probabilities of infection for summer and winter (estimated with parameter un-
certainty - ‘Varying’) are contrasted with the corresponding estimates where all needed values have been
plugged in as constants (‘Constant’). This figure shows the addition of considerable variation when the
underlying data and their model are incorporated in the risk assessment model. When most of the prob-
abilities lie below 0.5, the additional uncertainty increases the range of probabilities thereby giving an
increased likelihood of infection. When most of the probabilities lie above 0.5, the added uncertainty
again gives a wider range and therefore includes lower probabilities of infection in comparison with the
model using a constant.
Box plots for the probability of infection for 15 initial dose groups (Figure 4.11) indicate that if
the infection probabilities are not close to zero or one, the uncertainty is very greatly increased. Thus,
including parameter uncertainty could make a very great difference to conclusions about risk. Table 4.3
gives summary statistics for the initial doses by grouping. Table 4.2 gives summary statistics for the
probability of infection for each of these groupings shown in the graphs (Figure 4.11). Thus, in Table 4.2,
in dose grouping 8, the mean probability of infection for winter when constants are used is 0.82 with a
90% CI (0.73, 0.89) compared with 0.78 (0.43, 0.96) for the varying parameters, again more than double
the spread. Table 4.3 shows that the initial dose range for dose grouping 8 is 6.4× 106 to 5.52× 107 with
a median dose of 1.89×107 cells. Looking at the summer scenarios for dose grouping 12 (Table 4.2), the
152CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
mean probability of infection is 0.25 with 90% CI (0.12, 0.40) using constants, compared with a mean
probability of infection of 0.30 (0.04, 0.72), when parameters are drawn with uncertainty from their
distributions. This equates to a difference in interval width of 0.68 versus 0.28. That is, using varying
parameters the credible interval covers two thirds of the probability scale, whereas using constants the
interval is one third of the scale, constituting a very substantial difference.
The effect of the uncertainty induced by the uncertainty of the die-off parameters was so great that it
seemed useful to estimate the probability of infection for the initial dose (divided by 1,000) allowing no
die-off, to show the effect of the uncertainty induced by the dose-response parameters alone. Figure 4.12
shows the effect of including the uncertainty of the parameter estimates for dose-response when no die-off
is considered. The results for each grouping are shown in Tables 2 & 3. Thus, for dose grouping 8, using
constants only, the 90% interval is (0.16, 0.48), but (0.12, 0.56) with varying parameters, a difference in
width of 0.32 to 0.44. This is a considerable difference when the response lies between 0 and 1. From
Table 3, this group is seen to encompass doses from 6.4 × 103 to 5.5 × 104 cells.
Figure 4.13 shows the additional uncertainty in the probabilities of infection from the dose-response
model which results from the errors-in-variables model. The 95% credible intervals for the probability
of infection when error is assumed in the dose are considerably wider, but also, the mean probability
of infection has moved to the left, indicating a higher probability of infection for a dose, than when
the probability is estimated without recognising that the dose is measured with error. The additional
uncertainty operates to make the probability of infection markedly higher at lower doses. (Note that none
of the results discussed in the ‘risk assessment’ include this modification to nodes 6 & 7 of Figure 4.3.)
4.6 Discussion
The extended QMRA model was expressed as a Bayesian model and analysed using a simulation-based
approach, namely Markov Chain Monte Carlo (MCMC) to estimate distributions of the probability of
infection, thereby taking into account the uncertainty associated with parameter estimates needed in the
risk assessment, automatically and more satisfactorily. In general, when parameter uncertainty is taken
into account, it is typical to assume that the parameter estimate is normally distributed, which it may well
not be. The manner in which uncertainty is incorporated in the extended model allows the experimental
data to dictate the distribution of the parameter uncertainty, and allows the possibility of asymmetry
and long tails. The Bayesian framework permits embedding of several unrelated models in a single
4.6. Discussion 153
risk assessment, via directed acyclic graphs (DAGs), and may be compared with more conventional risk
assessments, where parameter estimates and their associated distributional assumptions are used. As
examples of such risk assessments, see, e.g., Gerba et al. [2008], Pouillot et al. [2004], Tanaka et al.
[1998], Whiting and Buchanan [1997].
This approach may be compared with that proposed by other researchers [Haas et al., 1999, Teunis
et al., 1997] who prepare a bootstrap sample of parameter estimates for the dose-response curve, thereby
allowing for non-normality, prior to running the risk assessment. However, despite this being the method
recommended in Haas et al. [1999], most risk assessments take their dose-response parameters as con-
stants. It would seem that bootstrapping, choosing a size for the bootstrap sample, and incorporating the
resultant bootstrap sample into the simulation framework is generally discouraging for most practition-
ers. When the interest in a risk assessment involves tails of distributions or upper percentiles, it is critical
not to ignore the tail behaviours of the fitted distributions. The method we propose permits all such
asymmetries and uncertainties to be easily incorporated. In this particular case, where the dose-response
parameters are correlated, the problem of simulation is particularly difficult, since the two parameters are
not bivariate normal. Hence, when the dose-response curve is a beta-Poisson, some appropriate method
must be used to capture the bivariate behaviour. Haas [1999] proposes a further method based on rank
coefficients. This method is again complex to implement, whereas here we argue that using MCMC via
WinBUGS is not.
There are so little data used in the estimation of the die-off coefficients that there is little evidence of
asymmetry. Nonetheless, using data rather than previously estimated constants, ensures that uncertainty
is propagated properly throughout the simulation. Had the ‘shoulder’ equation of Sinton et al. [2007]
been considered appropriate, the same problem of correlated parameter estimates for the curve fit would
again be as strongly evident as they are for the dose-response equation.
MCMC simulation and estimation has been available for some time, but is rarely used in the context
of risk assessment as described here. Kelly and Smith [2009] present a simple primer of MCMC methods
for this purpose, and, in particular, discuss its use in fitting hierarchical models, and in dealing appro-
priately with missing and uncertain data. Messner et al. [2001] use an MCMC approach to perform a
meta-analysis using hierarchical MCMC modelling to develop a dose-response curve for C.parvum. The
same approach is taken by Qian et al. [2003] who use MCMC to fit a hierarchical model to perform a
meta-analysis for various studies of protozoan inactivation by UV light. Delignette-Muller et al. [2006]
154CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
used a complex hierarchical model to describe the growth of Listeria in cold-smoked salmon, and then
used this to develop a further model for the time necessary to reach particular pathogen numbers, with
the second model importing the uncertainty implicit in the original data.
Paulo et al. [2005] undertook a risk assessment, which closely parallels our approach: the parameter
estimation for various submodels and the assessment of dietary exposure to pesticides (as a final node)
was accomplished in the one MCMC model. Like the models of Paulo et al. [2005], the model presented
in this paper differs substantially from the majority of risk assessments in two main ways. Firstly, it em-
beds the primary data (for dose-response, Salmonella die-off, and doses) within the simulation itself, thus
incorporating directly the uncertainties of the data, together with the (unknown) correlation structures.
That is, no summary data are used and no process is represented by a constant. Secondly, by putting
these together in a directed acyclic graph (DAG) and using MCMC, the model allows us to simultane-
ously estimate all the parameters currently used in a QMRA, together with the risks in which we are
interested.
Here, every parameter, however disparate, is estimated simultaneously with the risk simulation. This
permits all parameter uncertainty to be propagated throughout the risk assessment by incorporating all
relevant data seamlessly into the one directed acyclic graph. Thus, ideally, the data nodes might consist
of microbial cell numbers post treatment, dose-response data, and microbial numbers prior to treatment
(thereby allowing estimation of the log-reduction constants, and a potential comparison of the two meth-
ods of estimation), die-off data, and users’ consumption behaviour data. This method means that there is
no necessity for prior bootstrap simulations, as in Cullen and Frey [1999]’s “two-dimensional” approach
to fitting ‘uncertainty’ and ‘variability’: the models and the methods are explicit and transparent.
In summary, we have demonstrated a method for incorporating parameter uncertainty, which does
not require complex simulation methods. Where a risk assessor is trying to do more than arrive a point es-
timate, and is running Monte-Carlo simulations such as offered by @Risk [Palisade Corporation, 2008],
this method allows risk uncertainty to be satisfactorily described, without resorting to two-step estima-
tion procedures. It is also far more transparent than a spreadsheet approach where operations and their
sequencing can be difficult to discern. This method incorporates all the original data used to derive the
required parameters for a QMRA, into the QMRA, whereas in the more traditional approach these pa-
rameters are derived prior to undertaking the risk assessment and are ‘plugged’ in to the assessment. We
would recommend it as a simple, transparent method which should be incorporated into a risk assessor’s
4.7. Conclusions 155
armoury.
4.7 Conclusions
The aim of the study was twofold: (i) to indicate the potential problems arising from failure to include the
uncertainty of parameter estimates in risk assessments; and (ii) to illustrate the superiority of estimating
the parameters to be used in the risk assessment simultaneously with the risk assessment. When one
considers the “banana-shaped” bivariate graph for the dose-response parameters (α, β) and its long left
tail presented in Figure 4.8, there is little doubt that the simultaneous estimate of all parameters of interest
is a better methodology to use. The techniques and programs used to derive such estimates are now
readily available.
Our analysis indicated that, where dose ranges are either extremely large or small, estimating risk by
including the uncertainty in the underlying parameters makes little difference in the possible ranges for
the probability of infection. However, when the dose is within the range where the risk is neither very
close to one nor zero, the inclusion of uncertainty in the parameters may make marked difference in the
possible ranges for the probability of infection.
However, the results of this study highlight the superiority of models developed directly from data
for finding more realistic estimates of uncertainty. In practical terms, we would advocate that workers
in this field report comprehensive data. Commonly, reported results only include a range and a mean,
occasionally a standard deviation, and often not even the number of observations used. These are gen-
erally insufficient to permit adequate estimation of risk. In addition, there is a failure to acknowledge,
let alone include, the uncertainty which results from small experiments. For the methodology advanced
in this paper, we would recommend, firstly, that all data from experiments leading to parameters needed
in a risk assessment, be in the public domain, particularly when their interpretation may have important
implications for public health. A major limitation imposed on this study was the inability to access data
collected by, or on behalf of, any Australian water utility, much of which is mandated by law or regula-
tion. Thus, our final recommendation is that such data be made publicly available. Journals may make a
difference in the short term, by insisting on this for data forming the basis of a published paper.
156CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
4.8 Figures
4.8. Figures 157
Figure 4.1 Model for a QMRA for surface vegetable irrigated with treated wastewater. Observeddata nodes shown in white, parameter nodes in green, and outcome nodes in a lightgrey.
158CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.2 Model for the part of the standard QMRA implemented here. Observed data nodesshown in white, parameter nodes in green, and outcome nodes in a light grey.
4.8. Figures 159
Figure 4.3 Schematic Model for the directed acyclic graph implemented in WinBUGS for es-timation of parameters and risk. Observed data nodes (1,3,4,7) are shown in white.Unknown parameter nodes to be estimated (2,6) in green, and outcome nodes (5,8)in a light grey.
160CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.4 Sunlight hours for January/June 2008 at Perth airport.
4.8. Figures 161
Figure 4.5 Dose-Response curve with uncertainty for S.anatum: P = 1 − (1 + Dose/β)−α. Thebounding curves are the 95% credible intervals from the MCMC simulation.
.. / I
/ /
6 -
162CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.6 Graphical model for Dose-Response estimated with error in measurement and errorin individual dosage: Measured dose is the observed batch dose, Batch dose is theunobserved true batch dose, individual dose is the true unobserved individual dose.The observation of an individual’s infection status is assumed to be without error.
4.8. Figures 163
Figure 4.7 Graphical model for a risk assessment which includes the parameters for dose-response based on the errors-in-variables concept.
164CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.8 Dose-response curve parameters (α, β): Posterior distribution for logα vs logβ/1000using log uniform priors
0
I _, _,
. ·.·
-10 -9 -8 -7 -6 -6 -4 -8 -2 -1 0 ' ' 6
4.8. Figures 165
Figure 4.9 Die-off distributions for S.typhimurium: fixed effects pooled variance model Nt =
N0e−kt, k > 0. Note that for die-off k > 0.
... " I' n
""" '' 11 ' ' '
' I I ' ' I I ' ' ' ' I I
4000 ' ' '
' I I ' ' ' ' I I
I ' ' ' I I ,_ ...
' ' ' I I -... ..... ,.., ' ' ' ' I I -..,...., ' ' ' I I """"""""' ' ' ' I I ' '
""' ' ' I I ' ' ' ' I I ' ' I I ' ' '
' ' I I 1000 ' ' ' I I ' ' I I \j \ X __
'-_., .. .. "' l.5 2.0
DiB-~--k
166CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.10 Summer & Winter: Probability of infection - constant (the line) vs varying (thedots)
u~--------------------------------~~~-,
u ·,:· .
" - .. ,_
" "
I .. • .. I ..
" " " " -I
4.8. Figures 167
Figure 4.11 Summer & Winter: Probability of infection - Constant vs Varying by ranked initialpathogen numbers groups.
10
0.9 - Cons tcnt c==:J Varying
D.8
0.7
I D.6
D.5
J M
D.3
02
D.l l. 0.0 L
0 2 8 4 5 6 8 10 11 12 Ill 14 lfi
hdliol Patlicsen Nlllllbn: niDk group
LO
t ~· r r T-
0.9 - Cons tant c:==:J Vo ryin<,J
' D.8
0.7
! D.6
D.5
~ ;i M
D.8
02
D.l
L 0.0
0 2 8 4 5 6 8 10 11 12 Ill 14 lfi
- l'lltbogen Nlllllbn: niDk group
168CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Figure 4.12 Probability of infection (no die-off) against ranked initial pathogen numbers groups:using constant vs varying parameters for Beta binomial distribution.
4.8. Figures 169
Figure 4.13 Comparison of the Dose-response curves for S.anatum with 95% credible intervals,estimated with & without “errors-in-variables”.
LO
0.9
o.s
0.7
o.s
J D.5
1),4
o.s
0,2
D.l
0.0
llDl 100000 lOOOOOOOO
170CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
4.9 Tables
Table 4.1 Settings for constant parameters
Constant Description Value Derivation
α p(in f ection) = 1 − (1 + dose/β)−α .451 From Teunis et al. [1996]β 15177 (as above)
k1 Winter die-off constant(sunlight) -.3010 An earlier runk2 Winter die-off constant(shade) -.1237 (as above)k3 Summer die-off constant(sunlight) -1.0390 (as above)k4 Summer die-off constant(shade) -.6457 (as above)
S W Sunlight hours (Winter) 6.584 Mean June 2008 (Perth)S S Sunlight hours (Summer) 11.625 Mean January 2008 (Perth)
4.9. Tables 171
Table 4.2 Summary statistics for p(infected) over groupings
Group Period Type Mean std median q5 q95
7 No die-off Constant 0.07 0.001 0.06 0.03 0.14Varying 0.10 0.003 0.07 0.02 0.27
Summer Constant 0.00 0.000 0.00 0.00 0.00Varying 0.00 0.000 0.00 0.00 0.00
Winter Constant 0.57 0.004 0.58 0.41 0.71Varying 0.54 0.009 0.56 0.11 0.85
8 No die-off Constant 0.31 0.004 0.31 0.16 0.48Varying 0.34 0.005 0.33 0.12 0.56
Summer Constant 0.00 0.000 0.00 0.00 0.00Varying 0.00 0.000 0.00 0.00 0.00
Winter Constant 0.82 0.002 0.83 0.73 0.89Varying 0.78 0.006 0.82 0.43 0.96
9 No die-off Constant 0.66 0.003 0.67 0.52 0.77Varying 0.67 0.004 0.68 0.49 0.81
Summer Constant 0.00 0.000 0.00 0.00 0.00Varying 0.01 0.001 0.00 0.00 0.04
Winter Constant 0.93 0.001 0.93 0.90 0.95Varying 0.91 0.003 0.93 0.76 0.99
10 No die-off Constant 0.86 0.002 0.86 0.79 0.91Varying 0.85 0.002 0.86 0.76 0.94
Summer Constant 0.01 0.000 0.01 0.00 0.01Varying 0.04 0.004 0.01 0.00 0.20
Winter Constant 0.97 0.000 0.97 0.96 0.98Varying 0.95 0.002 0.97 0.87 1.00
11 No die-off Constant 0.94 0.001 0.95 0.92 0.96Varying 0.94 0.001 0.95 0.87 0.99
Summer Constant 0.05 0.001 0.04 0.02 0.10Varying 0.12 0.006 0.05 0.01 0.48
Winter Constant 0.99 0.000 0.99 0.98 0.99Varying 0.98 0.001 0.99 0.93 1.00
12 No die-off Constant 0.98 0.000 0.98 0.97 0.99Varying 0.97 0.001 0.98 0.93 1.00
Summer Constant 0.25 0.003 0.24 0.12 0.40Varying 0.30 0.008 0.26 0.04 0.72
Winter Constant 1.00 0.000 1.00 0.99 1.00Varying 0.99 0.001 1.00 0.96 1.00
q5: 5th percentile of posterior distributionq95: 95th percentile of posterior distribution
172CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Table 4.3 Summary statistics for groupings: Group Initial Pathogen Numbers / Doses
Group Mean Pathogens std median Min Max
0 1.21 × 100 2.70 × 10−2 1.06 × 100 3.70 × 10−1 2.81 × 100
1 8.53 × 100 1.80 × 10−1 7.37 × 100 2.82 × 101 1.99 × 101
2 7.41 × 101 1.71 × 100 6.17 × 101 1.99 × 101 1.74 × 102
3 5.68 × 102 1.23 × 101 4.81 × 102 1.75 × 102 1.28 × 103
4 4.29 × 103 9.65 × 101 3.59 × 103 1.30 × 103 1.03 × 104
5 3.89 × 104 9.36 × 102 3.20 × 104 1.03 × 104 9.64 × 104
6 3.38 × 105 7.61 × 103 2.87 × 105 9.70 × 104 7.97 × 105
7 2.75 × 106 6.19 × 104 2.31 × 106 8.00 × 105 6.40 × 106
8 2.29 × 107 5.32 × 105 1.89 × 107 6.40 × 106 5.52 × 107
9 1.83 × 108 3.96 × 106 1.59 × 108 5.53 × 107 4.14 × 108
10 1.52 × 109 3.63 × 107 1.27 × 109 4.14 × 108 3.67 × 109
11 1.16 × 1010 2.50 × 108 9.88 × 109 3.68 × 109 2.69 × 1010
12 8.35 × 1010 1.72 × 109 7.28 × 1010 2.70 × 1010 1.89 × 1011
13 6.09 × 1011 1.35 × 1010 5.01 × 1011 1.90 × 1011 1.38 × 1012
14 4.57 × 1012 1.0 × 1011 3.93 × 1012 1.38 × 1012 1.07 × 1013
For the No die-off results, the doses are 1/1000th of these
BIBLIOGRAPHY 173
Bibliography
Brandl, M. T. and R. Amundson (2008). Leaf age as a risk factor in contamination of lettuce with
Escherichia coli O157 : H7 and Salmonella enterica. Applied and Environmental Microbiology 74(8),
2298–2306.
Bureau of Meteorology (2010, April 15). 2010JR12235 *** Student/Request for Data, Forecasts
or other services/wa/Climate and Past Weather*** (JR- [SEC=UNCLASSIFIED]. email: cli-
Cullen, A. C. and H. C. Frey (1999). Probabilistic techniques in exposure assessment : a handbook for
dealing with variability and uncertainty in models and inputs. New York: Plenum Press.
Delignette-Muller, M. L., M. Cornu, R. Pouillot, and J. B. Denis (2006). Use of Bayesian modelling
in risk assessment: Application to growth of Listeria monocytogenes and food flora in cold-smoked
salmon. International Journal of Food Microbiology 106(2), 195–208.
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Gerba, C. P., N. C.-d. Campo, J. P. Brooks, and I. L. Pepper (2008). Exposure and risk assessment of
Salmonella in recycled residuals. Water Science & Technology 57(7), 1061–1065.
Gibbs, R. A. (1995). Die-off of human pathogens in stored wastewater sludge and sludge applied to
land. Technical report, Urban Water Research Association of Australia, Water Services Association
of Australia, Melbourne.
Gibbs, R. A. and G. E. Ho (1993). Health risks from pathogens in untreated wastewater sludge: implica-
tions for Australian sludge management guidelines. Water 20(1), 17–22.
Gibbs, R. A., C. J. Hu, G. E. Ho, P. A. Phillips, and I. Unkovich (1995). Pathogen die-off in stored
wastewater sludge. Water Science & Technology 31(5-6), 91–95.
Haas, C. N. (1999). On modeling correlated random variables in risk assessment. Risk Analysis 6,
1205–1214.
Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New York:
Wiley.
174CHAPTER 4. PAPER TWO: INCORPORATING PARAMETER UNCERTAINTY
INTO QUANTITATIVE MICROBIAL RISK ASSESSMENT (QMRA)
Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and G. Medema
(2007). Removal and fate of Cryptosporidium parvum, Clostridium perfringens and small-sized centric
diatoms (Stephanodiscus hantzschii) in slow sand filters. Water Research 41, 2151–2162.
Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms by drinking
water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.
Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-organisms
by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.
Kelly, D. L. and C. L. Smith (2009). Bayesian inference in probabilistic risk assessment–The current
state of the art. Reliability Engineering & System Safety 94(2), 628–643. 0951-8320 doi: DOI:
10.1016/j.ress.2008.07.002.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.
Marks, H. M., M. E. Coleman, C. T. J. Lin, and T. Roberts (1998). Topics in microbial risk assessment:
Dynamic flow tree process. Risk Analysis 18(3), 309–328.
McCullough, N. B. and C. W. Eisele (1951). Experimental human salmonellosis: I. pathogenicity of
strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried whole egg. The
Journal of Infectious Diseases 88(3), 278–289.
Messner, M. J., C. L. Chappell, and P. C. Okhuysen (2001). Risk assessment for Cryptosporidium: A
hierarchical Bayesian analysis of human dose response data. Water Research 35(16), 3934–3940.
Natural Resource Management Ministerial Council, Environment Protection and Heritage Coun-
cil, and Australian Health Ministers Conference (2006). Australian Guidelines for Wa-
ter Recycling: Managing health and environmental risks (Phase1) 2006. Available on:
www.ephc.gov.au/taxonomy/term/39. Accessed: March 29, 2008.
Oscar, T. (2004). Dose-response model for 13 strains of Salmonella. Risk Analysis 24(1), 41–49.
Palisade Corporation (2008). At Risk5.0. Available online: www.palisade.com/risk/. Accessed:
October 22, 2009.
BIBLIOGRAPHY 175
Paulo, M. J., H. v. d. Voet, M. J. W. Jansen, C. J. F. t. Braak, and J. D. v. Klaveren (2005). Risk assessment
of dietary exposure to pesticides using a Bayesian method. Pest Management Science 61(8), 759–766.
Pouillot, R., P. Beaudeau, J.-B. Denis, and F. Derouin (2004). A quantitative risk assessment of water-
borne Cryptosporidiosis in France using second-order Monte Carlo simulation. Risk Analysis 24(1),
1–17.
Qian, S. S., C. A. Stow, and M. E. Borsuk (2003). On Monte Carlo methods for Bayesian inference.
Ecological Modelling 159, 269.
Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass surfaces
irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.
Sinton, L., C. Hall, and R. Braithwaite (2007). Sunlight inactivation of Campylobacter jejuni and
Salmonella enterica, compared with Escherichia coli, in seawater and river water. Journal of Wa-
ter and Health 5(3), 357–365.
Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety of wastew-
ater reclamation and reuse using enteric virus monitoring data. Water Environment Research 70(1),
39–51.
Teunis, P. F. M., G. J. Medema, L. Kruidenier, and A. H. Havelaar (1997). Assessment of the risk
of infection by Cryptosporidium or Giardia in drinking water from a surface water source. Water
Research 31, 1333–1346.
Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The dose-
response relation in human volunteers for gastro-intestinal pathogens. Technical report, National In-
stitute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.
Wand, M. P. (2009). Semiparametric and graphical models. Australian and New Zealand Journal of
Statistics 51(1), 9–41.
Whiting, R. C. and R. L. Buchanan (1997). Development of a quantitative risk assessment model for
Salmonella enteritidis in pasteurized liquid eggs. International Journal of Food Microbiology 36,
111–125.
Statement of Contribution of Co-Authors for Thesis by Publication
The authors listed below certify that:
1. they meet the criteria for authorship, in that they have participated in the conception , execution, or interpretation , of at least that part of the publication in their field of expertise;
2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;
3. there are no other authors according to these criteria;
4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and
5. they agree to the use of the publication in the student's thesis and its publication on the Australasian Digital Thesis database consistent with any limitations set by publisher requirements.
In the case of Chapter 5:
Title: A Bayesian analysis of an agricultural field trial with three spatial dimensions
Journal: Computational Statistics and Data Analysis Status: Submitted September 2010
Contributor
Margaret Donald
Dr Clair Alston
Rick Young
Statement of Contribution
Margaret Donald as first author was responsible for the concept of the paper, data analysis, interpretation and the writing of all drafts.
Was responsible for advice on measurements and their meaning and editorial comment.
Was responsible for advice on the purpose and background to the field trial, advice on the meaning of statistical results and editorial comment
-----+-Professor Kerrie Mengersen
Was responsible for general advice and editorial comment
Principal Supervisor's Confirmation
Date
to/oq(r
I have sighted emall or other correspondence from all co-authors confirming their certifying authorship.
1 1 J IL...[ ~:l. t: 11t:Nl, '-:<J( v /.rvt-- I) I o1 /t 0
Name Signature Date
Chapter 5
A Bayesian analysis of an agricultural
field trial with three spatial dimensions
5.1 Preamble
This chapter satisfies research objective (3), where we aimed to build a satisfactory complex model for
one day’s data of the agricultural trial data. In this chapter, we consider various models for a single
day’s data from the agricultural trial. We compare many potential adjacency structures for a CAR model
[Besag, 1974, Besag et al., 1991] to model spatial autocorrelation in the data, as well as the AR1, AR1
model of Gilmour et al. [1997]. For the fixed part of the model, we consider orthogonal polynomials,
linear splines, cubic splines and cubic radial bases to model the treatment curves along the depth dimen-
sion. Knowing that the measured depth does not measure the depth in the soil profile [Ringrose-Voase
et al., 2003], we also introduce an errors-in-variables model for modelling the treatment curves along the
depth dimension.
This chapter has been written as a journal article, of which I am the first author and is presented here
in its entirety. It is reprinted here with its abstract, and with different bibliographic conventions from
Computational Statistics and Data Analysis, to which it has been submitted in September, 2010. Rick
Young provided the data, helped with all things agricultural, in addition to providing editorial comment.
Clair Alston provided major editorial advice in addition to advice on the collection and meaning of
the data. Kerrie Mengersen oversaw, helped with, and guided the exposition. As first author, I was
177
178CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
responsible for concept of the paper, the data analysis, interpretation and the writing of all drafts as well
as the final version.
Title: A Bayesian analysis of an agricultural field trial with three spatial dimensions
Authors: Margaret Donalda, Clair Alstona, Rick Youngb, Kerrie Mengersena.
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,
QLD 4001, Australia.
bTamworth Agricultural Institute, Industry & Investment NSW, 4 Marsden Park Road, Calala, NSW
2340, Australia.
Bibliography
Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J.
R. Statist. Soc. B 36(2), 192–236.
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial
statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.
Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extraneous variation
in the analysis of field experiments. Journal of Agricultural Biological and Environmental Statistics 2,
269–293.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,
M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,
R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses
in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report
Series, NSW Agriculture Orange.
5.2 A Bayesian analysis of an agricultural field trial with three spa-
tial dimensions
Abstract
Modern technology now has the ability to generate large datasets over space and time. These data
typically exhibit high autocorrelations over all dimensions. Generally, in the statistical modelling of such
data, modelling across time is made independent of the spatial dimensions. In a like manner, in three
dimensional space, when measurements are made over widely differing distances across the different
dimensions, it seems that better models may be fitted when the various spatial dimensions are separated.
Here, using an example of agricultural data collected over three dimensions, we see that the better model
is that in which depth is separated from the modelling within the horizontal layers. The field trial data
motivating the methods were collected to examine the behaviour of traditional cropping and to determine
a cropping system which could maximise water use for grain production while minimising leakage below
the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18
columns, in the lattice framework of an agricultural field.
Bayesian Conditional Autoregressive models are used to account for local site correlations. Con-
ditional autoregressive models have not been widely used in analyses of agricultural data. This paper
serves to illustrate the usefulness of these models in this field, along with the ease of implementation in
WinBUGS, a freely available software package.
The innovation is the fitting of separate conditional autoregressive models for each depth layer, while
simultaneously estimating depth profile functions for each site treatment. Modelling interest lay in how
best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the
spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and
treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbour-
hood models by depth layer. It is hierarchical, with separate conditional autoregressive spatial variance
components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth
errors as interval-censored measurement error. The Bayesian framework permits transparent specification
and easy comparison of the various complex models compared.
Keywords
Bayesian, Conditional Autoregressive (CAR) models, Cubic radial bases, Errors-in-variables, Field
180CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
trial, Latent variables, Markov Chain Monte-Carlo (MCMC), Markov random field (MRF), Orthogonal
polynomials, Spatial autocorrelation, Splines, Variance components.
5.3 Introduction
In the past 20 years there has been a large uptake of Bayesian methods in many scientific fields, but
this trend is less prevalent in agriculture. The paper of Besag and Higdon [1999], which demonstrated
Bayesian methods for an agricultural field trial, has been cited approximately 360 times, but of these,
just 2 were published in an agricultural journal, and none analysed an agricultural field trial. Similarly,
in a search of Web of Science on July 31, 2009, 545 papers were found which cited Besag et al. [1991],
a seminal paper for conditional autoregressive (CAR) modelling. Of these, the majority (341) related to
health, disease or death in humans or animals, and again, none dealt with an agricultural field trial.
Almost 20 years on from Besag et al. [1991], we re-examine the advantages of the conditional au-
toregressive (CAR) or Markov random field (MRF) models, described by Besag [1974] and elaborated
by him and co-workers in Besag et al. [1991] and applied to field trial data in Besag et al. [1995], Besag
and Higdon [1999]. Readily available software for CAR models allows simple specification of complex
random components, and simple calculation of complex quantities based on the model, permitting the
analyst to consider many differing models.
This paper is motivated by an increasing problem in agriculture, that of understanding the impact
of cropping regimes on water and, concomitantly, soil salinity. In many parts of the world the viability
of rainfed grain cropping is threatened by salination of land and water resources. Salination is caused
by excessive deep drainage below the plant root zone which mobilises sometimes vast sub-soil stores
of salt deposited at the time of soil formation. Deep drainage occurs when rain infiltrates already wet
soil that has insufficient capacity to store the additional water. This excess saline water may produce
water logging and shallow saline water tables, or may discharge at lower points in the landscape, or into
surface- or ground-waters [Broughton, 1994]. When saline ground waters encroach on the crop root zone,
the salt kills germinating crops or reduces yields depending on salt concentrations and rainfall [Daniells
et al., 2001]. The excess water is usually due to a combination of above average rainfall falling onto
land farmed using long fallow cropping practices, that is, the land is kept as bare fallow for about 2/3
of the time. Although long-fallow cropping usually results in good grain yields for each crop, average
yields over time are generally less than yields from more intensive, but somewhat more risky systems.
5.4. Methods 181
To overcome both the problems of excess water in the landscape under long fallow cropping, and the risk
of poor crop yields due to insufficient water supply between successive crops, when cropping is frequent,
a practice of planting a crop, appropriate for the time of year, crop health and economic considerations,
in response to soil water content (opportunity or response cropping) is being increasingly adopted by
farmers. When data are collected to consider the impact of cropping regimes, a substantive challenge of
this endeavour is the description of water patterns over space and depth.
This paper examines a single data set from a randomised complete block experiment, which com-
prises soil moisture measurements taken at 3 dimensions in space: row, column and depth. The presence
of spatial correlation is demonstrated, and various ways are considered for modelling it. We consider
several conditional autoregressive (CAR) models [Besag, 1974], with a complex variance structure. We
also consider an AR(1), AR(1) model such as those used by Gilmour et al. [1997] as base models, and
fitted here using Markov Chain Monte Carlo Gibbs sampling, and kriging models [Cressie, 1991].
Various models for the treatment effects along the depth dimension are considered, and include or-
thogonal polynomials, linear and cubic splines, and in conjunction with the splines, an errors-in-variables
model for depth to account for shrinkage of soils on drying, and expansion on wetting.
The interest of the final models chosen is that the data are best modelled using CAR models in two
dimensions, and not three.
5.4 Methods
5.4.1 Data
To test the efficiency of water use and the productivity of response cropping compared with long fal-
low systems and traditional continuous winter cropping, a field experiment was established on a deep,
well structured and well drained, non-saline, cracking clay soil (Black Vertosol) in the upper reaches of
the Liverpool Plains catchment in New South Wales in south-eastern Australia. Accurate measurement
of soil water content was critical to the success of this work. This was measured from access tubes at
measurement sites in each of 18 experimental plots over 15 equal depth increments to 3 m. The neutron
scattering count [Ringrose-Voase et al., 2003] was taken at each depth in the access tube and controlled
by the neutron count taken from an access tube fitted into a drum of water after each set of field measure-
ments. The surrogate measurement for moisture used in the analysis here is the log transformation of the
182CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
ratio of raw neutron count data to the control count. The measurement sites were arranged as 3 rows with
18 columns per row. Thus, the dimensions row, column, and depth are essentially orthogonal.
Nine experimental treatments consisting of three fully phased cropping systems and three types of
perennial pasture were allocated as a randomised complete block design to the 18 plots each containing
3 of the 54 measurement sites [Ringrose-Voase et al., 2003, p23]. Treatments are described as follows.
1. Treatments 1-3. Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are
grown in three years with an intervening 10-14 month fallow period. The 3 treatments were each
of 3 phases of the long fallow 3 strip system. ‘Long Fallow 1’ started with wheat in the winter of
1994 followed by sorghum in summer, 1996. ‘Long Fallow 2’ started with sorghum in the summer
of 1995, and ‘Long Fallow 3’ started with wheat in the winter of 1995.
2. Treatment 4. Continuous cropping in winter with wheat and barley grown alternately.
3. Treatments 5 and 6. Response cropping, where an appropriate crop (either a winter crop or a
summer crop) was planted when the depth of moist soil exceeded a predetermined level. The two
response cropping treatments were differentiated by the sequence of crop types.
4. Treatments 7-9. Perennial pastures. The 3 treatments were lucerne (a deep rooted perennial forage
legume with high water use potential), lucerne grown with a winter growing perennial grass,and a
mixture of winter and summer growing perennial grasses.
The data used here were from the third year of the experiment, when the treatments had bedded down
and measurements reflected treatment effects.
Rainfall in the 6 months preceding these moisture measurements was very high (802 mm, compared
to the annual average of 684 mm). As a result, winter crops in treatments 3-7 had not substantially
depleted the stored soil water compared with previous years. The sorghum crop in treatment 1 had
recently been planted into a fully recharged soil profile. Plots for treatment 2 were lying fallow. In
contrast, pasture plots were much drier as they were depleted by 300-600mm coming into the winter.
The volumetric water content calculated using the neutron moisture meter calibration equations from
Ringrose-Voase et al. [2003] of the soils ranged from 26% under lucerne to 55% under long fallow in the
surface 50 cm, illustrating the large water holding capacity of these soils.
5.4. Methods 183
5.4.2 Spatial Correlation
A well-known problem of agricultural field trials is the variability in the fertility, physical and hydraulic
characteristics of the landscape on which the experiment is sited. In a field trial, spatial correlation in the
observations is expected since we anticipate that soil and drainage conditions of neighbouring plots are
likely to be more similar than those of plots further away. To reduce bias in measurements, treatments
within blocks are allocated at random. However, random allocation does not necessarily eliminate the
problem, particularly in small experiments. In addition, although the treatments form a randomised
complete block treatment, there are three measurements within each treatment plot. These measurements
within the plot are expected to be highly spatially correlated.
A further complication is that the treatment effect, moisture, is modelled as a fixed function of depth
(Section 5.4.3). With a poor fixed effects model, we could expect residuals for each site depth profile
to become autocorrelated. We used a potentially overfitted model (the 8 degree orthogonal polynomial)
as a base model for the comparisons of neighbourhood models (Table 5.1). Testing the base model
for autocorrelations, using SAS PROC AUTOREG [SAS Institute, 2004] showed autocorrelation in the
depth residuals for 2 of the 54 sites at the 1% significance level, whether 6, 8 or 10 degree polynomials
were used. Hence, any of these polynomial models seemed a reasonable choice for a base model.
In preliminary analyses we fitted kriging models [Banerjee et al., 2004, Cressie, 1991, Gotway and
Cressie, 1990] to the residuals by depth layer, after fitting a saturated model of all treatments by depths
(135 fixed terms). We considered covariograms by depth layer, and additionally allowed for anisotropy
across row and column. However, neither anisotropic nor isotropic kriging models appeared satisfactory:
for all depths, no covariogram showed evidence of being a function of increasing distance. Banerjee et al.
[2004], Stefanova et al. [2009, p73] indicate that such evidence is not always reliable.
Spatial correlation via a local neighbourhood definition was also considered. Moran’s I [Banerjee
et al., 2004, pp72–73] was used to explore the spatial association of the residuals described above, using
a neighbourhood matrix of equal weights for first-order neighbours in the same depth layer. Table 5.2
shows this as being statistically significant at the 10% level for ten of the fifteen depths. Neighbourhoods
were largely defined by depth layer because of the very great difference in scale of depth compared with
the distances across row and columns.
In the light of the results of the Moran’s I analysis, conditional autoregressive (CAR) models were
adopted. They were more flexible than kriging models, since they employ sparse precision matrices [Rue
184CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
and Held, 2005], a far more efficient computational tool, rather than the dense covariance matrices used
in kriging. Recent research [Besag and Mondal, 2005, Lindgren et al., 2010] indicates that the distinction
between kriging and CAR models is more apparent than real.
Neighbourhoods and weights
CAR models are based on neighbourhoods. In a three dimensional space, there are many potential
choices for the neighbourhood of a point. A point in space could, for example, be thought of as being
surrounded by 26 neighbours of a 3 × 3 box, or by its first order neighbours in both spatial layer and
depth (6 neighbours). The major innovation of this paper is that, recognising the great differences in scale
between measurements in a horizontal layer and those at different depths, the neighbourhoods compared
were largely within the same depth layer.
As discussed in Section 5.4.2, CAR models depend not only on the definition of neighbourhood,
but also on the weights given to neighbours, which may often be distance based. Distance weights were
not used for the following reasons. For these data, neighbouring measurements across depth are 20
cm apart, while neighbouring measurements along rows are roughly 10 m apart when the measurement
sites are in the same subplot and roughly 30 metres apart when not. Along the columns, the distances
between neighbours are approximately 20 metres. Suppose that the reciprocal of the distance between
neighbouring measurement points is used as the weight. This would mean that points within a block
would weigh about 3 times as much as adjacent points not in the block, thus making averaging across
neighbours closer to averaging across a block. If depth neighbours were included, a depth neighbour
would weigh 50 times more than a neighbour within the block and 150 times more than a neighbour
from a neighbouring plot. This would effectively reduce the spatial analysis to a single dimension: depth,
and make it a depth neighbourhood analysis only. Distance weights could be discounted by raising the
distance to a fractional power, but any discounting power is arbitrary, and would increase the number of
models to be distinguished between, without adding insight.
The choice of neighbourhood weights as (0,1), i.e., a neighbour or not a neighbour, allows a simple
choice of the best adjacency model, independent of weights. If distance weights were to be used, it would
be unclear whether it is the weights or the adjacency definition changing the fit of the model. For example,
a second-order neighbour model rejected by a (0,1) weighting scheme, may not differ greatly from the
first-order neighbour model when distance weights are used, since the distance weights will discount the
5.4. Methods 185
second-order neighbours. We preferred to determine appropriate neighbourhoods over which to average,
independent of weights.
Depth neighbours were used in one CAR neighbourhood model only (Table 5.1), as it was possible
that after accounting for treatment effects using depth (Section 5.4.3), neighbouring depth residuals might
be correlated, both naturally or by a poor fixed modelling inducing depth correlation.
In weighting horizontal neighbours equally, the differences of scale between rows and columns are
ignored. Gilmour et al. [1997] deal with this by using a base model with AR(1) modelling for row
and column, while Besag and Higdon [1999], Besag and Mondal [2005] use differently weighted row
and column CAR neighbourhood models in which the weights are estimated. Both modelling strategies
recognise anisotropy, since a priori, it is unclear whether two experimental sites in adjacent rows which
are physically closer will have greater spatial correlation than two sites cultivated in the same column
but further apart. Under WinBUGS, weights may not be random quantities to be estimated, but must be
fixed. Our choice was above all to determine a suitable neighbourhood.
CAR spatial models
Let the sites be indexed by i = 1, ..., 54, at depths, 20, 40,...,300 (indexed by d = 1, ..., 15). The moisture
value, yi,d is modelled as nine depth functions, f j(i)(d), one for each treatment j, determined by the site
index, i. These are functions of the depth (indexed by d). The residual from this fixed effects model is
modelled as the sum of a spatial residual component, si,d, and a non-spatial residual component ϵi,d, with
ϵi,d ∼ N(0, τ2). The spatial residual component is an average of the neighbouring spatial residuals, [Besag
and Kooperberg, 1995, Besag et al., 1991]. This local spatial smoothing specification ensures a global
specification via Brooke’s lemma [Banerjee et al., 2004], and allows us to account for spatial similarities.
For site i at depth d, the full model is
yi,d = f j(i)(d) + si,d + ϵi,d
where f j(i)(d) is the treatment effect for treatment j at site i and depth index d, and is a function of
depth. The conditional probability of the spatial residual component, si,d, given its neighbours, sk,d, is
si,d |sk,d, k ∈ ∂i ∼ N(∑
k∈∂i
wik sk,d
wi+,σ2
dwi+
)
186CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
where ∂i is the set of indices for the neighbours of site i, wik is the weight of the kth neighbour of i,
wi+ is the sum of the weights of the neighbours of i, and σ2d is a variance component for the CAR model
at depth, d, and there is a common homogeneous variance component across all depths, τ2.
The majority of neighbourhood models compared are described by the CAR model given above.
However, two further random component models were fitted, one of which included a CAR model with
depth neighbours which therefore could only be given a single spatial variance component (σ2). This
model allowed the final homogeneous residual term ϵi,d to be dependent on depth, with ϵi,d ∼ N(0, τ2d).
The final model considered had first-order neighbours in the same depth layer, with 15 spatial variance
components and 15 homogeneous variance components, one for each depth.
5.4.3 Treatment (fixed) effects
Interest lies in describing moisture as a function of both depth and treatment. For this reason, we model
depth effects using fixed terms in the model, rather than incorporating depth neighbours into the random
CAR component of the model.
Preliminary analyses in which possible spatial correlation was ignored showed that the data could
be described in terms of five groupings of the treatments using orthogonal polynomials of up to degree
8 for at least some of the groupings. However, we chose to fit polynomials and spline models for all 9
treatments, to permit all treatment effects to be seen, and made final comparisons across the groupings.
Soil moisture measurements were considered to be part of a continuum. To take advantage of
this continuity it was thought reasonable to approximate the treatment effects as continuous, preferably
smooth, functions of depth. We fitted orthogonal polynomials, linear and cubic splines, and cubic radial
bases to the depth, allowing all curves to vary across the bases by treatment.
We compared 9 treatment polynomials of degree 10, 8 and 6, with linear spline models having 3-5
internal knots, and cubic splines and cubic radial bases with 5 interior knots.
For model choice the Deviance Information Criterion (DIC) [Spiegelhalter et al., 2002] was used to
compare the goodness of fit of the various models with their differing fixed and random effects.
Each of these models may be expressed in the following way: At site, i, depth indexed by d, with
treatment j,
5.4. Methods 187
yi,d = f j(i)(d),
= Xβ, in the case of the models used,
with X being a design matrix based on the treatments j(i), and the basis functions of the depth index, d.
Spline and cubic radial bases models
Treatment effects across the depths were modelled as linear splines with varying numbers of knots (from
3-5), linear splines with 5 knots and depth considered to be measured with error, and cubic radial basis
functions [Ngo and Wand, 2004] with and without measurement error in depth. We chose to fit a mea-
surement error model [Fuller, 1987, Wand, 2009] as an alternative to the adjustment to depth used by
Ringrose-Voase et al. [2003] to account for soil shrinkage/expansion under drier/wetter. While depth is
measured accurately, the depth within the soil profile is not, and it is the depth within the soil profile
which is of interest to the soil scientists. Additionally, fitting an errors-in-variables model allows the
possibility of a better fit for the spline model, and also provides a method for dealing with any residual
depth correlations. For the Errors-in-variables model, see Section 5.4.3.
For the 5 knot model, 5 equally spaced internal knots were chosen at d=3.33, 5.67, 8, 10.33 and
12.67, d=1,...,15. In this type of semi-parametric modelling, knots are typically chosen at quantiles of
the data. Thus, with equally spaced observations over depth, equally spaced knots are appropriate. The fit
using the cubic radial bases did not appear to improve with an increasing number of functions. Penalised
linear and cubic splines were fitted as described by Wand [2009].
Additionally, cubic radial basis functions as defined in Ngo and Wand [2004] were fitted. These
involve the inversion of a matrix, and the use of matrix algebra. However, this matrix is fixed once the
knots have been chosen. Thus, the changing bases implied by an errors-in-variables model do not require
matrix inversion within an MCMC implementation.
For both the linear splines and cubic radial basis functions the half-Cauchy distributions recom-
mended by Marley and Wand [2010] were used as prior distributions for the variance of the coefficients.
188CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Errors-in-measurement model
The proposed errors-in-measurement model postulates that the true depth index z is interval-censored
and is related to the observed depth index, d, in the following way:
zd |d ∼ N(d, σ2z )I(zd−1, zd+1) for d = 2, 3, ...14,
z1|d = 1 ∼ N(1, σ2z )I(0, z2),
z15|d = 15 ∼ N(15, σ2z )I(z14, 16),
where the prior for σ2z is
σ2z ∼ Half-Cauchy(1).
The choice of Half-Cauchy(1) was dictated by the need to disallow initial values of z from moving
to the extremes (0,16) and thereby removing the spline bases.
Both the spline models and the cubic radial bases model accommodate the measurement of depth
with error, giving a latent true depth of z × 20. Where an errors-in-variable model is used for depth, d
is replaced by z, the unobserved true depth index, and the knots are adjusted accordingly. The treatment
effects of the model, f j(i)(d), become f j(i)(z).
Contrasts of interest
It was important to assess whether response cropping would use rainfall (as stored soil water) more
efficiently than the traditional practices of long fallow cropping or continuous winter cropping. Other
important questions were to establish the patterns of water use and changes in soil water profiles under
response cropping compared with those under perennial pastures, which are noted for their ability to
respond to rainfall and to use available soil water at most times of the year.
Thus, three contrasts were considered: (1) the difference between the traditional long fallow treat-
ments (1,2,3) and the response cropping treatments (5,6), (2) the difference between cropping (treatments
1-6) and pastures (7-9), and (3) the difference between the lucerne treatments (7-8) and the perennial grass
treatment (9).
The various long fallow treatments were out of phase. Thus, despite the interest being between
response cropping and long fallow cropping, all 9 depth treatment curves were fitted separately to allow
any differences between them to be seen, with treatments only grouped for comparisons.
Under errors-in-measurement models of depth, treatment comparisons were made at the nominal
5.4. Methods 189
depths.
5.4.4 Choice of Priors
The variances for the spatial residual components (σ2d) were given a common inverse gamma prior, and
the non-spatial variance component (τ2) an inverse gamma prior. Thus,
1/σ2d ∼ Gamma (a1, b1),
1/τ2 ∼ Gamma (a2, b2), where,
al ∼ Gamma(.1, .1),
bl ∼ Gamma(.1, .1),
l = 1, 2.
For the splines and cubic radial basis functions, the coefficients of fixed terms were assigned the prior
N(0, σ2u), with σ2
u having a Half-Cauchy(25) prior. The latent depth variable, z, was assigned a prior of
N(0, σ2z ), with σ2
z having a Half-Cauchy(1) prior. These half -Cauchy choices are not restrictive, since
the median of the Half-Cauchy(1) is 1, the mode is 0, and the mid 90% of the distribution lies within
(.08,12.7), while for Half-Cauchy(25), the mid 90% lies between (1.97, 318).
The coefficients of the orthogonal polynomials were assigned priors of N(0, 3.3). This choice was
influenced by the number of fixed terms. With large numbers of terms, it was important to keep their sum
within numeric computing range during the burn-in. Given that their sum lay between -0.8 and -0.2, this
prior did not seem too restrictive.
Similar considerations applied to the choice of Gamma (0.1, 0.1) for the hyperpriors for the parame-
ters of the inverse gamma distributions from which the variances for the spatial and non-spatial random
components were drawn; a distribution with a mean of 1 and a variance of 10 (and mid 90% in (0,10))
seemed reasonable for these hyperpriors.
190CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
5.4.5 Model comparisons
The choice of neighbourhood model is made using the Deviance Information Criterion (DIC) [Spiegel-
halter et al., 2002], available in WinBUGS, while using a common fixed specification (orthogonal polyno-
mials of degree 8 for each treatment). And again, the choice for the fixed effects is made using a common
CAR neighbourhood specification (a maximum of 4 possible neighbours in the same depth plane).
5.4.6 Implementation Details
Initially, treatment effects were expressed in terms of design matrices, X, and MCMC iterations esti-
mating the treatment effects iterated over all 810 observations. However, within WinBUGS, it is more
useful to think of the fitted models as fitting a value for each of nine treatments at each of 15 depths (135
estimates), and using indices to assign this fitted value to each of the 6 site observations for a treatment.
This change speeds convergence and reduces memory requirements.
Neighbourhood matrices, othogonal polynomials, and the inverse matrices required by the cubic
radial bases were calculated outside WinBUGS and formed part of the data description. Spline bases
were calculated in WinBUGS, which is necessary when the errors-in-measurement model for depth was
used, as the bases change with each change in the latent depth variable.
All models were run as scripts, with at least a 60,000 iterations for burn-in (140,000 for errors-in-
variables models) with 200,000 iterations in all for the more complex models. Models were set up with
two chains and Gelman-Rubin statistics checked.
5.5 Results
5.5.1 Assessing presence of spatial correlation
The presence of spatial correlation was demonstrated by Moran’s I [Banerjee et al., 2004]. Table 5.2
shows this statistic as being statistically significant at the 10% level for ten of the fifteen depths. The
pattern of significance was also reflected in the significance of the ratio of spatial variance to non-spatial
variance differing from 1, with lower variability being shown at the central depths.
5.5. Results 191
5.5.2 Determining neighbourhoods and random components
Table 5.1 compares several models with differing neighbourhood structures, and in some cases, differing
random components. In this table all models have the same fixed design, with orthogonal polynomials
of degree 8 for each treatment (the ‘base’ model). This polynomial model was used because it had been
shown to adequately model the treatments as a function of depth when using a single pooled error, and
examination of the residuals along the depth dimension had demonstrated that the model was not induc-
ing autocorrelated residuals by depth. Models are compared using the effective number of parameters
(pD) and Deviance Information Criterion (DIC). The essential set of comparisons is used to determine
an appropriate neighbourhood, and each model in this set has a single homogeneous variance component
(τ2), and 15 spatial variance components (σ2d). Three additional models are considered, one of which
includes depth neighbours and which therefore cannot be fitted with differing spatial variance compo-
nents. This model has 15 homogeneous variance components (τ2d). The second additional model has
the same variance structure but has a 4 neighbourhood CAR structure. The final model has 15 spatial
variance components and 15 homogeneous random variance components and the same 4 neighbour CAR
structure. The major set of comparisons is between models having the same variance structure: a random
component structure of 15 spatial random components (σ2d) and a common homogeneous variance (τ2).
Table 5.1 compares a base model together with models having the same variance component structure:
1. One common pooled variance for error across all sites and depths (the ‘base’ model);
2. CAR model with a maximum of two neighbours per site (along the row), 15 spatial variances, one
for each depth, and a single homogeneous error variance;
3. CAR model with a maximum of four neighbours per site (directly adjacent in row and column),
15 spatial variances, one for each depth, and a single homogeneous error variance;
4. CAR model with a maximum of eight neighbours per site (includes diagonally adjacent sites), 15
spatial variances, one for each depth, and a single homogeneous error variance;
5. AR(1), AR(1) model [Gilmour et al., 1997], with a different autocorrelation components for each
depth layer, along the rows, and a common AR(1) component across the rows;
Table 5.1 also shows a further set of comparisons which allow the determination of the random
component structure, and also determine whether depth neighbours should be fitted within the CAR
modelling offered under WinBUGS.
192CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
1. CAR model with a maximum of 6 neighbours per site (2 of which are depth neighbours), one
spatial variance, and 15 homogeneous error variances, one for each depth;
2. CAR model with a maximum of 4 neighbours per site, one spatial variance, and 15 homogeneous
error variances, one for each depth;
3. CAR model with a maximum of 4 neighbours per site, 15 spatial variance components, and 15
homogeneous error variances, one for each depth.
Using the DIC criterion, the preferred model is that having a maximum of 4 neighbours in the same
layer, with the 15 differing spatial variances. Including depth neighbours with the one spatial variance
component and the 15 non-spatial variance components gave a poorer model than the 4 neighbour model
with the same random component structure. Somewhat surprisingly, the 30 variance component model
was less satisfactory than either of the models with the same first-order neighbourhood model, and 16
variance components
In Table 5.3 we examine various fixed effects models while using a common random component
specification. All models in this table use a homogeneous random component, and a CAR specification
with a maximum of 4 first-order neighbours in the same depth layer, and the CAR model for each depth
having a differing spatial variance.
1. The saturated model with 9 treatments × 15 depth terms;
2. Three orthogonal polynomial models of (a) degree 6, (b) degree 8, and (c) degree 10;
3. Three linear spline models with (a) 4 internal knots, (b) 4 internal knots, and the assumption of
errors in the depth measurement, and (c) 5 internal knots, and the assumption of errors in the depth
measurement;
4. Two cubic radial bases models with 5 internal knots and (a) no assumption of error in the depth
measurement, and (b) the assumption of errors in the depth measurement;
5. Cubic spline with 5 internal knots.
The three polynomial models were fitted to choose a good base model to allow comparison of the
CAR models, and show that the choice of an 8 degree polynomial model was a reasonable choice for the
comparisons of Table 5.1.
5.5. Results 193
The poor fit of the saturated model (9 × 15 terms) reflects the biology of the system. Treatments be-
come increasingly irrelevant with depth, with the roots of each crop becoming unable to access moisture
at the deeper depths. When the errors-in-variables model is not fitted, the best model of the orthogonal
polynomials is the set of orthogonal polynomials of degree 8, and the best model that using cubic radial
bases.
The linear spline models with four and five knots and errors-in-measurement for depth, which are
roughly equivalent, provide a better fit than the simple linear spline model.
The cubic radial bases model with the 5 interior knots, and the same model with the errors-in-
measurement component provide the best of the spline basis fits. Despite the apparent lack of necessity
for the errors-in-variables modelling with these bases, the errors-in-measurement model is preferred since
this matches the known occurrence of soil shrinkage and expansion. The estimated differences between
true depth and nominal depth were effectively the same for both the linear spline model and the cubic
radial bases model.
Contrasts of interest were monitored at each depth. For the models without errors-in-measurement,
the contrasts were, as expected, sharper than those for the models with measurement error. However,
the patterns were largely the same. The major differences established between treatment groupings are
those for cropping versus pasture and for lucerne versus the perennial grasses, with cropping giving
the higher moisture values, perennial grasses the next highest values, and lucernes giving the lowest
moisture values. The differences are most marked at the shallower depths. The hoped-for difference
between response cropping and long fallowing was observed at the intermediate depths (from 160 cm -
200 cm, for the polynomial model, and from 180 cm - 200 cm for the errors-in-variables models). See
Figure 5.1.
As expected, all fixed effects curves from the various fixed effects models with the 4 neighbour
hierarchical CAR model have wider credible intervals than those for the corresponding models with no
spatial correlation taken into account (not shown). The CAR analysis is more realistic in that spatial
correlation has been accounted for.
Figure 5.2 shows the linear spline fit for the treatment effects of the model from errors-in-measurement
model, again with the hierarchical 4 neighbour CAR spatial model. This graph shows great variation in
true depth where there is rapid drying of the profile. The credible intervals of Figure 5.2 also show
greater variability for the fixed component at both the shallower and deeper depths, which was ob-
194CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
served in all model fits. These fitted curves are essentially the same as those for the cubic radial bases
errors-in-measurement curve fit (Figure 5.3). Based on the DIC, the spline model without the errors-in-
measurements gives an inferior fit to the polynomial model when fitted without this, but an apparently
improved fit when we include the possibility of depth being measured with error. However, the cubic
radial bases fit without errors-in-measurement squanders fewer parameters, and has a lower DIC than the
other two models. Thus, if we were to judge on DIC alone, it would be the preferred model, but, given
that the errors-in-measurement model is appropriate, there is a strong case for preferring the cubic radial
bases model with errors-in-measurement.
At the greater depths, predicted moisture levels differ little between treatments. Predicted mois-
ture levels differ most markedly between treatments at the shallower depths, but again no difference is
observed between response cropping and long fallowing, at these depths.
Figure 5.4 shows the ratio of the standard deviations for the spatial neighbourhood residual compo-
nents to the standard deviation for the non-spatial variance, together with 95% credible intervals. Again
we see greater spatial variation at both the shallow and at the greater depths, with the smaller variance
components being at depths from 60 cm to 200 cm, for the various fixed models. The spatial variation is
not significantly different from the non-spatial variation at the shallower depths (from 20 cm to 140 cm),
while at the intermediate depths, from 160 cm to 200 cm, the spatial variance component is smaller than
the non-spatial variance component, with the spatial variation being larger than the non-spatial variation
at the greater depths (from 240 cm to 300 cm). This aspect of the spatial variation is consistent over the 2,
4 and 8 neighbour CAR models, and over the spline-basis and orthogonal polynomials. Clearly, the total
variation drives this, since the 15 homogeneous error variance model with one spatial variance model is
essentially equivalent to the 15 spatial variance model with one homogeneous variance component (Ta-
ble 5.1). After fitting any fixed effects model, the total residual variation not accounted for by the fixed
model is greater at both shallower and deeper depths.
Table 5.4 gives the contrasts between treatment types at various depths under the errors-in-variables
cubic radial bases model. The contrasts are more tightly estimated under the model without errors-in-
measurement of depth. However, the significant contrasts closely parallel each other. The contrasts are
shown in Figure 5.1.
5.6. Discussion 195
5.6 Discussion
In this paper, there were two critical issues: (1) to find an appropriate way of dealing with spatial cor-
relation, and (2) to find an appropriate model for the treatment effects. Having accomplished these two
goals, it then becomes possible to make inferences for the questions asked by the soil scientists.
The primary difficulty was the determination of the spatial model. Given that the data are point
referenced, an obvious choice of spatial model was a kriging model, such as that of Gotway and Cressie
[1990]. However, the large number of terms in the fixed part of the model made such an approach
impossible in the MCMC framework of WinBUGS. Additionally, including depth in the calculation
of distance, would have meant greater difficulties in disentangling treatment effects over depth from
spatial modelling considerations. Software such as SAS PROC MIXED [SAS Institute, 2004] offers the
possibility of both kriging and the various correlation model structures of Gilmour et al. [1997] within
a REML or ML framework, but when a model is poorly specified or very complex, PROC MIXED can
be difficult to use, and neither SAS nor the package of Gilmour et al. [2005] is freely available. We had
hoped to show that the CAR models used were comparable to the AR(1), AR(1) basis models of Gilmour
et al. [1997], Stefanova et al. [2009]. These were unable to be fitted within WinBUGS with the desired
complexity, but show comparability with the best CAR models of Table 5.1 (∆DIC = 257).
We used CAR models, first introduced by Besag [1974], and elaborated by Besag and coworkers
in Besag et al. [1995], Besag and Higdon [1999] for agricultural lattices. In recent work [Besag and
Mondal, 2005, Lindgren et al., 2010], CAR models have been shown to be closely related to kriging, but
whereas in kriging the highly dense covariance matrix is used, in a CAR model, a sparse precision matrix
is the basis for estimation.
Here, given that we wished to model the moisture measurements as a function of the depth, it made
good sense not to include depth in the CAR model specifications. The restriction of neighbours to the
same horizontal layer permitted the fitting of spatial residual components with differing variances, while
also avoiding the problem of the scale difference of depth when compared to row/column scale. Thus,
we were able to see (Table 5.1) that a model with 15 spatial variance components and one homogeneous
component (∆DIC = 300) was roughly equivalent to the model having 15 homogeneous components
and one spatial variance (∆DIC = 270), with both being better than the model which allowed both sets
of variance components to vary by depth (∆DIC = 76).
196CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Interestingly, the inclusion of depth neighbours in a CAR first-order neighbourhood model, with a
single spatial variance component (σ2), and 15 homogeneous variance components (τ2d), led to a poorer
model, leading us to believe that where measurements are made in 3 dimensions and taken at very differ-
ent scales, autocorrelations should be modelled separately. In future work, we would like to model depth
dimension autocorrelations with an AR(1) or ARIMA type model. However, here, with the use of an
errors-in-measurement model for the treatment effects, it would seem that the larger part of any residual
depth autocorrelation has been dealt with, while the failure to deal with rows and columns separately is
not grave.
A disadvantage of the CAR modelling choice was that within the WinBUGS framework, weights
must be chosen a priori and not estimated as in Besag and Higdon [1999], Besag and Mondal [2005].
The lattice framework, so typically found in agricultural data, needs an anisotropic treatment such as
that found in the Besag models already cited and in the models of Gilmour et al. [1997], Stefanova et al.
[2009].
We chose to model within a Bayesian framework for a number of reasons. The CAR models, readily
available in WinBUGS, were more flexible than potentially equivalent kriging models and other con-
tinuous space models available for point referenced data. Additionally, the WinBUGS framework for
MCMC analysis is both transparent and accessible to analysts of all skill levels. Analysis proceeds by
formulating the model as a set of conditional distributions and simulating realisations directly from the
posterior distributions of the parameters. Moreover, once the model has converged to the stationary dis-
tribution, most quantities of interest may be estimated. For example, the ratio of the square root of the
spatial variances to the overall non-spatial variance may be calculated in each MCMC iteration and the
samples monitored to find 95% posterior credible intervals.
Having accounted for spatial variation via CAR modelling, the concern was to choose a treatments
effect model and estimate treatment differences. The polynomial models were useful as a base compar-
ison, since they had been shown to have minimally autocorrelated residuals along the depth dimension,
and treatment effects could have been fitted adequately using the 8 degree polynomial, the linear spline
with depth measured with error, or the cubic radial bases model (with or without error). One of the
strengths of the WinBUGS framework was that it was possible to fit an errors-in-variables model, and
there are two good reasons for fitting errors-in-variables models: Firstly it is untenable in most regression
frameworks to believe that the response variable is measured with error, while the explanatory variable
5.6. Discussion 197
is not, and secondly, and possibly more importantly in this instance, in an earlier report [Ringrose-Voase
et al., 2003] the researchers were applying a complex formula to the measured depth in order to find the
true depth. Thus, this ability to model true depth was a useful extension of the model.
Table 5.3 shows the near equivalence of a number of competing treatment models. The treatment
contrasts shown in Table 5.4 and Figure 5.1 are those from the cubic radial bases with errors-in-variables
model, and do not differ significantly from the corresponding graphs and tables for the linear-spline
model with errors in variables. There are significant differences between cropping and pastures, while the
contrast of interest, between response cropping and long fallow rotation, is observed at the intermediate
depths. However, Figure 5.1 shows sufficient difference between the two types of cropping in the critical
part of the profile, for response cropping to be recommended should such a difference be repeated in
further data. The differences are in the mid-depth range where moisture uptake is needed to prevent
salination.
For some time the methods of Besag [1974] for analysing spatially correlated data have been avail-
able via the freely available software, WinBUGS, and many papers have been written using conditional
auto-regressive (CAR) models to smooth spatial data, particularly in the field of spatial epidemiology, see,
for example, Bernardinelli et al. [1995], Clements et al. [2008], Earnest et al. [2010], Elliott [2000]. How-
ever, few authors analysing agricultural lattice data have chosen to use Markov Random Field methodol-
ogy where the data are point-referenced. An early paper promoting CAR methods for lattice plots was
Besag et al. [1995] which analysed strawberry data in a lattice plot.
Other methods for agricultural spatially correlated data include those of Cullis and Gleeson [1991]
and Cullis et al. [1989], which use ARIMA models to account for spatial autocorrelation and model the
variance components using REML. In a later version of this approach, Gilmour et al. [1997] fit a complete
blocks model and AR(1), AR(1) models as a starting point for their REML modelling, and look at kriging
graphs on the residuals to determine how the data may be better modelled by the introduction of further
‘global’ extraneous random effects. The general consensus of these various authors is that agricultural
lattice data should be dealt with anisotropically. This could not be done within the framework we used.
However, we believe that the complex modelling shown here may illustrate a general truth, that where
lattice points on a 3-dimensional grid are far from equally spaced, the data need to be considered via an
approach resembling that used here with layering being possible where the measurements are roughly
equally spaced.
198CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
In summary, this paper has extended the usual CAR modelling with a single spatial neighbourhood
matrix to a hierarchical CAR model with 15 different spatial variance components. It has also demon-
strated the richness available in modelling within a Bayesian framework, by combining this more com-
plex CAR spatial modelling framework with fixed effects treatment models with many terms, and more
importantly, an errors-in-variables model. We hope that this demonstration of the flexibility of CAR
models and their ease of fitting, together with the simplicity of fitting complex fixed effects models may
lead to greater use of CAR models and of Bayesian modelling in agricultural research.
Acknowledgments
This research was supported by the ARC Centre of Excellence in Complex Dynamic Systems and Con-
trol, and by QUT.
We thank Dr Alison Bowman of NSW Industry and Investment for her interest in, and support of
this work. We thank, too, Professor Matt Wand of the University of Wollongong whose generosity with
his understanding of non-parametric modelling led to the ‘semi-parametric’ modelling of the data, and
hence, to the errors-in-variables model.
This paper is dedicated to the memory of Julian Besag, a pioneer in this field of research, a teacher
and a friend.
5.7. Tables 199
5.7 Tables
Table 5.1 Comparing spatial neighbourhood modelling. Treatment effects model is identical forall models (Orthogonal polynomial degree 8). Models have 15 spatial variance com-ponents (σ2
d), and one homogeneous variance component (τ2), except where otherwisestated.
Description pD DIC ∆ DIC
Base model: No spatial component 81 -2690 -
Linear CAR (maximum 2 horiz neighbours) 264 -2811 121CAR (maximum 4 horiz neighbours) 358 -2990 300 †CAR (maximum 8 horiz neighbours) 320 -2930 240AR(1), AR(1) 945 -2947 257CAR (maximum 4 horiz neighbours, 2 depth)* 109 -2752 62CAR (maximum 4 horiz neighbours)* 110 -2960 270CAR (maximum 4 horiz neighbours) ** 121 -2766 76
∆ DIC=DIC(Base Model)−DIC†indicates the favoured neighbourhood model.* Models with 1 spatial variance & 15 non-spatial variance components.** Model with 15 spatial & 15 non-spatial variance components.
200CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Table 5.2 Values of Moran’s I for each depth layer. A normal approximation is used for testingsignificance.
Depth Moran’s I Prob
20 -0.062 0.473040 -0.063 0.465560 0.210 0.0028 *80 0.204 0.0037 *
100 0.115 0.0907 *120 0.125 0.0668 *140 0.139 0.0434 *160 0.107 0.1135180 -0.002 0.9199200 0.069 0.2844220 0.189 0.0068 *240 0.407 <.0001 *260 0.242 0.0006 *280 0.147 0.0329 *300 0.176 0.0118 *
* Significant at α=10%
5.7. Tables 201
Table 5.3 Comparing Fixed Effects modelling. Random components for all models are givenby 4 neighbour CAR with 15 depth variances (σ2
d), and one homogeneous variancecomponent (τ2).
Deg/Knots No. Terms Type D pD DIC ∆ DIC
135 Saturated Model (9 × 15 terms) 809 -2319 -
6 63 Orthogonal poly 297 -2970 6518 81 358 -2990 671 †
10 99 371 -2967 648
4 54 Linear Spline 318 -2923 6044 54 (+error in depth) 369 -3002 683 †5 63 (+error in depth) 401 -2999 680 †
5 81 Cubic radial bases 327 -2954 6355 81 (+error in depth) 368 -3013 694 †
5 81 Cubic Spline 257 -2769 450
†indicates the best fixed effects model of its type.No. Terms is the number of fitted fixed effects termspD, DIC given for the moisture value to allow comparison.∆ DIC: DIC(saturated model) - DIC.
202CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Table 5.4 Contrasts at nominal depths: Cubic radial bases model where depth is measured witherror.
Contrast Depth Est 95%CI Prob
Long Fallow v Opportunity cropping 180 0.033 0.004 0.063 0.0246200 0.033 0.002 0.066 0.0357
Crop v pasture 20 0.440 0.401 0.484 <.000140 0.372 0.346 0.399 <.000160 0.314 0.286 0.340 <.000180 0.270 0.244 0.294 <.0001
100 0.236 0.212 0.261 <.0001120 0.202 0.178 0.228 <.0001140 0.164 0.142 0.188 <.0001160 0.130 0.106 0.154 <.0001180 0.106 0.082 0.129 <.0001200 0.093 0.069 0.118 <.0001220 0.090 0.065 0.116 <.0001240 0.092 0.065 0.120 <.0001260 0.092 0.062 0.122 <.0001280 0.085 0.050 0.119 <.0001300 0.071 0.009 0.127 0.0279
Lucernes v native pasture 20 -0.305 -0.369 -0.245 <.000140 -0.301 -0.343 -0.254 <.000160 -0.290 -0.333 -0.239 <.000180 -0.273 -0.316 -0.228 <.0001
100 -0.243 -0.290 -0.202 <.0001120 -0.188 -0.237 -0.146 <.0001140 -0.115 -0.155 -0.072 <.0001160 -0.054 -0.092 -0.006 0.0322
Only contrasts with CIs not containing zero shown
5.8. Figures 203
Figure 5.1 95% credible intervals for the contrast differences based on the cubic radial basesmodel with errors-in-measurement (graphed where the 95% CI did not cover zero).The lines with the widest tops and tails show “Long Fallow - Response Cropping”,with the thinnest “Lucerne - Native Pastures”, and those with medium width “Crop -Pasture”.
5.8 Figures
0.5
0.4 f !
! 1 1
o2 1 I I 01 1 1 ± ± ! I f I ""················ ················ ···r+ .l ............. .... ... .
~= I l I I I I -0.3
0.3
0 100 200 300
Depth (cm)
204CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Figure 5.2 Fixed effects curves for errors-in-variables model: Linear spline treatment effects &95% credible intervals, CAR model, sites 1-54. The true depths are those implied bythe errors-in-measurement model. For each treatment there are 6 sites, each with thesame treatment curve.
-0.2
-0.3
i -0.4
§ a
i -0.5
-0.6
~ ·I§ )1
-0.7
-0.8
-0.9
0 100
l.Long Fallow (1) 2.Long Fallow (2) 3.Long Fallow (3)
4.Continuous 5.Response Cropping(l)
-- 6.Response Cropping(2) -- 7.Pasture aucerne -1)
200
8.Pasture auceme-2) 9.Pastme (native)
300
Estimated true depth (cm)
400
5.8. Figures 205
Figure 5.3 Fixed effects curves for errors-in-variables model: Cubic radial bases model showingestimates at the nominal depth. Depth has been jittered to allow credible intervals tobe seen.
206CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Figure 5.4 95% CI for the ratio of square root of the spatial variance to that of the non-spatialvariance at the fifteen depths: Cubic radial bases model with errors-in-measurementfor depth.
BIBLIOGRAPHY 207
Bibliography
Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for spatial data.
Monographs on statistics and applied probability. Boca Raton, London, New York, Washington D.C.:
Chapman & Hall.
Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995). Bayesian
analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22), 2433–2443.
Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J.
R. Statist. Soc. B 36(2), 192–236.
Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic
systems. Statistical Science 10(1), 3–41.
Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal of the
Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.
Besag, J. E. and C. Kooperberg (1995). On conditional and intrinsic autoregressions. Biometrika 82(4),
733–746.
Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.
Biometrika 92(4), 909–920.
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial
statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.
Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salinity studies
- Liverpool Plains, TS94.026. Technical report, New South Wales Department of Water Resources.
Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F. Gabrielli,
and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated uncertainty, West
Africa. Emerging Infectious Diseases 14(10), 1629–1632.
Cressie, N. A. C. (1991). Statistics for spatial data. Wiley series in probability and mathematical
statistics. Applied probability and statistics. New York: John Wiley.
208CHAPTER 5. PAPER THREE: AN ANALYSIS OF A FIELD TRIAL WITH THREE
SPATIAL DIMENSIONS
Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to two dimen-
sions. Biometrics 47, 1449–1460.
Cullis, B. R., W. J. Lill, J. A. Fisher, B. J. Read, and A. C. Gleeson (1989). A new procedure for the
analysis of early generation variety trials. Journal of the Royal Statistical Society Series C Applied
Statistics 38(2), 361–375.
Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship between
yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions. Australian Journal
of Experimental Agriculture 41, 211–217.
Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn, D. Muscatello,
and K. Mengersen (2010). Small area estimation of sparse disease counts using shared component
models-application to birth defect registry data in New South Wales, Australia. Health & Place 16,
684–693.
Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publications. Ox-
ford: Oxford University Press.
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extraneous variation
in the analysis of field experiments. Journal of Agricultural Biological and Environmental Statistics 2,
269–293.
Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide Release 2.0.
Technical report, VSN International Ltd, Hemel Hempstead, UK.
Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water infiltration.
Water resources research 26(11), 2695–2703.
Lindgren, F., H. Rue, and J. Lindstrom (2010). An explicit link between Gaussian fields and Gaussian
Markov random fields: The SPDE approach. Journal of the Royal Statistical Society Series B, to
appear.
Marley, J. K. and M. P. Wand (2010). Non-standard semiparametric regression via BRugs. Journal of
Statistical Software 37(5), 1–30.
BIBLIOGRAPHY 209
Ngo, L. and M. Wand (2004). Smoothing with mixed model software. Journal of Statistical Software 9,
1–56.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,
M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,
R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses
in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report
Series, NSW Agriculture Orange.
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca Raton:
Chapman & Hall/CRC.
SAS Institute (2004). SAS Version 9.1.3. Cary, NC., USA: SAS Institute Inc.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),
583–639.
Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial analysis of
field trials. Journal of Agricultural Biological and Environmental Statistics 14(4), 392–410.
Wand, M. P. (2009). Semiparametric and graphical models. Australian and New Zealand Journal of
Statistics 51(1), 9–41.
Statemen• of Contribution of Co·AUthors for Thesis by Publication
The authors listed below oerti!y !hat;
1. they meet !he Glitena for authorship, in that they have pa:cic:pated ln the conception, execution, or fnlerpretation, oi a! least that part of the in lhaldiold ol expertise;
2. they lake public responsibility tor their pul)iic:ation, except tor lhe rBSj:>onsib!e· ttuthQf who accepts overai! ''"mnn<ihilitv tt1e pui;ilciilion;
3. lheee ara oo o1her authors according to thl3se criteria;
4. potential conflicts.of interest have been di,;clcJsEd granting bOdies, or publisher of or other pulalic:ations, head olthe ""'.nnn,ihiA
academic· un!t, and
5. they agree to the use ofine publication in !he student's ltlesis and its on the Australasian Digital Thesis database consistent with any limitations set tJy publisher requirements~
In~ t:ase otChapl.er 6;
T!tk>: Cflfl!parlson of three dlmensiooal profiles O.V!!!' tim& JOul'll!ll: Journal of Applied S!a!islics Status: Submitted December 20i 0
! stit:;;:m~nTol££lf1iiibution _______ ,:. S!g~aitil'e _____ -JJ>iue_~-'-! ! As first author was r;osponsible fonhe: 1 conc~t ol the paper, data Malys1s, ; interpretation and the writing of an drafts1_ as ! wail as lhEi iinai version. Determined thecfil1al i CAR model to be thus contributing to j the final of the Gibbs csamplEwior the ! CP.H layered mode!. Wrote its mathematical ! description and that ofthe fullmooc'lc 1 Programmed the DIG calculations an(! the j neighbourhood matrices,
---------1 i Was responsible !or advice on : Dr Clair Al$100
! Dr Chrls strlddand
i l"ll€asoremei1ts <md their meaning and i editorial commant
------------:---------~------~ 1 Programmed and dellatoped the G~ ! : sampler tar tl'le CAR layer mOdel in pylii!GMG J
~~----1-------___;--l.!!ld su:pervii>edMl<rgarafs d~ll:m.i:!f it, in ~--~---~ ' addition to supervising the mathematical '
l Rick Y auny
i description of the rnOdel.
i Was responsibiB for advice on lhe purplse i and background to ihe field trial, advice on i the of statiStical results and editorial i comment
;·-----~~ -----------+------"")-----! l ProtessorKerrle 1 Was forgooeralad\liceand l MengBrsen 1 editorial comment ~ ·------~----- ----------------------~------------L-------~
Principal Supervisor's Confirmation
1 have sight\ild email or olher correspoodence from all co-authors confirming their certifying
authorshr~ j ~ wdAJ td. -2JI/1 in Name 1 Signature · Date J!-
Chapter 6
Comparison of three dimensional
profiles over time
Preamble
This chapter addresses research objective (4) and fits a model to several days data to allow some of the
complexities of modelling the full dataset to be explored. This chapter uses Gaussian Markov Random
Fields and their sparse matrix representation to allow efficient block updating of the spatial residual
components. Banerjee et al. [2004] show the pointwise conditional CAR specifications lead to a global
spatial specification via Brook [1964]. Rue and Held [2005] show that the pointwise specification is
equivalent to a global Gaussian Markov Random Field specification. Here, unlike Rue and Held [2005]
and [Martino and Rue, 2008] who divide the nodes into two disjoint sets, and update the one conditional
on the other, we update the full set of spatial components as a block using the Krylov subspace methods
of Simpson et al. [2008], Strickland et al. [2010].
There are currently two programs using the GMRF framework for fitting spatial models with block
updating. INLA, developed by Rue and collaborators, fits into an R framework and is a reasonably
transparent framework for fitting models of the kind discussed here. BayesX, developed by Belitz et al.
[2009a,b] provides a framework for fitting additive models, together with CAR spatial priors and is
somewhat easier to use. However, having fitted the complex models of Chapter 5.2, I wished to fit
similar models but for larger datasets. The software of BayesX and INLA proved somewhat difficult
211
212CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
to use with the complex CAR models which seemed to be needed. (For further discussion of this see
Chapter 7.1.)
Choosing to fit the desired models in pyMCMC [Strickland, 2010], a purpose-built MCMC frame-
work developed in Python, allowed considerably greater insight into the fitted models.
The appendix B gives tables for the differences in the various random components over time and
over depths, together with differences in slopes in the last linear segment of the depth curves for each
treatment and day, and contrast estimates. Contour curves for the spatial random errors for each day and
depth (75 graphs) are also given and show considerable continuity across the depths within a day.
I am the principal author and the paper is reprinted here with its abstract, but with different biblio-
graphic conventions from the Journal of Applied Statistics, to which it has been submitted. Rick Young
provided the data, helped with agricultural interpretations, in addition to providing editorial comment.
Chris Strickland programmed and developed the Gibbs sampler for the CAR layer model in pyMCMC
and checked my description of it given in the appendix 6.7, in addition to clarifying and verifying the
mathematical description of the model given in Section 6.4.1. My contribution to the development of the
sampler was to direct & describe what was needed. I also programmed the neighbourhood matrix and
DIC calculations, and wrote the descriptions of the model and the Gibbs sampler. Clair Alston provided
major editorial advice in addition to advice on the collection and meaning of the data. Kerrie Mengersen
oversaw, helped with, and guided the exposition. As first author, I was responsible for concept of the
paper, the choice of CAR model, the data analysis, interpretation and the writing of all drafts as well as
the final version.
Title: Comparison of three dimensional profiles over time
Authors: Margaret Donalda, Chris Stricklanda, Clair Alstona, Rick Youngb, Kerrie Mengersena.
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,
QLD 4001, Australia.
bTamworth Agricultural Institute, Industry & Investment NSW, 4 Marsden Park Road, Calala, NSW
2340, Australia.
BIBLIOGRAPHY 213
Bibliography
Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for spatial data.
Monographs on statistics and applied probability. Boca Raton, London, New York, Washington D.C.:
Chapman & Hall.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). Bayesx Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). Bayesx Software for Bayesian Inference in
Structured Additive Regression Models Version 2.0.1 Software Methodology Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Brook, D. (1964). On the distinction between the conditional probability and the joint probability ap-
proaches in the specification of nearest-neighbour systems. Biometrika 51(3-4), 481.
Martino, S. and H. Rue (2008). Implementing approximate Bayesian inference using Integrated Nested
Laplace Approximation: A manual for the INLA program. Citeseer.
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca Raton:
Chapman & Hall/CRC.
Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov random
field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Available online:
http://eprints.qut.edu.au.
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal of Com-
putational and Graphical Statistics, 1–46. submitted August, 2010.
Strickland, C. M., D. P. Simpson, I. W. Turner, R. Denham, and K. L. Mengersen (2010). Fast Bayesian
analysis of spatial dynamic factor models for multi-temporal remotely sensed imagery.
6.1 Comparison of three dimensional profiles over time
Abstract
We describe an analysis for data collected on a three-dimensional spatial lattice with treatments applied at
the horizontal lattice points. Spatial correlation is accounted for using a conditional autoregressive (CAR)
model. Observations are defined as neighbours only if they are at the same nominal depth. This allows
the corresponding variance components to vary by depth. We use Markov Chain Monte Carlo (MCMC)
with block updating, together with Krylov subspace methods for efficient estimation of the model. The
method is applicable to both regular and irregular horizontal lattices and hence to data collected at any
set of horizontal sites for a set of depths or heights, for example, water column or soil profile data.
The model for the three-dimensional data is applied to agricultural trial data for five separate days taken
roughly six months apart, in order to determine possible relationships over time. The purpose of the trial
is to determine a form of cropping that leads to less moist soils in the root zone and beyond. We estimate
moisture for each date, depth and treatment accounting for spatial correlation and determine relationships
of these and other parameters over time.
keywords Bayesian; Conditional Autoregressive (CAR) models; depth profiles; Field trial; Linear
spline; Markov Chain Monte-Carlo (MCMC); Gaussian Markov random field (GMRF); Spatial autocor-
relation; Variance components.
6.2 Introduction
Despite numerous papers examining crop rotations and field experiments conducted over lengthy periods
of time, it remains a difficult problem to analyse such data satisfactorily, and in this case, the problem is
not that of a response measured over a surface, but that of a response measured over three dimensions
of the field. We describe field trial data from a long term crop rotation trial, conducted to determine
a cropping system which would maximise the use of stored water in the soil, and minimise the risk of
water leakage leaching the soil of its salts and endangering long-term agriculture on the Liverpool Plains
in New South Wales, Australia.
These data pose several problems: how to describe the treatment effect, how to account for spatial
6.3. Case Study 215
autocorrelation, how to account for spatial correlation over depths, and what might be an appropriate
model over time.
Preliminary analyses for one date’s data indicate that CAR models [Besag and Higdon, 1999, Be-
sag et al., 1991] describe the local spatial autocorrelations well. The choice of CAR models is further
discussed in Section 6.6. We use an identical model structure for each date to analyse five dates of soil
moisture measurements taken six months apart over a period of two years. In using the same model,
we wish to determine which parameters are constant over the different dates, and which are not, in an
exploration of the data prior to fitting a time-space model, with the final purpose of data analysis being
to determine the best cropping system of those considered.
The model for the treatment effect assumes that each treatment determines a depth profile curve for
each date. These treatment effects are modelled as continuous curves along the depth dimension, with
different curves for each date and treatment. We use linear splines to model treatment effect over depth,
as this allows trend comparisons over segments of the curve.
We present a methodology for fitting spatially correlated agricultural data, where data are three di-
mensional over space, which is computationally efficient.
Section 6.3 describes the data used in the case-study. Section 6.4 describes the model and the com-
putational framework for its estimation. Section 6.4 also describes the methods used for comparisons
of contrasts between and within dates. Section 6.5 provides the results of the case-study. Section 6.6
provides a discussion of the methods and the results.
6.3 Case Study
The four dimensional data used in this case-study consist of moisture observations taken at 108 surface
treatment sites, 15 depths and over 5 different dates during a two-year period. The 108 measurement
sites are arranged as 6 rows with 18 columns per row. Hence, data at each time point consist of 1620
measurements at 108 sites over 15 depths.
The purpose of the field trial is to determine a cropping system which leads to lower moisture values
in the soils, in order to minimise the risk of deep drainage. More complete details of the trial may be
found in Ringrose-Voase et al. [2003]. Nine treatments are considered. These fall into three groups, long
fallow cropping, response cropping and pasture treatments.
The primary question of interest to crop scientists is whether response cropping gives lower moisture
216CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
values both at the intermediate and greater depths, in comparison with long fallow, and whether this
is sustained over different stages of the cropping cycle. Subsidiary questions addressed here are the
comparison of cropping treatments with pasture treatments, and the comparison of the lucerne pasture
mixtures with the native grass pasture.
The concern in this paper is to establish whether the various components of the model vary from date
to date, and to determine the cropping system best suited to the land.
The treatments are
1. Treatments 1-3: Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are
grown in three years with an intervening 10-14 month fallow period. The 3 treatments are each of
3 phases of the long fallow 3 strip system.
2. Treatment 4: Continuous cropping in winter with wheat and barley grown alternately.
3. Treatments 5 and 6: Response cropping, where an appropriate crop (either a winter or a summer
crop) is planted when the depth of moist soil exceeded a predetermined level.
4. Treatments 7-9: Perennial pastures. The three treatments are lucerne (a deep rooted perennial
forage legume with high water use potential), lucerne grown with a winter growing perennial
grass,and a mixture of winter and summer growing perennial grasses.
The dates of the observations are almost equally spaced over two years (July 23, 1997, December 4,
1997, April 28, 1998, September 23, 1998, and February 25, 1999).
6.4 Methods
6.4.1 Model
The model describes data collected over the three spatial dimensions, in particular, over a three dimen-
sional lattice, with the response variable arising from an experimental treatment applied at lattice points
on the horizontal plane. The model consists of a regression or fixed effect component, a spatial component
and an irregular component or residual error. For the spatial component we consider both non-stationary
and stationary spatial processes. The regression component is specified via the treatments (determined
by the horizontal planar spatial locations), and a set of basis functions over the third spatial dimension,
depth. As in the case of two-dimensional spatial models, we enumerate the horizontal spatial locations in
6.4. Methods 217
a particular order which determines the spatial neighbourhood matrix. To simplify the definition of the
matrices associated with the variances and precisions, the (n × 1) vector of observations, y, is arranged
as D× S observations, where D is the number of lattice points in the depth dimension, and S the number
of spatial locations in the horizontal plane.
The model for y is as follows
y = Xβ + ψ + ϵ, (6.1)
where X is an (n × p) design matrix, β is a (p × 1) vector of regression coefficients, ψ is an (n × 1) vector
that models spatial correlation at each depth and ϵ is an (n × 1) residual vector that is homogeneous
within each depth. The design matrix, X, models the treatment effects as continuous functions of depth
for each treatment. The spatial covariance is modelled using a Gaussian Markov random field (GMRF).
Stationary and non-stationary covariance structures are considered. A proper conditional autoregressive
(CAR) prior [Gelfand and Vounatsou, 2003] is used for the stationary case, while an intrinsic CAR prior
[Besag et al., 1991, Rue and Held, 2005] is used in the non-stationary case. The spatial variation for ψ is
captured either through a proper prior on ψ in the stationary case such that
ψ ∼ N(0,Ω (ρ, τ)−1
),
or in the non-stationary case, through an improper prior
ψ ∼ N(0,Ω (τ)−1
).
Points on the lattice are defined as neighbours only if they lie in the same horizontal layer. The precision
matrices, Ω (ρ, τ) and Ω (τ), are (n × n) block diagonal matrices that depend on the horizontal neigh-
bourhood structure, the (D × 1) vector of scaling coefficients, τ2, and, in the stationary case, a spatial
dependence parameter ρ, where |ρ| < 1. In the non-stationary case ρ is not required. The block diagonal
structure permits D separate scaling coefficients, τ2, that model differing variances at each depth for the
spatial components.
The error, ϵ, is an n × 1 vector, that is defined such that
ϵ ∼ N (0,Σ (σ)) ,
218CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
where Σ (σ) is an (n × n) diagonal covariance matrix that is a function of a (D × 1) vector σ2 that allows
heterogeneity across depths in the non-spatial random component. The variance, Σ, is defined as
Σ = diag(σ2
1I, σ22I, . . . , σ2
DI),
where I is the S × S identity matrix. This structure arises from the ordering of y by depth and then by
spatial site. The residuals are modelled as having differing variances at each depth.
Two variables, depth and treatment, are used to describe the fixed effects. Here, depth is treated
as a continuous variable, and a set of basis functions is formed from it in order to fit splines. For the
case-study, the basis functions are linear splines, but other basis functions may be used [Ngo and Wand,
2004]. Treatment is a categorical variable, with T levels. The design matrix, X may be expressed as
X = A ⊗ B,
where A is a D×k matrix of k depth basis functions, ⊗ is the Kronecker product and B is an S ×T matrix
that matches the S horizontal sites with the appropriate set of T dummy variables for the site treatments.
The Kronecker product gives X as an n × p matrix with n = D × S and p = k × T .
The linear splines are defined as zk(d)=(d − κk)+, where
(d − κk)+ = d − κk, d ≥ κk,
= 0, d < κk,
for some knot sequence κ1, ...κk−2, and d = 1, 2, . . . ,D. The basis functions in A for each d are [1, d, z1(d), z2(d), . . . zk−2(d)]
[Ngo and Wand, 2004]. For the linear splines used in the case-study of this paper, the number of basis
functions, k, is the number of internal knots plus 2.
For the stationary CAR prior the precision matrix is
Ω (ρ, τ) = block diagonal(Q/τ2
1,Q/τ22, . . . ,Q/τ
2D
),
with τ2 an n× 1 vector of scaling coefficients permitting different variances at the D different depths, and
6.4. Methods 219
Q an S × S first order neighbourhood precision matrix common to each depth layer, and
Q = (M − ρW).
The neighbourhood matrix, W, is defined such that
wi j =
0 i = j,
−1 i ∼ j, (i, j are neighbours),
0 otherwise,
and M is given by
M = diag (n1, n2, . . . , nS ) ,
where ni is the number of neighbours of site i. See Gelfand and Vounatsou [2003].
In the depth layered scheme used here the non-stationary CAR prior is defined as
Ω (τ) = block diagonal(R/τ2
1, R/τ22, ...R/τ
2D
),
with R an S × S first order neighbourhood precision matrix whose elements ri j are specified by
ri j =
ni i = j,
−1 i ∼ j,
0 otherwise,
where ni is the number of neighbours for site i [Rue and Held, 2005].
6.4.2 Computation
Computation is performed using a general-purpose MCMC software framework currently under devel-
opment, which allows block updating of parameters. Programming is in Python and uses the Fortran
and C libraries, LAPACK, BLAS, SciPy of Anderson et al. [1999], Blackford et al. [2002], and NumPy
Community [2010] respectively. Model parameters are partitioned into five blocks, (ψ, τ,β,σ, ρ) , each
jointly sampled. Closed form samplers are used for all model parameters except ρ where a Metropolis
Hastings sampler is used. Block updating is found to be more efficient by various authors, see, for exam-
220CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
ple, Chib and Carlin [1999], Pitt and Shephard [1999]. Lui et al. [1994] show theoretically that jointly
sampling parameters in a Gibbs scheme leads to a reduction in correlation in the associated Markov chain
in comparison with the individual sampling of parameters. Block updating typically means that MCMC
chains converge faster.
The conditional autogressive models have a sparse precision matrix defined by the adjacency ma-
trix. The sparse matrix representation used here is the compressed sparse row format described by Saad
[2003]. Krylov subspace methods are used for updating [Simpson et al., 2008, Strickland et al., 2010].
Further computing details are given in Section 6.7, where block updating equations are given for the
posterior probabilities.
6.4.3 Fixed effects
Three models are considered for the regression component: a three-knot linear spline with knots at depth
indices 4, 7 & 10, a five-knot linear spline with equally spaced knots at 3.33, 5.67, 8, 10.33 and 12.67,
and a saturated model of 135 terms which fits a constant for each treatment by depth.
The linear splines allow discussion of trends across various depth segments. The five-knot linear
spline was the initial choice. However, with six segments defined over 15 depth points, linear trends
may not be seen because of the limited number (3) of different depths within a segment, so a three-knot
model was also considered. For comparison, a saturated model of treatments by depths or 9 × 15 = 135
parameters was also fitted.
Smooth continuous curves may be fitted using the generalised additive (GAM) framework of Hastie
and Tibshirani [1990], or the random walk (order 2) (RW2) smoothing of INLA [Martino and Rue, 2009],
the RW2 penalised splines of BayesX [Belitz et al., 2009a,b] which are described more fully in Brezger
and Lang [2006], Lang and Brezger [2004] and Kneib and Fahrmeir [2006]. Such frameworks seem
unnecessarily complicated for the problem here. (For example, a naive use of BayesX, gave a default 20
knots across the 15 depth values.) Additionally, the use of linear splines allows comparisons of trends
over the linear segments of the curves.
Choice of the regression component and final model is made using the Deviance Information Crite-
rion (DIC) [Spiegelhalter et al., 2002], an information criterion based on the Deviance and adjusted by
an estimated number of parameters. The results of these comparisons are reinforced by the curves of the
posterior deviance distributions [Aitkin, 1997]. See Figure 6.1.
6.5. Results 221
6.4.4 Contrast and parameter comparisons
Output for each MCMC simulation after burnin was kept for all model estimates. This permitted post-
hoc comparisons for any desired function both within and across the measurement dates. Contrasts of
interest are (1) Average Long fallow cropping minus average response cropping, (2) Average cropping
minus average pastures, and (3) Average lucernes minus native perennial pastures. These contrasts are
calculated for both the slope of the line segment from 200 cm to 300 cm, and for the moisture estimates
at each point in the depth profile.
Contrasts and slopes are compared across all combinations of dates, giving 10 comparisons for each
estimate. Comparisons within a date are formed by pairing the estimates from the same iteration. How-
ever, the estimates from each date’s model are independent. Hence, the across date comparisons are
formed after randomising the iterates.
We compare the variance components of the model in the same manner. For the random spatial
components a visual comparison only is made, using the 95% credible intervals for ψ for each site and
depth. For depths from 140 cm and onward, these credible intervals largely overlap.
6.5 Results
6.5.1 Model choice
The DIC (Table 6.1) indicates that the three-knot linear spline model with the stationary CAR prior is a
better model than the three-knot model with the non-stationary CAR, and a more appropriate model than
the stationary CAR prior saturated model or the five-knot linear spline on almost all of the five dates. On
the date (date 4, September 23, 1998) when the five-knot model is found to be the best, the three-knot
model is virtually equivalent. Clearly, tracking treatment effects at depths where they do not exist leads
to a poorer fit. However, it seems likely that the improvement observed with the five-knot model on
September 23, 1998 represents a slightly better fit at the shallower depths for that date. The three-knot
linear spline fits the bulk of the data well, but may be a poorer model for some dates at the shallower
depths covered by the first linear segment.
The Deviance curves for the three-knot linear spline, five-knot linear spline and saturated model
show the superiority of the five-knot linear spline model for date 4 (Figure 6.1). Plots of deviance curves
222CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
for all models and dates show the saturated model generally to be the poorest of the fitted models.
A useful byproduct of the DIC calculation is the calculation of pD, the effective number of parame-
ters. The saturated model contains 135 parameters and the variance components consist of 31 parameters,
but as can be seen from Table 6.1, pD is approximately equal to 5/8 of the degrees of freedom available
on any date. This proportion indicates that many of the spatial residual components might well be con-
sidered to be outliers of the CAR normal models [Spiegelhalter et al., 2002].
6.5.2 Variance components
Figures 6.2 and 6.3 show the square roots of the non-spatial and the spatial variance component pa-
rameters, σ2 and τ2, and indicate that they mirror each other at the various depths and over the dates.
Figure 6.2 illustrates the need for a non-spatial variance component for each depth, but that these compo-
nents may be constant over dates. The comparisons to see whether the non-spatial variance component
for each depth differs across dates, show that all 100 of the possible comparisons across dates for depths
from 120 cm to 300 cm have 95% credible intervals which include zero. For depths from 20 cm to 100
cm (50 comparisons) just 9 differences have credible intervals not inclusive of zero, and these all involve
comparisons with date 4. These intervals indicate that the non-spatial variance component, σ2, varies by
depth but not by date.
Figure 6.3 shows τ varying by depth, but being approximately constant across dates from depths 120
cm to 300 cm. Comparisons across dates show just 3 observed differences whose 95% credible interval
fails to include zero from a possible 100. There are apparently some differences across dates at the
shallower depths, with 25 of the 50 possible comparisons showing differences for depths from 20 cm to
100 cm, and these are generally differences with the τ values for date 4. The spatial variance components
vary by depth, but not by generally by date. This is particularly true for depths from 120 cm to 300 cm.
The variance component graphs (Figures 6.2, 6.3) show very much lower variability in the mid-depth
range. Date 4 (September 23, 1998) shows considerably smaller variances for the shallower depths than
those for the other dates for both the spatial and non-spatial variance components.
Tables 6.2 and 6.3 show values and comparisons for the parameter ρ. Just one of the possible
10 comparisons across dates has a 95% credible interval which did not include zero. ρ appears to be
effectively the same across dates.
6.5. Results 223
6.5.3 Depth segments and dates
The three-knot spline model consists of four linear segments for each treatment. Table B.15 shows the
95% credible intervals for the slope of the linear segment at the greatest depth (from 200 cm to 300 cm).
Almost all treatments show no trend in this segment. (The exceptions are treatment 8, a lucerne mixture
treatment which shows decreasing moisture in this line segment, and treatment 2 which on two of the
five dates shows increasing moisture.) In general, the last linear segment (from 200 cm - 300 cm) is
constant for all treatments over all dates. Hence, from about 200 cm depth and deeper, the treatments
would appear to no longer affect the moisture levels and moisture stays roughly constant but with greater
variability with increasing depth.
Contrasts between the dates for each treatment’s final slope give 95% credible intervals which include
zero for all treatments, except for treatments 1 and 9, which each show 4 of the 10 possible differences
between the dates’ final slopes as differing.
If we group long fallowing, response cropping and pastures and calculate a common final slope for
each grouping, these estimated slopes all have 95% credible intervals which include zero. Comparing
the contrasts for these grouped slopes across dates, no differences are found across dates. Mean moisture
levels from 200 cm to 300 cm do not change across the different dates for the various treatments and
types of cropping, but become more variable with depth.
6.5.4 Point by point contrasts
Figures 6.4- 6.6 graph the point by point contrasts for all depths and datess. The most important of these,
Figure 6.4, shows the 95% credible intervals for the long fallow versus response cropping contrast as
generally differing across dates at the shallower depths, but overlapping for the depths from 200 cm to
300 cm.
Tables for contrasts for the three-knot linear model are given in the online supplementary materials.
Table 6.5 shows the sign of each contrast whose 95% credible interval does not contain zero.
The statistical evidence is that the treatments no longer affect the moisture values from the depth of
200 cm to 300 cm, and given that the moisture profile is effectively flat at these depths, it seems that
moisture levels after 200 cm are constant for their treatment, but have greater variability than at the mid-
depths. The contrast of long fallow cropping (treatments 1-3) versus response cropping (treatments 5
& 6), has almost positive 95% credible intervals from 200 cm to 300 cm for all dates. Thus, it would
224CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
appear that for the five dates considered response cropping decreases moisture levels at the depth critical
for salination.
As expected, all contrasts for the contrast ‘Crop vs pasture’ (the average of treatments 1-6 minus the
average of treatments 7-9) are positive for all dates and depths, with the difference being roughly constant
from 200 cm to 300 cm. That is, cropping leads to moister soil than pastures.
The lucerne pasture mixtures (treatments 7 & 8) perform consistently better than the native pastures
for depths greater than 100 cm. That is, at these depths, lucerne mixtures lead to drier soil than the native
pastures.
The differences discussed above are also shown in the saturated model contrast differences but not so
markedly. These same contrasts when compared across the dates show essentially no differences in the
depths from 200 cm to 300 cm.
6.5.5 Spatial residual components, ψ
As indicated in Section 6.4.4, no formal comparisons were made for the spatial residuals across dates.
Graphs of their 95% credible intervals were plotted to inspect overlap or non-overlap. For depths from
140 cm and deeper the credible intervals largely overlap. Figure 6.7 gives contour graphs for these spatial
residuals at the depth of 240 cm for the different dates. These show considerable consistency across dates.
6.6 Discussion
In considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu [2007],
Piepho et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009] use mixed models
within a REML framework to analyse their spatio-temporal data, and explicitly address the fitting of
state-space models via standard software and REML. The fixed part of their models is generally simple
and the data are measured on two spatial dimensions. Some soil profile studies [Macdonald et al., 2009]
do not use spatial information in the analysis. Some studies composite the soils from different depths
across soil types or treatment [Sleutel et al., 2009]. Others [Nayyar et al., 2009] use the mixed modelling
framework advocated by Piepho et al. [2004]. Roy and Blois [2008] is one of the few papers in an
agricultural context which uses conditional autoregressive models. The current methodology of choice
for agricultural data, which accounts for spatial correlation would seem to be mixed modelling to describe
6.6. Discussion 225
spatial and other variance components, using REML. Despite the work of Besag et al. [1995], Besag and
Higdon [1993, 1999] there has been almost no use of CAR models for agricultural analyses. We use
conditional autoregressive models for their simplicity and their capacity to allow reasonably complex
fixed model components. Working with the sparse precision matrix from the adjacency matrix rather
than from a dense covariance matrix permits efficient model fitting. Besag and Mondal [2005], Lindgren
et al. [2010] show the equivalence of various kriging and CAR models.
The use of block updating allows good mixing and the Krylov subspace methods exploit the sparse
structure of the precision matrix to give efficient sampling.
The choice to allow neighbours only at the same depth is made for several reasons. Firstly, with depth
an important part of the regression component, to include depth-neighbours would confound estimation
of the treatment effects. Secondly, and more importantly, it permits the fitting of differing variances for
the spatial components at each depth. Finally, using the obvious choice of distance weighted neighbours
would mean that with the great differences in scale between horizontal and vertical distances the neigh-
bourhood model would degenerate effectively into a depth neighbourhood model only, while using (1,0)
neighbours would also be difficult to justify. This consideration seems likely to apply in many agricultural
contexts where observations are made in three spatial dimensions. We use a first order neighbourhood
across the horizontal lattice with (1,0) weights.
We fitted the same model to five dates of data aiming to discover how best to fit a model for the full
data. It largely appears that several important parameters of the model (ρ, τ and σ) are constant across
dates for the depths which are of concern for salination. (If we should wish to model moisture at all
depths, classifying dates as wet or dry on the basis of previous rainfall, may be useful to distinguish such
dates as date 4, September 23, 1998.)
From the DIC values, we see that the simplification of the three-knot model, where a longer linear
segment at the deeper depths is used, has resulted in a better model. Clearly for depths from about 200
cm and greater, the various treatments no longer exercise a direct effect on the moisture content of the
soil. Rather, the moisture content remains approximately constant at whatever level it has reached by 200
cm, but with increasing variability with increasing depth. This is true for all five dates.
We have presented a methodology for the analysis of three dimensional lattice data sets, where the
distance between lattice points in one dimension is not commensurate with those in the other two, a
situation which often applies water column, air column and soil studies. The method is applicable to
226CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
both regular and irregular lattices in the horizontal plane. We see it as applying to oceanographic, and air
column data as well as three dimensional agricultural studies.
The analyses of the case study here have uncovered important features of the data. In particular, by
having taken out the spatially correlated components, they indicate that response cropping gives rise to
more satisfactory moisture levels than long fallow cropping below the root zone where the soils are at
greatest risk of salination.
6.7. Appendix 227
6.7 Appendix
The joint posterior for the full set of unknown parameters is estimated by partitioning the parameters into
five blocks
(ψ, τ,β,σ, ρ) .
and a Gibbs sampling scheme is defined such that the jth step is
1. Sample ψ j from p(ψ|y, τ j−1,β j−1,σ j−1, ρ j−1
),
2. Sample τ j from p(τ|y,ψ j,β j−1,σ j−1, ρ j−1
),
3. Sample β j from p(β|y,ψ j, τ j,σ j−1, ρ j−1
),
4. Sample σ j from p(σ|y,ψ j, τ j,β j, ρ j−1
),
5. Sample ρ j from p(ρ|y,ψ j, τ j,β j,σ j
).
Let S be the number of horizontal sites, D the number of different depths, and n the number of
observations.
The following subsections describe the sampling from each of the full conditional posteriors in the
scheme above.
6.7.1 Sampling β.
We define y, such that
y = Σ−1/2 (y − ψ) ,
and X such that
X = Σ−1/2X.
The prior probability density function (pdf) for β is taken as
β ∼ N(β,V−1
),
where β is the prior mean, and V is the prior precision. Thus the posterior distribution for β is given by
β|y,σ,ψ ∼ N(β,V
−1),
228CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
where
V = XT X + V,
and
Vβ = Vβ + XT y.
6.7.2 Sampling σ.
Let
ϵ = y − Xβ − ψ,
and let the n × 1 residual vector be partitioned by depth, d, into D subvectors, ϵd, such that
ϵ = [ϵ1, ϵ2, . . . , ϵD],
where each ϵd is an S ×1 vector, with d = 1, 2, . . . ,D. The vectorσ2 is updated by updating each variance
component, σ2d, one at a time, using the following updating equations.
1/σ2d |y,β,ψ ∼ Gamma (ν/2, s/2) ,
where
ν = S + ν,
and
s = s + ϵTd ϵd,
and the common prior for each variance component, σ2d, the dth element of the vector, σ2, is
1/σ2d ∼ Gamma
(ν/2, s/2
), d = 1, 2, . . . ,D.
6.7.3 Sampling ψ.
Define
y = y − Xβ.
6.7. Appendix 229
This gives
y|β,σ, ρ, τ ∼ N(ψ,Σ).
Hence,
p(ψ|y,β,σ, ρ, τ) ∝ p(y|...) × p(ψ|ρ, τ),
∝ exp− 1
2
(ψTΣ−1ψ + ψTΩψ − 2ψTΣ−1 y
).
and thus
ψ|y,β,σ ∼ N(ψ,Ω−1
),
where
Ω = Ω + Σ−1,
and
Ωψ = Σ−1 y.
6.7.4 Sampling τ.
The elements τ2d of the vector τ2, d = 1, 2, . . . ,D, are updated one at a time as follows. The n × 1 vector
ψ is partitioned into D subvectors ψd, (the spatial residuals at depth d), and (see Section 6.4.1)
Ω (ρ, τ) = block diagonal(Q/τ2
1,Q/τ22, . . . ,Q/τ
2D
).
Let the prior pdf for for τ2d be given by
1/τ2d ∼ Gamma
(a2,
b2
),
with (a, b) as hyperpriors.
This gives the updating posterior probability density function for τ2 as
1/τ2d |ψ, ρ ∼ Gamma
(a2,
b2
), where
a = a + S , and
b = b + ψTd Qψd,
230CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
for
τ2 = [τ21, τ
22, . . . , τ
2D].
6.7.5 Sampling ρ.
From Section 6.4.1, the precision matrix for the spatial components isΩ(ρ, τ). The prior for ρ is taken as
ρ ∼ Beta (α, β) ,
with (α, β) the hyperparameters.
Hence, the posterior probability density function for ρ is given by
p(ρ|ψ, τ) ∝ |Ω(ρ, τ)|1/2ρα−1(ρ − 1)β−1 exp−1
2
(ψTΩ(ρ, τ)ψ
),
and ρ is sampled via a Metropolis-Hastings update.
6.8. Tables 231
6.8 Tables
Table 6.1 Summary of DICs
Model pD DIC
Date 1 D135(S) 1065 -5850K5(S) 1019 -5875K3(S) 1032 -5915 *
K3(NS) 959 -5657
Date 2 D135(S) 1061 -5952K5(S) 1039 -6038K3(S) 1048 -6049 *
K3(NS) 904 -5560
Date 3 D135(S) 1049 -5885K5(S) 1034 -5961K3(S) 1044 -5996 *
K3(NS) 906 -5509
Date 4 D135(S) 1093 -6570K5(S) 1070 -6623 *K3(S) 1064 -6619
K3(NS) 1011 -6507
Date 5 D135(S) 1053 -6321K5(S) 1024 -6378K3(S) 1024 -6396 *
K3(NS) 973 -6214
D135(S): Saturated model, 9 × 15 terms.K5(S): 5-knot linear spline.K3(S): 3-knot linear spline.K3(NS): 3-knot linear spline - Intrinsic CAR.
(S): 3tationary CAR(NS): Non-stationary (Intrinsic CAR).
pD: Estimated number of parameters.
232CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Table 6.2 Estimates for ρ in the spatial precision matrix
ρ 95% CI
July 23, 1997 .461 (.373, .550)December 4, 1997 .385 (.304, .477)April 28, 1998 .375 (.289, .462)September 23, 1998 .346 (.266, .429)February 25, 1999 .325 (.246, .413)
Table 6.3 Differences in ρ across the five time periods.
Day1 Day2 est q025 q975 Sig
1 2 0.077 0.000 0.156 *3 0.087 0.008 0.162 *4 0.115 0.032 0.195 *5 0.136 0.058 0.213 *
2 3 0.010 -0.068 0.0854 0.039 -0.039 0.1225 0.060 -0.017 0.135
3 4 0.029 -0.049 0.1145 0.050 -0.024 0.124
4 5 0.021 -0.054 0.098
Est=ρDay1 − ρDay2
6.8. Tables 233
Table 6.4 Slopes for segment 200 cm - 300 cm for each treatment
Treatment Day (Date) Est q025 q975 Sig
1 1 -0.002 -0.010 0.0052 0.001 -0.006 0.0083 -0.001 -0.008 0.0054 -0.005 -0.012 0.0015 -0.002 -0.009 0.005
2 1 -0.005 -0.013 0.0022 -0.004 -0.011 0.0033 0.010 0.003 0.017 *4 0.012 0.005 0.019 *5 -0.003 -0.010 0.003
3 1 0.006 -0.001 0.0142 0.004 -0.003 0.0113 0.005 -0.002 0.0124 0.002 -0.005 0.0095 0.001 -0.006 0.008
4 1 -0.005 -0.012 0.0022 -0.004 -0.010 0.0033 -0.004 -0.011 0.0024 -0.004 -0.010 0.0025 -0.005 -0.011 0.001
5 1 0.001 -0.007 0.0082 -0.000 -0.007 0.0073 0.002 -0.005 0.0094 0.000 -0.006 0.0075 -0.000 -0.007 0.006
6 1 0.004 -0.004 0.0122 0.001 -0.006 0.0083 0.005 -0.003 0.0124 0.003 -0.004 0.0105 -0.004 -0.011 0.003
7 1 0.003 -0.003 0.0102 0.003 -0.003 0.0103 0.005 -0.002 0.0114 0.003 -0.003 0.0095 0.002 -0.004 0.008
8 1 -0.009 -0.016 -0.003 *2 -0.009 -0.016 -0.002 *3 -0.009 -0.015 -0.003 *4 -0.008 -0.014 -0.002 *5 -0.011 -0.017 -0.005 *
9 1 0.004 -0.003 0.0112 0.003 -0.004 0.0103 0.006 -0.001 0.0134 0.006 -0.001 0.0125 0.005 -0.001 0.012
* indicates 95% credible interval does not include zero.
234CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Table 6.5 Signs for contrasts with 95% credible intervals not including zero, for each date. Pos-itive (+) and negative (−) values indicated.
DateContrast Depth 1 2 3 4 5
Long Fallow - Response 20 + - + +
40 + + + +
60 + + +
80 + + + +
100 + + + + +
120 + + + +
140 + - + +
160 + + + +
180 + + + +
200 + + + +
220 + + + + +
240 + + + + +
260 + + + + +
280 + +
300 +
Cropping - Pastures 20 + + + + +
40 + + + + +
60 + + + + +
80 + + + + +
100 + + + + +
120 + + + + +
140 + + + + +
160 + + + + +
180 + + + + +
200 + + + + +
220 + + + + +
240 + + + + +
260 + + + + +
280 + + + + +
300 + + + + +
Lucerne mixtures - Native 20 - -40 - - -60 - - - -80 - - - -
100 - - - - -120 - - - - -140 - - - - -160 - - - - -180 - - - - -200 - - - - -220 - - - - -240 - - - - -260 - - - - -280 - - - - -300 - - - - -
6.9. Figures 235
Figure 6.1 Cumulative distribution curves for the posterior distribution of the deviance, for (date4) September 23, 1998. The solid line represents that for the saturated model, themiddle broken line that for the 3-knot linear spline, and the more coarsely brokenline on the left that for the 5-knot linear spline model.
6.9 Figures
LO
0.9
0.8
0.7
0.6
0.5
OA
0.3
02 ---Design -------- K3
Ol K5
0.0 l,-,---,-,-;_:,.:-:;:=;::::;;=;=;~:;=,--,-,--,-,--,-,-~,-,---,-.-.---,--,--,-,--,-,--,-,--,-,--,-,--,---,---,-,--,--.--.---,-,J
-8200 -8100 -8000 -7900 -7800 -7700 -7600 -7500 -7400 -7300 -7200
Deviance
236CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Figure 6.2 Square root of non-spatial variances, by date and depth. Credible intervals are stag-gered in date order. Note the comparatively smaller variances at the shallower depthsfor Date 4.
OJ2_c---------
OJl
010
~ ·I 0.09
i 0.08
i' 0.07
~ - 0.06 0
! 0.05
~ 0.04 r'E
0.00
0.02
0~~0000~~~~00~~~~~~~
Depth
6.9. Figures 237
Figure 6.3 Square root of spatial variance, by date and depth. Credible intervals are staggeredin date order. Note the comparatively smaller variances at the shallower depths forDate 4.
012
OJl
010
·I 0.09
0.08 :>
! OJYI
0.06 'a
! 0.06
j 0.04
0.03
i w\ ~. riD! ~I 0.02
O.lll i J ±g:ic ± " ""' -- - J!l: ±811 0.00
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320
Depth
238CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Figure 6.4 Contrast: Long Fallow - Response cropping. Credible intervals are staggered in dateorder.
0.28 0.26 0.24 0.22
0.20 0.1.8 Ol6
! 014
012 010 0.08
~ 0.06
0.04
J 0.02 0.00
-0.02
-0.04 -0.06
-0.08 -010 -012
-014
0
\ \
/ /
\ \
/ /
/
50 100 150 200
Depth
250
July Zl, 19!11 December 4, 19!11
April 28, 1998 September Zl, 1998
February 25, 1999
300 350
6.9. Figures 239
Figure 6.5 Contrast: Cropping - Pastures. Credible intervals are staggered in date order.
240CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Figure 6.6 Contrast: Lucerne mixtures - Native pastures. Credible intervals are staggered indate order.
0.2
Ol
~ z 0.0
1 \ /1r! ~i~t \/y/ i -0.2
l-1-Y / ...:I
l /r -0.3
\ f /
-OA
0 50 100 150 200
Depth
July Zl, 19!11 December 4, 19!11
April 28, 1998 -- September Zl, 1998
February 25, 1999
250 300 350
6.9. Figures 241
5050 5100 5150 5200
4800
4900
5000 −0.08
−0.04
0
0 0
0 0.02
0.02
July 23,1997
5050 5100 5150 5200
4800
4900
5000 −0.1 −0.06
−0.02
0
0
0 0
0
0 0
0.02
0.02
0.02
December 4,1997
5050 5100 5150 5200
4800
4900
5000 −0.1
−0.06 0
0
0
0
0 0
0
0.02 0.02
0.02 0.02
April 28,1998
5050 5100 5150 5200
4800
4900
5000 −0.1
−0.04 0
0
0
0
0
0
0.02
0.02
September 23,1998
5050 5100 5150 5200
4800
4900
5000 −0.1 −0.08 −0.04
0
0
0
0
0
0
0.02
February 25,1999
Figure 6.7 Spatial residual components at depth 240 cm.
242CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Bibliography
Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior
distribution of the likelihood. Statistics and Computing 7, 253–261.
Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum,
S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third Edition (22
Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics (SIAM).
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). Bayesx Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). Bayesx Software for Bayesian Inference in
Structured Additive Regression Models Version 2.0.1 Software Methodology Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic
systems. Statistical Science 10(1), 3–41.
Besag, J. E. and D. Higdon (1993). Bayesian inference for agricultural field experiments. Bull. Inst.
Internat. Statist 55(Book 1), 121–136.
Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal of the
Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.
Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.
Biometrika 92(4), 909–920.
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial
statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.
Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman,
A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms (BLAS).
ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.
BIBLIOGRAPHY 243
Brezger, A. and S. Lang (2006). Generalized structured additive regression based on Bayesian P-splines.
Computational Statistics and Data Analysis 50(4), 967–991.
Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, including longitu-
dinal experiments. Journal of Agricultural, Biological, and Environmental Statistics 14(3), 253–280.
Chib, S. and B. P. Carlin (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and
Computing 9, 17–26.
Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial
data analysis. Biostatistics 4(1), 11–25.
Hastie, T. and R. Tibshirani (1990). Generalized additive models (1st ed.). Monographs on statistics and
applied probability. London ; New York: Chapman and Hall.
Kneib, T. and L. Fahrmeir (2006). Structured additive regression for categorical spacetime data: A mixed
model approach. Biometrics 62(1), 109–118.
Lang, S. and A. Brezger (2004). Bayesian P-splines. Journal of Computational and Graphical Statis-
tics 13(1), 183–212.
Lindgren, F., H. Rue, and J. Lindstrom (2010). An explicit link between Gaussian fields and Gaussian
Markov random fields: The SPDE approach. Journal of the Royal Statistical Society Series B, to
appear.
Lui, J. S., W. H. Wong, and A. Kong (1994). Covariance structure of the Gibbs sampler with applications
to the comparisons of estimators and augmentations schemes. Journal of the Royal Statistical Society,
Series B 57(1), 157–169.
Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and I. White
(2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid sulfate landscape.
Applied Clay Science 46(2), 166–175.
Martino, S. and H. Rue (2009). R Package: INLA. Department of Mathematical Sciences NTNU,
Norway.
244CHAPTER 6. PAPER FOUR: COMPARISON OF THREE DIMENSIONAL
PROFILES OVER TIME
Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil microbial
quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1), 115–121.
Ngo, L. and M. Wand (2004). Smoothing with mixed model software. Journal of Statistical Software 9,
1–56.
NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release 1.5.0.dev8106. Avail-
able online: http://docs.scipy.org/doc/. Accessed: February 9, 2010.
Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized experi-
ments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–247.
Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework. American
Statistician 61(3), 224–232.
Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear variance
models in plant breeding trials. Biometrical Journal 50(2), 164–189.
Pitt, M. and N. Shephard (1999). Analytic convergence rates and parameterisation issues for the Gibbs
sampler applied to state space models. Journal of Time Series Analysis 20, 63–85.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,
M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,
R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses
in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report
Series, NSW Agriculture Orange.
Roy, V. and S. d. Blois (2008). Evaluating hedgerow corridors for the conservation of native forest herb
diversity. Biological Conservation 141, 298–307.
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca Raton:
Chapman & Hall/CRC.
Saad, Y. (2003). Iterative methods for sparse linear systems. Society for Industrial and Applied Mathe-
matics. [electronic resource].
BIBLIOGRAPHY 245
Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov random
field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Available online:
http://eprints.qut.edu.au.
Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and S. De Neve
(2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and coniferous forests
under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),
583–639.
Strickland, C. M., D. P. Simpson, I. W. Turner, R. Denham, and K. L. Mengersen (2010). Fast Bayesian
analysis of spatial dynamic factor models for multi-temporal remotely sensed imagery.
Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal experiments
with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.
Statement of Contribution of Co-Authors for Thesis by Publi1cation
The authors !is1:ed below certify that
1. 'they meet the· criteria for authorship, in tr1at have participaited in the cortceoti•Jn, of the publk:a!ion in their field of execution, or at least that
expertise;
2. t!leytake public responsibility'for tlleir part ofthe puiDiicati,)n, "'""·"'for trn; responsible author \;ryho accepts overall e>;ponsibi!ity
3. !here are rio other authors according to tllese criteria;
4. potential oonilicts of irlterest have been disclosed to (a) m;mtinn nntnes (b) \he editor or putllisoor ol JOUrnals or other publications, and (c) !he academic unfl;, and
5. they agree to !be use of the publication in the student's thesis and its M the Ai;stralasian Digital Thesis database oonsistent wit11.any liml!ations set by publisher requirements,
In Ute.- of Chapter 7:
Tiller Fot>r dimensional spat!Q.temporal amllysfS: d 1$ aQ!f~ ~· Status: In preparation
L~contr_~~~~or".-----'! -~!~!~~~~i!igt~~!!~rt~!!~.!~5. _ . L-c~-iB_~atUre'------o-~ . .J~Ste ! i Margaret Oonald i Margaret Donald as first author
i was !or the concept ! cl the paper, date analysis,. ! interpretation and the writing of i ell drafts.
~---~~ --------r-~----4·---1 i Was tl!$ponsible fl::>r advice M ; measurements. and lheir
Dt Clair Alston
i Or Ghrls Strlck!and
i Rick Young I
) meaning and editoriaJ comment. -------+---~----1·-----l i Programmed & developed the
, Gibbs sampler used fur the CAR r
' myered model, In pyMGMC ..
' Will be responsible fur a!lvlce an i the purpose <lf!d backgrourtl! to I thP tierd·trta,, advice: on the i meaning of statistical results afl(l 1
i editorial nnmment. l ·~~~ l Was responsible for general 1 Professor Kerrie
' Mengersen J advice and e(jilorial comment.
Prmclpq.! Supervisor's Confil'!llation
I have sighted email or other correspondence from all co-authors Clll1linning their certily\ng authorshio. j / / · / j
!.<. ~'1 /l/oc V\_JE~ F0._ ___. . J, q-/f;!r Name Signature Date
Chapter 7
Four dimensional spatio-temporal
analysis of an agricultural dataset
Preamble
This chapter addresses research objective (5) to fit a complex spatio-temporal model to the full agri-
cultural dataset. It uses the model and modelling software discussed in Chapter 6.1 to fit the saturated
treatment by depth model to 56 days data from the agricultural trial data. Thus, here we eschew the aim
of simplifying the treatment curves by depth, taking the view that the contrasts are better estimated by
fitting means to all treatments and depths. The data are analysed as a single analysis, which is created
by fitting the model of Chapter 6.1 at each date. And, secondly, in a two stage analysis, where estimates
for the contrast of interest are taken from the full model and used in a series of time series analyses to
consider the contrast over the complete trial.
Appendix C shows all the fits from Method 1 together with the penalised spline (over time) models
which are included to show the seasonality at the shallower depths dampening until there is virtually no
seasonality at the greater depths. The random walk fits show the fits from the chosen time series model.
The appendix includes figures for all these models for depths 100 cm to 220 cm.
I am the principal author and the paper is given with its abstract. This paper is in preparation. Rick
Young provided the data and will provide agricultural and other editorial comment when the statistical
content has been finalised. Chris Strickland programmed the various samplers for the posteriors in pyM-
247
248CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
CMC. I programmed the precision matrix and DIC calculations. Clair Alston provided editorial advice
in addition to advice on the collection and meaning of the data. Kerrie Mengersen oversaw, helped with,
and guided the exposition. As first author, I am responsible for concept of the paper, the data analysis,
interpretation and the writing of all drafts.
Title: A four dimensional spatio-temporal analysis of an agricultural field trial
Authors: Margaret Donalda, Chris Stricklanda, Clair Alstona, Rick Youngb, Kerrie Mengersena.
aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane,
QLD 4001, Australia.
bTamworth Agricultural Institute, Industry & Investment NSW, 4 Marsden Park Road, Calala, NSW
2340, Australia.
7.1 Four dimensional spatio-temporal analysis
Abstract
While a variety of statistical models now exist for the spatio-temporal analysis of two-dimensional data
collected over time, there are few published examples of analogous models for the spatial analysis of
data taken over four dimensions, namely space, height or depth, and time. When taking account of the
autocorrelation of data within and between dimensions, the notion of closeness often differs for each
of the dimensions. In this paper, we consider a number of approaches to the analysis of such a dataset
arising from an agricultural experiment which explores the impact of different cropping regimes on soil
moisture. The proposed models vary in their representation of the spatial correlation in the data, its
assumed temporal pattern and the choice of conditional autoregressive (CAR) and other priors. The
sensitivity of random walk of order 1 models to priors and their effect on fit and hence the deviance
information criterion (DIC) is also discussed.
In terms of the substantive question, we find that response cropping is generally more effective than
long fallow cropping in reducing soil moisture at the depths considered (100 cm to 220 cm). We also find
that there may be a problem with random walks of order one, in that they are extremely sensitive to the
priors, and it is unclear how to choose priors to give a meaningful fit.
7.2 Introduction
Where observations are collected from a series of sites, at a series of time points, observations taken
close to each other in either time or space may be autocorrelated. Highly positively correlated obser-
vations reduce the number of effective observations, and testing which fails to take this autocorrelation
into account will often report erroneous significant relationships. Hence methods have been developed
for both spatial and temporal models to account for autocorrelation. In many applications, the spatial
autocorrelations are the focus of interest, but here, we wish only to account for them.
Spatio-temporal data are often analysed using models where spatial and temporal autocorrelation
effects are separable, and with an assumption of no structure in the time by space error interaction term
(Section 7.6). This is particularly common for spatio-temporal epidemiological analyses.
250CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
We consider a dataset where spatial autocorrelation effects are not constant over time, space, nor over
the fourth dimension, depth. The data are agricultural data from a lattice plot arrangement with differing
experimental treatments by plot. The spatial dimension, depth, reflects the experimental treatment for the
plot. The dataset is described in more detail in Section 7.3.
In accounting for spatial correlations we use the convolution conditional autoregressive (CAR) prior
model of Besag et al. [1991], but with the proper CAR prior of Gelfand and Vounatsou [2003], rather
than geostatistical modelling. There are two reasons for doing this. Firstly kriging models are slow to
converge in a Bayesian setting when datasets are large [Higdon, 1998], whereas neighbourhood models
which define a sparse precision matrix are relatively fast, both because of their sparseness and because
they require no repeated matrix inversions. Secondly, Besag and Mondal [2005] and Lindgren et al.
[2010] show an equivalence between the two types of model. This is supported by Hrafnkelsson and
Cressie [2003] and Rue and Tjelmeland [2002], who calibrate CAR models to kriging models.
Because of the complexity of the model and the size of the dataset (90,720 observations), the primary
method of analysis was to fit the full model as a series of daily models, using a block updating Gibbs
sampler and saving the Markov chain Monte Carlo iterates after burnin to allow estimation of the contrast
of interest for each depth and day and their credible intervals. Additionally, the contrast estimates are
modelled using time series methods, to gain insight into the time-varying process.
When many models are fitted to data, a simple comparison method is needed prior to any detailed
model assessment. We used the DIC of Spiegelhalter et al. [2002].
The purpose of the paper was to account for both the spatial and temporal autocorrelations in the
four dimensions of these data. It seemed unlikely for these data that an additive autocorrelation model
in time and space would be appropriate, and ignoring the time dimension, the question of dealing with
autocorrelations within the three-dimensional space also needed to be resolved. Finally, we wished to
form a contrast from the estimates and to describe it over time. Thus, the final objectives of the data
analysis were threefold: firstly, to estimate the contrast together with 95% credible intervals across time;
secondly, to understand the time-varying nature of the contrasts; and thirdly, to find appropriate credible
intervals for the contrasts when considered as time series.
In section 7.3 we discuss the data. In section 7.4 we outline the analysis methods and models.
Section 7.5 outlines the results. Section 7.6 provides a discussion of both methods and results.
7.3. Data 251
7.3 Data
The four dimensional data consist of moisture observations taken at 108 surface treatment sites and 15 soil
depths from 20 cm to 300 cm, for 56 different dates spaced roughly equally over a two-year period. The
108 measurement sites are arranged as 6 rows with 18 columns per row. Hence, data at each time point
consist of 1620 measurements at 108 sites over 15 depths, while the entire dataset consists of 90,720
observations. The data were collected to determine a cropping system which would minimise water
leakage, and consisted essentially of three cropping systems, running in different phases, and giving rise
to 9 treatments. The first moisture measurements were made on June 26, 1995, the last on April 27, 2000.
Further details of the trial may be found in Ringrose-Voase et al. [2003].
The primary question for crop scientists was whether response cropping gives lower moisture values
both at the intermediate and greater depths, in comparison with long fallow cropping, and whether this
is sustained over different stages of the cropping cycle. This contrast is calculated as the average of
treatments 1, 2 & 3 minus the average of treatments 5 & 6. The units of measurement for the contrast are
log(neutron count ratio), a surrogate measure for moisture. See Ringrose-Voase et al. [2003]. Note that
the higher the log(neutron count ratio), the moister the soil.
The treatments are
• Treatments 1-3: Long fallow wheat/sorghum rotation, where one wheat and one sorghum crop are
grown in three years with an intervening 10-14 month fallow period. The 3 treatments are each of
3 phases of the long fallow 3 strip system.
• Treatment 4: Continuous cropping in winter with wheat and barley grown alternately.
• Treatments 5 and 6: Response cropping, where an appropriate crop (either a winter or a summer
crop) is planted when the depth of moist soil exceeded a predetermined level.
• Treatments 7-9: Perennial pastures. The three treatments are lucerne (a deep rooted perennial
forage legume with high water use potential), lucerne grown with a winter growing perennial
grass,and a mixture of winter and summer growing perennial grasses.
In modelling the contrast evolving over time, the covariates of log(rainfall+1), linear, quadratic and
cubic effects over time, together with interactions of year with sine and cosine terms with periods of a
year and a half-year were considered as possible useful covariates. (See section 7.4.)
252CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
7.4 Methods
Three methods were used to consider the contrast of interest. The data may be fitted as a single model
or they may be fitted as a two-stage model with the same model fitted for each date, allowing a date by
all model terms interaction model. An attractive possibility is to fit the structured additive models with
penalised spline smoothers of Fahrmeir et al. [2004] and Kneib [2006] to the full dataset using BayesX
software [Belitz et al., 2009a,b]. This methodology allows smoothing of estimates across unequally
spaced covariates and was particularly attractive given the unequal time intervals over which measure-
ments were made. It also permits the fitting of the CAR models of Besag et al. [1991]. We used three
methods to meet the objectives.
The first method (Method 1) fits the same model at each date, which, when fitted for each date, gives
a complete model for all the data. The second method (Method 2) takes the single contrast estimate
for each date (and depth) from Method 1 and uses time series methods to give understanding of the
time-varying nature of the contrasts, and finally, in Method 3, we used the structured additive models of
Brezger and Lang [2006], Fahrmeir et al. [2004] to fit a single model to the full dataset.
Method 1
Let ytid be the response variable measured on date t, at site i (of S horizontal plot sites), at depthid d
(d = 1, ..., 15). Let j be the treatment given at site i.
Method 1 fits the same model at each date, t, which gives the full model as
ytid = ft j(d) + ψtid + ϵtid, ϵtid ∼ N(0, σ2td), with
ft j(d) = αt jd,
ψtid |ψti′d, i , i′, ψtid ∼ N(ρt
∑i′∈∂i
ψti′dni,τ2
tdni
),
(7.1)
where ni is the number of sites adjacent to site i, and i′ ∈ ∂i denotes that site i′ is a neighbour of site
i. Neighbours are defined as neighbours only within the same depth, and are first order neighbours
in all models. Note that this is the proper CAR model of Gelfand and Vounatsou [2003], and that ρt
is common across all depths for a given date, t. The function, ft j(d), is a function of d estimated for
each treatment and date. However, as shown above, a parameter was fitted for each treatment, date and
depth. From the treatment effects, the contrast of long fallowing versus response cropping is calculated as
7.4. Methods 253
(e1+e2+e3)/3− (e5+e6)/2, where e j indicates the estimate for treatment j. Thus, in Method 1, we fit the
saturated model, with 56 daily treatment means at each depth, i.e., 56×9×15 = 7560 treatment estimates
for each date, t, and depth, d, and find 840 contrast estimates together with their credible intervals.
Use of this method satisfies the first objective of Section 7.2 of estimating the contrast together with
credible intervals over time. Method 1 provides credible intervals for the contrast across time but provides
no insight into their time-varying nature. Hence, the need for Method 2 (satisfying objectives 2 and 3 of
Section 7.2).
Method 2
Method 2 takes the contrast estimates from Method 1, which correspond to 15 series over the dimension
of time, and fits time series models. We consider the time series for the depths from 100 cm to 220 cm
(7 series in all). The set of depth × time series could be modelled as a multivariate time series. However,
although there is clear evidence of continuity in the contrasts across the depths, it is unclear just how
one might wish to model the depth-varying component of such a multivariate random walk, or of a more
general dynamic linear model. There seemed no obvious simplification. Hence, we chose to model the
time series at each depth, each as a univariate series.
Within the framework of Method 2, we considered time series models, simple regression models and
combinations of these. The models fall into four classes, those using time series methods alone such as the
random walk models, and autoregression models (described by Equations 7.3- 7.5), regression models
with time-varying covariates (Equation 7.2), a combination of an autoregessive model with regression
components using time-varying covariates, and penalised spline smooths over time (Equation 7.6).
The time series models assumed equally spaced observations over time, which was not the case. The
regression models used time-varying covariates, which included log(rainfall+1) and interactions of year
by sine and cosine terms with periods of a year and a half year. Smooths over time such as polyno-
mial smooths or penalised spline smooths typically allow insightful descriptions of the data. With these
data, where we expect seasonality and perhaps trends, the penalised spline fits can suggest explanatory
variables for a simple regression. The regression models and the regression models in combination with
an autoregressive model were an attempt to deal with the assumption of equal time-intervals and its
inadequacies.
Let Yt represent the contrast estimate at time, t. Within the framework of Method 2, the following
254CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
models were fitted:
A regression model which assumes errors are not autocorrelated.
Yt = Y ∼ N(Xβ,V), 1/V ∼ Gamma(10−6, 10−6), (7.2)
where X is a design matrix of time-varying covariates, such as log(rainfall+1) and interactions of year
(as a factor) by sine and cosine terms with periods of a year and a half year.
The local level state-space model (random walk) of e.g., Commandeur and Koopman [2007], Harvey
[1989], West and Harrison [1997] was fitted.
Yt = µt + νt, νt ∼ N(0,V),
µt = µt−1 + ωt, ωt ∼ N(0,W),
1/V ∼ Gamma(10−6, 10−6), 1/W ∼ Gamma(10−6, 10−6).
(7.3)
In a further version of this model, t-distributions with 10& 4 degrees of freedom were substituted for the
normal distributions for the observation and state errors.
An alternative formulation of the random walk model which uses CAR neighbourhood models was
also used. This formulation permits the neighbours to be weighted, and thus allows a correction for the
unequal time intervals. Hence, weighted random walk models of order 1 (RW1) and order 2 (RW2) were
fitted using Lunn et al. [2000].
Yt = µt + ωt + ψt, ωt ∼ N(0,V),
ψt |ψt′ , t , t′, ψt ∼ N(∑
t′∈∂twt′ψt′
w+ , Ww+
), where
wt′ = 1/|t − t′|, and w+ =∑
t′∈∂t wt′ ,
(7.4)
with V , W defined as in Equation 7.3. The weight used is the reciprocal of the distance between neigh-
bours over the time scale.
Autoregressive models Box and Jenkins [1976] and Gamerman and Lopes [2006] were also fitted,
7.4. Methods 255
withYt ∼ N(µt,V), 1/V ∼ Gamma(10−6, 10−6),
µt = α0 + α1Yt−1, or,
µt = α0 + α1Yt−1 + Xβ, or,
µt = α0 + α1Yt−1 + α2Yt−2, or,
µt = α0 + α1Yt−1 + α12Yt−12.
(7.5)
Note that the simple regression models using time-varying covariates (Equation 7.2), and the autoregres-
sive models with time-varying covariates (Equation 7.5) were fitted in an attempt to remedy the problem
of a non-equally spaced time series. The assumption of equally spaced contrasts across time in the mod-
els of Equations 7.5 and 7.3 motivated the weighted random walk models of Equation 7.4. However, an
alternative was to fit missing data models. This was done for the random walk of order 1 model (Equa-
tion 7.3) only. For these data, the highest common factor of the time intervals was 1, which gives a time
series of largely missing data (56 observed of 1768 observations or 3.2% non-missing observations). The
missing data model was fitted for one depth only (140 cm), since the credible intervals for random walk
models were arbitrarily dependent on the priors for the precisions. (See Section 7.4 for the priors used
for the precisions and Section 7.6 for further discussion.)
An additional way of dealing with these time series contrasts, and one which did not require the equal
time interval assumption, was to fit generalised additive models using penalised spline smooths over time
[Brezger and Lang, 2006, Fahrmeir et al., 2004]. Let the contrast, Yt, at date, t, and depth, d, be defined
as
Yt = f (t) + ϵt, ϵt ∼ N(0, σ2), (7.6)
for each depth, d, with f (t) being fitted as a penalised spline over time with a random walk penalty of
order 2. These models, like the regression model of Equation 7.2, do not account for autocorrelation over
the time dimension, but they use the unequally spaced dates of the contrasts.
The time series models of Method 2 capture only the time-varying variance of the contrast and fail
to reflect the spatial error. Hence, we experimented with precisions for the random walk model (see
Section 7.4), which might reflect the full error which includes the within date variability of the contrast,
in an attempt to satisfy objective 3 of Section 7.2.
256CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Method 3
Finally, dealing with the full dataset, we fitted two additive structured models using the full 90,720
observations. The first model is defined as
ytid =∑
j f j(d) +∑
j f j(t) + ψid + ϵtid, ϵtid ∼ N(0, σ2),
ψid |ψi′d, i , i′, ψid ∼ N(∑
i′∈∂iψi′dni,τ2
dni
),
(7.7)
with ψid being drawn from a Gaussian Markov random field (GMRF) in a layered scheme as above, but
with the spatial components at a site (i) being common across dates, and the associated variances being
given by τ2d, i.e. specific to each depth and constant across dates. The term f j(d) denotes smoothed
treatment curves over depth, d, while f j(t) denotes smoothed treatment curves over time, t. Thus, in
the ‘fixed’ (non-parametric) part of the model, we have assumed simple additivity over time and depth,
while in the modelling the variances, the model assumes constant spatial residuals at each depth (ψid)
from sampling date to sampling date. As can be seen from the definition, smoothed treatment profiles
across time were common to all depths, and smoothed treatment profiles across depths were common to
each date.
The second model is analogous to that of Method 1 and is defined by
ytid =∑
t∑
j f j(t)(d) + ψtid + ϵtid, ϵtid ∼ N(0, σ2t ),
ψtid |ψti′d, i , i′, ψtid ∼ N(∑
i′∈∂iψti′d
ni,τ2
tdni
),
(7.8)
Thus, this model fits a penalised curve for each treatment over depth, for each timepoint, and models
site correlations using CAR models with different variances at each depth and day, together with a final
unstructured residual whose variance differs by day. Thus, the model from Equation 7.8 is again a consid-
erable simplification of the model of Method 1. The CAR residual structure is the same, but is coupled
with an unstructured variance common to each day, whereas the unstructured variance components of
Method 1 (Equation 7.1) differ by date and depth. Probably more importantly, it fits a series of penalised
smooths across the depth dimension (by date and by treatment), while there is no smoothing along depth
in the saturated model of Method 1.
7.4. Methods 257
Priors
Priors for the Method 1 precisions for both structured and unstructured residual component precisions
were Gamma(5,.005), with priors for the fixed coefficients being normal with mean zero and variance 10.
For the Method 2 models, which fitted the contrast across the time dimension, the priors for the
coefficients were generally specified as a diffuse normal prior N(0, 106). Priors for the precision terms
of these models were initially set as Gamma(10−6, 10−6). However, almost all the models of Method 2
were rerun with priors for the precisions of Gamma(10−4, 10−4), and final model choice was made using
models with this prior.
However, given that we wanted a meaningful temporal description of the contrast together with
appropriate credible intervals, we experimented with various ways of apportioning crude estimates of the
total error observed in the model from Method 1. Table 7.1 gives the settings for the 5 different priors
used for the various models of Method 2, and Priors 3-5 of this table show three schemes for apportioning
the error.
Priors for the Method 3 models were set as the default BayesX software priors, with all precisions
having a Gamma(.001,.001) prior.
Model Comparisons
We adopted the Deviance Information Criterion (DIC) of Spiegelhalter et al. [2002] as the method for
model comparison. Thus, we planned to compare the the full models of Method 1 and Method 3 via the
DIC, in addition to choosing a model from the many models of Method 2. Within the Method 2 models,
only the models fitted using WinBUGS were compared.
With the problems observed when comparing random walk models of order one (Section 7.5), we
looked at the root mean square of predictive error to try and resolve the problem. This is defined as
√(yt+1 − E(yt+1|y1, y2, ...yt, θ))2,
and is used in Table 7.7.
258CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Computational details
The model of Method 1 which produced the contrast estimates used in Method 2 was fitted using cus-
tom built software, pyMCMC [Strickland, 2010], which used block updating. Its daily models had a
6,000 iterate burnin and 16,000 iterates in all. (Fewer burnin iterates were needed because of the block
updating.)
The models of Method 2 were fitted using BayesX [Belitz et al., 2009a,b] or WinBUGS [Lunn et al.,
2000]. The BayesX software was used because it offered penalised smooths over time, and because actual
dates could be used in the fit. It was thought that such models would offer insight into the seasonality
and/or trends in the data. WinBUGS was used because of its transparency and its robustness as a well
established software.
The univariate time series models of Method 2 were run with 2 chains, 120,000 iterates with a
100,000 burnin when using WinBUGS and Gelman-Rubin statistics were checked. (This burnin was
unnecessarily large.) Models fitted using BayesX, Equation 7.6, were run with a 10,000 iterate burnin
and 60,000 iterates in all (which was probably unnecessary, given that this software uses block updating
[Brezger and Lang, 2006]).
Geweke diagnostics [Geweke, 1992] for convergence and Raftery-Lewis estimates for accuracy
[Raftery and Lewis, 1992] were checked and found to be satisfactory for all models, except where noted
otherwise.
7.5 Results
Estimates for the contrasts at all depths and their 95% credible intervals from Method 1 are given in the
supplementary materials of Appendix C, as are graphs of the fits for all 7 depths time series. The Method
1 estimates and credible intervals are those which best satisfy the first objective of the analysis.
Figure 7.1 shows the point estimates from Method 1 for the contrasts at depths 100 cm to 220 cm.
A careful reading of this graph shows a continuity of the contrast estimates across time and depth. The
same data are graphed again as a contour graph of moisture over day and depth (Figure 7.2) in order to
emphasize the continuity of the contrast estimates across time and depth.
Various fits for the contrasts at depth 100 cm are shown in Figures 7.3- 7.6. Figure 7.3 shows the fit
for Method 1, and when compared to the three fits of Figures 7.4- 7.6, can be seen to have much wider
7.5. Results 259
credible intervals. This fit is thought to give the most appropriate credible intervals, for the reasons given
below. Figure 7.4 shows the penalised smooth from Method 2 Equation 7.6, and illustrates the seasonality
observed in these models at the shallower depths. Figure 7.5 shows the 28 term regression fit from
Equation 7.2 of Method 2, and echoes the penalised spline fit of Figure 7.4, but with the discontinuities
expected in a model with interactions of year by periodics. Figure 7.6 shows the random walk of order 1
model of Method 2, and with its more abrupt jumps the seasonality displayed in the two earlier graphs is
less obvious.
Figures 7.7 and 7.8 show the square roots of the spatial and the unstructured variances for each
date at 100 cm and 220 cm, estimated using the model of Method 1. Not unexpectedly, the variances
at the shallower depths show greater variability across the sampling dates (Figure 7.7). The comparable
graphs across all depths show a decreasing variability with increasing depth of these parameters across
the sampling dates (Figure 7.8). The variability in these parameters justifies the choice to fit the same
model across all sampling dates (thereby allowing all parameters of the original model to vary by date),
since a description of their evolution across time was not obvious a-priori.
Comparisons (Table 7.2) of the time series models of Method 2 are given for the contrast at depth 140
cm and were used to consider various models and ways of dealing with the unequal time spacing. Some
of the models compared are shown in Figures 7.9- 7.13. These figures show the poorer fits of the models
with the poorer DICs. This table indicates that the AR(1) and AR(2) models are essentially equivalent,
that the AR (1)(12) model is a poor model (with its negative estimate for the number of parameters), and
that the better models are the random walk models. The table indicates that the RW(1) or the RW(2)
distance weighted models are the best of those models compared. Table 7.2 shows the DIC and pD
varying for differing priors for the random walk models, but not for the other models fitted under Method
2. It indicates that the RW1 models give overfitted models, with the estimated number of parameters
exceeding the number of points fitted when the more diffuse priors of Prior 1 are used. Thus, the decision
was made to use models fitted under Prior 2 as the basis of model choice. Ideally, neither the DIC nor
the estimated number of parameters (pD) should depend on the specification of the priors. This issue is
discussed further in Section 7.6.2.
Table 7.3 compares the models fitted using Equations 7.2 and 7.5. This table shows that additional
periodic covariates improve the fit of the AR1 model at depths of 100 & 120 cm, but for all other depths
the simple AR1 model accounts adequately for the data without the need for rainfall or periodics. This is
260CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
not surprising, since the time series models take out the random shocks that might be explained by such
terms. Note that the model AR1+5, a covariate model in combination with an AR1 model, posits (some-
what unrealistically) the same amplitudes across the years for the cyclical behaviour, but (realistically)
posits a common time of year for the yearly maxima and minima. This compares with the year by period
interaction model which permits different amplitudes for the periodics for each year and different times of
year for the maxima and minima. This table also gives the DICs for the regression model with interaction
terms of year by periodics (24 terms), together with a cubic over time, giving 27 time covariates. Not
surprisingly, with 28 parameters to fit 56 observations, these models from Table 7.3 show the regression
model doing better than the AR1 models for all depths (except 200 & 220 cm), and they also compare
reasonably well with those of Table 7.4, although they are not as good as the best of the random walk
models.
Table 7.4 shows the DICs & pDs for the random walk models of order 1 & 2, both weighted and
unweighted. The weighted RW1 model appears to be the best model for the depths from 100-160 cm,
while the unweighted RW2 model appears best for 200-220 cm. The downweighting of points further
away in the weighted models generally leads to a greater estimated number of parameters and a better fit
at the shallower depths. Similarly, the weighted RW2 models generally improve the DIC and increase
the estimated number of parameters by downweighting points further away. That is, the weighted models
decrease the smooths of the unweighted models. Ideally, we would have preferred a common time series
description for the contrast at all depths. However, the final time series model chosen for the shallower
depths is the RW1 model (from 100-160 cm), and for the remaining depths the RW2 model.
Random walk models allow the calculation of the ratio (W/V) between the two types of variance in
the model (Equations 7.3 and 7.4) which is the signal to noise ratio [West and Harrison, 1997]. The signal
to noise ratio for the different depths is tabulated in Table 7.5 and shows a clear gradient, with ratios at
the shallow depths having higher signal to noise ratios than those at the deeper depths, with perhaps three
different depth strata (100 cm, 120-180 cm, and 200-220 cm).
The penalised spline fits from Equation 7.6, Figure 7.14, show seasonal peaks and troughs which
vary in amplitude across the years, thereby suggesting interactions of year by sine and cosine terms with
periods of a year and half year. These curves also show the periodic behaviour dampening with increasing
depth. We found significant terms in the rainfall and interactions of year by half-yearly periodic simple
regression model, but such models showed the expected problems of sine curves being too smooth at
7.5. Results 261
their peaks and troughs with disjuncts at the year breaks, thus giving rise to serially correlated errors.
We included these models because they gave a basis for comparison with the WinBUGS models where
equality of time intervals was assumed, and allowed a possibility of a correction when used in combina-
tion with the AR models. Figure 7.4 gives the smoothed model of Equation 7.6 and credible intervals for
the contrast at 100 cm. Figure 7.6 shows the RW1 fit which was the final model choice.
The missing data model, motivated by objective 3 (Section 7.2), was based on Equation 7.3 since
this was found to be close to the best model of those fitted. This was fitted because the approximation
of equally spaced time intervals in the WinBUGS models seemed a gross oversimplification. Figure 7.13
shows one outcome of the attempt to construct credible intervals which would reflect the spatial vari-
ability by adjusting the priors for the two variances of an RW1 model. This fit uses Prior 5. Given that
Table 7.6 shows the essentially arbitrary nature of such an undertaking (see below), there was little point
in fitting such models to the contrasts at other depths.
A further outcome of attempting to partition the variances of an RW1 model was the set of compar-
isons of Table 7.6 which shows DIC results for three priors, together with R2 (calculated using the fit
without the spatial and the unstructured error), and pD (the estimated number of parameters). Table 7.6
shows that the choice of priors dictates the goodness of fit: thus, fits using prior 3 have an R2 ranging from
12% to 33%, while prior 4 gives fits with an R2 of about 80%, and prior 5 an R2 of 99%. In constructing
the priors 3-5 for use in Equation 7.3, mean τ is a an estimate for the total precision estimated from
Method 1. Placing a fixed prior on the precision for the random walk error (Prior 3) resulted in poor fits.
Attempting to partition the precision estimate between the observational and random walk errors (Prior
4) gave posterior estimates of the ratio, r, which were essentially identical to its prior, and a slightly better
fit. Allocating the diffuse prior to the observational error (Prior 5) gave entirely unsatisfactory overfits to
the data with estimates of the number of parameters exceeding the number of observations. Table 7.2 and
Table 7.6 both showed that the model comparison criterion was dependent on the priors for the random
walk models.
We then considered the root mean squared predictive error (Section 7.4) and compared the RW1
models under Prior 1 & Prior 2. (See Table 7.7.) Under this criterion, Prior 2 would seem to give the
better model at depths 100 cm - 180 cm, and Prior 1 the better model at depths 200 cm & 220 cm. This
does not resolve the problem, in that the overfitted models are chosen at depths 200 cm & 220 cm, but in
any case, we are still left with the problem of arbitrary fit and the choice of suitable priors. Hence, we
262CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
were unable to satisfy objective 3, since any choice was arbitrarily dependent on the priors chosen for the
precisions.
Tests for convergence for the first model of Method 3 (Equation 7.7) showed failure to converge
using Geweke’s test [Geweke, 1992]. This model is a major simplification of the model, Equation 7.1,
from Method 1, and it is not surprising that it failed converge. See Section 7.4. The more complex
model from Method 3 (Equation 7.8) also failed to converge. We decided not to pursue the modelling
strategy of Method 3. Reformulation to remove the simple additivity may have helped the convergence
problems, but more importantly, it was felt that smoothing the treatment effects prior to calculating the
contrast could lead to biased contrast estimates, i.e., that it was not the treatment curves which should be
smoothed but the contrast curve. For further discussion of this issue, see Section 7.6.
All models from Methods 1 & 2 showed successful convergence using Geweke statistics [Geweke,
1992] and Raftery-Lewis [Raftery and Lewis, 1992] for the quantities of interest. For the random walk
models, which were felt to best model the contrast estimates, we assessed the residuals (residuals from
lack of fit, observational error and system error) for serial correlation. No serial correlation in the resid-
uals was found.
Figures C.1- C.21 in the supplementary materials of Section C show the fits for all time series from
Method 1, together with the penalised smooths of Method 2 Equation 7.6 which show periodicity at the
shallower depths which decreases with increasing depth. The RW final model fits are also shown.
Overall, we concluded that long fallow cropping generally led to moister soils over the experiment,
with both point estimates and 95% credible intervals generally being positive. We also found a problem
with the random walk models of order 1, with estimates and measures of fit being extremely sensitive to
the choice of priors for the variances of the observational and random walk error components.
7.6 Discussion
7.6.1 Modelling spatio-temporal data
Many spatio-temporal papers remain largely descriptive [Bell et al., 2007, Teschke et al., 2001] using
maps at several timepoints, with the maps essentially being descriptive devices. The more complex
models are generally Bayesian and use either geostatistical methods or the convolution CAR prior of
Besag et al. [1991]. Models with CAR priors typically partition the error term in the model, ϵit as ei, et
7.6. Discussion 263
and eit, where the first two error terms capture the structured spatial random effects and the structured
temporal random effects, and eit is a simple unstructured random effect with eit ∼ N(0, σ2). See, e.g.,
Adebayo and Fahrmeir [2005], Crook et al. [2003], Knorr-Held and Besag [1998], Poncet et al. [2010],
Waller et al. [1997], where the last three papers use BayesX software [Belitz et al., 2009a,b] to conduct
the analysis.
Assuncao et al. [2001] fit quadratics over time which differ for each spatial location, and for which the
coefficients are smoothed using CAR priors, but both space and time are separable. This elegant solution
to modelling very short time sequence data allows the possibility of seeing increasing and decreasing
rates, while accounting for spatial closeness. Assuncao [2003], Assuncao et al. [2002] again use space-
varying regression coefficients on their quadratic models in time. Yan and Clayton [2006] use the space-
time interaction to define a set of space-time separable clusters carrying a specific risk, and fit a final
unstructured random effect.
Abellan et al. [2008] decompose the error term into a structured temporal effect, a structured spatial
random effect, and a time by space interaction random effect which is a mixture of two Gaussians and
thus equivalent to an outlier or contaminant model. This allows the identification of sites (areas) and
times which fail to fit the common temporal and common spatial patterns.
Some environmental papers which are not simply descriptive use Higdon [1998]’s method of con-
volution with Gaussian kernels, which allows for non-stationary spatial smoothing, and give snapshots
over time [Lemos and Sanso, 2009, Sahu and Challenor, 2008] or, having failed to find much influence in
terms of spatial proximity, model time sequences for each site using time series methods [Lemos et al.,
2007].
Looking at spatio-temporal analyses within an agricultural context, the analysis of Trought and
Bramley [2011] considers the quality of grape juice by site across time. Their strategy is to fit differ-
ent curves across time for each site, and then to look at spatial outcomes of their model by mapping. In
considering longitudinal agricultural experiments, Piepho et al. [2004], Piepho and Ogutu [2007], Piepho
et al. [2008], Wang and Goonewardene [2004] and Brien and Demetrio [2009] use mixed models within a
REML framework to analyse their spatio-temporal data, and fit state-space models via standard software
and REML. The fixed part of their models is generally straightforward and the data are measured on two
spatial dimensions. Moving to the spatial dimension of depth, the soil profile study of Macdonald et al.
[2009] does not use spatial information in the analysis. Other studies composite the soils from different
264CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
depths across soil types or treatment [Sleutel et al., 2009], while others [Nayyar et al., 2009] use the
mixed modelling framework advocated by Piepho et al. [2004]. Within a spatial context, Haskard et al.
[2007] fit an anisotropic geostatistical model.
A major difference between these agricultural data and the epidemiological data which is so often
modelled using an additive common spatially structured error, an additive common structured temporal
random term (and an unstructured error with a variance common over both space and time), is that the
spatial units of epidemiological data tend to vary slowly over time scales of a few years. Additionally,
administrative time shocks may often be thought to be constant across a map, and hence this simple
modelling structure works well. In contrast, the agricultural data modelled here, vary markedly from
sampling date to sampling date, and it is clear that the simple separable variance decomposition used by
so many epidemiological models does not describe the data well.
In moving to four dimensions, there are yet more possibilities for the decomposition of the fixed
and residual parts of the model. However, in the context of differing treatments for differing plots in the
horizontal dimensions, with the same treatment along the depth profile at each plot, and in the context of
different scales between the depth measurements and the distances between plots, it was a simple decision
to treat depth differently from the horizontal dimensions. This same choice to treat the third spatial
dimension differently is made by others with three dimensional spatial data. Ridgway et al. [2002],
modelling ocean temperatures and other ocean parameters, separate out the depth component in their
loess data fit.
We excluded depth from the neighbourhood error structures. If depth neighbours were to be included
as neighbours with equal weights, the horizontal layer information would be downweighted. If we weight
using functions of distance, the horizontal correlations become effectively irrelevant. A useful property
of defining neighbours as neighbours only within the same depth layer is that the CAR model is then
permitted to have differing variances across the depths. For our model (Method 1, Equation 7.1), both the
homogeneous and spatial variance components differ by depth, and while no formal tests were conducted
this flexibility in the model appeared useful.
In the fixed part of the model, the choice to fit all treatment by date by depth means, rather than
to find a parsimonious model over the depths, was dictated by the view that smoothing of the different
treatments and then calculating a contrast was not an appropriate way to calculate the contrast estimates.
A false simplification of any of the treatment curves, which are measured at just 15 depths, could lead
7.6. Discussion 265
to greater or lesser estimates of the contrast. This differs from the informal/formal comparisons of racial
differences after semiparametric fitting of the longitudinal bone density by race [Fong et al., 2010, Wand,
2009] where there are many observations at points in the dimension (age) in which the curves are to be
simplified. The assumption of continuity, on which any smooth is made, is better justified for their bone
density analyses with the data’s many points of support on the dimension to be smoothed.
The Method 1 model is a date interaction model with the daily model. Each daily model is indepen-
dent of each other which allows us to sum the DICs and the pDs over all 56 daily models and thus allows
the possibility of comparing DICs with the models of Method 3, Equations 7.7 and 7.8, where all 90,720
observations are modelled at once. We had planned to use this to compare the Method 3 models fitted to
the full dataset, with the fit achieved by the daily models from Method 1. However, the Method 3 models
failed to converge, and this was not done.
Method 1 gives appropriate 95% credible intervals for the contrast estimates, but no insight into the
way in which these contrasts vary over time. The modelling strategy adopted in Method 2 attempts to
remedy that by fitting time-varying covariates and by using time series methods. Two-stage models do
not account for the treatment effect variation observed in the model 1 fits. However they do allow us to
see what level of complexity may be required to account for the time-varying nature of the contrasts.
Figure 7.13 shows the fit for one of the missing data models. However, there is little point in
fitting a missing data model when the posterior variance may be arbitrarily chosen by the choice of a
prior for a precision, as it is with 3% of the observations not being missing. (For further discussion, see
Section 7.6.2.)
7.6.2 Model Comparisons: Problems
Where competing models are suggested, the preference for model comparison is to use some summary
statistic of the analysis fit such as the AIC [Akaike, 1973] (used for geostatistical model comparisons by
Hoeting et al. [2006]), the BIC [Schwarz, 1978] or the DIC [Spiegelhalter et al., 2002]. When WinBUGS
is used for model fitting, an obvious choice for a model comparison criterion is the DIC, which has the
added advantage of estimating the number of parameters used by the model. Table 7.6 shows DICs for
random walk models of order 1 where the only modelling difference is in the priors used for the two
precisions. The differing priors make differences to the fit (R2) and to the DIC.
In arguing the case for the DIC, a CAR model is explicitly discussed in Spiegelhalter et al. [2002].
266CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Their model has a CAR normal spatial prior, but the unstructured error component is Poisson, and there-
fore dictated by the estimates for the Poisson rate, which are themselves modified by the spatial CAR
prior. For the agricultural data here, the CAR (and RW) models (both of which have two variance com-
ponents) gave DICs which were highly sensitive to the choice of priors. Table 7.2 shows that the models
with the more diffuse priors at the observation level are apparently the better models. Graphs of the fitted
values and their credible intervals (not shown) show they also give closer fits. The DICs of the autore-
gressive models are unaffected by the prior choice for the error, but the random walk models of order
one often have estimates for the number of parameters which are greater than the number of observations
used. Additionally, estimates for the number of parameters change with choice of prior, as does the DIC.
This is not a problem of the criterion choice. Calculation of the BIC, which is also based on the final
fit, gives essentially the same preferred models. What is happening with these very diffuse priors, is that
the random walk model of order one becomes effectively a saturated model in which each observation
becomes the fitted value. A model which joins the dots, however, gives no insight into the data. Our
problem is to find prior distributions reflecting ignorance, the ‘statistical holy grail’ talked of by Fienberg
[2006]. The convolution prior of Besag et al. [1991] which works so well for spatial epidemiological
data, works less well when all the model components are normal and there is a single observation to be
partitioned into structured and unstructured error, and a maximum of two neighbours, as is the case for
the RW(1) models here.
Our final view was that the purpose of the modelling across time was to develop insight into the time-
varying nature of the contrast estimates. Credible intervals for the time series models are not realistic,
since they do not include the spatial variance. From the DIC comparisons of the modelling in WinBUGS,
we believe that a random walk model of order 1 with inverse distance based weights for neighbours is
the best of the models considered. There is evidence of periodicity at the shallower levels (See Table 7.3)
and this is also shown by the penalised spline smooths of Equation 7.6 and illustrated in Figure 7.14, with
their double periodics per year at the shallower levels. At the depths of 200 cm and 220 cm these peaks
and troughs have largely disappeared, with the curves showing a possibly increasing trend with time.
7.7 Conclusions
Our purpose was to account for spatial and temporal autocorrelations in the context of four-dimensional
data. The model of Method 1 forms the basis for the analyses within this paper. It fits a fixed parameter
7.7. Conclusions 267
for moisture at every combination of depth, date and treatment. Its error structure is complex, with an
unstructured error at every depth, date and site, and having variances differing by depth and date. The
spatial structured error is fitted across each horizontal layer and ignores depth neighbours. The variance
of these structured spatial errors also differs by depth and date. Comparisons with three dimensional
CAR neighbourhood models (not shown in here) show that that this separation of the two-dimensional
plot arrangement from the depth dimension gave better descriptions of the data.
The simple expedient of fitting the data as a series of daily models allowed the maximum possible
complexity in terms of the experiment and was a useful approach to modelling the full dataset. By
fitting what is an interaction model by date at all levels of the daily model, we were able to explore the
variability of the data effectively, and believe that some of the curiosities of the variability at some depths
need further elucidation. At the shallower levels, they appear to be following cyclical and long term
trends. At the greater depths, seasonal variation is less visible. See Figure 7.8.
The method of defining neighbours within a horizontal layer has potentially wide applicability in
three and four dimensional agricultural datasets, where the plot and treatment are defined by the two-
dimensional surface coordinates. It may be also be applicable in measurements made over the ocean
where variables may also be measured at depth, again a situation where the differences in latitude and
longitude between measurements far outweigh the differences in the depth dimension.
The analysis shows that response cropping delivers lower moisture levels for most times of the year,
in contrast to long fallow cropping. At the shallower depths, not surprisingly, this contrast exhibits
considerable cyclicity which attenuates with depth. Given the final choice of model to determine whether
response cropping delivers less moist soils, it appears that the temporal component adds little or no
additional uncertainty to the estimates.
This paper also illustrates that choosing priors for random walk models of order one can cause
some problems in data modelling, with some choices making the prior on the observational error highly
informative.
268CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
7.8 Tables
Table 7.1 Various priors used for the precisions of the timeseries models of Method 2
Precision for observational error Precision for random walk error*Prior 1 ∼ Gamma(.000001,.000001) ∼ Gamma(.000001,.000001)Prior 2 ∼ Gamma(.0001,.0001) ∼ Gamma(.0001,.0001)
Prior 3 mean τ ∼ Gamma(.000001,.000001)Prior 4 total ∗ r total ∗ (1 − r)Prior 5 ∼ Gamma(.000001,.000001) mean τtotal ∼ Gamma(a, b), r ∼ Beta(1, 1)a,b calculated via method of moments from mean & 95%CI for posterior in Method 1*Priors 1 & 2 were also used for other timeseries models.
Depth (cm) Mean τ a b
100 1395 6.934 .004971120 1759 6.024 .003425140 2241 12.413 .005538160 3019 52.316 .017327180 3226 87.249 .027045200 3201 180.410 .056354220 2175 82.412 .037894
7.8. Tables 269
Table 7.2 Summary of DICs for Contrast 1 (Long fallowing vs Response cropping) at Depth 140
Prior 1 Prior 2Model pD DIC pD DIC
Regression 30 -377
AR(1) 4 -343 4 -343AR(1)(12) -2 -356 -2 -355AR(2) 4 -343 5 -342
RW(1) 69 -435 36 -379RW(1) (weighted) 73 -468 * 40 -392 *RW(1) (t10 distribution) 73 -450 39 -378RW(1) (t4 distribution) 74 -451 41 -375RW(2) 20 -370 23 -373RW(2) (weighted) 26 -390 43 -395 *
RW(1) (1768 time points) 49 -304 (Prior 5)
* Best model
Prior 1: both precision priors Gamma(0.000001,0.000001)Prior 2: both precision priors Gamma(0.0001,0.0001)
Table 7.3 DICs for Long fallowing vs Response cropping: 1st order autoregressive models vssimple regression model
AR1 With rainfall* AR1 AR1+5 Regression(28)Depth pD DIC pD DIC pD DIC pD DIC
100 5 -279 4 -278 9 -289 30 -315120 5 -301 4 -303 9 -306 30 -344140 5 -341 4 -343 9 -342 30 -377160 5 -386 4 -386 9 -386 30 -425180 5 -414 4 -414 9 -410 30 -433200 5 -449 4 -450 9 -444 30 -449220 5 -457 4 -458 9 -450 30 -455
* Covariate: log(rainfall+1)(AR1 + 5) Covariates: log(rainfall+1), sin(x), cos(x), sin(2x), cos(2x), x=date/2π(Regression(28)) Covariates: x,x*x,x*x*x), year*(sin(x), cos(x), sin(2x), cos(2x))
270CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Table 7.4 DICs for Long fallowing vs Response cropping: random walk model comparisons,using Prior 2.
RW1 RW1 (W) RW2 RW2 (W)Depth pD DIC pD DIC pD DIC pD DIC
100 47 -342 47 -346 * 21 -332 38 -340120 39 -348 42 -360 * 26 -323 40 -360 *140 36 -379 40 -392 * 23 -373 43 -395 *160 34 -413 38 -424 * 25 -417 43 -419180 32 -433 36 -434 25 -439 * 43 -419220 28 -457 * 34 -448 24 -458 * 42 -424220 28 -461 * 35 -452 24 -463 * 43 -427
(W): inverse time interval weights.* Indicates the better models.
Table 7.5 Square root of the Signal to Noise ratio for the RW models
Depth (cm) SN ratio 95% CI
100 6.1 (2.1, 27.0)120 3.3 (1.2, 19.2)140 3.2 (1.2, 14.5)160 3.3 (1.4, 11.0)180 2.3 (1.1, 7.8)200 1.0 (.5, 2.8)220 1.0 (.5, 3.1)
Table 7.6 R2, pD and DIC for the RW(1) weighted models using priors 3-5
Prior 3 Prior 4 Prior 5Depth (cm) R2 pD DIC R2 pD DIC R2 pD DIC
100 33% 13 -255 80% 36 -258 99% 100 -411120 23% 9 -277 79% 35 -279 99% 97 -421140 12% 6 -299 80% 35 -271 99% 94 -433160 16% 5 -323 83% 35 -251 100% 90 -446180 19% 5 -332 85% 34 -246 99% 89 -448200 27% 4 -337 86% 34 -239 99% 89 -448220 18% 3 -318 82% 34 -225 99% 94 -434
7.8. Tables 271
Table 7.7 Root mean square predicted error for RW1 models under Priors 1 & 2
Depth Prior Median 25%ile 75%ile
100 Prior 1 0.020 0.019 0.020100 Prior 2 0.019 0.019 0.020120 Prior 1 0.016 0.015 0.016120 Prior 2 0.015 0.014 0.016140 Prior 1 0.011 0.011 0.011140 Prior 2 0.010 0.010 0.011160 Prior 1 0.0075 0.0073 0.0077160 Prior 2 0.0073 0.0070 0.0077
180 Prior 1 0.0057 0.0054 0.0059180 Prior 2 0.0056 0.0054 0.0059
200 Prior 1 0.0037 0.0035 0.0039200 Prior 2 0.0041 0.0039 0.0043
220 Prior 1 0.0035 0.0033 0.0037220 Prior 2 0.0039 0.0037 0.0041
272CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Figure 7.1 Long fallowing vs Response cropping at at all depths. Saturated model. Point esti-mates from the MCMC iterates of the full model (Method 1).
7.9 Figures
7.9.1 Contrasts
OJl
OJO
OJII
om 6.07
0.00
(U)5
6.04
0.00
Wl2
o.m 0.00
-o.m -WI2
100 l2Xl 140 100 100 2m 200
-OJII ~~~~--~~-.~~~--~~~.-~~~~~.-~ OlJANllJ96 01JANl1lOO OIJAN1997 01JANl998 01JANlll99 OI.JANmJ OIJAm001
7.9. Figures 273
10 20 30 40 50
−220
−200
−180
−160
−140
−120
−100
Depth
−0.01
0
0
0.01
0.01
0.01
0.01
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.04 0.04 0.04
0.04
0.05 0.05
0.05
0.06
0.06
0.06
0.07 0.
07 0.08
0.08 0.09
Figure 7.2 Long fallowing vs Response cropping. Saturated model. Contour graph from thepoint estimates from the MCMC iterates of the full model (Method 1).
274CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Figure 7.3 Long fallowing vs Response cropping at depth 100 for all trial dates. Saturatedmodel. Point estimates & 95%CIs from MCMC iterates from the full model.
Figure 7.4 Long fallowing vs Response cropping at depth 100 for all trial dates. Penalised splinesmooth across dates. Point estimates & 95%CIs.
n ij J
~s----------------------------------------,
Q.lfi
0.14 Q.IB
OJ2 OJ1 OJO 0.4» 0.4» om !1.06 0.00 11.04. 11.03 11.03 Q.(l1
0.00 -Q.(Il
-11.03 -11.03 -11.01. -1105
-0.4»~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
OJ2
OJ1
OJO
o.cs 1111!
om 0.4» 0.00
001
11411
1102
Q.(l1
0.00
-Q.(Il
-!lOll
-o.m II1.IAN96 OUAN96
7.9. Figures 275
Figure 7.5 Long fallowing vs Response cropping at depth 100 for all trial dates. Regressionmodel (Equation 7.2) fitting 27 time-varying covariates. Point estimates & 95%CIs.
Figure 7.6 Long fallowing vs Response cropping at depth 100 for all trial dates. Random Walkof order one. Point estimates & 95%CIs.
Ql8
0.12
OJl
OlO
ll.4lll
H OJlj
om OJlj
lj 0.00
1!.04.
OJlj J 11.02
o.m 0.00
-o.m -11.02
-11.02
WJANlWii
Ql8
OJl
OlO
o.cs 1111!
n om D.06
ij 0.00
11.01.
!1.411
J 11.02
o.m 0.00
-o.m -11.02
-OJl!
II1.IAN96 OUAN96
276CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Figure 7.7 Spatially structured and unstructured standard deviations & 95% credible intervals atdepths 100 cm. The spatial standard deviations are shown in blue, the unstructuredstandard deviations in green.
0.09
O.a!
O.IY7
0.06
~ O.IX>
;;,j O.M
-- -, -
~ -- -- s -
-~
0.00
0.00
o.m
0.00 'r----,---------,--------,-------,-------,-----------,-'
OlJANl995 OIJAN1996 OIJAN19!17 01JAN1998 Ol.IANl999 01JAN2000 01JAN2001
Date
7.9. Figures 277
Figure 7.8 Spatially structured and unstructured standard deviations & 95% credible intervalsat depth 220 cm. The spatial standard deviations are shown dotted, the unstructuredstandard deviations in green.
Figure 7.9 Long fallowing vs Response cropping at depth 140 for all trial dates (AR1 fit). Pointestimates & 95%CIs.
278CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Figure 7.10 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit usingweights which are reciprocals of the time intervals). Point estimates & 95%CIs.
Figure 7.11 Long fallowing vs Response cropping at depth 140 for all trial dates (RW2 fit).Point estimates & 95%CIs.
rums----------------------------------------,
0.00
-run
-rum ~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
woos----------------------------------------,
'
0.00
-run
.:--~ -\
' ' / I
/ 'i ,/~- .... ....
I • /
/ / \~/ t · ,/ I \ / • ,'
' ' ' ' •v' /
'
~~ \ , ••. ··',!,',,' I \. 11 \~..-...- " r I
l ·· ~ . '~~~ / . / . f
....... ... /'
-rum ~~---,----~~----,---~-.~~--~--~~ WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll mJANlml mJANlDlL
Date
7.9. Figures 279
Figure 7.12 Long fallowing vs Response cropping at depth 140 for all trial dates (RW1 fit witht dist df=10). Point estimates & 95%CIs.
Figure 7.13 Long fallowing vs Response cropping at depth 140 for all trial dates. Random walkwith 97% missing data. Random walk precision fixed at 2241. See Table 7.1. Pointestimates & 95%CIs.
OJl6
!1.06 " " r 0.04.
n ll.4ll
lj lUll
Q.(l1 J 0.00
-Q.Cil
-o.oz CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll lliJANl!l!ll m.JANl!IOO !II.IANl'llm
llot.e
Q.lO
o.w D.a!
D.07
OJl6
n !1.06
o.m D.a!
ij o.oz Q.(l1
0.00
J -Q.(Il
-lUll
-o.oa -o.m -!1.06
-D.a!
-o.ar 1000 1l!IJO 1400 l6IJO l800
280CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Figure 7.14 Non-parametric penalised spline smooths. (Fits for the contrasts at the 7 depths.)
BIBLIOGRAPHY 281
Bibliography
Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investigate the stability of
patterns of disease.(Mini-Monograph). Environmental Health Perspectives 116(8), 1111–1119.
Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive discrete-
time survival models. Statistics in Medicine 24(5), 709–728.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In
B. Petrox and F. Caski (Eds.), Second International Symposium on Information Theory, Akademia
Kiado, Budapest, Hungary.
Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environmetrics 14(5),
453–473.
Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter model
applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.
Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishmaniasis in
a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in Medicine 20(15),
2319–2335.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). Bayesx Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). Bayesx Software for Bayesian Inference in
Structured Additive Regression Models Version 2.0.1 Software Methodology Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in PM2. 5
chemical composition in the United States for health effects studies. Environmental Health Perspec-
tives 115(7), 989–995.
Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.
Biometrika 92(4), 909–920.
282CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial
statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.
Box, G. E. P. and G. M. Jenkins (1976). Time series analysis : forecasting and control (Rev. ed.).
Holden-Day series in time series analysis and digital processing. San Francisco: Holden-Day.
Brezger, A. and S. Lang (2006). Generalized structured additive regression based on Bayesian P-splines.
Computational Statistics and Data Analysis 50(4), 967–991.
Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, including longitu-
dinal experiments. Journal of Agricultural, Biological, and Environmental Statistics 14(3), 253–280.
Commandeur, J. J. F. and S. J. Koopman (2007). An introduction to state space time series analysis.
Practical econometrics. Oxford New York: Oxford University Press.
Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time to event data:
a case study using months from angiography to coronary artery bypass graft (CABG). Statistics in
Medicine 22(18), 2943–2961.
Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression for space-time data:
A Bayesian perspective. Statistica Sinica 14, 731–761.
Fienberg, S. E. (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1, 1–40.
Fong, Y., H. Rue, and J. Wakefield (2010). Bayesian inference for generalized linear mixed models.
Biostatistics 11(3), 397–412.
Gamerman, D. and H. F. Lopes (2006). Markov chain Monte Carlo : stochastic simulation for Bayesian
inference (2nd ed.). London ; New York: Chapman & Hall.
Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial
data analysis. Biostatistics 4(1), 11–25.
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior
moments. Bayesian Statistics 4, 169–188.
Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge
England: Cambridge University Press.
BIBLIOGRAPHY 283
Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spatial
prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics 12(2),
147–160.
Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North Atlantic
Ocean. Environmental and Ecological Statistics 5, 173–190.
Hoeting, J. A., R. A. Davis, A. A. Merton, and S. E. Thompson (2006). Model selection for geostatistical
models. Ecological Applications 16(1), 87–98.
Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application to nuclear
fall-out. Environmental and Ecological Statistics 10, 179–200.
Kneib, T. (2006). Geoadditive hazard regression for interval censored survival times. Computational
Statistics and Data Analysis 51, 777–792.
Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics in
Medicine 17, 2045–2060.
Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend fields of North
Atlantic sea surface temperature. Journal of the American Statistical Association 104(485), 5–18.
Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in a central
California estuary. Journal of Agricultural, Biological, and Environmental Statistics 12(3), 379–396.
Lindgren, F., H. Rue, and J. Lindstrom (2010). An explicit link between Gaussian fields and Gaussian
Markov random fields: The SPDE approach. Journal of the Royal Statistical Society Series B, to
appear.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.
Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and I. White
(2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid sulfate landscape.
Applied Clay Science 46(2), 166–175.
284CHAPTER 7. PAPER FIVE: FOUR DIMENSIONAL SPATIO-TEMPORAL
ANALYSIS OF AN AGRICULTURAL DATASET
Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil microbial
quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1), 115–121.
Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized experi-
ments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–247.
Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework. American
Statistician 61(3), 224–232.
Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear variance
models in plant breeding trials. Biometrical Journal 50(2), 164–189.
Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal analysis
of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest Entomology 12(3),
325–332.
Raftery, A. and S. Lewis (1992). How many iterations in the Gibbs sampler? In J. Bernardo, J. Berger,
A. Dawid, and A. Smith (Eds.), Bayesian Statistics 4. Oxford: Oxford University Press.
Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted least
squares-application to the waters around Australasia. Journal of Atmospheric and Oceanic Technol-
ogy 19(9), 1357–1375.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,
M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,
R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses
in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report
Series, NSW Agriculture Orange.
Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scandi-
navian Journal of Statistics 29(1), 31–49.
Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean temperature and
salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.
BIBLIOGRAPHY 285
Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and S. De Neve
(2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and coniferous forests
under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),
583–639.
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal of Com-
putational and Graphical Statistics, 1–46. submitted August, 2010.
Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal distribution of
airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for gypsy moth eradication.
Environmental Health Perspectives 109(1), 47–54.
Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New Zealand:
characterising spatial and temporal changes in fruit composition and juice quality in the vineyard.
Australian Journal of Grape and Wine Research 17(1), 79–89.
Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal mapping of
disease rates. Journal of the American Statistical Association 92(438), 607–617.
Wand, M. P. (2009). Semiparametric and graphical models. Australian and New Zealand Journal of
Statistics 51(1), 9–41.
Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal experiments
with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.
West, M. and J. Harrison (1997). Bayesian forecasting and dynamic models (2nd ed.). Springer series in
statistics. New York: Springer.
Yan, P. and M. K. Clayton (2006). A cluster model for space-time disease counts. Statistics in
Medicine 25(5), 867–881.
Chapter 8
Conclusions and further work
This chapter provides a brief overview of the thesis and thoughts for further work.
8.1 Conclusions
The aim of this research was firstly to contribute to Bayesian statistical methodology, by contributing to
risk assessment methodology, and to spatial and spatio-temporal methodology, and it seemed this might
be possible by modelling error structures using complex hierarchical models. A further, parallel, aim
was to contribute by applying these new methodologies in the areas of risk analyses for recycled water,
and in the assessment of differences between cropping systems over time, taking account of possible
autocorrelations over the three spatial dimensions and over time.
The statistical contributions made in this thesis are
• the development of methods for
– forming credible intervals for the point estimates of queries in Bayesian nets, having first
elicited the uncertainty for all the net’s various conditional probabilities;
– incorporating experimental uncertainty into risk assessments;
• the introduction of the layered CAR model for three (spatial) dimensional data (in combination
with complex regression models);
287
288 CHAPTER 8. CONCLUSIONS AND FURTHER WORK
• the introduction of Chris Strickland’s Gibbs’ sampler for block updating the layered CAR model
(described in Chapter 6);
• the introduction of a complex time by space interaction model, through repeated use of the layered
CAR model, thereby providing a model where space and time effects are not additive, neither in
the fixed nor the error components.
Thus, we first considered how to build credible intervals for a Bayesian net (Chapter 3), and illus-
trated a simple method, whereby, having elicited uncertainty about the various conditional probability
tables, credible intervals may be found for the complex mixture distributions which result, for any of the
marginal or conditional probabilities and relative risks under any desired scenario in a Bayesian net.In an
addendum (Chapter 3.9), we show that this can produce very different results from credible intervals put
forward by Van Allen et al. [2001, 2008].
Secondly, in Chapter 4, we showed that uncertainty in QMRAs was more appropriately addressed
by incorporating all the primary data into a DAG, and dealing with dose-response data via an errors-in-
variables model within that DAG. This chapter illustrated that with no change of assumptions, estimating
parameters which describe the models used in a risk assessment, simultaneously with the risk assessment,
may increase the size of the credible intervals for risks markedly. Posterior credible intervals found as
a result of using the complex DAG structure reflect the experimental uncertainties of the small scale
experiments on which most QMRAs are based, and we recommend that this should be the preferred
method of undertaking QMRAs. This chapter shows how to incorporate the experimental uncertainty
of the usually small scale experiments which lead to the parameter constants on which risk assessments
may be based. A further contribution of this chapter was to consider dose-response data as an errors-in-
variables problem, and again this leads to markedly greater estimates of risk at lower doses.
In considering the agricultural data, the concern was to make a contribution within both the theoreti-
cal and the applied literature. We used spatial methods currently used for two-dimensions, and extended
them to the context of three spatial dimensions to build a new flexible model, the CAR layered model.
The layering performs three functions: it permits differing spatial and non-spatial variances in each layer;
it permits treatment effects to be modelled independent of spatial correlation along the depth dimension;
and dispenses with the issue of choosing weights when the distances in the depth dimension are not of
the same order as those in the horizontal dimension.
In Chapter 5, we confined consideration to essentially three ways of accounting for spatial correlation
8.1. Conclusions 289
and outlined our explorations in describing data in a three-dimensional space. This work was fundamental
to our later analyses of Chapters 6 and 7. The methodological contribution was to extend two-dimensional
modelling to three-dimensions in a way which was suitable to the experimental data, by the introduction
of the CAR layered model.
Chapter 5 also found appropriate models to describe the treatment curves along the depth dimension
and appropriate models for the neighbourhood structure for CAR models chosen to account for spatial
autocorrelation. The final structure chosen, which allowed neighbours to be neighbours only at the same
depth, was found to capture very flexibly the differing variances at the different depths, while at the
same time being computationally simpler than a CAR model with a 1620 × 1620 precision matrix. This
chapter also demonstrated that very complex models may be fitted within the CAR modelling framework
of WinBUGS, with the final chosen model fitting essentially 15 different depth CAR models while at
the same time fitting cubic radial bases functions with a latent error model in the depth dimension as a
parametric component.
The attempt to find parsimonious models to describe the behaviour of the treatments along the depth
dimension gave insight into the attenuation of the treatment effects with depth, and also showed the
variances varying from greater to smaller and then to greater with depth, in a relatively smooth way.
These insights were critical to an understanding of possible models for the full data.
Chapter 6 built on the work of Chapter 5 and compared the contrast of interest over five time periods.
We found that the errors-in-variables model with linear splines with various knot schemes could not be
sensibly fitted in WinBUGS as a single model for the full 5 day dataset. (Though, in retrospect, this may
well have been due to the very heavy tailed priors used for the latent variable for depth. Instead of the
Half-Cauchy of Marley and Wand [2010], the Gamma prior of Wakefield et al. [2000] may be a more
successful choice.) Given that WinBUGS failed us, a different computational choice was made. Thus,
this paper describes a block updating Gibbs sampler used within purpose-built software for the Gibbs
sampling (pyMCMC of Strickland [2010]), and uses the CAR proper prior of Gelfand and Vounatsou
[2003], to build the layered CAR model. Again, the first order adjacency matrix within each depth, which
had been found to be the best performer in the comparisons of Chapter 5, was used. The methodological
contribution of this chapter is the development together with Chris Strickland of the block updating Gibbs
sampler for the CAR layered model, and its description. This block updating Gibbs sampler permits the
possibility of analysing large datasets.
290 CHAPTER 8. CONCLUSIONS AND FURTHER WORK
The applied contribution of Chapter 6 was to show that by a depth of about 200 cm, the trends in the
moisture observations had flattened. That is, that moisture was a constant dependent on treatment from
this depth onward, with, however, greater variance with increasing depth from this depth.
In Chapter 7, we situated our work within the spatio-temporal modelling literature. We found that
treating depth differently from the horizontal spatial dimensions was echoed in the oceanic work of
Ridgway et al. [2002], while in the kernel convolution modelling of Sahu and Challenor [2008] which
considered oceanic temperatures and salinity, separate models are built for three depths. We fitted a
model with different treatment means for each day and depth, together with the (modified) CAR spatial
model of Chapter 5, at each sampling date, to give a complex spatio-temporal model which gave point
estimates for the contrast of interest together with 95% credible intervals. We also treated the point
estimates for the contrast as the starting point for time series modelling, and determined that a weighted
random walk of order 1 best described most of the depth series over time. In this chapter, we argued
that in the context of an agricultural experiment, it is not sensible to smooth treatment curves prior to
calculation of a contrast. We also demonstrated and argued that data arising from a three-dimensional (in
space) agricultural experiment over time are likely to have a complex error structure. We went some way
towards an adequate description.
It may be argued that we could/should have fitted kriging models with sparsity created in the variance
matrices by threshholding. This could have been done, but my feeling is that with a layout of 6×18 rows
by columns, spatial continuities are harder to see and model, with signal to noise ratios being more
difficult to disentangle. Equally, we could have used the Gaussian kernel smooths of Higdon [1998],
but this is may be viewed as just another set of weights with a bandwidth to be chosen. Both of these
possibilities seem worthwhile but probably only when there are considerably more than 6 measured
points along a dimension.
Using a single point estimate for the contrast for each day and depth led to a problem of model choice,
when random walk models of order one were amongst those being considered. These models would seem
to be extremely sensitive to the choice of priors for the structured and observational error. This posed a
major problem for our modelling and the problem is not resolved by moving from a model choice based
on the smallest DIC to a model choice based on minimal forecasting error. Doubtless, we could choose
a grid for the precision priors, or some other sampling process for the priors, but it appears to remain
a considerable task to minimise the error, and having completed the task, the possibility remains that a
8.2. Future Work 291
model chosen this way might still be a join-the-dots model.
In working through potential models for the paper of Chapter 6, it seemed remarkable that none of
the various models tried bettered the three knot model in terms of the DIC. Given that a single observation
has its error partitioned into two components, an unstructured and a structured error, it may again be the
case that the partitioning of the two error types is sensitive on the priors when the data are Gaussian,
and the neighbours are few. This problem does not arise for epidemiological data which typically model
a proportion or a count and thus the unstructured error is binomial or Poisson, dictated by the mean
structure, itself dependent on the spatially structured error, giving rise to a complex interplay which
determines both the unstructured error and the spatially structured error. The possibility of differing fits
with differing priors needs to be explored in the case of the CAR spatial models also.
In Chapter 7, we presented a way of modelling over the dimensions of time and space, which allowed
space-time interactions, in contrast to the many models where space and time effects are additive. By
using the CAR layered model repeatedly, we showed how a full interaction spatio-temporal model might
be fitted and the contrast of interest be found, and how it behaved over time. We applied this methodology
to 90,720 observations, and found periodic behaviour in the contrast difference at the shallower levels,
and that response cropping generally led to less moist soils than long fallowing, although this difference
did not always have a 95% credible interval which failed to include zero.
8.2 Future Work
The Half-Cauchy priors of Chapter 5 failed to generate good fits when we tried to fit the full five days data
of Chapter 6. In retrospect, this work should be reviewed using the Gamma(.5,.0005) prior of Wakefield
et al. [2000].
The neighbourhood models should be calibrated to a kriging model [Hrafnkelsson and Cressie,
2003], or via the work of Lindgren et al. [2011].
The fits (together with the DIC) need to be explored via simulation for the case where both structured
and unstructured error are Gaussian. There seems to be a problem of extreme sensitivity to precision
priors when there are two errors with only one observation at each timepoint few neighbours, when
the model is Gaussian. This problem may also arise in the CAR spatial model where again there is
one observation at each spatial point. In our model, both error components are Gaussian, unlike many
instances of CAR models for epidemiological data where the fitted values are frequently binomial or
292 CHAPTER 8. CONCLUSIONS AND FURTHER WORK
Poisson.
The final four-dimensional modelling of Chapter 7 may be thought of as unsatisfactory in a number
of ways. Fitting the full model as a set of daily models restricts the ability to describe what is happening
over time. Thus, ideally, the two-stage model of Chapter 7 should be fitted as a single model for all the
reasons given by Gerlach et al. [2000]. As in Abellan et al. [2008], the quantity of interest together with
an appropriate prior should be embedded in a full model. To see just how to do this, the two-stage model
needs to be more fully exploited.
Chapter 7 noted a problem with the random walk models (which were felt to best describe the time-
varying behaviour of the contrast of interest). A simple solution (but somewhat ad-hoc) for the non-
identifiability of the time series random walk models, would be to use repeated observations at each time
point. This solution has legitimacy since within each day there are 12 repeats of each treatment and hence
a whole series of possible calculations for each contrast within a day. Thus, it seems not inappropriate to
take 12 MCMC simulations of the contrast for a day and fit these to the various time series models. Such
contrasts have already been spatially adjusted. In Figure 8.1, we show the fit of an RW1 model together
with the mean contrast value for the day. This model had a pD of 53, in other words, almost all the date
degrees of freedom.
Thinking further about this, we note that the MCMC iterates form a sample of the distribution of the
contrast at each day and depth. We should be able to use a subsample of these MCMC samples to fit the
second-stage model to get appropriate credible intervals and a meaningful description in the dimension
of time. The question is, do we need a sample of 10000, 100 or 12?
Table 8.1 compares two random (sampled from the MCMC iterates) samples of different size and
two second-stage models at depth 100 cm, and shows that for a simple random walk of order one, sample
sizes of 12 and 100 produce essentially equivalent results. Comparing the DIC for models with the same
sample size, we see the superiority of the RW1 model at both sample sizes in comparison with the AR1
plus covariates model. The pD shows that the effective number of parameters is unchanged for the fits
from sample size to sample size, and estimates of the standard errors are constant from sample size to
sample size, which shows that both 12 and 100 are sufficiently large samples to represent the distribution
of the contrast under the models’ assumptions.
However, with a sample of the contrast at each time point, more complex time series models, with
differing observational variances at each time point should be fitted. For example, one possibility is to fit a
8.2. Future Work 293
two variance observational error (a contaminated mixture model), and an example is shown in Figure 8.2.
(Figure 8.1, shows the fit of the single variance RW1 model.) More realistically, the variance of the
variance probably varies continuously, and models to reflect this should be fitted. Such a model with its
additional complexity will need a larger sample size of the MCMC iterates in order to be estimated, and
again, the problem arises of how large this sample size should be.
The graphs of the estimates of square roots of the structural (spatial) variances and the unstructured
variance components from Equation 7.1 are shown in Figures C.29- C.35 in Appendix C.2 of Chapter 7.
These show something of the complexity of modelling longitudinal experimental agricultural data.
A further issue for future work is that the 90,720 measurements used in the analysis of Chapter 7
did not include the days for which no measurements at all were taken for treatments 7 & 8. Within the
structure of a repeated daily model, with no postulated distribution over time, it did not seem sensible to
fit a distribution for the missing data, when information about an entire treatment group was missing. In
an integrated model, where variation over time is incorporated into the model structure, such data could
meaningfully be included. Thus, a missing data model fits into the need for an integrated space time
interaction model for the data.
Futher exploration of the modelling of the contrasts as multivariate time series would seem to be an
appropriate way to start to build a fully integrated model.
Considering yet again, the issue of the random walk models of order one, a further way of exploring
this problem would be to normalise the response variable prior to fitting, in order to change the apparent
non-informativeness of the priors. That is, a prior is uninformative relative to the scale of the data.
294 CHAPTER 8. CONCLUSIONS AND FURTHER WORK
Figure 8.1 Random walk of order one & 95% credible intervals at depth 100 cm. Fitted to 12posterior contrast estimates at each time point.
Table 8.1 Comparison of some fits for the contrast Long fallowing vs Response cropping atDepth 100.
Model Sample size pD DIC s s2
AR1+5 12 9 -2994 .026 (.025, .027)100 9 -3190 .022 (.021, .024)
RW1 12 53 -3619 .021 (.017, .026) .016 (.015, .017)
100 52 -3585 .021 (.017, .026) .016 (.015, .017)
s: square root of the posterior observational variance (in the RW1 models)s2: square root of the posterior system variance (in the RW1 models)
8.2. Future Work 295
Figure 8.2 Contaminated observational error: Random walk of order one & 95% credible inter-vals at depth 100 cm. Fitted to 12 posterior contrast estimates at each time point.
296 CHAPTER 8. CONCLUSIONS AND FURTHER WORK
Bibliography
Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investigate the stability of
patterns of disease.(Mini-Monograph). Environmental Health Perspectives 116(8), 1111–1119.
Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial
data analysis. Biostatistics 4(1), 11–25.
Gerlach, R., C. Carter, and R. Kohn (2000). Efficient Bayesian inference for dynamic mixture models.
Journal of the American Statistical Association 95(451), 818–828.
Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North Atlantic
Ocean. Environmental and Ecological Statistics 5, 173–190.
Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application to nuclear
fall-out. Environmental and Ecological Statistics 10, 179–200.
Lindgren, F., H. Rue, and J. Lindstrom (2011). An explicit link between Gaussian fields and Gaussian
Markov random fields: the stochastic partial differential equation approach. Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.
Marley, J. K. and M. P. Wand (2010). Non-standard semiparametric regression via BRugs. Journal of
Statistical Software 37(5), 1–30.
Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted least
squares-application to the waters around Australasia. Journal of Atmospheric and Oceanic Technol-
ogy 19(9), 1357–1375.
Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean temperature and
salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal of Com-
putational and Graphical Statistics, 1–46. submitted August, 2010.
Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle.
Citeseer.
BIBLIOGRAPHY 297
Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a Belief Net
response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172, 483–513.
Wakefield, J., N. Best, and L. Waller (2000). Bayesian approaches to disease mapping. In P. Elliott,
J. Wakefield, N. Best, and D. Briggs (Eds.), Spatial Epidemiology: Methods and Applications, pp.
104–127. Oxford: Oxford University Press.
Appendices
299
Appendix A
Some mathematical models
A.1 WinBUGS code for the model 2 Bayesian net of Paper 1
# AngusOriginal260808CI
# Age structure modified to reflect Australian population of 2000
Model for (i in 1:N)
node[i,1] ˜ dbeta( a[1],b[1]) # Primary source water
PSW[i] ˜ dbern(node[i,1])
node[i,5] ˜ dbeta(a[5],b[5]) # Other source water
OSW[i] ˜ dbern(node[i,5])
node[i,6] ˜ dbeta(a[6],b[6]) # Reprocessing
Rep[i] ˜ dbern(node[i,6])
node[i,7] ˜ dbeta(a[7],b[7]) # Other planned/unplanned supply
OPPS[i] ˜ dbern(node[i,7])
node2[i] <- 2*PSW[i]+ OSW[i] + 1 # Primary treatment
X[i,2] ˜ dbeta(aa2[node2[i]],bb2[node2[i]])
PT[i] ˜ dbern(X[i,2])
node[i,3] <- 2*Rep[i] + PT[i] + 1 # Storage
X[i,3] ˜ dbeta(aa3[node[i,3]],bb3[node[i,3]])
Storage[i] ˜ dbern(X[i,3])
node[i,4] <- 2*OPPS[i] + Storage[i] + 1 # Endpoint distribution
X[i,4] ˜ dbeta(aa4[node[i,4]],bb4[node[i,4]])
ED[i] ˜ dbern(X[i,4])
301
302 APPENDIX A.
node[i,8] ˜ dbeta(a[8],b[8]) # Planned/unplanned Use
Puse[i] ˜ dbern(node[i,8])
node[i,10] ˜ dbeta(a[10],b[10]) # Exposure period (Short)
EP[i] ˜ dbern(node[i,10])
node[i,12] ˜ dbeta(a[12],b[12]) # Pathogen uptake (Low)
PU[i] ˜ dbern(node[i,12])
Age[i] ˜ dcat(p[])
node[i,9] <- 2*ED[i] + Puse[i] + 1 # Pathogen Load
X[i,9] ˜ dbeta(aa9[node[i,9]],bb9[node[i,9]])
PL[i] ˜ dbern(X[i,9])
node[i,11] <- 4*PU[i] + 2*EP[i] + PL[i] + 1 # Cumulative dose
X[i,11] ˜ dbeta(aa11[node[i,11]],bb11[node[i,11]])
CD[i] ˜ dbern(X[i,11])
node[i,13] <- 3*CD[i] + Age[i] # Gastroenteritis (Yes)
X[i,13] ˜ dbeta(aa13[node[i,13]],bb13[node[i,13]])
Gastro[i] ˜ dbern(X[i,13])
# Partitioning the sample
CD1[i] <- step(CD[i]-1)
ED0[i] <- step(-ED[i])
Puse0[i] <- step(-Puse[i])
CD1Gastro[i] <- CD1[i]*Gastro[i]
ED0Gastro[i] <- ED0[i]*Gastro[i]
Puse0Gastro[i] <- Puse0[i]*Gastro[i]
ED0Puse0Gastro[i] <- Puse0Gastro[i]*ED0[i]
ED0Puse0[i] <- Puse0[i]*ED0[i]
Age1[i] <- equals(Age[i],1)
Age2[i] <- equals(Age[i],2)
Age3[i] <- equals(Age[i],3)
CD1Age1[i] <- step(CD[i]*equals(Age[i],1)-1)
CD1Age2[i] <- step(CD[i]*equals(Age[i],2)-1)
CD1Age3[i] <- step(CD[i]*equals(Age[i],3)-1)
CD1Age1Gastro[i] <- CD1Age1[i]*Gastro[i]
CD1Age2Gastro[i] <- CD1Age2[i]*Gastro[i]
CD1Age3Gastro[i] <- CD1Age3[i]*Gastro[i]
ED0Age1[i] <- ED0[i]*equals(Age[i],1)
A.1. WinBUGS code for the model 2 Bayesian net of Paper 1 303
ED0Age2[i] <- ED0[i]*equals(Age[i],2)
ED0Age3[i] <- ED0[i]*equals(Age[i],3)
ED0Age1Gastro[i] <- ED0Age1[i]*Gastro[i]
ED0Age2Gastro[i] <- ED0Age2[i]*Gastro[i]
ED0Age3Gastro[i] <- ED0Age3[i]*Gastro[i]
ED0Puse0Age1[i] <- ED0Puse0[i]*equals(Age[i],1)
ED0Puse0Age2[i] <- ED0Puse0[i]*equals(Age[i],2)
ED0Puse0Age3[i] <- ED0Puse0[i]*equals(Age[i],3)
ED0Puse0Age1Gastro[i] <- ED0Puse0Age1[i]*Gastro[i]
ED0Puse0Age2Gastro[i] <- ED0Puse0Age2[i]*Gastro[i]
ED0Puse0Age3Gastro[i] <- ED0Puse0Age3[i]*Gastro[i]
Age1Gastro[i] <- Age1[i]*Gastro[i]
Age2Gastro[i] <- Age2[i]*Gastro[i]
Age3Gastro[i] <- Age3[i]*Gastro[i]
e[1] <- sum(ED0Age1[]) # E(n) for ED=0, Age=1
e[2] <- sum(ED0Age2[]) # E(n) for ED=0, Age=2
e[3] <- sum(ED0Age3[]) # E(n) for ED=0, Age=3
e[4] <- sum(ED0Puse0Age1[]) # E(n) for ED=0, Puse=0, Age=1
e[5] <- sum(ED0Puse0Age2[]) # E(n) for ED=0, Puse=0, Age=2
e[6] <- sum(ED0Puse0Age3[]) # E(n) for ED=0, Puse=0, Age=3
r[1] <- sum(PT[])/N # p for PT: node 2
r[2] <- sum(Storage[])/N # p for Storage: node 3
r[3] <- sum(ED[])/N # p for Endpoint distribution:
# node 4
r[4] <- sum(PL[])/N # prop for PL: node 9
r[5] <- sum(CD[])/N # prop for CD: node 11
r[6] <- sum(Gastro[])/N # prop for Gastro: node 13
# Conditional probabilities
r[7] <- sum(CD1Gastro[])/sum(CD1[])
# prob for Gastro: All population groups (CD acceptable)
r[8] <- sum(ED0Gastro[])/sum(ED0[])
# prob for Gastro: All population groups (ED fails)
r[20] <- sum(ED0Puse0Gastro[])/sum(ED0Puse0[])
# prob for Gastro: All population groups (ED fails & Puse=0)
r[9] <- r[6]/r[7] # RR Gastro: All population groups:
# r[6]/ r[7] CD v CD1
r[10] <- r[8]/r[7] # RR Gastro: All population groups
# ED0 v CD1
r[21] <- r[20]/r[7] # RR Gastro: All population groups
# ED0Puse v CD1
304 APPENDIX A.
r[11] <- sum(CD1Age1Gastro[])/sum(CD1Age1[])
# prob for Gastro: <5 (CD acceptable)
r[12] <- sum(CD1Age2Gastro[])/sum(CD1Age2[])
# prob for Gastro: 5-64 (CD acceptable)
r[13] <- sum(CD1Age3Gastro[])/sum(CD1Age3[])
# prob for Gastro: 65+ (CD acceptable)
r[14] <- sum(Age1Gastro[])/sum(Age1[]) # prob for Gastro: <5
r[15] <- sum(Age2Gastro[])/sum(Age2[]) # prob for Gastro: 5-64
r[16] <- sum(Age3Gastro[])/sum(Age3[]) # prob for Gastro: 65+
r[17] <- r[14]/r[11] # RR Gastro: <5: CD v CD1
r[18] <- r[15]/r[12] # RR Gastro: 5-64: CD v CD1
r[19] <- r[16]/r[13] # RR Gastro: 65+: CD v CD1
r[22] <- sum(ED0Age1Gastro[])/sum(ED0Age1[])
# prob for Gastro: (ED fails & Age<5)
r[23] <- sum(ED0Age2Gastro[])/sum(ED0Age2[])
# prob for Gastro: (ED fails & Age 5-64)
r[24] <- sum(ED0Age3Gastro[])/sum(ED0Age3[])
# prob for Gastro: (ED fails & Age 65+)
r[25] <- r[22]/r[11] # RR Gastro: <5: ED0 v CD1
r[26] <- r[23]/r[12] # RR Gastro: 5-64: ED0 v CD1
r[27] <- r[24]/r[13] # RR Gastro: 65$+$: ED0 v CD1
r[28] <- sum(ED0Puse0Age1Gastro[])/sum(ED0Puse0Age1[])
# prob for Gastro: (ED fails Puse=0 & Age<5)
r[29] <- sum(ED0Puse0Age2Gastro[])/sum(ED0Puse0Age2[])
# prob for Gastro: (ED fails Puse=0 & Age 5-64)
r[30] <- sum(ED0Puse0Age3Gastro[])/sum(ED0Puse0Age3[])
# prob for Gastro: (ED fails Puse=0 & Age 65+)
r[31] <- r[28]/r[11] # RR Gastro: <5: ED0Puse0 v CD1
r[32] <- r[29]/r[12] # RR Gastro: 5-64: ED0Puse0 v CD1
r[33] <- r[30]/r[13] # RR Gastro: 65+: ED0Puse0 v CD1
for(k in 1:4)
aa2[k] ˜ dgamma(.01,.01)
bb2[k] ˜ dgamma(.01,.01)
aa3[k] ˜ dgamma(.01,.01)
bb3[k] ˜ dgamma(.01,.01)
aa4[k] ˜ dgamma(.01,.01)
bb4[k] ˜ dgamma(.01,.01)
A.1. WinBUGS code for the model 2 Bayesian net of Paper 1 305
aa9[k] ˜ dgamma(.01,.01)
bb9[k] ˜ dgamma(.01,.01)
for (j in 1:6)
aa13[j] ˜ dgamma(.01,.01)
bb13[j] ˜ dgamma(.01,.01)
for (jj in 1:8)
aa11[jj] ˜ dgamma(.01,.01)
bb11[jj] ˜ dgamma(.01,.01)
#Data
list(N=50000,
p=c(.0671, .8096, .1233),
a=c(6.751,NA, NA, NA,.599, 59.2524, 48.372, 30.217, NA,
30.217, NA, 47.52),
b=c(668.37, NA,NA, NA,59.25, .59851, 12.093, 3.35744, NA,
3.35744, NA, 47.52),
aa2=c(55.687, 117.083, 375.525,5.135),
bb2=c(2.32, 2.39, 3.79, .01),
aa3=c(48.372,59.252,30.217,239.98),
bb3=c(12.09,.60, 3.36, 2.42),
aa4=c(8.335,21.054, 30.217, 152.358),
bb4=c(3.57, 5.26, 3.36, .15),
aa9=c(8.335, 68.391, 172.529, 152.358),
bb9=c(3.57, 3.60, 5.34,.15),
aa11=c(8.335, 7.068, 30.217, 172.529, 92.103, 68.391, 117.083, 152.358),
bb11=c(3.57, 1.77, 3.36, 5.34, 6.93, 3.60, 2.39, .15),
aa13=c(8.82, 12.093, 10.456, 3.6, 3.739, 5.33595),
bb13=c(13.23, 48.37, 24.40, 68.39, 375.53, 172.529)
)
Appendix B
Supplementary materials for Chapter
Six
B.1 Supplementary tables
These tables tabulate the elements in the model which remain constant across the dates considered. The
symbols used are those used in Chapter 6. Thus, σ represents the square root of the unstructured variance
on a particular date and at a particular depth, and κ, the square root of the spatially structured variance on
a particular date and at a particular depth.
307
308 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.1 Differences in σ for depths from 20cm to 100cm
Depth Day1 Day2 Est q025 q975 Sig
20 1 2 0.011 -0.029 0.0713 0.007 -0.042 0.0674 0.027 0.002 0.083 *5 -0.011 -0.062 0.058
20 2 3 -0.004 -0.048 0.0314 0.016 0.001 0.046 *5 -0.022 -0.066 0.022
20 3 4 0.020 0.001 0.062 *5 -0.018 -0.065 0.035
20 4 5 -0.038 -0.076 -0.005 *
40 1 2 0.013 -0.069 0.0683 0.018 -0.068 0.0694 0.052 0.008 0.083 *5 0.030 -0.019 0.066
40 2 3 0.005 -0.083 0.0834 0.040 0.004 0.100 *5 0.017 -0.026 0.082
40 3 4 0.034 0.004 0.103 *5 0.012 -0.027 0.083
40 4 5 -0.023 -0.040 -0.006 *
60 1 2 0.006 -0.040 0.0443 0.004 -0.051 0.0454 0.029 0.002 0.056 *5 0.022 -0.005 0.050
60 2 3 -0.002 -0.057 0.0484 0.023 -0.001 0.0605 0.016 -0.008 0.054
60 3 4 0.025 -0.001 0.0725 0.018 -0.009 0.066
60 4 5 -0.007 -0.017 0.004
80 1 2 -0.003 -0.028 0.0213 -0.004 -0.035 0.0224 0.008 -0.008 0.0275 0.013 -0.002 0.031
80 2 3 -0.001 -0.033 0.0284 0.011 -0.007 0.0345 0.016 -0.002 0.038
80 3 4 0.012 -0.008 0.0415 0.017 -0.002 0.044
80 4 5 0.005 -0.005 0.015
100 1 2 0.001 -0.014 0.0173 -0.002 -0.020 0.0154 0.004 -0.009 0.0185 0.009 -0.002 0.022
100 2 3 -0.003 -0.020 0.0134 0.002 -0.010 0.0165 0.008 -0.002 0.020
100 3 4 0.005 -0.009 0.0225 0.011 -0.001 0.026
100 4 5 0.006 -0.003 0.015
B.1. Supplementary tables 309
Table B.2 Differences in σ for depths from 120 cm to 200 cm
Depth Day1 Day2 Est q025 q975 Sig
120 1 2 0.004 -0.006 0.0153 0.001 -0.010 0.0134 0.003 -0.007 0.0145 0.007 -0.002 0.018
120 2 3 -0.003 -0.014 0.0074 -0.001 -0.010 0.0085 0.003 -0.004 0.011
120 3 4 0.002 -0.008 0.0135 0.006 -0.002 0.016
120 4 5 0.004 -0.004 0.013
140 1 2 0.002 -0.004 0.0093 0.001 -0.006 0.0074 -0.000 -0.007 0.0075 0.003 -0.003 0.009
140 2 3 -0.001 -0.008 0.0054 -0.002 -0.009 0.0045 0.001 -0.005 0.007
140 3 4 -0.001 -0.008 0.0065 0.002 -0.004 0.009
140 4 5 0.003 -0.003 0.010
160 1 2 0.001 -0.005 0.0073 0.001 -0.004 0.0074 0.000 -0.005 0.0065 0.002 -0.003 0.008
160 2 3 0.000 -0.005 0.0064 -0.001 -0.006 0.0055 0.001 -0.004 0.006
160 3 4 -0.001 -0.006 0.0055 0.001 -0.004 0.006
160 4 5 0.002 -0.004 0.007
180 1 2 0.000 -0.005 0.0063 0.001 -0.004 0.0064 0.001 -0.004 0.0065 0.001 -0.004 0.007
180 2 3 0.001 -0.004 0.0064 0.001 -0.004 0.0065 0.001 -0.004 0.006
180 3 4 -0.000 -0.005 0.0055 0.000 -0.004 0.005
180 4 5 0.000 -0.004 0.005
200 1 2 0.000 -0.005 0.0063 -0.000 -0.006 0.0054 0.000 -0.005 0.0065 0.001 -0.005 0.006
200 2 3 -0.000 -0.006 0.0054 0.000 -0.005 0.0055 0.001 -0.005 0.006
200 3 4 0.001 -0.005 0.0065 0.001 -0.004 0.006
200 4 5 0.000 -0.005 0.006
310 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.3 Differences in σ for depths from 220 cm to 300 cm
Depth Day1 Day2 Est q025 q975 Sig
220 1 2 -0.004 -0.012 0.0033 -0.005 -0.013 0.0024 -0.004 -0.011 0.0035 -0.005 -0.012 0.002
220 2 3 -0.001 -0.009 0.0084 0.001 -0.007 0.0095 -0.001 -0.009 0.008
220 3 4 0.001 -0.007 0.0105 0.000 -0.008 0.009
220 4 5 -0.001 -0.009 0.007
240 1 2 -0.002 -0.011 0.0073 -0.002 -0.012 0.0074 -0.003 -0.013 0.0075 -0.002 -0.011 0.007
240 2 3 -0.001 -0.010 0.0094 -0.001 -0.011 0.0095 -0.000 -0.010 0.009
240 3 4 -0.001 -0.011 0.0095 0.000 -0.009 0.010
240 4 5 0.001 -0.009 0.011
260 1 2 0.002 -0.009 0.0153 0.003 -0.009 0.0154 0.003 -0.009 0.0155 0.003 -0.009 0.015
260 2 3 0.000 -0.011 0.0114 0.001 -0.010 0.0115 0.001 -0.010 0.011
260 3 4 0.000 -0.010 0.0115 0.000 -0.010 0.011
260 4 5 0.000 -0.010 0.011
280 1 2 0.000 -0.018 0.0213 0.001 -0.017 0.0224 0.002 -0.016 0.0225 0.001 -0.016 0.022
280 2 3 0.001 -0.016 0.0194 0.002 -0.015 0.0195 0.001 -0.015 0.019
280 3 4 0.001 -0.016 0.0175 0.000 -0.016 0.017
280 4 5 -0.000 -0.017 0.016
300 1 2 -0.004 -0.031 0.0193 -0.002 -0.027 0.0204 -0.002 -0.028 0.0215 -0.004 -0.030 0.020
300 2 3 0.002 -0.025 0.0304 0.002 -0.025 0.0305 0.000 -0.028 0.030
300 3 4 0.000 -0.026 0.0265 -0.001 -0.029 0.025
300 4 5 -0.001 -0.029 0.026
B.1. Supplementary tables 311
Table B.4 Differences in κ for depths from 20cm to 100cm
Depth Day1 Day2 Est q025 q975 Sig
20 1 2 -0.043 -0.081 -0.012 *3 0.006 -0.026 0.0324 0.035 0.007 0.054 *5 0.015 -0.020 0.042
20 2 3 0.049 0.021 0.080 *4 0.078 0.055 0.105 *5 0.058 0.028 0.091 *
20 3 4 0.029 0.009 0.044 *5 0.009 -0.017 0.033
20 4 5 -0.020 -0.039 -0.003 *
40 1 2 -0.020 -0.053 0.0223 -0.027 -0.059 0.0194 0.015 0.002 0.038 *5 0.008 -0.007 0.032
40 2 3 -0.007 -0.052 0.0434 0.035 0.004 0.062 *5 0.028 -0.004 0.056
40 3 4 0.042 0.005 0.068 *5 0.035 -0.003 0.062
40 4 5 -0.007 -0.013 -0.002 *
60 1 2 -0.009 -0.026 0.0113 -0.014 -0.034 0.0114 0.012 0.002 0.023 *5 0.010 -0.001 0.022
60 2 3 -0.005 -0.029 0.0214 0.020 0.004 0.034 *5 0.018 0.002 0.032 *
60 3 4 0.025 0.004 0.042 *5 0.023 0.002 0.040 *
60 4 5 -0.002 -0.005 0.000
80 1 2 -0.003 -0.011 0.0063 -0.007 -0.018 0.0044 0.005 -0.000 0.0115 0.007 0.001 0.012 *
80 2 3 -0.005 -0.016 0.0074 0.008 0.001 0.015 *5 0.009 0.002 0.016 *
80 3 4 0.012 0.002 0.022 *5 0.014 0.004 0.023 *
80 4 5 0.002 -0.001 0.004
100 1 2 0.001 -0.004 0.0053 -0.002 -0.007 0.0044 0.003 -0.001 0.0075 0.005 0.002 0.009 *
100 2 3 -0.002 -0.008 0.0034 0.002 -0.001 0.0065 0.004 0.001 0.008 *
100 3 4 0.005 0.000 0.009 *5 0.007 0.002 0.011 *
100 4 5 0.002 -0.000 0.004
312 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.5 Differences in κ for depths from 120 cm to 200 cm
Depth Day1 Day2 Est q025 q975 Sig
120 1 2 0.002 -0.001 0.0053 0.000 -0.003 0.0044 0.002 -0.001 0.0055 0.003 0.001 0.006 *
120 2 3 -0.002 -0.004 0.0014 -0.000 -0.002 0.0025 0.001 -0.000 0.003
120 3 4 0.001 -0.001 0.0045 0.003 0.001 0.006 *
120 4 5 0.002 -0.000 0.004
140 1 2 0.001 -0.001 0.0023 0.000 -0.002 0.0024 -0.000 -0.002 0.0025 0.001 -0.000 0.002
140 2 3 -0.001 -0.002 0.0014 -0.001 -0.002 0.0015 0.000 -0.001 0.001
140 3 4 -0.000 -0.002 0.0025 0.001 -0.000 0.002
140 4 5 0.001 -0.000 0.002
160 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0024 0.000 -0.001 0.0015 0.001 -0.000 0.002
160 2 3 0.000 -0.001 0.0014 -0.000 -0.001 0.0015 0.000 -0.001 0.001
160 3 4 -0.000 -0.001 0.0015 0.000 -0.001 0.001
160 4 5 0.000 -0.001 0.001
180 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001
180 2 3 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001
180 3 4 0.000 -0.001 0.0015 0.000 -0.001 0.001
180 4 5 0.000 -0.001 0.001
200 1 2 0.000 -0.001 0.0013 0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001
200 2 3 -0.000 -0.001 0.0014 0.000 -0.001 0.0015 0.000 -0.001 0.001
200 3 4 0.000 -0.001 0.0015 0.000 -0.001 0.001
200 4 5 0.000 -0.001 0.001
B.1. Supplementary tables 313
Table B.6 Differences in κ for depths from 220 cm to 300 cm
Depth Day1 Day2 Est q025 q975 Sig
220 1 2 -0.001 -0.003 0.0003 -0.001 -0.003 0.0004 -0.001 -0.003 0.0005 -0.001 -0.003 0.000
220 2 3 -0.000 -0.002 0.0024 0.000 -0.002 0.0025 -0.000 -0.002 0.002
220 3 4 0.000 -0.002 0.0025 -0.000 -0.002 0.002
220 4 5 -0.000 -0.002 0.002
240 1 2 -0.000 -0.002 0.0023 -0.001 -0.003 0.0024 -0.001 -0.003 0.0025 -0.000 -0.002 0.002
240 2 3 -0.000 -0.003 0.0024 -0.000 -0.003 0.0025 0.000 -0.002 0.002
240 3 4 -0.000 -0.003 0.0025 0.001 -0.002 0.003
240 4 5 0.001 -0.002 0.003
260 1 2 0.003 -0.001 0.0063 0.003 -0.001 0.0064 0.002 -0.001 0.0065 0.003 -0.001 0.007
260 2 3 0.000 -0.003 0.0034 -0.000 -0.003 0.0035 0.000 -0.003 0.003
260 3 4 -0.000 -0.003 0.0035 0.000 -0.003 0.003
260 4 5 0.000 -0.002 0.003
280 1 2 0.005 -0.002 0.0123 0.006 -0.000 0.0134 0.007 0.000 0.014 *5 0.007 -0.000 0.013
280 2 3 0.001 -0.004 0.0064 0.002 -0.003 0.0075 0.001 -0.004 0.007
280 3 4 0.001 -0.004 0.0065 0.000 -0.005 0.005
280 4 5 -0.000 -0.005 0.004
300 1 2 -0.000 -0.010 0.0103 -0.000 -0.010 0.0104 -0.001 -0.011 0.0095 -0.002 -0.012 0.009
300 2 3 -0.000 -0.011 0.0114 -0.001 -0.012 0.0105 -0.002 -0.013 0.010
300 3 4 -0.001 -0.012 0.0105 -0.001 -0.012 0.010
300 4 5 -0.001 -0.012 0.011
314 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.7 Differences in slope from 200 cm - 300 cm for each treatment across days
Treatment Day1 Day2 Est q025 q975 Sig
1 1 2 -0.002 -0.016 0.0123 0.014 0.000 0.028 *4 0.020 0.007 0.034 *5 0.001 -0.012 0.015
1 2 3 0.017 0.003 0.030 *4 0.023 0.009 0.036 *5 0.004 -0.010 0.017
1 3 4 0.006 -0.007 0.0195 -0.013 -0.026 0.000
1 4 5 -0.019 -0.032 -0.006 *
2 1 2 -0.006 -0.020 0.0083 -0.002 -0.016 0.0124 -0.001 -0.015 0.0135 -0.006 -0.019 0.008
2 2 3 0.004 -0.010 0.0184 0.005 -0.009 0.0185 0.000 -0.013 0.014
2 3 4 0.001 -0.012 0.0145 -0.004 -0.017 0.010
2 4 5 -0.004 -0.018 0.008
3 1 2 -0.002 -0.016 0.0123 -0.000 -0.014 0.0144 0.004 -0.010 0.0185 -0.000 -0.014 0.014
3 2 3 0.002 -0.012 0.0154 0.006 -0.007 0.0195 0.002 -0.012 0.015
3 3 4 0.004 -0.009 0.0185 0.000 -0.013 0.013
3 4 5 -0.004 -0.017 0.009
4 1 2 -0.004 -0.019 0.0103 0.000 -0.014 0.0144 0.003 -0.011 0.0175 -0.001 -0.016 0.013
4 2 3 0.005 -0.009 0.0184 0.007 -0.007 0.0205 0.003 -0.010 0.017
4 3 4 0.003 -0.011 0.0165 -0.001 -0.015 0.012
4 4 5 -0.004 -0.017 0.009
B.1. Supplementary tables 315
Table B.8 Differences in slope from 200 cm - 300 cm for each treatment across days
Treatment Day1 Day2 Est q025 q975 Sig
5 1 2 -0.006 -0.020 0.0083 -0.000 -0.014 0.0144 0.002 -0.012 0.0175 -0.008 -0.022 0.006
5 2 3 0.006 -0.008 0.0204 0.008 -0.005 0.0225 -0.002 -0.016 0.012
5 3 4 0.003 -0.011 0.0165 -0.008 -0.022 0.005
5 4 5 -0.011 -0.024 0.002
6 1 2 -0.003 -0.017 0.0103 0.000 -0.013 0.0144 0.002 -0.011 0.0165 -0.001 -0.015 0.012
6 2 3 0.004 -0.010 0.0174 0.006 -0.008 0.0195 0.002 -0.011 0.015
6 3 4 0.002 -0.011 0.0155 -0.002 -0.015 0.012
6 4 5 -0.004 -0.016 0.009
316 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.9 Differences in slope from 200 cm - 300 cm for each treatment across days
Treatment Day1 Day2 Est q025 q975 Sig
7 1 2 -0.003 -0.016 0.0103 -0.001 -0.014 0.0134 0.005 -0.008 0.0185 -0.002 -0.015 0.012
7 2 3 0.003 -0.010 0.0154 0.008 -0.005 0.0215 0.001 -0.012 0.014
7 3 4 0.005 -0.008 0.0185 -0.001 -0.014 0.012
7 4 5 -0.006 -0.019 0.007
8 1 2 -0.004 -0.018 0.0103 0.001 -0.013 0.0164 0.005 -0.009 0.0195 0.002 -0.012 0.015
8 2 3 0.005 -0.008 0.0194 0.009 -0.005 0.0235 0.006 -0.008 0.019
8 3 4 0.004 -0.010 0.0175 0.000 -0.013 0.014
8 4 5 -0.003 -0.017 0.010
9 1 2 -0.004 -0.018 0.0103 -0.016 -0.030 -0.003 *4 -0.022 -0.036 -0.008 *5 -0.007 -0.021 0.007
9 2 3 -0.013 -0.026 0.0014 -0.018 -0.031 -0.004 *5 -0.003 -0.017 0.010
9 3 4 -0.005 -0.018 0.0085 0.009 -0.004 0.022
9 4 5 0.015 0.001 0.028 *
B.1. Supplementary tables 317
Table B.10 Differences in slopes for each treatment on day 1
Day Trt1 Trt2 Est q025 q975 Sig
1 1 2 0.003 -0.007 0.0133 -0.008 -0.019 0.0024 0.003 -0.008 0.0135 -0.003 -0.013 0.0086 -0.006 -0.017 0.0047 -0.006 -0.016 0.0048 0.007 -0.002 0.0179 -0.006 -0.016 0.004
1 2 3 -0.011 -0.022 -0.001 *4 -0.000 -0.010 0.0105 -0.006 -0.016 0.0056 -0.009 -0.019 0.0027 -0.009 -0.018 0.0018 0.004 -0.005 0.0149 -0.009 -0.019 0.002
1 3 4 0.011 0.002 0.021 *5 0.006 -0.005 0.0166 0.002 -0.008 0.0137 0.003 -0.007 0.0138 0.016 0.006 0.025 *9 0.003 -0.007 0.013
1 4 5 -0.006 -0.016 0.0046 -0.009 -0.019 0.0027 -0.009 -0.018 0.0018 0.004 -0.005 0.0149 -0.009 -0.018 0.001
1 5 6 -0.003 -0.014 0.0087 -0.003 -0.013 0.0078 0.010 0.000 0.020 *9 -0.003 -0.013 0.007
1 6 7 0.000 -0.010 0.0118 0.013 0.003 0.023 *9 0.000 -0.010 0.010
1 7 8 0.013 0.004 0.022 *9 -0.000 -0.010 0.010
1 8 9 -0.013 -0.023 -0.003 *
318 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.11 Differences in slopes for each treatment on day 2
Day Trt1 Trt2 Est q025 q975 Sig
2 1 2 0.005 -0.004 0.0153 -0.002 -0.012 0.0074 0.005 -0.005 0.0145 0.002 -0.008 0.0116 0.000 -0.010 0.0107 -0.002 -0.012 0.0078 0.010 0.001 0.019 *9 -0.002 -0.012 0.008
2 2 3 -0.008 -0.017 0.0024 -0.000 -0.010 0.0095 -0.004 -0.013 0.0066 -0.005 -0.015 0.0057 -0.007 -0.016 0.0018 0.005 -0.004 0.0149 -0.007 -0.017 0.003
2 3 4 0.007 -0.002 0.0165 0.004 -0.006 0.0146 0.003 -0.007 0.0137 0.000 -0.009 0.0098 0.013 0.004 0.022 *9 0.001 -0.009 0.010
2 4 5 -0.003 -0.013 0.0066 -0.005 -0.014 0.0057 -0.007 -0.016 0.0028 0.005 -0.003 0.0149 -0.007 -0.015 0.002
2 5 6 -0.002 -0.011 0.0097 -0.004 -0.013 0.0058 0.009 -0.000 0.0189 -0.004 -0.013 0.006
2 6 7 -0.002 -0.012 0.0078 0.010 0.000 0.020 *9 -0.002 -0.011 0.007
2 7 8 0.012 0.004 0.021 *9 0.000 -0.009 0.010
2 8 9 -0.012 -0.021 -0.003 *
B.1. Supplementary tables 319
Table B.12 Differences in slopes for each treatment on day 3
Day Trt1 Trt2 Est q025 q975 Sig
3 1 2 -0.011 -0.021 -0.002 *3 -0.006 -0.016 0.0034 0.003 -0.006 0.0135 -0.003 -0.013 0.0076 -0.006 -0.016 0.0037 -0.006 -0.015 0.0038 0.008 -0.001 0.0179 -0.007 -0.017 0.002
3 2 3 0.005 -0.004 0.0144 0.014 0.005 0.024 *5 0.008 -0.001 0.0186 0.005 -0.004 0.0157 0.005 -0.003 0.0148 0.019 0.010 0.028 *9 0.004 -0.005 0.014
3 3 4 0.010 0.001 0.019 *5 0.003 -0.006 0.0136 0.000 -0.009 0.0107 0.001 -0.008 0.0108 0.014 0.005 0.023 *9 -0.001 -0.010 0.009
3 4 5 -0.006 -0.016 0.0036 -0.009 -0.019 0.0007 -0.009 -0.018 -0.000 *8 0.005 -0.004 0.0139 -0.010 -0.019 -0.001 *
3 5 6 -0.003 -0.013 0.0077 -0.003 -0.012 0.0068 0.011 0.001 0.020 *9 -0.004 -0.013 0.005
3 6 7 0.000 -0.009 0.0098 0.014 0.004 0.023 *9 -0.001 -0.010 0.008
3 7 8 0.014 0.005 0.022 *9 -0.001 -0.011 0.008
3 8 9 -0.015 -0.024 -0.006 *
320 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.13 Differences in slopes for each treatment on day 4
Day Trt1 Trt2 Est q025 q975 Sig
4 1 2 -0.017 -0.027 -0.008 *3 -0.007 -0.017 0.0024 -0.001 -0.011 0.0085 -0.006 -0.015 0.0046 -0.008 -0.018 0.0017 -0.008 -0.017 0.0018 0.003 -0.006 0.0129 -0.011 -0.021 -0.002 *
4 2 3 0.010 0.001 0.020 *4 0.016 0.007 0.025 *5 0.012 0.002 0.021 *6 0.009 -0.000 0.0197 0.010 0.001 0.018 *8 0.020 0.011 0.029 *9 0.007 -0.002 0.016
4 3 4 0.006 -0.003 0.0155 0.002 -0.008 0.0116 -0.001 -0.011 0.0087 -0.001 -0.009 0.0088 0.010 0.001 0.019 *9 -0.004 -0.013 0.006
4 4 5 -0.004 -0.014 0.0056 -0.007 -0.017 0.0027 -0.007 -0.015 0.0028 0.004 -0.005 0.0139 -0.010 -0.018 -0.001 *
4 5 6 -0.003 -0.012 0.0077 -0.002 -0.011 0.0078 0.008 -0.001 0.0179 -0.005 -0.014 0.004
4 6 7 0.001 -0.008 0.0108 0.011 0.002 0.020 *9 -0.002 -0.011 0.006
4 7 8 0.010 0.002 0.019 *9 -0.003 -0.012 0.006
4 8 9 -0.014 -0.022 -0.005 *
B.1. Supplementary tables 321
Table B.14 Differences in slopes for each treatment on day 5
Day Trt1 Trt2 Est q025 q975 Sig
5 1 2 0.002 -0.007 0.0113 -0.003 -0.012 0.0064 0.003 -0.006 0.0125 -0.001 -0.011 0.0086 0.002 -0.007 0.0117 -0.004 -0.013 0.0058 0.009 -0.000 0.0189 -0.007 -0.017 0.002
5 2 3 -0.004 -0.014 0.0054 0.001 -0.008 0.0115 -0.003 -0.013 0.0066 0.001 -0.009 0.0107 -0.006 -0.015 0.0038 0.007 -0.001 0.0179 -0.009 -0.018 0.001
5 3 4 0.006 -0.003 0.0155 0.001 -0.008 0.0116 0.005 -0.005 0.0157 -0.002 -0.010 0.0078 0.012 0.003 0.020 *9 -0.005 -0.014 0.004
5 4 5 -0.004 -0.013 0.0056 -0.001 -0.010 0.0097 -0.007 -0.016 0.0018 0.006 -0.003 0.0159 -0.010 -0.019 -0.001 *
5 5 6 0.004 -0.006 0.0137 -0.003 -0.012 0.0068 0.010 0.002 0.020 *9 -0.006 -0.015 0.003
5 6 7 -0.007 -0.015 0.0028 0.007 -0.003 0.0169 -0.010 -0.019 -0.001 *
5 7 8 0.013 0.005 0.022 *9 -0.003 -0.012 0.006
5 8 9 -0.016 -0.025 -0.008 *
322 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.15 Slopes for segment 200 cm - 300 cm for each treatment
Treatment Day (Date) Est q025 q975 Sig
1 1 -0.002 -0.010 0.0052 0.001 -0.006 0.0083 -0.001 -0.008 0.0054 -0.005 -0.012 0.0015 -0.002 -0.009 0.005
2 1 -0.005 -0.013 0.0022 -0.004 -0.011 0.0033 0.010 0.003 0.017 *4 0.012 0.005 0.019 *5 -0.003 -0.010 0.003
3 1 0.006 -0.001 0.0142 0.004 -0.003 0.0113 0.005 -0.002 0.0124 0.002 -0.005 0.0095 0.001 -0.006 0.008
4 1 -0.005 -0.012 0.0022 -0.004 -0.010 0.0033 -0.004 -0.011 0.0024 -0.004 -0.010 0.0025 -0.005 -0.011 0.001
5 1 0.001 -0.007 0.0082 -0.000 -0.007 0.0073 0.002 -0.005 0.0094 0.000 -0.006 0.0075 -0.000 -0.007 0.006
6 1 0.004 -0.004 0.0122 0.001 -0.006 0.0083 0.005 -0.003 0.0124 0.003 -0.004 0.0105 -0.004 -0.011 0.003
7 1 0.003 -0.003 0.0102 0.003 -0.003 0.0103 0.005 -0.002 0.0114 0.003 -0.003 0.0095 0.002 -0.004 0.008
8 1 -0.009 -0.016 -0.003 *2 -0.009 -0.016 -0.002 *3 -0.009 -0.015 -0.003 *4 -0.008 -0.014 -0.002 *5 -0.011 -0.017 -0.005 *
9 1 0.004 -0.003 0.0112 0.003 -0.004 0.0103 0.006 -0.001 0.0134 0.006 -0.001 0.0125 0.005 -0.001 0.012
* indicates 95% credible interval does not include zero.
B.1. Supplementary tables 323
Table B.16 Slopes for segment 200 cm - 300 cm for Groupings
Cropping Day Est q025 q975 Sig
Long fallowing 1 -0.000 -0.005 0.0042 0.000 -0.004 0.0043 0.005 0.000 0.009 *4 0.003 -0.001 0.0075 -0.001 -0.006 0.002
Response cropping 1 0.002 -0.003 0.0082 0.000 -0.004 0.0063 0.003 -0.002 0.0084 0.002 -0.003 0.0075 -0.002 -0.007 0.003
Pastures 1 -0.001 -0.005 0.0042 -0.001 -0.005 0.0033 0.000 -0.003 0.0044 0.000 -0.004 0.0045 -0.001 -0.005 0.003
324 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.17 Differences in slopes for each group across days
Group Day1 Day2 Est q025 q975 Sig
Long fallowing 1 2 -0.001 -0.007 0.0063 -0.005 -0.011 0.0014 -0.003 -0.009 0.0035 0.001 -0.005 0.007
Long fallowing 2 3 -0.004 -0.010 0.0024 -0.003 -0.009 0.0035 0.002 -0.004 0.008
Long fallowing 3 4 0.002 -0.004 0.0075 0.006 0.000 0.012 *
Long fallowing 4 5 0.005 -0.001 0.010
Response cropping 1 2 0.002 -0.006 0.0093 -0.001 -0.008 0.0074 0.000 -0.007 0.0085 0.004 -0.003 0.012
Response cropping 2 3 -0.003 -0.010 0.0044 -0.001 -0.008 0.0065 0.003 -0.004 0.010
Response cropping 3 4 0.001 -0.006 0.0085 0.005 -0.001 0.013
Response cropping 4 5 0.004 -0.003 0.011
Pastures 1 2 0.000 -0.006 0.0063 -0.001 -0.007 0.0054 -0.001 -0.006 0.0055 0.000 -0.005 0.006
Pastures 2 3 -0.001 -0.007 0.0044 -0.001 -0.007 0.0045 0.000 -0.005 0.006
Pastures 3 4 0.000 -0.005 0.0065 0.001 -0.004 0.007
Pastures 4 5 0.001 -0.004 0.006
B.1. Supplementary tables 325
Table B.18 Contrasts compared between days
Contrast Depth Day1 1 2 3 4 5
Long Fallow - Response 100 1 + + + +
2 -3 -4 -5 -
Long Fallow - Response 120 1 + + + +
2 -3 - -4 - +
5 -
Long Fallow - Response 140 1 + +
2 - + -3 - - - -4 + + +
5 + -
Long Fallow - Response 160 1 + +
2 - + -3 - - - -4 + + +
5 + -
Long Fallow - Response 180 1 + +
2 - +
3 - - - -4 +
5 +
Long Fallow - Response 200 12345
Long Fallow - Response 220 123 -45 +
Long Fallow - Response 240 123 -45 +
Long Fallow - Response 260 12345
Long Fallow - Response 280 12345
Long Fallow - Response 300 12345
326 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.19 Contrasts compared between days
Contrast Depth Day1 1 2 3 4 5
Cropping - Pastures 100 1 + -2 - - - -3 + -4 + -5 + + + +
Cropping - Pastures 120 1 + + - -2 - - -3 - - -4 + + + -5 + + + +
Cropping - Pastures 140 1 + - -2 - -3 - - -4 + + + -5 + + + +
Cropping - Pastures 160 1 + - -2 - -3 - - -4 + + + -5 + + + +
Cropping - Pastures 180 1 - -2 - -3 - -4 + + + -5 + + + +
Cropping - Pastures 200 1 - -2 -3 -4 + -5 + + + +
Cropping - Pastures 220 1 - -2 -3 -4 + -5 + + + +
Cropping - Pastures 240 1 - -2 -3 -4 +
5 + + +
Cropping - Pastures 260 1 - -2 -34 +
5 + +
Cropping - Pastures 280 12345
Cropping - Pastures 300 12345
B.1. Supplementary tables 327
Table B.20 Contrasts compared between days
Contrast Depth Day1 1 2 3 4 5
Lucerne mixtures - Native 100 1 - - + +
2 + - + +
3 + + + +
4 - - - -5 - - - +
Lucerne mixtures - Native 120 1 - - +
2 + - + +
3 + + + +
4 - - - -5 - - +
Lucerne mixtures - Native 140 1 -2 +
3 + + +
4 - -5 -
Lucerne mixtures - Native 160 1 -23 + + +
4 -5 -
Lucerne mixtures - Native 180 1 -23 +
45
Lucerne mixtures - Native 200 1 - -234 +
5 +
Lucerne mixtures - Native 220 1 - - -23 +
4 +
5 +
Lucerne mixtures - Native 240 1 -234 +
5
Lucerne mixtures - Native 260 12345
Lucerne mixtures - Native 280 12345
Lucerne mixtures - Native 300 12345
328 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.21 Contrasts (1)
Contrast Depth Est q025 q975 Sig
Long Fallow - Response 20 0.203 0.151 0.254 *Long Fallow - Response 20 -0.069 -0.134 -0.003 *Long Fallow - Response 20 0.225 0.171 0.278 *Long Fallow - Response 20 . . .Long Fallow - Response 20 0.123 0.087 0.161 *Long Fallow - Response 40 0.171 0.137 0.204 *Long Fallow - Response 40 . . .Long Fallow - Response 40 0.166 0.132 0.203 *Long Fallow - Response 40 0.012 0.001 0.023 *Long Fallow - Response 40 0.088 0.064 0.112 *Long Fallow - Response 60 0.139 0.117 0.162 *Long Fallow - Response 60 . . .Long Fallow - Response 60 0.108 0.083 0.134 *Long Fallow - Response 60 . . .Long Fallow - Response 60 0.052 0.038 0.067 *Long Fallow - Response 80 0.108 0.081 0.135 *Long Fallow - Response 80 0.030 0.002 0.058 *Long Fallow - Response 80 0.050 0.019 0.081 *Long Fallow - Response 80 . . .Long Fallow - Response 80 0.017 0.000 0.034 *Long Fallow - Response 100 0.084 0.066 0.102 *Long Fallow - Response 100 0.024 0.006 0.042 *Long Fallow - Response 100 0.028 0.007 0.048 *Long Fallow - Response 100 0.020 0.007 0.032 *Long Fallow - Response 100 0.017 0.006 0.028 *Long Fallow - Response 120 0.060 0.047 0.073 *Long Fallow - Response 120 0.018 0.006 0.030 *Long Fallow - Response 120 . . .Long Fallow - Response 120 0.031 0.020 0.042 *Long Fallow - Response 120 0.017 0.008 0.027 *Long Fallow - Response 140 0.036 0.021 0.053 *Long Fallow - Response 140 . . .Long Fallow - Response 140 -0.016 -0.031 -0.001 *Long Fallow - Response 140 0.042 0.027 0.057 *Long Fallow - Response 140 0.017 0.004 0.030 *
B.1. Supplementary tables 329
Table B.22 Contrasts (2)
Contrast Depth Est q025 q975 Sig
Long Fallow - Response 160 0.033 0.022 0.043 *Long Fallow - Response 160 0.014 0.005 0.024 *Long Fallow - Response 160 . . .Long Fallow - Response 160 0.034 0.024 0.044 *Long Fallow - Response 160 0.020 0.012 0.029 *Long Fallow - Response 180 0.029 0.020 0.038 *Long Fallow - Response 180 0.016 0.007 0.024 *Long Fallow - Response 180 . . .Long Fallow - Response 180 0.026 0.018 0.035 *Long Fallow - Response 180 0.023 0.015 0.032 *Long Fallow - Response 200 0.025 0.012 0.038 *Long Fallow - Response 200 0.017 0.005 0.030 *Long Fallow - Response 200 . . .Long Fallow - Response 200 0.019 0.006 0.031 *Long Fallow - Response 200 0.027 0.014 0.039 *Long Fallow - Response 220 0.022 0.013 0.032 *Long Fallow - Response 220 0.017 0.008 0.027 *Long Fallow - Response 220 0.011 0.001 0.021 *Long Fallow - Response 220 0.020 0.010 0.029 *Long Fallow - Response 220 0.027 0.018 0.036 *Long Fallow - Response 240 0.020 0.009 0.030 *Long Fallow - Response 240 0.017 0.008 0.027 *Long Fallow - Response 240 0.013 0.003 0.023 *Long Fallow - Response 240 0.021 0.011 0.031 *Long Fallow - Response 240 0.028 0.019 0.038 *Long Fallow - Response 260 0.017 0.003 0.032 *Long Fallow - Response 260 0.017 0.004 0.031 *Long Fallow - Response 260 0.014 0.001 0.028 *Long Fallow - Response 260 0.022 0.008 0.036 *Long Fallow - Response 260 0.029 0.016 0.042 *Long Fallow - Response 280 . . .Long Fallow - Response 280 . . .Long Fallow - Response 280 . . .Long Fallow - Response 280 0.023 0.004 0.042 *Long Fallow - Response 280 0.030 0.011 0.048 *Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 . . .Long Fallow - Response 300 0.030 0.006 0.054 *
330 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.23 Contrasts (3)
Contrast Depth Est q025 q975 Sig
Cropping - Pastures 20 0.362 0.324 0.398 *Cropping - Pastures 20 0.429 0.382 0.475 *Cropping - Pastures 20 0.279 0.239 0.319 *Cropping - Pastures 20 0.119 0.107 0.131 *Cropping - Pastures 20 0.489 0.462 0.516 *Cropping - Pastures 40 0.335 0.311 0.359 *Cropping - Pastures 40 0.365 0.335 0.395 *Cropping - Pastures 40 0.277 0.251 0.303 *Cropping - Pastures 40 0.166 0.158 0.174 *Cropping - Pastures 40 0.433 0.416 0.450 *Cropping - Pastures 60 0.309 0.293 0.324 *Cropping - Pastures 60 0.301 0.284 0.320 *Cropping - Pastures 60 0.275 0.257 0.294 *Cropping - Pastures 60 0.213 0.204 0.222 *Cropping - Pastures 60 0.377 0.366 0.387 *Cropping - Pastures 80 0.282 0.263 0.302 *Cropping - Pastures 80 0.237 0.217 0.258 *Cropping - Pastures 80 0.273 0.250 0.296 *Cropping - Pastures 80 0.260 0.246 0.273 *Cropping - Pastures 80 0.320 0.308 0.333 *Cropping - Pastures 100 0.232 0.219 0.245 *Cropping - Pastures 100 0.197 0.184 0.211 *Cropping - Pastures 100 0.218 0.203 0.233 *Cropping - Pastures 100 0.228 0.218 0.237 *Cropping - Pastures 100 0.273 0.265 0.282 *Cropping - Pastures 120 0.182 0.173 0.191 *Cropping - Pastures 120 0.157 0.149 0.166 *Cropping - Pastures 120 0.163 0.154 0.173 *Cropping - Pastures 120 0.196 0.188 0.204 *Cropping - Pastures 120 0.227 0.220 0.234 *Cropping - Pastures 140 0.133 0.121 0.144 *Cropping - Pastures 140 0.117 0.107 0.128 *Cropping - Pastures 140 0.108 0.097 0.119 *Cropping - Pastures 140 0.164 0.153 0.175 *Cropping - Pastures 140 0.180 0.170 0.190 *
B.1. Supplementary tables 331
Table B.24 Contrasts (4)
Contrast Depth Est q025 q975 Sig
Cropping - Pastures 160 0.113 0.105 0.120 *Cropping - Pastures 160 0.104 0.098 0.111 *Cropping - Pastures 160 0.098 0.091 0.105 *Cropping - Pastures 160 0.138 0.131 0.145 *Cropping - Pastures 160 0.153 0.147 0.160 *Cropping - Pastures 180 0.093 0.087 0.100 *Cropping - Pastures 180 0.091 0.085 0.097 *Cropping - Pastures 180 0.088 0.081 0.094 *Cropping - Pastures 180 0.113 0.107 0.119 *Cropping - Pastures 180 0.127 0.121 0.133 *Cropping - Pastures 200 0.074 0.064 0.083 *Cropping - Pastures 200 0.078 0.069 0.087 *Cropping - Pastures 200 0.077 0.068 0.086 *Cropping - Pastures 200 0.087 0.078 0.096 *Cropping - Pastures 200 0.100 0.091 0.109 *Cropping - Pastures 220 0.074 0.067 0.081 *Cropping - Pastures 220 0.078 0.071 0.086 *Cropping - Pastures 220 0.079 0.072 0.086 *Cropping - Pastures 220 0.088 0.081 0.096 *Cropping - Pastures 220 0.099 0.092 0.106 *Cropping - Pastures 240 0.075 0.067 0.082 *Cropping - Pastures 240 0.079 0.071 0.086 *Cropping - Pastures 240 0.081 0.074 0.089 *Cropping - Pastures 240 0.090 0.082 0.097 *Cropping - Pastures 240 0.097 0.090 0.105 *Cropping - Pastures 260 0.075 0.064 0.086 *Cropping - Pastures 260 0.080 0.069 0.090 *Cropping - Pastures 260 0.084 0.073 0.094 *Cropping - Pastures 260 0.091 0.080 0.101 *Cropping - Pastures 260 0.096 0.086 0.106 *Cropping - Pastures 280 0.076 0.060 0.090 *Cropping - Pastures 280 0.080 0.066 0.094 *Cropping - Pastures 280 0.086 0.072 0.099 *Cropping - Pastures 280 0.092 0.078 0.106 *Cropping - Pastures 280 0.095 0.081 0.108 *Cropping - Pastures 300 0.076 0.056 0.095 *Cropping - Pastures 300 0.081 0.063 0.098 *Cropping - Pastures 300 0.088 0.070 0.106 *Cropping - Pastures 300 0.093 0.075 0.111 *Cropping - Pastures 300 0.093 0.076 0.111 *
332 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
Table B.25 Contrasts (5)
Contrast Depth Est q025 q975 Sig
Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 . . .Lucerne mixtures - Native 20 -0.175 -0.197 -0.154 *Lucerne mixtures - Native 20 -0.230 -0.277 -0.182 *Lucerne mixtures - Native 40 -0.102 -0.144 -0.061 *Lucerne mixtures - Native 40 . . .Lucerne mixtures - Native 40 . . .Lucerne mixtures - Native 40 -0.235 -0.248 -0.220 *Lucerne mixtures - Native 40 -0.236 -0.266 -0.205 *Lucerne mixtures - Native 60 -0.155 -0.182 -0.126 *Lucerne mixtures - Native 60 -0.092 -0.124 -0.059 *Lucerne mixtures - Native 60 . . .Lucerne mixtures - Native 60 -0.294 -0.309 -0.279 *Lucerne mixtures - Native 60 -0.243 -0.261 -0.223 *Lucerne mixtures - Native 80 -0.207 -0.241 -0.173 *Lucerne mixtures - Native 80 -0.161 -0.197 -0.125 *Lucerne mixtures - Native 80 . . .Lucerne mixtures - Native 80 -0.354 -0.378 -0.330 *Lucerne mixtures - Native 80 -0.249 -0.271 -0.227 *Lucerne mixtures - Native 100 -0.170 -0.193 -0.147 *Lucerne mixtures - Native 100 -0.135 -0.158 -0.111 *Lucerne mixtures - Native 100 -0.036 -0.062 -0.010 *Lucerne mixtures - Native 100 -0.274 -0.289 -0.258 *Lucerne mixtures - Native 100 -0.199 -0.213 -0.184 *Lucerne mixtures - Native 120 -0.133 -0.148 -0.118 *Lucerne mixtures - Native 120 -0.108 -0.123 -0.094 *Lucerne mixtures - Native 120 -0.051 -0.067 -0.035 *Lucerne mixtures - Native 120 -0.194 -0.208 -0.181 *Lucerne mixtures - Native 120 -0.149 -0.160 -0.137 *Lucerne mixtures - Native 140 -0.096 -0.115 -0.076 *Lucerne mixtures - Native 140 -0.082 -0.100 -0.064 *Lucerne mixtures - Native 140 -0.066 -0.085 -0.047 *Lucerne mixtures - Native 140 -0.114 -0.133 -0.096 *Lucerne mixtures - Native 140 -0.098 -0.115 -0.082 *
B.1. Supplementary tables 333
Table B.26 Contrasts (6)
Contrast Depth Est q025 q975 Sig
Lucerne mixtures - Native 160 -0.082 -0.095 -0.070 *Lucerne mixtures - Native 160 -0.071 -0.083 -0.059 *Lucerne mixtures - Native 160 -0.056 -0.068 -0.044 *Lucerne mixtures - Native 160 -0.086 -0.099 -0.074 *Lucerne mixtures - Native 160 -0.076 -0.087 -0.065 *Lucerne mixtures - Native 180 -0.069 -0.080 -0.057 *Lucerne mixtures - Native 180 -0.060 -0.071 -0.049 *Lucerne mixtures - Native 180 -0.046 -0.057 -0.035 *Lucerne mixtures - Native 180 -0.059 -0.070 -0.048 *Lucerne mixtures - Native 180 -0.053 -0.064 -0.043 *Lucerne mixtures - Native 200 -0.055 -0.071 -0.038 *Lucerne mixtures - Native 200 -0.049 -0.066 -0.033 *Lucerne mixtures - Native 200 -0.036 -0.052 -0.019 *Lucerne mixtures - Native 200 -0.031 -0.047 -0.015 *Lucerne mixtures - Native 200 -0.031 -0.046 -0.015 *Lucerne mixtures - Native 220 -0.062 -0.074 -0.049 *Lucerne mixtures - Native 220 -0.055 -0.068 -0.043 *Lucerne mixtures - Native 220 -0.044 -0.056 -0.031 *Lucerne mixtures - Native 220 -0.039 -0.051 -0.027 *Lucerne mixtures - Native 220 -0.040 -0.052 -0.028 *Lucerne mixtures - Native 240 -0.068 -0.082 -0.054 *Lucerne mixtures - Native 240 -0.061 -0.074 -0.048 *Lucerne mixtures - Native 240 -0.052 -0.065 -0.038 *Lucerne mixtures - Native 240 -0.048 -0.060 -0.034 *Lucerne mixtures - Native 240 -0.050 -0.062 -0.037 *Lucerne mixtures - Native 260 -0.075 -0.095 -0.055 *Lucerne mixtures - Native 260 -0.067 -0.085 -0.049 *Lucerne mixtures - Native 260 -0.060 -0.077 -0.042 *Lucerne mixtures - Native 260 -0.056 -0.074 -0.038 *Lucerne mixtures - Native 260 -0.060 -0.077 -0.042 *Lucerne mixtures - Native 280 -0.082 -0.109 -0.054 *Lucerne mixtures - Native 280 -0.073 -0.098 -0.048 *Lucerne mixtures - Native 280 -0.068 -0.092 -0.044 *Lucerne mixtures - Native 280 -0.064 -0.089 -0.041 *Lucerne mixtures - Native 280 -0.069 -0.093 -0.045 *Lucerne mixtures - Native 300 -0.089 -0.123 -0.053 *Lucerne mixtures - Native 300 -0.079 -0.111 -0.047 *Lucerne mixtures - Native 300 -0.076 -0.107 -0.045 *Lucerne mixtures - Native 300 -0.073 -0.104 -0.042 *Lucerne mixtures - Native 300 -0.079 -0.109 -0.047 *
334 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.2
−0.15
−0.1
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.1
Figure B.1 Spatial random components: Day 1, Depth 20 cm.
B.2 Supplementary Graphs: Contour Graphs for the spatial resid-
uals
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 335
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.06
−0.06
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
0.04
0.04
0.06
Figure B.2 Spatial random components: Day 1, Depth 40 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08 −0.06
−0.04
−0.04
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02 0.02
0.02
0.02
0.04
0.04
0.04
0.04
0.06
0.06
0.06
0.08
Figure B.3 Spatial random components: Day 1, Depth 60 cm.
336 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1 −0.06
−0.04
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
0.02
0.04
0.04
0.04
0.06
0.08
Figure B.4 Spatial random components: Day 1, Depth 80 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.05
0
0
0
0
0
0
0
0
0
0.05
Figure B.5 Spatial random components: Day 1, Depth 100 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 337
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.1
−0.08
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0.02
0.02
0.02
0.02
0.04
Figure B.6 Spatial random components: Day 1, Depth 120 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.04
−0.03
−0.03
−0.03
−0
.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0.01
0.01
0.01
0.02
0.02
0.02
0.03
0.03 0.03
Figure B.7 Spatial random components: Day 1, Depth 140 cm.
338 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.01
0.02
0.02
0.02
0.02
0.03
0.03
0.03
Figure B.8 Spatial random components: Day 1, Depth 160 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.02
−0.02
−0.01
−0.01
0 0
0
0
0
0
0.01
0.01
0.01
0.01
0.02
0.02
0.02
0.03
Figure B.9 Spatial random components: Day 1, Depth 180 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 339
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.02
−0.02 −0.01
−0.01
−0.01
−0.01 −0.01
0
0
0
0
0
0
0
0 0
0.01
0.01
0.01 0.01
0.01 0.02
Figure B.10 Spatial random components: Day 1, Depth 200 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.07
−0.03
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0 0
0
0
0.01 0.01
0.01
0.01
0.02
0.02
Figure B.11 Spatial random components: Day 1, Depth 220 cm.
340 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08
−0.06
−0.04
−0.02
0
0
0
0
0
0
0
0.02 0.02
0.02
Figure B.12 Spatial random components: Day 1, Depth 240 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.1
−0.08 −0.06
−0.04
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0.02
0.02
0.02
0.04
0.04
Figure B.13 Spatial random components: Day 1, Depth 260 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 341
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0.05
0.05
0.05
Figure B.14 Spatial random components: Day 1, Depth 280 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.1
−0.05
−0.05
−0.05
0
0
0
0 0
0.05
0.05
0.05
0.05
0.1
0.1
Figure B.15 Spatial random components: Day 1, Depth 300 cm.
342 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.4
−0.2
−0.2
−0.2
−0.1
−0.1
−0.1
−0.1
−0.1
−0.1
0
0
0
0
0
0
0
0
0
0
0.1
0.1
Figure B.16 Spatial random components: Day 2, Depth 20 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.1
0.1
0.1
0.15
Figure B.17 Spatial random components: Day 2, Depth 40 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 343
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.1
0.1
Figure B.18 Spatial random components: Day 2, Depth 60 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.06
−0.06
−0.04
−0.04
−0.04
−0.04
−0.02
−0.02 −0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.04
0.04
0.04
0.04
0.06
0.06
Figure B.19 Spatial random components: Day 2, Depth 80 cm.
344 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.06 −0.06
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0 0.02
0.02
0.02
0.02
0.04
0.04
0.06
Figure B.20 Spatial random components: Day 2, Depth 100 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.05
−0.05
−0.04
−0.04
−0.04
−0.04
−0.03
−0.03
−0.03
−0.02
−0.02
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.02
0.02
0.02
0.03
0.03
Figure B.21 Spatial random components: Day 2, Depth 120 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 345
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.03
−0.03
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0 0
0
0
0
0
0.01
0.01
0.01
0.01
0.02
0.02
Figure B.22 Spatial random components: Day 2, Depth 140 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.02
−0.015
−0.015
−0.015
−0.01
−0.01 −0.01 −0
.01
−0.005
−0.005
−0.005
0
0
0
0
0
0
0.005
0.005
0.005
0.005
0.005
0.01
0.01
0.01
0.01
0.01
0.01
0.015
0.015
0.02
0.025
Figure B.23 Spatial random components: Day 2, Depth 160 cm.
346 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.01
−0.01
−0.01
0
0
0
0 0
0
0.01 0.01
0.01
0.01
0.01
0.01
0.01 0.01
0.02
0.02
Figure B.24 Spatial random components: Day 2, Depth 180 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.01
Figure B.25 Spatial random components: Day 2, Depth 200 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 347
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06 −
0.04
−0.02
0
0
0
0
0
0.02
0.02
0.02
Figure B.26 Spatial random components: Day 2, Depth 220 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.08
−0.06
−0.04
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
Figure B.27 Spatial random components: Day 2, Depth 240 cm.
348 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08 −0
.06
−0.04
−0.02
−0.02
0 0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
Figure B.28 Spatial random components: Day 2, Depth 260 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.14
−0.1
−0.1
−0.08
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0.02
0.02
0.02
0.02
0.02
0.04
0.04
Figure B.29 Spatial random components: Day 2, Depth 280 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 349
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.1
−0.05
−0.05
0
0
0
0
0
0
0.05 0.05
0.05
0.05
0.05
0.1
Figure B.30 Spatial random components: Day 2, Depth 300 cm.
350 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.05
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.05
0.05
0.1
0.1
Figure B.31 Spatial random components: Day 3, Depth 20 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.3
−0.2
−0.15
−0.15
−0.15
−0.1
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.1
Figure B.32 Spatial random components: Day 3, Depth 40 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 351
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.2 −0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
0
0
0 0
0
0
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.1
0.15
Figure B.33 Spatial random components: Day 3, Depth 60 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1 −0.05
−0.05
0
0
0
0
0
0
0
0
0
0
0
0
0.05 0.05
0.05
0.05 0.1
Figure B.34 Spatial random components: Day 3, Depth 80 cm.
352 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.05
−0.05
−0.05
0
0
0
0
0
0
0
0.05
0.05
Figure B.35 Spatial random components: Day 3, Depth 100 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.08
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0 0
0
0
0.02
0.02
0.04
0.04
Figure B.36 Spatial random components: Day 3, Depth 120 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 353
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.07
−0.03
−0.03
−0.02
−0.02
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.02
0.02
Figure B.37 Spatial random components: Day 3, Depth 140 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.015
−0.015
−0.015
−0.01
−0.005
−0.005
−0.005
0
0
0
0
0
0
0
0
0
0
0
0.005
0.005
0.005
0.005
0.005
0.005
0.01
0.01
0.01
0.01
0.01
0.01 0.015
0.015
0.015
0.015
0.02
0.02
Figure B.38 Spatial random components: Day 3, Depth 160 cm.
354 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.01
−0.01
−0.005
−0.005
−0.005
0
0
0
0
0
0
0.005
0.005
0.005
0.005
0.005
0.005
0.005
0.005
0.005 0.005
0.01
0.01
0.01
0.01
0.01
0.01
0.015
0.02
0.025
Figure B.39 Spatial random components: Day 3, Depth 180 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0.01
Figure B.40 Spatial random components: Day 3, Depth 200 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 355
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.04
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
Figure B.41 Spatial random components: Day 3, Depth 220 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.08
−0.06
−0.04
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0.02 0.02
0.02
0.02
Figure B.42 Spatial random components: Day 3, Depth 240 cm.
356 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08
−0.06
−0.04
−0.02
−0.02
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02 0.02
Figure B.43 Spatial random components: Day 3, Depth 260 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.14
−0.1
−0.1
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
Figure B.44 Spatial random components: Day 3, Depth 280 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 357
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.1
−0.1
−0.05
−0.05
0
0
0
0
0
0
0.05
0.05
0.05
0.05
0.05
0.1
Figure B.45 Spatial random components: Day 3, Depth 300 cm.
358 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.05
−0.03
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
0
0 0
0
0
0
0
0
0
0.01
0.01
0.01
Figure B.46 Spatial random components: Day 4, Depth 20 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.02
−0.01
−0.01
0
0
0
0
0.01
0.01
0.01
0.01 0.01
0.02
0.02
0.02
Figure B.47 Spatial random components: Day 4, Depth 40 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 359
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.03
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0 0
0
0
0
0
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.02
0.02
0.02
0.03
0.03
0.04
0.04
0.04
Figure B.48 Spatial random components: Day 4, Depth 60 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0 0
0
0.02
0.02
0.02
0.04
0.04
0.06
Figure B.49 Spatial random components: Day 4, Depth 80 cm.
360 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.04
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
Figure B.50 Spatial random components: Day 4, Depth 100 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.06
−0.03
−0.03
−0.03
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01 −0.01
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.02
0.02
0.03
0.03
Figure B.51 Spatial random components: Day 4, Depth 120 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 361
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.01
−0.01 −0.
01
−0.01
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.03
0.04
Figure B.52 Spatial random components: Day 4, Depth 140 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.02
−0.01
−0.01
−0.01
−0.01
0
0
0
0 0
0
0
0
0
0
0.01
0.01
0.01
0.01
0.01
0.01
0.02
Figure B.53 Spatial random components: Day 4, Depth 160 cm.
362 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.025
−0.015
−0.01
−0.01
−0.01
−0.01
−0.01
−0.005
−0.005
−0.005
−0.005
−0.005
0
0
0
0
0
0.005 0.005
0.005
0.005
0.01
0.01
0.01
0.015
Figure B.54 Spatial random components: Day 4, Depth 180 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
Figure B.55 Spatial random components: Day 4, Depth 200 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 363
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.08 −0.0
6
−0.02
0
0
0
0
0
0
0
0.02
0.02
Figure B.56 Spatial random components: Day 4, Depth 220 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.08 −0.06
−0.02
0
0
0
0
0
0
0.02
0.02
0.02
0.02
Figure B.57 Spatial random components: Day 4, Depth 240 cm.
364 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08
−0.06
−0.04
−0.02
−0.02
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
Figure B.58 Spatial random components: Day 4, Depth 260 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.1
−0.08
−0.06
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0.02 0.02
0.02 0.02
0.04
Figure B.59 Spatial random components: Day 4, Depth 280 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 365
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.1
−0.1
−0.05
0
0
0
0
0
0.05
0.05
0.05
0.1
Figure B.60 Spatial random components: Day 4, Depth 300 cm.
366 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.2
−0.1
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
0
0
0
0
0
0 0
0.05
Figure B.61 Spatial random components: Day 5, Depth 20 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.06
−0.06
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.04 0.04
0.04
0.06
Figure B.62 Spatial random components: Day 5, Depth 40 cm.
'- .
~fo~, \ C/; £ ' I
-::>
~ -~~ ---- '
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 367
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
0.02
0.04 0.04
0.04
Figure B.63 Spatial random components: Day 5, Depth 60 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.02
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0.01
0.01
0.01
0.01 0.01
0.02 0.02
0.02
0.03
0.03
0.03
Figure B.64 Spatial random components: Day 5, Depth 80 cm.
368 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.03
−0.03
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.01
0.02 0.02
0.02
0.03
Figure B.65 Spatial random components: Day 5, Depth 100 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.04
−0.03
−0.02
−0.02
−0.02
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0 0
0
0
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.03
Figure B.66 Spatial random components: Day 5, Depth 120 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 369
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.02
−0.02
−0.01
−0.01
−0.01
−0.01
0
0
0
0
0
0
0
0.01
0.01
0.01 0.02
0.03
Figure B.67 Spatial random components: Day 5, Depth 140 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.01
−0.01
0
0
0
0
0
0
0
0
0.01
0.01
0.01
0.01
0.02
Figure B.68 Spatial random components: Day 5, Depth 160 cm.
370 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.015 −0
.015 −
0.01
−0.01
−0.01
−0.005
−0.005
−0.005
0
0
0
0
0
0.005
0.005
0.005
0.005
0.005
0.005
0.01
0.01
0.01
0.015
Figure B.69 Spatial random components: Day 5, Depth 180 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.03
−0.02 −0.015
−0.01
−0.01
−0.01
−0.01
−0.005 −0.005
−0.005
−0.005
−0.005 −0.005
−0.005
0
0
0
0
0
0.005
0.005
0.005
0.005
0.005
0.005
0.01
0.01
0.015
Figure B.70 Spatial random components: Day 5, Depth 200 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 371
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.08 −0.
06
−0.02 0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
Figure B.71 Spatial random components: Day 5, Depth 220 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1 −
0.08
−0.06
−0.04 −0.02
0
0
0
0
0
0
0 0
0
0.02
0.02
0.02
0.02
0.02
Figure B.72 Spatial random components: Day 5, Depth 240 cm.
372 APPENDIX B. SUPPLEMENTARY MATERIALS FOR CHAPTER SIX
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.1
−0.08
−0.06
−0.04
−0.02
−0.02
0
0
0
0
0
0
0
0
0
0
0
0.02
0.02
0.02
Figure B.73 Spatial random components: Day 5, Depth 260 cm.
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.12
−0.1
−0.1
−0.06
−0.04
−0.04
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0 0
0
0
0.02
0.02
0.02
0.04
Figure B.74 Spatial random components: Day 5, Depth 280 cm.
B.2. Supplementary Graphs: Contour Graphs for the spatial residuals 373
5050 5100 5150 5200
4800
4850
4900
4950
5000
5050
−0.15
−0.1
−0.1
−0.05
−0.05
0
0
0 0
0.05
0.05
0.05
0.1
Figure B.75 Spatial random components: Day 5, Depth 300 cm.
Appendix C
Supplementary graphs and tables for
Chapter 7
C.1 Graphs: Method 1, Method 2 random walk and penalised spline
smoothed models
The fits from Method 1 are those with the more realistic 95% credible intervals. The penalised spline
(over time) models are included to show the seasonality at the shallower depths dampening until there
is virtually no seasonality at the greater depths. The random walk fits show the fits from the chosen
timeseries model.
375
376 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.1 Long fallowing vs Response cropping at depth 100 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model for the contrast. Estimates& 95% CIs.
Figure C.2 Long fallowing vs Response cropping at depth 100 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
OJ& OJ& 0.11. QJ3
QJ3
OJ1 OJO n o.m Cltl! om Cltl!
lj 1106 ([04.
11111
J 11.02 o.cn o.ro
-o.cn -11.02 -11111 -([04.
-1106 -0.00
CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll
llot.e
QJ3
OJ1
OJO
OJll
Cltl!
H 1107
0.00
lj ll06
([04.
o.m J 11.02
o.cn o.ro
-o.cn -IUIII
-lllll
III.IAN!I6
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models377
Figure C.3 Long fallowing vs Response cropping at depth 100 cm for all trial days. RandomWalk of order one. Estimates & 95% CIs.
Figure C.4 Long fallowing vs Response cropping at depth 120 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model for the contrast. Estimates& 95% CIs.
0.11
o.n OJO
DJ»
().(@
n 1107
1106
lj 1106
1104
o.m J 1102
o.m 0.00
-o.m -ll02
-IMII
IILIAN!I6 OISAN!IB
Q.l8
OJll
o.n OJO
DJ»
().(@
n 1107
1106
1106
ij 1104
ll.4ll
1102
J o.m 0.00
-o.m -1102
-IMII
-1104
-1106
Ot.JAJmlll6 Ot.JAJmlll6
t. I
: ·~ .... ··,
,-' '
'f \ ~\, ,I~r\~., If
\.r\ : ( t \ j '-/ ~ /
' :-"\·~~ ! f\,i'2-\_-l---~'"~----------
378 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.5 Long fallowing vs Response cropping at depth 120 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
Figure C.6 Long fallowing vs Response cropping at depth 120 cm for all trial days. RandomWalk of order one. Estimates & 95% CIs.
n ij J
ams-----------------------------------------,
0.00
lltl!
o.ar
0.00
0.00
!1.04.
rum ().02
o.oL
o.w
-o.oL
-!1.02
-o.w~--~-.------,-~~-,------r-~--------~ OlJANlifi OlJANlifi mJANW OLIANll! OLIANll! m.IANro mJANm
0.00
lltl!
o.ar
rum
ll06
!1.04.
ll03
lltl!
o.oL
o.w
-o.oL
-o.w -rum ~~~-,------~~---,----~~----,---~-r
OlJANlifi OLIANll! mJANW OLIANll! OLIANll! m.IANro mJANm
Dao
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models379
Figure C.7 Long fallowing vs Response cropping at depth 140 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model for the contrast. Estimates& 95% CIs.
Figure C.8 Long fallowing vs Response cropping at depth 140 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
om
110!
o.ar ().06
n 0.00
(1.04.
ruJI
lj o.oz o.cn J 0.00
-o.cn -o.oz -o.m
-ruM c..-~ .. --~.-.-.. -.-..-.-.. -.-.~.-~-.~ CII.IANl!llli OLIANl!l!l6 IILIANI99'l m.TANl!l!lll lliJANl!l!ll m.JANl!IOO !II.IANl'llm
llot.e
om
-o.oz~.-.. -,-.~.-.--.-.-.~.-.--.-.-.,-.-.-~ III.IAN!I6 OISAN!I8 IILIAN!I'1 OIJAN!I8 DIJAN!!I !II.TANOO III.IANm
380 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.9 Long fallowing vs Response cropping at depth 220 for all trial days. Random Walkof order one. Estimates & 95% CIs.
Figure C.10 Long fallowing vs Response cropping at depth 160 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model for the contrast. Estimates& 95% CIs.
n ij J
~~-----------------------------------,
i I
r / )l ~ ·:
~ .... _\)
-rumc,.-.-.,-.-.~.-.-.,-,-.~.-.-.,-.-.~.-~ IILIAN!I6 OISAN!IB IIIJAN!I'1 OIJAN!I8 DIJAN!I9 !II.TANOO IILIANm
~
11.011
ILOii
1104. / D.OII
rum
o.m
0.00
-o.m J\1 -1!.02
1
-D.OII c,.-.-.,-.-..-.-.,-.-.~--~-.-.~.-.-~-Y m.JAJmlll6 m.JAJmlll6 OIJAl'IIWl OIJAirull8 llUAPruMI OIJAmooo IILIAN!OOl
Data
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models381
Figure C.11 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
Figure C.12 Long fallowing vs Response cropping at depth 160 for all trial days. Random Walkof order one. Estimates & 95% CIs.
0.116
I),(M
' '
H OJB ( , . .. ' / , . . '·
lj OJII ~ ,..~-~ .... ~- ---
! / .,
'. / ' __ -- .... __ /
J 11.01 --/
:/ 0.00
-11.01~~~~~~~.-~~-.~~~~~~-.~~~ WJANlili m.IANli6 OIJANi7 OlJANlill OlJANli9 OIJANOO
~
0.116~----------------------------------------,
0.00
-o.m.
-rum ~~~-.~~~~~~-.~~~~~~.-~~-r WJANlili OlJANlill lllJANW OlJANlill OLJANII9 OIJANOO OIJANOL
~
382 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.13 Long fallowing vs Response cropping at depth 180 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model. Estimates & 95% CIs.
Figure C.14 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
rums----------------------------------------,
0.00
-run
-ru.
-ru.~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
OJJij
OJII
!
n QJB ) ..
ij OJl2 .,.~~ ~ .... ~.-
/1 ·I I o/
J run / noo
-run
CIIJANI6 01JANM WAN91 m.rANIIB ll1.TAN99 lli.IANOO lli.IANOL
lloiB
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models383
Figure C.15 Long fallowing vs Response cropping at depth 180 for all trial days. Random Walkof order two. Estimates & 95% CIs.
384 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.16 Long fallowing vs Response cropping at depth 200 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model. Estimates & 95% CIs.
Figure C.17 Long fallowing vs Response cropping at depth 160 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
~s----------------------------------------,
o.cn
0.00
-o.cn
-rum~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
OJBJ
D.OPB
o.a.lll
!1.011!1
11.01!2
n O.a.Jl
Wllll
Wllll
ij o.cJ14
Wll2
Q.CIU)
J 1100!1
D.OJ6
11.001 0.002
0.000
-D.002
-11.001
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models385
Figure C.18 Long fallowing vs Response cropping at depth 200 for all trial days. Random Walkof order two. Estimates & 95% CIs.
386 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.19 Long fallowing vs Response cropping at depth 220 for all trial days. Saturatedmodel. summary of MCMC iterates from the full model. Estimates & 95% CIs.
Figure C.20 Long fallowing vs Response cropping at depth 220 cm for all trial days. Non-parametric penalised spline smooth across dates. Estimates & 95% CIs.
OJl6
OJII
,''\/\l/\;v\r/-, _/\ _./"'
DJII
0,4D ;1 fl ~ / \ f
~ -:: f ""/ 'V'V'''Yv ',L\e. i -run~~~-.------.-~---.~~-.------,---~~
OlJANJIIi6 OLIAN'llll6 OIJANlW7 OLIAN'llll6 OLTANlllliD OIJANlDJO m.TANlDll
rua;
OJlll.
OJJ82
rum ILIBI
n OJI28
OJlill.
ij OJIIll
ll.lm
OJJl8
J OJJl8
().OK
OJJl8
o.mo OJDI
0..000
lli.TAN95 Ol.IANOL
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models387
Figure C.21 Long fallowing vs Response cropping at depth 220 for all trial days. Random Walkof order two. Estimates & 95% CIs.
388 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.22 100 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
Random walk models with Prior 5 precisions
Wl
OJO
0.111
o.al
n D.O'I
o.al
0.00
ij llOI.
o.al
J D.02
D.01
0.00
-D.Ol
-!1.02
-o.a~
CII.IANlB85
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models389
Figure C.23 120 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
Figure C.24 140 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
o.m
o.m om o.m ' fr ..
H 11.06 I! 11.01. f "' ~,,\
o.m \~'\
lj ', D.Qil
\.,\; o.cJl J 0.00
-o.cJL
-o.o:!
-o.m
-11.01.~--~--~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
OJl6
11.06 '
1!.04.
n ll03 " ' '\:
D.Qil ·~J j
ij o.cJl '~ ,\
J 0.00
-o.cJL I •I
-ll.Of! :;
-rum ~~---,----~~----,---~-.~~--~--~~ WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll mJANlml mJANlDlL
Date
390 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.25 160 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
Figure C.26 180 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
o.cn
0.00
-o.cn
-rum~--~-.~~--~-----.--~~------,-~---r WJANlWii OLTANlllli6 IJIJAN1W1 OIJANlliiiS OIJANllilll IDJANlml mJANlDlL
OJJij
OJII
n QJB
' ' '
ij OJl2 ' f
j J o.cn
0.00
-o.cn T-~---,------~--~-.--~-,------,-~---r OlJANlllll5 OIJANlSII6 lli.TANlllll7 II1.Willlll8 OIJANllilll OIJANliOOO mJANlDlL
Dao
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models391
Figure C.27 200 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
Figure C.28 220 cm: Long fallowing vs Response cropping with Prior 5 precisions applied tothe random walk model. Estimates & 95% CIs.
OJJII llOII2 rum llOPll 1!.0111 ll.OI!I
H llOII2 1!.011)
tl.OlB tl.OlB
lj Q.Q14
II.CIIll Q.CIIO
J 0.0011 D.006 llOOil D.OOil 0.000
-D.WI! -0.00!1 -D.006
IJlJANllill6 OIJANIW& WJANlW1
OJM2
11.010
OJII8
OJII6
OJJII
OJll2
n OJlll
OJIIII
OJIIII
ij ll.llill.
ll.02Z
ll.OID
J OJD8
OJD8
OJlll.
o.mz o..om D..OII!
0..000
392 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.29 Square root of variances & 95% credible intervals at depth 100 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
Square root of the spatially structured and the unstructured variances
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models393
Figure C.30 Square root of variances & 95% credible intervals at depth 120 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
Figure C.31 Square root of variances & 95% credible intervals at depth 140 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
O.a!
O.<YT
0.06
o.a;
0.00
0.00
0.01
0.00 'r----,-------,----,-------,-------------,-------,'
OJJANl995 OIJAN1996 OIJANl!l!n 01JAN1998 Q1JANl999 01JAN2000 01JAN2001
Date
0.00
o.a;
O.Oi
§
1 0.00
~ 0.00
0.01
0.00 'r--- -,------- ,---- ,------- ,-------------,- ------,'
OJJANl995 01JAN1996 OIJANl!l!n 01JAN1998 OlJANl999 01JAN2000 01JAN2001
Date
394 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Figure C.32 Square root of variances & 95% credible intervals at depth 160 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
Figure C.33 Square root of variances & 95% credible intervals at depth 180 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
o.mo 0.008 -
' 0.006 ' ---
0.004 -- - - -
0.002 - -== O.lro
t 0.018
0.016
.... 0.014
~ 0.012
0.010
O.WI
0.006
0.004
0.000
0.000
OlJANl995 OJJAN1996 QlJANl9!Y7 Q1JAN1998 Q1JANl999 OlJANIDlO Q1JAN2001
:Date
0.002
0.000 - -
0.008
0.006
0.004
0.002
O.lro
i 0.018
0.016
~ 0.014
~~ ~= - -
s '
' -=c--c '
F-
~ • 0.012
0.010
0.00!
0.006
0.004
0.000 i
HHHimlltHHHHHH fHmm IH:!HHHH ±m±tB 0.000
OlJANl995 01JAN1996 OJJANllm 01JANl998 OlJANl999 OlJANIDlO 01JAN2001
:Date
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models395
Figure C.34 Square root of variances & 95% credible intervals at depth 200 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
Figure C.35 Square root of variances & 95% credible intervals at depth 220 cm. Unstructured:green with broader bars, spatial: blue with narrower bars.
t .... ~
O.!rlS
0.004 -,
0.002 _- - - r:. -, -, - - Lo=:- -, -,
-: - - 10' ~.--r- ~ ~ -, O.lm
0.018 --- f-
0.016
0.014 -~ .-_~ ~- ~-- · -~~
-~
0.012
0.010
0.()(11
0.006
0.004
0.000
0.000 'T-----,-----,-----,----,----,--------,-'
OlJANl995 OJJAN1996 QlJANl9!Y7 OIJAN1998 OlJANl999 !llJANIDlO OIJAN2001
0.002
0.000
0.008
O.!rlS
0.004
0.002
O.lm
--
:Date
j 0.018
Ji 0.016
a! 0.014
0.012
0.010
0.()(11
0.006
0.004
0.000 mm±m HHHHilHHH f~HH~ iH !HiHH iiHHH
0.000 'T-----,-----,----,----,----,--------,-'
OlJANl995 01JAN1996 OJJANllm 01JANl998 OlJANl999 !llJANIDlO 01JAN2001
396 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
10 20 30 40 50
−300
−250
−200
−150
−100
−50
Day
Depth
2e−07
2e−07
4e−07
4e−07
4e−07
4e−07
4e−07
6e−07
6e−07
6e−07
6e−07
6e−07
8e−07
8e−07
8e−07
8e−07 8e−07
1e−06
1e−06
1.2e−06
1.2e−06
1.4e−06
1.4e−06
1.4e−06
1.8e−06
Figure C.36 Square root of unstructured variance: Days by Depth. Contour graph smooth.
10 20 30 40 50
−300
−250
−200
−150
−100
−50
Day
Depth
0.005
0.005
0.005
0.005
0.005
0.01
0.01
0.01
0.01
0.01
0.015
0.015
0.015
0.015
0.015
0.015
0.015
0.015
0.02 0.02
0.02
Figure C.37 Square root of spatially structured variances: Days by Depth. Contour graphsmooth.
C.1. Graphs: Method 1, Method 2 random walk and penalised spline smoothed models397
Figure C.38 ρ & 95% credible intervals.
398 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
C.2 Final estimates and credible intervals for the contrast of long
fallow cropping versus response cropping
C.2. Final estimates and credible intervals for the contrast of long fallow cropping versusresponse cropping 399
Table C.1 Contrast estimates for depth 100 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
100 1995 0.053 0.033 0.0730.049 0.029 0.0700.038 0.018 0.0580.027 0.008 0.047
-0.023 -0.051 0.0060.017 -0.016 0.050
100 1996 0.021 -0.001 0.0440.006 -0.012 0.0250.002 -0.021 0.0250.038 0.008 0.0700.099 0.052 0.1470.096 0.043 0.1480.099 0.044 0.1530.100 0.047 0.1530.069 0.016 0.1220.069 0.034 0.1040.046 0.008 0.084
100 1997 0.046 0.009 0.0830.023 -0.014 0.0610.024 -0.006 0.0540.027 0.000 0.0540.075 0.046 0.1050.081 0.048 0.1150.085 0.051 0.1200.087 0.053 0.1210.085 0.051 0.1190.084 0.049 0.1180.057 0.022 0.0910.029 -0.003 0.0610.032 0.000 0.062
100 1998 0.029 0.000 0.0590.017 -0.012 0.0460.025 -0.006 0.0570.025 -0.011 0.0600.071 0.034 0.1060.047 0.007 0.0860.022 -0.005 0.0480.015 -0.012 0.0410.014 -0.009 0.0390.005 -0.017 0.0280.006 -0.016 0.029
100 1999 0.017 -0.004 0.0370.016 -0.005 0.0370.026 0.006 0.0480.037 0.016 0.0590.040 0.019 0.0610.041 0.019 0.0620.040 0.018 0.0620.024 0.002 0.0460.018 -0.005 0.0420.016 -0.008 0.0410.015 -0.008 0.039
100 2000 0.015 -0.005 0.0360.054 0.033 0.0750.072 0.051 0.0940.086 0.066 0.107
400 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Table C.2 Contrast estimates for depth 120 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
120 1995 0.027 0.010 0.0440.027 0.010 0.0440.020 0.003 0.0370.016 -0.001 0.033
-0.026 -0.047 -0.005-0.005 -0.027 0.016
120 1996 0.028 0.007 0.0500.013 -0.005 0.0320.011 -0.010 0.0320.013 -0.015 0.0410.071 0.035 0.1090.065 0.022 0.1070.071 0.025 0.1170.079 0.036 0.1220.060 0.012 0.1090.061 0.024 0.0980.044 0.006 0.080
120 1997 0.058 0.020 0.0950.038 0.000 0.0770.033 0.001 0.0650.029 0.002 0.0550.046 0.020 0.0740.060 0.033 0.0890.060 0.032 0.0870.057 0.027 0.0860.056 0.024 0.0870.058 0.028 0.0870.042 0.013 0.0710.024 -0.001 0.0480.027 0.002 0.052
120 1998 0.028 0.004 0.051-0.001 -0.025 0.022-0.003 -0.028 0.0220.005 -0.023 0.0330.043 0.013 0.0720.029 -0.003 0.0600.037 0.011 0.0630.029 0.004 0.0530.029 0.006 0.0530.018 -0.004 0.0400.012 -0.009 0.033
120 1999 0.010 -0.010 0.0310.020 -0.001 0.0400.016 -0.005 0.0360.033 0.014 0.0520.033 0.013 0.0520.037 0.017 0.0560.032 0.013 0.0520.029 0.009 0.0490.021 0.002 0.0410.020 0.001 0.0400.024 0.005 0.044
120 2000 0.016 -0.004 0.0370.048 0.028 0.0690.065 0.045 0.0860.067 0.047 0.086
C.2. Final estimates and credible intervals for the contrast of long fallow cropping versusresponse cropping 401
Table C.3 Contrast estimates for depth 140 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
140 1995 0.010 -0.007 0.0270.012 -0.006 0.0290.005 -0.013 0.0220.010 -0.007 0.027
-0.015 -0.033 0.004-0.005 -0.024 0.013
140 1996 0.020 -0.002 0.0410.016 -0.005 0.0360.008 -0.012 0.0270.003 -0.020 0.0270.030 -0.001 0.0610.028 -0.006 0.0620.033 -0.002 0.0680.046 0.013 0.0800.038 0.000 0.0760.041 0.009 0.0720.031 -0.001 0.065
140 1997 0.050 0.021 0.0790.038 0.010 0.0660.029 0.004 0.0530.024 0.003 0.0450.026 0.006 0.0470.040 0.019 0.0610.036 0.015 0.0570.044 0.022 0.0670.043 0.017 0.0710.036 0.014 0.0590.027 0.003 0.0500.013 -0.006 0.0330.018 -0.002 0.039
140 1998 0.019 -0.002 0.040-0.004 -0.022 0.014-0.009 -0.028 0.011-0.011 -0.032 0.0100.014 -0.007 0.0350.014 -0.008 0.0360.043 0.020 0.0670.042 0.021 0.0640.044 0.023 0.0660.021 0.001 0.0420.020 -0.000 0.039
140 1999 0.013 -0.006 0.0320.020 0.002 0.0380.017 -0.001 0.0350.022 0.004 0.0400.023 0.005 0.0410.022 0.003 0.0410.021 0.003 0.0390.019 0.001 0.0370.018 0.000 0.0350.020 0.001 0.0380.015 -0.002 0.033
140 2000 0.014 -0.004 0.0330.025 0.007 0.0430.047 0.028 0.0670.046 0.025 0.066
402 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Table C.4 Contrast estimates for depth 160 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
160 1995 0.002 -0.014 0.0190.001 -0.015 0.0180.003 -0.013 0.0200.004 -0.012 0.021
-0.010 -0.026 0.008-0.003 -0.019 0.014
160 1996 0.011 -0.007 0.0310.011 -0.006 0.0290.008 -0.009 0.0260.000 -0.019 0.0200.005 -0.015 0.0270.012 -0.009 0.0320.011 -0.010 0.0330.023 0.003 0.0430.024 0.003 0.0450.022 0.001 0.0430.022 0.000 0.043
160 1997 0.027 0.006 0.0480.026 0.006 0.0460.027 0.008 0.0460.025 0.007 0.0430.018 -0.001 0.0360.029 0.010 0.0480.021 0.003 0.0390.026 0.006 0.0450.026 0.005 0.0460.026 0.008 0.0450.021 0.001 0.0400.011 -0.007 0.0290.012 -0.006 0.030
160 1998 0.013 -0.005 0.0310.007 -0.010 0.024
-0.009 -0.025 0.007-0.009 -0.026 0.0090.002 -0.015 0.0200.003 -0.014 0.0210.036 0.018 0.0540.038 0.020 0.0560.041 0.022 0.0580.026 0.009 0.0440.022 0.005 0.040
160 1999 0.020 0.003 0.0370.016 -0.001 0.0330.018 0.002 0.0350.019 0.000 0.0380.016 -0.001 0.0320.019 0.002 0.0360.016 -0.002 0.0320.022 0.005 0.0390.019 0.002 0.0340.014 -0.002 0.0310.021 0.005 0.037
160 2000 0.014 -0.002 0.0300.016 0.000 0.0320.029 0.013 0.0450.030 0.011 0.048
C.2. Final estimates and credible intervals for the contrast of long fallow cropping versusresponse cropping 403
Table C.5 Contrast estimates for depth 180 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
180 1995 -0.000 -0.016 0.0160.003 -0.014 0.0190.002 -0.015 0.0190.006 -0.011 0.0230.001 -0.016 0.018
-0.002 -0.018 0.015
180 1996 0.007 -0.011 0.0240.009 -0.008 0.0260.011 -0.006 0.0280.008 -0.009 0.0250.003 -0.016 0.0210.009 -0.010 0.0280.009 -0.010 0.0270.015 -0.004 0.0340.017 -0.003 0.0360.021 0.002 0.0400.018 -0.001 0.037
180 1997 0.022 0.004 0.0410.021 0.003 0.0390.024 0.006 0.0420.028 0.011 0.0450.021 0.003 0.0380.022 0.003 0.0410.022 0.004 0.0400.025 0.007 0.0440.011 -0.013 0.0360.022 0.004 0.0400.020 0.001 0.0400.015 -0.003 0.0320.014 -0.004 0.032
180 1998 0.017 -0.000 0.0350.014 -0.004 0.031
-0.002 -0.019 0.016-0.004 -0.021 0.012-0.000 -0.018 0.0170.007 -0.010 0.0240.026 0.009 0.0440.025 0.008 0.0420.035 0.018 0.0520.027 0.010 0.0440.034 0.017 0.051
180 1999 0.021 0.004 0.0380.024 0.008 0.0400.020 0.004 0.0370.020 0.004 0.0360.014 -0.002 0.0310.017 0.001 0.0340.019 0.003 0.0360.018 0.002 0.0350.019 0.003 0.0350.017 0.001 0.0320.012 -0.004 0.028
180 2000 0.011 -0.004 0.0270.015 -0.001 0.0310.022 0.006 0.0380.024 0.008 0.040
404 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Table C.6 Contrast estimates for depth 200 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
200 1995 0.001 -0.016 0.0180.003 -0.014 0.0190.006 -0.012 0.0230.004 -0.013 0.0210.003 -0.015 0.0200.005 -0.013 0.023
200 1996 0.009 -0.009 0.0270.007 -0.011 0.0240.006 -0.012 0.0230.011 -0.007 0.0280.003 -0.014 0.0210.010 -0.007 0.0290.011 -0.006 0.0290.014 -0.003 0.0320.014 -0.003 0.0320.013 -0.004 0.0310.016 -0.001 0.034
200 1997 0.013 -0.004 0.0300.021 0.003 0.0400.016 -0.001 0.0340.021 0.003 0.0380.022 0.004 0.0400.022 0.004 0.0390.017 -0.001 0.0340.023 0.004 0.0410.026 0.008 0.0430.022 0.005 0.0400.019 0.000 0.0380.014 -0.004 0.0320.016 -0.001 0.032
200 1998 0.015 -0.003 0.0320.017 -0.001 0.0350.012 -0.006 0.0310.009 -0.009 0.0270.007 -0.011 0.0250.011 -0.006 0.0290.019 0.003 0.0380.014 -0.002 0.0320.025 0.007 0.0430.026 0.009 0.0440.028 0.012 0.045
200 1999 0.017 0.001 0.0340.027 0.010 0.0440.023 0.005 0.0410.015 -0.001 0.0320.017 0.001 0.0340.020 0.003 0.0360.019 0.003 0.0350.018 0.002 0.0350.019 0.003 0.0350.012 -0.004 0.0290.016 -0.000 0.033
200 2000 0.017 -0.001 0.0340.017 0.000 0.0340.017 0.001 0.0340.016 0.000 0.033
C.2. Final estimates and credible intervals for the contrast of long fallow cropping versusresponse cropping 405
Table C.7 Contrast estimates for depth 220 cm: Long fallow cropping vs Response cropping
Depth Year Estimate q025 q975
220 1995 0.014 -0.006 0.0330.014 -0.005 0.0330.016 -0.004 0.0360.013 -0.006 0.0320.013 -0.006 0.0330.016 -0.004 0.035
220 1996 0.015 -0.006 0.0360.023 0.004 0.0430.025 0.005 0.0430.024 0.004 0.0430.021 0.002 0.0400.023 0.004 0.0430.025 0.005 0.0450.021 0.002 0.0400.022 0.003 0.0420.024 0.005 0.0440.025 0.006 0.045
220 1997 0.032 0.012 0.0510.032 0.013 0.0530.026 0.007 0.0450.027 0.008 0.0460.021 0.001 0.0410.030 0.011 0.0500.025 0.005 0.0450.030 0.010 0.0500.023 0.003 0.0420.028 0.008 0.0480.026 0.005 0.0470.026 0.003 0.0490.028 0.006 0.050
220 1998 0.027 0.004 0.0500.024 0.001 0.0480.024 0.002 0.0470.023 -0.000 0.0470.015 -0.008 0.0380.019 -0.005 0.0420.031 0.008 0.0540.027 0.005 0.0490.031 0.008 0.0550.033 0.010 0.0550.035 0.012 0.058
220 1999 0.034 0.009 0.0590.032 0.008 0.0550.029 0.006 0.0520.030 0.007 0.0530.031 0.007 0.0540.026 0.003 0.0490.033 0.010 0.0560.030 0.007 0.0530.023 0.001 0.0450.024 0.002 0.0450.027 0.005 0.048
220 2000 0.022 -0.000 0.0440.021 -0.002 0.0430.023 0.000 0.0450.024 0.002 0.046
Full Reference List
Full Reference List
Abellan, J. J., S. Richardson, and N. Best (2008). Use of space-time models to investigate the stability of
patterns of disease.(Mini-Monograph). Environmental Health Perspectives 116(8), 1111–1119.
Adebayo, S. B. and L. Fahrmeir (2005). Analysing child mortality in Nigeria with geoadditive discrete-
time survival models. Statistics in Medicine 24(5), 709–728.
Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior
distribution of the likelihood. Statistics and Computing 7, 253–261.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In
B. Petrox and F. Caski (Eds.), Second International Symposium on Information Theory, Akademia
Kiado, Budapest, Hungary.
Albert, I., E. Grenier, J.-B. Denis, and J. Rousseau (2008). Quantitative Risk Assessment from Farm
to Fork and Beyond: A Global Bayesian Approach Concerning Food-Borne Diseases. Risk Analy-
sis 28(2), 557–571.
Andersen, S., K. Olesen, F. Jensen, and F. Jensen (1989). Hugin - a shell for building Bayesian belief
universes for expert systems. In Eleventh International Joint Conference on Artificial Intelligence,
Detroit, Michigan, pp. 1080–1085.
Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum,
S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users’ Guide: Third Edition (22
Aug 1999 ed.). Philadelphia: Society for Industrial and Applied Mathematics (SIAM).
407
408 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Anderson, J. M. (2007). AWA water recycling forum position paper: Water recycling to meet our water
needs. In S. J. Khan, R. M. Stuetz, and J. M. Anderson (Eds.), Water Reuse and Recycling 2007.
Sydney: UNSW Publishing & Printing Services.
Anderson, R. and R. May (1991). Infectious diseases of humans: dynamics and control. New York:
Oxford University Press.
Anon. (2008). Accessed, June, 2008: http://www.nationmaster.com/country/as-australia/.
Arrowood, M. J., P. J. Lammie, J. W. Priest, D. G. Addiss, M. R. Hurd, W. R. MacKenzie, A. C. McDon-
ald, M. S. Gradus, G. Linke, and E. Zembrowski (2001). Cryptosporidium parvum-specific antibody
responses among children residing in Milwaukee during the 1993 waterborne outbreak. Journal of
Infectious Diseases 183(9), 1373–1378.
Asano, T. (1998). Wastewater reclamation and reuse. Water Quality Management Library ; V. 10.
Lancaster, Pa.: Technomic Pub.
Ashbolt, N. J., S. R. Petterson, T.-A. Stenstrom, C. Schonning, T. Westrell, and J. Ottoson (2005). Mi-
crobial Risk Assessment (MRA) tool. Technical Report Report 2005:7, Chalmers University of Tech-
nology.
Assuncao, R. M. (2003). Space varying coefficient models for small area data. Environmetrics 14(5),
453–473.
Assuncao, R. M., J. E. Potter, and S. M. Cavenaghi (2002). A Bayesian space varying parameter model
applied to estimating fertility schedules. Statistics in Medicine 21(14), 2057–2075.
Assuncao, R. M., I. A. Reis, and C. D. Oliveira (2001). Diffusion and prediction of Leishmaniasis in
a large metropolitan area in Brazil with a Bayesian space-time model. Statistics in Medicine 20(15),
2319–2335.
Ayars, J. E., P. Shouse, and S. M. Lesch (2009). In situ use of groundwater by alfalfa. Agricultural Water
Management 96(11), 1579–1586.
Baird, D. and R. Mead (1991). The empirical efficiency and validity of two neighbour models. Biomet-
rics 47(4), 1473–1487.
FULL REFERENCE LIST 409
Banerjee, S., B. P. Carlin, and A. E. Gelfand (2004). Hierarchical modeling and analysis for spatial data.
Monographs on statistics and applied probability. Boca Raton, London, New York, Washington D.C.:
Chapman & Hall.
Barker, G. C., N. L. C. Talbot, and M. W. Peck (2002). Risk assessment for Clostridium botulinum: a
network approach. International Biodeterioration & Biodegradation 50(3-4), 167–175.
Bartlett, M. (1978). Nearest neighbour models in the analysis of field experiments. Journal of the Royal
Statistical Society. Series B (Methodological) 40(2), 147–174.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009a). BayesX Software for Bayesian Infer-
ence in Structured Additive Regression Models Version 2.0.1 Reference Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Belitz, C., A. Brezger, T. Kneib, and S. Lang (2009b). BayesX Software for Bayesian Inference in
Structured Additive Regression Models Version 2.0.1 Software Methodology Manual. Online at
http://www.stat.uni-muenchen.de/˜bayesx/bayesx.html. Accessed: October 25, 2010.
Bell, M., F. Dominici, K. Ebisu, S. Zeger, and J. Samet (2007). Spatial and temporal variation in PM2. 5
chemical composition in the United States for health effects studies. Environmental Health Perspec-
tives 115(7), 989–995.
Bellhouse, D. R. (2004). The Reverend Thomas Bayes, FRS: A biography to celebrate the tercentenary
of his birth. Statistical Science 19(1), 3–43.
Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini (1995). Bayesian
analysis of space-time variation in disease risk. Statistics in Medicine 14(21-22), 2433–2443.
Besag, J. and R. Kempton (1986). Statistical analysis of field experiments using neighbouring plots.
Biometrics 42(2), 231–251.
Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J.
R. Statist. Soc. B 36(2), 192–236.
Besag, J. E., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic
systems. Statistical Science 10(1), 3–41.
410 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Besag, J. E. and D. Higdon (1993). Bayesian inference for agricultural field experiments. Bull. Inst.
Internat. Statist 55(Book 1), 121–136.
Besag, J. E. and D. Higdon (1999). Bayesian analysis of agricultural field experiments. Journal of the
Royal Statistical Society Series B-Statistical Methodology 61, 691–717. Part 4.
Besag, J. E. and C. Kooperberg (1995). On conditional and intrinsic autoregressions. Biometrika 82(4),
733–746.
Besag, J. E. and D. Mondal (2005). First-order intrinsic autoregressions and the de Wijs process.
Biometrika 92(4), 909–920.
Besag, J. E., J. York, and A. Mollie (1991). Bayesian image restoration with applications in spatial
statistics (with discussion). Annals of the Institute of Mathematical Statistics 43, 1–59.
Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman,
A. Lumsdaine, and A. Petitet (2002). An updated set of basic linear algebra subprograms (BLAS).
ACM Transactions on Mathematical Software (TOMS) 28(2), 135–151.
Blaser, M. J. and L. S. Newman (1982). A review of human salmonellosis: I. Infective dose. Reviews of
infectious diseases 4(6), 1096–1106.
Boerlage, B. (1992). Link Strength in Bayesian Networks. Ph. D. thesis, University of British Columbia,
Canada.
Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness. Journal of
the Royal Statistical Society. Series A (General) 143(4), 383–430.
Box, G. E. P. and G. M. Jenkins (1976). Time series analysis : forecasting and control (Rev. ed.).
Holden-Day series in time series analysis and digital processing. San Francisco: Holden-Day.
Brandl, M. T. and R. Amundson (2008). Leaf age as a risk factor in contamination of lettuce with
Escherichia coli O157 : H7 and Salmonella enterica. Applied and Environmental Microbiology 74(8),
2298–2306.
Brezger, A. and S. Lang (2006). Generalized structured additive regression based on Bayesian P-splines.
Computational Statistics and Data Analysis 50(4), 967–991.
FULL REFERENCE LIST 411
Brien, C. J. and C. G. B. Demetrio (2009). Formulating mixed models for experiments, including longitu-
dinal experiments. Journal of Agricultural, Biological, and Environmental Statistics 14(3), 253–280.
Brook, D. (1964). On the distinction between the conditional probability and the joint probability ap-
proaches in the specification of nearest-neighbour systems. Biometrika 51(3-4), 481.
Brookhart, M. A., A. E. Hubbard, M. J. v. d. Laan, J. John M. Colford, and J. N. S. Eisenberg (2002).
Statistical estimation of parameters in a disease transmission model: analysis of a Cryptosporidium
outbreak. Statistics in Medicine 21, 3627–3638.
Broughton, A. (1994). Mooki River Catchment hydrogeological investigation and dryland salinity studies
- Liverpool Plains, TS94.026. Technical report, New South Wales Department of Water Resources.
Bureau of Meteorology (2010, April 15). 2010JR12235 *** Student/Request for Data, Forecasts
or other services/wa/Climate and Past Weather*** (JR- [SEC=UNCLASSIFIED]. email: cli-
Burgman, M. (2005). Risks and Decisions for Conservation and Environmental Management. New York:
Cambridge University Press.
Butler, D. G., B. R. Cullis, A. R. Gilmour, and B. J. Gogel (2007). Analysis of Mixed Models for S
Language Environments, ASReml-R Reference Manual Release 2, Volume No. QE02001 of Training
and Development Series. Brisbane, Australia: Queensland Department of Primary Industries and
Fisheries.
Casman, E. A., B. Fischhoff, C. Palmgren, M. J. Small, and F. Wu (2000). An integrated risk model of a
drinking water borne cryptosporidiosis outbreak. Risk Analysis 20(4), 495–511.
Castillo, E., J. M. Gutierrez, and E. Castillo (1997). Sensitivity analysis in discrete Bayesian networks.
IEEE Transactions on Systems, Man & Cybernetics: Part A 27, 412–423.
Castillo, E., J. M. Gutierrez, A. S. Hadi, and C. Solares (1997). Symbolic propagation and sensitivity
analysis in Gaussian Bayesian networks with application to damage assessment. Artificial Intelligence
in Engineering 11, 173–181.
412 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Chan, H. and A. Darwiche (2004). Sensitivity analysis in Bayesian networks: from single to multiple
parameters. In UAI ‘04 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence,
pp. 67–75. AUAI Press.
Chib, S. and B. P. Carlin (1999). On MCMC sampling in hierarchical longitudinal models. Statistics and
Computing 9, 17–26.
Clements, A., S. Brooker, U. Nyandindi, A. Fenwick, and L. Blair (2008). Bayesian spatial analysis
of a national urinary Schistosomiasis questionnaire to assist geographic targeting of Schistosomiasis
control in Tanzania, East Africa. International Journal for Parasitology 38, 401–415.
Clements, A. C., A. Garba, M. Sacko, S. Tour, R. Dembel, A. Landour, E. Bosque-Oliva, A. F. Gabrielli,
and A. Fenwick (2008). Mapping the probability of Schistosomiasis and associated uncertainty, West
Africa. Emerging Infectious Diseases 14(10), 1629–1632.
Commandeur, J. J. F. and S. J. Koopman (2007). An introduction to state space time series analysis.
Practical econometrics. Oxford New York: Oxford University Press.
Cowell, R. G. and A. P. Dawid (1992). Fast retraction of evidence in a probabilistic expert system.
Statistics and Computing 2(1), 37–40.
Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter (2001). Probabilistic Networks and
Expert Systems. Springer.
Cressie, N. A. C. (1991). Statistics for spatial data. Wiley series in probability and mathematical
statistics. Applied probability and statistics. New York: John Wiley.
Crook, A. M., L. Knorr-Held, and H. Hemingway (2003). Measuring spatial effects in time to event data:
a case study using months from angiography to coronary artery bypass graft (CABG). Statistics in
Medicine 22(18), 2943–2961.
Cullen, A. C. and H. C. Frey (1999). Probabilistic techniques in exposure assessment : a handbook for
dealing with variability and uncertainty in models and inputs. New York: Plenum Press.
Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments-an extension to two dimen-
sions. Biometrics 47, 1449–1460.
FULL REFERENCE LIST 413
Cullis, B. R., W. J. Lill, J. A. Fisher, B. J. Read, and A. C. Gleeson (1989). A new procedure for the
analysis of early generation variety trials. Journal of the Royal Statistical Society Series C Applied
Statistics 38(2), 361–375.
Daniells, I. G., J. F. Holland, R. R. Young, C. L. Alston, and A. L. Bernardi (2001). Relationship between
yield of grain sorghum (Sorghum bicolor) and soil salinity under field conditions. Australian Journal
of Experimental Agriculture 41, 211–217.
Darroch, J. N., S. L. Lauritzen, and T. P. Speed (1980). Markov fields and log-linear interaction models
for contingency tables. The Annals of Statistics 8(3), 522–539.
Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems.
Statistics and Computing 2(1), 25–36.
Dawid, A. P., U. Kjaerulff, and S. L. Lauritzen (1995). Hybrid propagation in junction trees. In Advances
in Intelligent Computing - Ipmu ’94, Volume 945 of Lecture Notes in Computer Science, pp. 87–97.
Springer Verlag KG.
Delignette-Muller, M. L., M. Cornu, R. Pouillot, and J. B. Denis (2006). Use of Bayesian modelling
in risk assessment: Application to growth of Listeria monocytogenes and food flora in cold-smoked
salmon. International Journal of Food Microbiology 106(2), 195–208.
Dillon, P., D. Page, J. Vanderzalm, P. Pavelic, S. Toze, E. Bekele, J. Sidhu, H. Prommer, S. Higginson,
R. Regel, S. Rinck-Pfeiffer, M. Purdie, C. Pitman, and T. Wintgens (2008). A critical evaluation of
combined engineered and aquifer treatment systems in water recycling. Water Science & Technology
- WST 57(5), 753–762.
Dunson, D. (2001). Commentary: practical advantages of Bayesian analysis of epidemiologic data.
American Journal of Epidemiology 153(12), 1222–1226.
Durban, M., C. A. Hackett, J. W. McNicol, A. C. Newton, W. T. B. Thomas, and I. D. Currie (2003).
The practical use of semiparametric models in field trials. Journal of Agricultural Biological and
Environmental Statistics 8(1), 48–66.
Earnest, A., J. R. Beard, G. Morgan, D. Lincoln, R. Summerhayes, D. Donoghue, T. Dunn, D. Muscatello,
and K. Mengersen (2010). Small area estimation of sparse disease counts using shared component
414 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
models-application to birth defect registry data in New South Wales, Australia. Health & Place 16,
684–693.
Edwards, D. (1995). Introduction to Graphical Modelling. New York: Springer-Verlag.
Eisenberg, J., E. Seto, A. Olivieri, and R. Spear (1996). Quantifying water pathogen risk in an epidemi-
ological framework. Risk Analysis 16, 549–563.
Eisenberg, J. N. S., M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr (2002). Disease transmis-
sion models for public health decision making: Analysis of epidemic and endemic conditions caused
by waterborne pathogens. Environmental Health Perspectives 110(8), 783–790.
Eisenberg, J. N. S., E. Y. W. Seto, J. M. Colford Jr, A. Olivieri, and R. C. Spear (1998). An analysis
of the Milwaukee cryptosporidiosis outbreak based on a dynamic model of the infection process.
Epidemiology 9(3), 255–263.
Elliott, P. (2000). Spatial epidemiology : methods and applications. Oxford medical publications. Ox-
ford: Oxford University Press.
Fahrmeir, L., T. Kneib, and S. Lang (2004). Penalized structured additive regression for space-time data:
A Bayesian perspective. Statistica Sinica 14, 731–761.
Fewtrell, L. and J. Bartram (2001). Water Quality: Guidelines, Standards and Health. London: World
Health Organisation.
Fienberg, S. E. (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1, 1–40.
Fong, Y., H. Rue, and J. Wakefield (2010). Bayesian inference for generalized linear mixed models.
Biostatistics 11(3), 397–412.
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Gamerman, D. and H. F. Lopes (2006). Markov chain Monte Carlo : stochastic simulation for Bayesian
inference (2nd ed.). London ; New York: Chapman & Hall.
Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial
data analysis. Biostatistics 4(1), 11–25.
FULL REFERENCE LIST 415
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian data analysis. Texts in statistical
science. London: Chapman & Hall.
Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration
of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741.
Gerba, C. P., N. C.-d. Campo, J. P. Brooks, and I. L. Pepper (2008). Exposure and risk assessment of
Salmonella in recycled residuals. Water Science & Technology 57(7), 1061–1065.
Gerlach, R., C. Carter, and R. Kohn (2000). Efficient Bayesian inference for dynamic mixture models.
Journal of the American Statistical Association 95(451), 818–828.
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior
moments. Bayesian Statistics 4, 169–188.
Gibbs, R. A. (1995). Die-off of human pathogens in stored wastewater sludge and sludge applied to
land. Technical report, Urban Water Research Association of Australia, Water Services Association
of Australia, Melbourne.
Gibbs, R. A. and G. E. Ho (1993). Health risks from pathogens in untreated wastewater sludge: implica-
tions for Australian sludge management guidelines. Water 20(1), 17–22.
Gibbs, R. A., C. J. Hu, G. E. Ho, P. A. Phillips, and I. Unkovich (1995). Pathogen die-off in stored
wastewater sludge. Water Science & Technology 31(5-6), 91–95.
Gilmour, A. R., B. R. Cullis, and A. P. Verbyla (1997). Accounting for natural and extraneous variation
in the analysis of field experiments. Journal of Agricultural Biological and Environmental Statistics 2,
269–293.
Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R. Thompson (2005). ASReml User Guide Release 2.0.
Technical report, VSN International Ltd, Hemel Hempstead, UK.
Gilmour, A. R., R. Thompson, and B. R. Cullis (1995). Average information REML: an efficient algo-
rithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–1450.
Gordon, C. and S. Toze (2003). Influence of groundwater characteristics on the survival of enteric viruses.
Journal of Applied Microbiology 95(3), 536–544.
416 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Gotway, C. A. and N. A. C. Cressie (1990). A spatial analysis of variance applied to soil-water infiltration.
Water resources research 26(11), 2695–2703.
Gotway, C. A. and L. J. Young (2002). Combining incompatible spatial data. Journal of the American
Statistical Association 97(458), 632–648.
Green, P. J. and R. Sibson (1978). Computing Dirichlet tessellations in the plane. Computer Journal 21,
168–173.
Haas, C. and J. N. Eisenberg (2001). Risk assessment. In L. Fewtrell and J. Bartram (Eds.), Water
Quality: Guidelines, Standards and Health. WHO.
Haas, C. N. (1999). On modeling correlated random variables in risk assessment. Risk Analysis 6,
1205–1214.
Haas, C. N., J. B. Rose, and C. P. Gerba (1999). Quantitative Microbial Risk Assessment. New York:
Wiley.
Hall, G. (2004). Results from the National Gastroenteritis Survey 2001 2002. Technical Report NCEPH
Working Paper Number 50, National Centre for Epidemiology & Population Health.
Hall, G. and M. Kirk (2005). Foodborne illness in Australia annual incidence circa 2000. Technical
report, Australian Government Department of Health and Ageing.
Hall, G., J. Raupach, and K. Yohannes (2006). An estimate of under-reporting of foodborne notifiable
diseases: Salmonella Campylobacter Shiga toxin producing E. coli (STEC). Technical report, National
Centre for Epidemiology & Population Health.
Hamilton, G. S., F. Fielding, A. W. Chiffings, B. T. Hart, R. W. Johnstone, and K. L. Mengersen (2007).
Investigating the use of a Bayesian network to model the risk of Lyngbya majuscula bloom initiation
in Deception Bay, Queensland. Ecological Risk Assessment 13(6), 1271–1287.
Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge
England: Cambridge University Press.
FULL REFERENCE LIST 417
Haskard, K. A., B. R. Cullis, and A. P. Verbyla (2007). Anisotropic Matern correlation and spatial
prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics 12(2),
147–160.
Hastie, T. and R. Tibshirani (1990). Generalized additive models (1st ed.). Monographs on statistics and
applied probability. London ; New York: Chapman and Hall.
Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North Atlantic
Ocean. Environmental and Ecological Statistics 5, 173–190.
Hijnen, W. A., Y. J. Dullemont, J. F. Schijven, A. J. Hanzens-Brouwer, M. Rosielle, and G. Medema
(2007). Removal and fate of Cryptosporidium parvum, Clostridium perfringens and small-sized centric
diatoms (Stephanodiscus hantzschii) in slow sand filters. Water Research 41, 2151–2162.
Hijnen, W. A. M., E. Beerendonk, and G. J. Medema (2005). Elimination of micro-organisms by drinking
water processes a review. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.
Hijnen, W. A. M., E. Beerendonk, P. Smeets, and G. J. Medema (2004). Elimination of micro-organisms
by water treatment processes. Technical report, Kiwa N.V., Nieuwegein, The Netherlands.
Hijnen, W. A. M., J. F. Schijven, P. Bonne, A. Visser, and G. J. Medema (2004). Elimination of viruses,
bacteria and protozoan oocysts by slow sand filtration. Water Science & Technology 50(1), 147–154.
Hoeting, J. A., R. A. Davis, A. A. Merton, and S. E. Thompson (2006). Model selection for geostatistical
models. Ecological Applications 16(1), 87–98.
Holbrook, N. and N. Bindoff (2000). A statistically efficient mapping technique for four-dimensional
ocean temperature data. Journal of Atmospheric and Oceanic Technology 17(6), 831–846.
Hrafnkelsson, B. and N. Cressie (2003). Hierarchical modeling of count data with application to nuclear
fall-out. Environmental and Ecological Statistics 10, 179–200.
Hrudey, S. E., P. M. Huck, P. Payment, R. W. Gillham, and E. J. Hrudey (2002). Walkerton: Lessons
learned in comparison with waterborne outbreaks in the developed world. Journal of Environmental
Engineering and Science 1(6), 397–407.
Hugin Expert A/S (2007). Hugin 6.9. Available on: www.hugin.com. Accessed: November 6, 2008.
418 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Hugin Expert A/S (2007). Hugin Expert - Publications. Available on:
www.hugin.com/developer/Publications/. Accessed: November 6, 2008.
Hunter, P. R. and L. Fewtrell (2001). Acceptable risk. In L. Fewtrell and J. Bartram (Eds.), Water Quality:
Guidelines, Standards and Health. WHO.
Isaac, D. (2008a). Email: June 27,2008: Re: Fw: Recycled water: measurements required under licence
by the Health Department.
Isaac, D. (2008b). Fit for purpose guidelines for recycled water. email, received June 26, 2008.
Jacobsen, K. and J. Koopman (2004). Declining hepatitis A seroprevalence: a global review and analysis.
Epidemiology and Infection 132, 1005–1022.
Jensen, F. (1994). Implementation aspects of various propagation algorithms in Hugin. Technical Report
Research Report R-94-2014, Department of Mathematics and Computer Science, Aalborg University,
Denmark, Aalborg, Denmark.
Jensen, F. (2001). Bayesian Networks and Decision Graphs. Springer.
Jensen, F. V., S. H. Aldenryd, and K. B. Jensen (1995). Sensitivity analysis in Bayesian networks. Lecture
Notes in Artificial Intelligence 946, 243.
Jensen, F. V., B. Chamberlain, T. Nordahl, and F. Jensen (1991). Analysis in Hugin of data conflict. In
Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, New
York, NY, USA, pp. 519–528. Elsevier Science Inc.
Jordan, M. I. (2004). Graphical models. Statistical Science 19(1), 140–155.
Karanis, P., C. Kourenti, and H. Smith (2007). Waterborne transmission of protozoan parasites: A
worldwide review of outbreaks and lessons learnt. Journal of Water and Health 5(1), 1–38.
Karim, M. R., E. P. Glenn, and C. P. Gerba (2008). The effect of wetland vegetation on the survival of
Escherichia coli, Salmonella typhimurium, bacteriophage MS-2 and polio virus. Journal of Water and
Health 06(2), 167–175.
FULL REFERENCE LIST 419
Kelly, D. L. and C. L. Smith (2009). Bayesian inference in probabilistic risk assessment–The current
state of the art. Reliability Engineering & System Safety 94(2), 628–643. 0951-8320 doi: DOI:
10.1016/j.ress.2008.07.002.
Kennett, R. J., K. B. Korb, and A. E. Nicholson (2001). Seebreeze prediction using Bayesian networks:
a case study. Lecture Notes in Computer Science 2035, 148–153.
Khan, S. J. (2010). Quantitative chemical exposure assessment for water recycling schemes. Waterlines
Report Series, No 27. Australian Government National Water Commission.
Kinde, H., M. Adelson, A. Ardans, E. H. Little, D. Willoughby, D. Berchtold, D. H. Read, R. Breitmeyer,
D. Kerr, R. Tarbell, and E. Hughes (1997). Prevalence of Salmonella in municipal sewage treatment
plant effluents in Southern California. Avian Diseases 41(2), 392–398.
Kneib, T. (2006). Geoadditive hazard regression for interval censored survival times. Computational
Statistics and Data Analysis 51, 777–792.
Kneib, T. and L. Fahrmeir (2006). Structured additive regression for categorical spacetime data: A mixed
model approach. Biometrics 62(1), 109–118.
Knorr-Held, L. and J. Besag (1998). Modeling risk from a disease in time and space. Statistics in
Medicine 17, 2045–2060.
Korb, K. B. and A. E. Nicholson (2004). Bayesian Artificial Intelligence. London: CRC Press.
Lang, S. and A. Brezger (2004). Bayesian P-splines. Journal of Computational and Graphical Statis-
tics 13(1), 183–212.
Laskey, K. B. (1995). Sensitivity analysis for probability assessments in Bayesian networks. IEEE
Transactions on Systems, Man and Cybernetics 25, 901–909.
Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data. Computa-
tional Statistics & Data Analysis 19, 191–201.
Lauritzen, S. L. and D. J. Spiegelhalter (1988). Local computations with probabilities on graphical
structures and their application to expert systems. Journal of the Royal Statistical Society. Series B
(Methodological) 50(2), 157–224.
420 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Lawson, C. L., R. J. Hanson, D. R. Kincaid, and K. F. T (1979). Basic Linear Algebra Subprograms for
Fortran usage. ACM Trans. Math. Software 5(3), 324–325.
Lemos, R. T. and B. Sanso (2009). A spatio-temporal model for mean, anomaly, and trend fields of North
Atlantic sea surface temperature. Journal of the American Statistical Association 104(485), 5–18.
Lemos, R. T., B. Sanso, and M. L. Huertos (2007). Spatially varying temperature trends in a central
California estuary. Journal of Agricultural, Biological, and Environmental Statistics 12(3), 379–396.
Lindgren, F., H. Rue, and J. Lindstrom (2011). An explicit link between Gaussian fields and Gaussian
Markov random fields: the stochastic partial differential equation approach. Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.
Lui, J. S., W. H. Wong, and A. Kong (1994). Covariance structure of the Gibbs sampler with applications
to the comparisons of estimators and augmentations schemes. Journal of the Royal Statistical Society,
Series B 57(1), 157–169.
Lumina Decision Systems (2004). Analytica. Available on:
www.lumina.com/ana/editiondescriptions.htm. Accessed: April 24, 2008.
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS - A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing 10(4), 325–337.
Macdonald, B. C. T., J. K. Reynolds, A. S. Kinsela, R. J. Reilly, P. van Oploo, T. D. Waite, and I. White
(2009). Critical coagulation in sulfidic sediments from an east-coast Australian acid sulfate landscape.
Applied Clay Science 46(2), 166–175.
MacKenzie, W., N. J. Hoxie, M. E. Proctor, M. S. Gradus, K. A. Blair, D. E. Peterson, J. J. Kazmierczak,
D. G. Addiss, K. R. Fox, J. B. Rose, and J. P. Davis (1994). A massive outbreak in Milwaukee
of cryptosporidium infection transmitted through the public water supply. New England Journal of
Medicine 331(3), 161–167.
MacKenzie, W. R., W. L. Schell, K. A. Blair, D. G. Addiss, D. E. Peterson, N. J. Hoxie, J. J. Kazmierczak,
and J. P. Davis (1995). A massive outbreak of waterborne cryptosporidium infection in Milwaukee,
Wisconsin: Recurrence of illness and risk of secondary transmission. Clinical infectious diseaseas 21,
57–62.
FULL REFERENCE LIST 421
Marcot, B. G. (2006). Characterizing species at risk I: Modeling rare species under the Northwest Forest
Plan. Ecology and Society 11(2), 10.
Marcot, B. G., P. A. Hohenlohe, S. Morey, R. Holmes, R. Molina, M. C. Turley, M. H. Huff, and J. A.
Laurence (2006). Characterizing species at risk II: Using Bayesian belief networks as decision support
tools to determine species conservation categories under the Northwest Forest Plan. Ecology and
Society 11(2), 12.
Marks, H. M., M. E. Coleman, C. T. J. Lin, and T. Roberts (1998). Topics in microbial risk assessment:
Dynamic flow tree process. Risk Analysis 18(3), 309–328.
Marley, J. K. and M. P. Wand (2010). Non-standard semiparametric regression via BRugs. Journal of
Statistical Software 37(5), 1–30.
Martin, J. E., T. Rivas, J. M. Matas, J. Taboada, and A. Argelles (2009). A Bayesian network analysis of
workplace accidents caused by falls from a height. Safety Science 47(2), 206–214.
Martin, R. J., N. Chauhan, J. A. Eccleston, and B. S. P. Chan (2006). Efficient experimental designs when
most treatments are unreplicated. Linear Algebra and its Applications 417(1), 163–182.
Martino, S. and H. Rue (2008). Implementing approximate Bayesian inference using Integrated Nested
Laplace Approximation: A manual for the INLA program. Citeseer.
Martino, S. and H. Rue (2009). R Package: INLA. Department of Mathematical Sciences NTNU,
Norway.
Matias, J. M., T. Rivas, C. Ordonez, J. Taboada, and J. M. Matias (2007). Assessing the environmental
impact of slate quarrying using Bayesian networks and GIS. In AIP Conference, Volume 963, pp.
1285–1288.
McCullough, N. B. and C. W. Eisele (1951a). Experimental human salmonellosis: I. pathogenicity of
strains of Salmonella meleagridis and Salmonella anatum obtained from spray-dried whole egg. The
Journal of Infectious Diseases 88(3), 278–289.
McCullough, N. B. and C. W. Eisele (1951b). Experimental human salmonellosis: II. Immunity studies
following experimental illness with Salmonella meleagridis and Salmonella anatum. The Journal of
Immunology 66(5), 595–608.
422 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
McCullough, N. B. and C. W. Eisele (1951c). Experimental human salmonellosis: III. Pathogenicity of
strains of Salmonella newport, Salmonella derby, and Salmonella bareilly obtained from spray-dried
whole egg. The Journal of Infectious Diseases 89(3), 209–213.
McCullough, N. B. and C. W. Eisele (1951d). Experimental human salmonellosis: IV. Pathogenicity
of strains of Salmonella pullorum obtained from spray-dried whole egg. The Journal of Infectious
Diseases 89(3), 259–265.
Messner, M. J., C. L. Chappell, and P. C. Okhuysen (2001). Risk assessment for Cryptosporidium: A
hierarchical Bayesian analysis of human dose response data. Water Research 35(16), 3934–3940.
Mons, M., J. Van der Wielen, E. Blokker, M. Sinclair, K. Hulshof, F. Dangendorf, P. Hunter, and
G. Medema (2007). Estimation of the consumption of cold tap water for microbiological risk as-
sessment: an overview of studies and statistical analysis of data. Journal of Water and Health 5(1),
151–170.
Nadebaum, P., M. Chapman, R. Morden, and S. Rizak (2004). A guide to hazard identification & risk
assessment for drinking water supplies. Technical report, CRC for Water Quality and Treatment.
National Notifiable Diseases Surveillance System (2008). National Notifiable Diseases Surveillance
System. Available on: http://www9.health.gov.au/cda/Source/CDA-index.cfm. Accessed:
April 9, 2008.
Natural Resource Management Ministerial Council, Environment Protection and Heritage Coun-
cil, and Australian Health Ministers Conference (2006). Australian Guidelines for Wa-
ter Recycling: Managing health and environmental risks (Phase1) 2006. Available on:
www.ephc.gov.au/taxonomy/term/39. Accessed: March 29, 2008.
Nayyar, A., C. Hamel, G. Lafond, B. D. Gossen, K. Hanson, and J. Germida (2009). Soil microbial
quality associated with yield reduction in continuous-pea. Applied Soil Ecology 43(1), 115–121.
Neapolitan, R. E. and X. Jiang (2007). Probabilistic Methods for Financial and Marketing Informatics.
Elsevier.
Ngo, L. and M. Wand (2004). Smoothing with mixed model software. Journal of Statistical Software 9,
1–56.
FULL REFERENCE LIST 423
Nicholson, A., S. Watson, and C. Twardy (2003). Using Bayesian networks for water quality prediction
in Sydney Harbour. Available online:www.csse.monash.edu.au/bai/talks/NSWDEC.ppt. Ac-
cessed: March 27,2008.
Norsys Software Corp. (2007). Netica 3.25. Available online: www.norsys.com. Accessed February
15, 2008.
NumPy Community (2010, February 9, 2010). NumPy Reference Manual: Release 1.5.0.dev8106. Avail-
able online: http://docs.scipy.org/doc/. Accessed: February 9, 2010.
Olivieri, A. W., R. Danielson, J. N. Eisenberg, L. Johnson, V. Pon, R. Sakaji, R. Soller, J. A. Soller,
J. Stephenson, and C. Trese (2007). Evaluation of microbial risk assessment techniques and appli-
cations in water reclamation. Technical report, Water Environment Research Foundation (WERF),
Alexandria, VA. Available online: www.werf.org/AM/.
Oscar, T. (2004). Dose-response model for 13 strains of Salmonella. Risk Analysis 24(1), 41–49.
Palacios, M. P., P. Lupiola, M. T. Tejedor, E. Del-Nero, A. Pardo, and L. Pita (2001). Climatic effects
on Salmonella survival in plant and soil irrigated with artificially inoculated wastewater: preliminary
results. Water Science Technology 43(12), 103–108.
Palisade Corporation (2008). At Risk5.0. Available online: www.palisade.com/risk/. Accessed:
October 22, 2009.
Papadakis, J. S. (1937). Mthode statistique pour des expriences sur champ. Bulletin scientifique damlio-
ration des plantes de Thessalonique 23, 30.
Paulo, M. J., H. v. d. Voet, M. J. W. Jansen, C. J. F. t. Braak, and J. D. v. Klaveren (2005). Risk assessment
of dietary exposure to pesticides using a Bayesian method. Pest Management Science 61(8), 759–766.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems : networks of plausible inference. San
Mateo, California: Morgan Kaufmann Publishers.
Petterson, S. and N. Ashbolt (2001). Viral risks associated with wastewater reuse: modeling virus per-
sistence on wastewater irrigated salad crops. Water Science and Technology 43(12), 23–26.
424 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Petterson, S., N. Ashbolt, and A. Sharma (2001). Microbial risks from wastewater irrigation of salad
crops: A screening-level risk assessment. Water Environment Research 73(6), 667–672.
Petterson, S. A. and N. J. Ashbolt (2006). WHO Guidelines for the safe use of wastewater and excreta in
agriculture microbial risk assessment section. Technical report, World Health Organization.
Petterson, S. R. (2002). Microbial Risk Assessment of Wastewater Irrigated Salad Crops. Ph. D. thesis,
University of New South Wales.
Piepho, H. P., A. Buchse, and K. Emrich (2003). A hitchhiker’s guide to mixed models for randomized
experiments. Journal of Agronomy and Crop Science 189(5), 310–322.
Piepho, H. P., A. Buchse, and C. Richter (2004). A mixed modelling approach for randomized experi-
ments with repeated measures. Journal of Agronomy and Crop Science 190(4), 230–247.
Piepho, H. P. and J. O. Ogutu (2007). Simple state-space models in a mixed model framework. American
Statistician 61(3), 224–232.
Piepho, H. P., C. Richter, and E. Williams (2008). Nearest neighbour adjustment and linear variance
models in plant breeding trials. Biometrical Journal 50(2), 164–189.
Piepho, H. P. and E. R. Williams (2010). Linear variance models for plant breeding trials. Plant Breed-
ing 129(1), 1–8.
Pike, W. A. (2004). Modeling drinking water quality violations with Bayesian networks. Journal of the
American Water Resources Association 40(6), 1563–1578.
Pitt, M. and N. Shephard (1999). Analytic convergence rates and parameterisation issues for the Gibbs
sampler applied to state space models. Journal of Time Series Analysis 20, 63–85.
Pollino, C. A. and B. T. Hart (2005a). Bayesian approaches can help make better sense of ecotoxicolog-
ical information in risk assessments. Australian Journal of Ecotoxicology 11, 57–58.
Pollino, C. A. and B. T. Hart (2005b). Bayesian decision networks - going beyond expert elicitation for
parameterisation and evaluation of ecological endpoints. In A. Voinov, A. Jakeman, and A. Rizzoli
(Eds.), Third Biennial Meeting: Summit on Environmental Modelling and Software, Burlington, USA.
FULL REFERENCE LIST 425
Pollino, C. A., O. Woodberry, A. E. Nicholson, K. B. Korb, and B. T. Hart (2007). Parameterisation and
evaluation of a Bayesian network for use in an ecological risk assessment. Environmental Modelling
and Software 22, 1140–1152.
Poncet, C., V. Lemesle, L. Mailleret, A. Bout, R. Boll, and J. Vaglio (2010). Spatio-temporal analysis
of plant pests in a greenhouse using a Bayesian approach. Agricultural and Forest Entomology 12(3),
325–332.
Pouillot, R., P. Beaudeau, J.-B. Denis, and F. Derouin (2004). A quantitative risk assessment of water-
borne Cryptosporidiosis in France using second-order Monte Carlo simulation. Risk Analysis 24(1),
1–17.
Qian, S. S., C. A. Stow, and M. E. Borsuk (2003). On Monte Carlo methods for Bayesian inference.
Ecological Modelling 159, 269.
Raftery, A. and S. Lewis (1992). How many iterations in the Gibbs sampler? In J. Bernardo, J. Berger,
A. Dawid, and A. Smith (Eds.), Bayesian Statistics 4. Oxford: Oxford University Press.
Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety Science 27(2-
3), 183–213.
Raso, G., P. Vounatsou, L. Gosoniu, M. Tanner, E. K. N’Goran, and J. Utzinger (2006). Risk factors and
spatial patterns of hookworm infection among schoolchildren in a rural area of western Cte d’Ivoire.
International Journal for Parisitology 36(2), 201–210.
Rassmussen, L. (1995). Bayesian network for blood typing and parentage verification of cattle. Technical
report, Department of Mathematics and Computer Science, Aalborg University, Denmark. Hugin
reference Hugin 6.9.
Reich, B., J. Hodges, and B. Carlin (2007). Spatial analyses of periodontal data using conditionally
autoregressive priors having two classes of neighbor relations. Journal of the American Statistical
Association 102(477), 44–55.
Rentdorff, R. (1954). The experimental transmission of human intestinal protozoan parasites: II. Giardia
lamblia cysts given in capsules. American Journal of Hygiene 59, 209–220.
426 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Ridgway, K., J. Dunn, and J. Wilkin (2002). Ocean interpolation by four-dimensional weighted least
squares-application to the waters around Australasia. Journal of Atmospheric and Oceanic Technol-
ogy 19(9), 1357–1375.
Ringrose-Voase, A., R. R. Young, Z. Payder, N. Huth, A. Bernardi, H. Cresswell, B. Keating, J. Scott,
M. Stauffacher, R. Banks, J. Holland, R. Johnston, T. Green, L. Gregory, I. Daniells, R. Farquharson,
R. Drinkwater, S. Heidenreich, and S. Donaldson (2003). Deep drainage under different land uses
in the Liverpool Plains Catchment. Technical Report 3, Agricultural Resource Management Report
Series, NSW Agriculture Orange.
Rizak, S. and S. Hrudey (2007). Strategic water quality monitoring for drinking water safety. Technical
Report 37, CRC for Water Quality and Treatment.
Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Sociological
Review 15(3), 351–357.
Robinson, W. (2009). Ecological correlations and the behavior of individuals. International Journal of
Epidemiology 38(2), 337–341.
Roser, D., S. Khan, C. Davies, R. Signor, S. Petterson, and N. Ashbolt (2006). Screening health risk
assessment for the use of microfiltration-reverse osmosis treated tertiary effluent for replacement of
environmental flows. Technical Report CWWT Report 2006-20, Centre for Water and Waste Technol-
ogy, School of Civil and Environmental Engineering, University of NSW.
Roser, D., S. Petterson, R. Signor, and N. Ashbolt (2006). How to implement QMRA? to estimate
baseline and hazardous event risks with management end uses in mind. Technical report, MicroRisk
project co-funded by the European Commission under the Fifth Framework Programme, Theme 4:
Energy, environment and sustainable development (contract EVK1-CT-2002-00123).
Roy, V. and S. d. Blois (2008). Evaluating hedgerow corridors for the conservation of native forest herb
diversity. Biological Conservation 141, 298–307.
Rue, H. and L. Held (2005). Gaussian Markov random fields : theory and applications. Boca Raton:
Chapman & Hall/CRC.
FULL REFERENCE LIST 427
Rue, H. and H. Tjelmeland (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scandi-
navian Journal of Statistics 29(1), 31–49.
Saad, Y. (2003). Iterative methods for sparse linear systems. Society for Industrial and Applied Mathe-
matics. [electronic resource].
Sahu, S. K. and P. Challenor (2008). A space-time model for joint modeling of ocean temperature and
salinity levels as measured by Argo floats. Environmetrics 19(5), 509–528.
SAS Institute (2004). SAS Version 9.1.3. Cary, NC., USA: SAS Institute Inc.
Schabenberger, O. and C. A. Gotway (2005). Statistical methods for spatial data analysis. Texts in
statistical science. Boca Raton: Chapman & Hall/CRC.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.
Shillito, R. M., D. J. Timlin, D. Fleisher, V. R. Reddy, and B. Quebedeaux (2009). Yield response of
potato to spatially patterned nitrogen application. Agriculture Ecosystems & Environment 129(1-3),
107–116.
Sidhu, J. P. S., J. Hanna, and S. G. Toze (2008). Survival of enteric microorganisms on grass surfaces
irrigated with treated effluent. Journal of Water and Health 06(2), 255–262.
Signor, R. and N. Ashbolt (2006). Pathogen monitoring offers questionable protection against drinking-
water risks: a QMRA (Quantitive Microbial Risk Analysis) approach to assess management strategies.
Erratum in Water Science and Technology 54 (11-12):451. Water Science and Technology 54, 261–
268.
Signor, R. S. (2007a). Microbial risk implications of rainfall-induced runoff events entering a reservoir
used as a drinking-water source. Journal of Water Supply Research and Technology - AQUA 56, 515–
531.
Signor, R. S. (2007b). Probabilistic Microbial Risk Assessment & Management Implications for Urban
Water Supply Systems. Ph. D. thesis, UNSW.
428 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008). Fast sampling form a Gaussian Markov random
field using Krylov subspace approaches. QUT Eprints 14376 (Brisbane), 1–17. Available online:
http://eprints.qut.edu.au.
Sinclair, M. (2005). Strategic review of waterborne viruses. Technical report, CRC for Water Quality
and Treatment.
Singh, M., R. S. Malhotra, S. Ceccarelli, A. Sarker, S. Grando, and W. Erskine (2003). Spatial variability
models to improve dryland field trials. Experimental Agriculture 39(02), 151–160.
Sinton, L., C. Hall, and R. Braithwaite (2007). Sunlight inactivation of Campylobacter jejuni and
Salmonella enterica, compared with Escherichia coli, in seawater and river water. Journal of Wa-
ter and Health 5(3), 357–365.
Sleutel, S., J. Vandenbruwane, A. De Schrijver, K. Wuyts, B. Moeskops, K. Verheyen, and S. De Neve
(2009). Patterns of dissolved organic carbon and nitrogen fluxes in deciduous and coniferous forests
under historic high nitrogen deposition. Biogeosciences 6(12), 2743–2758.
Smeets, P. W. M. H., Y. J. Dullemont, P. H. A. J. M. V. Gelder, J. C. V. Dijk, and G. J. Medema (2008).
Improved methods for modelling drinking water treatment in quantitative microbial risk assessment; a
case study of Campylobacter reduction by filtration and ozonation. Journal of Water and Health 6(3),
301–314.
Smeets, P. W. M. H., G. J. Medema, Y. J. Dullemont, P. H. A. J. M. V. Gelder, and J. C. V. Dijk. (2008).
Case study of Campylobacter reduction by filtration and ozonation. Journal of Water and Health 6,
301–314.
Smeets, P. W. M. H., G. J. Medema, G. Stanfield, J. C. v. Dijk, and L. C. Rietveld (2007). How can the UK
statutory Cryptosporidium monitoring be used for quantitative risk assessment of Cryptosporidium in
drinking water? Journal of Water and Health 5(1 (Suppl)), 107–118.
Snow, J. (1849). On the mode of communication of cholera. London: John Churchill.
Snow, J. (1855). On the mode of communication of cholera (2nd Edition ed.). London: John Churchill.
FULL REFERENCE LIST 429
Song, H.-R., A. Lawson, R. B. D’Agostino Jr, and A. D. Liese (2011). Modeling type 1 and type 2
diabetes mellitus incidence in youth: An application of Bayesian hierarchical regression for sparse
small area data. Spatial and Spatio-temporal Epidemiology 2(1), 23–33.
Spiegelhalter, D. (1989). A unified approach to imprecision and sensitivity of beliefs in expert systems. In
L.N.Kanal (Ed.), Uncertainty in Artificial Intelligence 3. North Holland: Elsevier Science Publishers
B.V.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),
583–639.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64(4),
583–639.
Spiegelhalter, D. J., A. P. Dawid, S. L. Lauritzen, and R. G. Cowell (1993). Bayesian analysis in expert
systems. Statistical Science 8(3), 219–247.
Spiegelhalter, D. J., N. L. Harris, K. Bull, and R. C. G. Franklin (1994). Empirical-evaluation of prior
beliefs about frequencies - methodology and a case-study in congenital heart-disease. Journal of the
American Statistical Association 89(426), 435–443.
Steck, H. (2001). Constrained-Based Structural Learning in Bayesian Networks Using Finite Data Sets.
Ph. D. thesis, Institut fur der Informatik der Technischen Universitat.
Stefanova, K. T., A. B. Smith, and B. R. Cullis (2009). Enhanced diagnostics for the spatial analysis of
field trials. Journal of Agricultural Biological and Environmental Statistics 14(4), 392–410.
Strahm, B. D., R. B. Harrison, T. A. Terry, T. B. Harrington, A. B. Adams, and P. W. Footen (2009).
Changes in dissolved organic matter with depth suggest the potential for postharvest organic matter
retention to increase subsurface soil carbon pools. Forest Ecology and Management 258(10), 2347–
2352.
Strickland, C. (2010). pyMCMC: a statistical package for Bayesian MCMC analysis. Journal of Com-
putational and Graphical Statistics, 1–46. submitted August, 2010.
430 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Strickland, C. M., D. P. Simpson, I. W. Turner, R. Denham, and K. L. Mengersen (2011). Fast Bayesian
analysis of spatial dynamic factor models for multitemporal remotely sensed imagery. Journal of the
Royal Statistical Society: Series C (Applied Statistics) 60(1), 109–124.
Tanaka, H., T. Asano, E. D. Schroeder, and G. Tchobanoglous (1998). Estimating the safety of wastew-
ater reclamation and reuse using enteric virus monitoring data. Water Environment Research 70(1),
39–51.
Tawk, H. M., K. Vickery, L. Bisset, W. Selby, and Y. E. Cossart (2006). The impact of hepatitis B
vaccination in a western country: recall of vaccination and serological status in Australian adults.
Vaccine 24(8), 1095–1106.
Teschke, K., Y. Chow, K. Bartlett, A. Ross, and C. van Netten (2001). Spatial and temporal distribution of
airborne Bacillus thuringiensis var. kurstaki during an aerial spray program for gypsy moth eradication.
Environmental Health Perspectives 109(1), 47–54.
Teunis, P. F. M., G. J. Medema, L. Kruidenier, and A. H. Havelaar (1997). Assessment of the risk
of infection by Cryptosporidium or Giardia in drinking water from a surface water source. Water
Research 31, 1333–1346.
Teunis, P. F. M., O. van der Heijden, J. W. B. van der Giessen, and A. H. Havelaar (1996). The dose-
response relation in human volunteers for gastro-intestinal pathogens. Technical report, National In-
stitute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands.
Toze, S. (1999). PCR and the detection of microbial pathogens in water and wastewater. Water Re-
search 33(17), 3545–3556.
Toze, S. (2002). Review of the risk of groundwater contamination from microbial pathogen due to
the infiltration of treated effluent to groundwater at the Bridgetown wastewater treatment plant. A
consultancy report to the Water Corporation, WA. Technical report, CSIRO.
Toze, S. (2004). Literature Review on the Fate of Viruses and Other Pathogens and Health Risks in
Non-Potable Reuse of Storm Water and Reclaimed Water. CSIRO. Accessed: February 1, 2011.
Toze, S., J. Hanna, and J. Sidhu (2005). Microbial monitoring of the McGillivray Oval direct reuse
scheme Report to the Water Corporation WA. Technical report, CSIRO.
FULL REFERENCE LIST 431
Toze, S., J. Hanna, A. Smith, and W. Hick (2002). Halls Head indirect treated wastewater reuse scheme.
Technical report, CSIRO.
Toze, S., J. Hanna, T. Smith, L. Edmonds, and A. McCrow (2004). Determination of water quality
improvements due to the artificial recharge of treated effluent. In J. Steenworden and T. Endreny
(Eds.), IAHS Publications-Series of Proceedings and Reports: Wastewater reuse and groundwater
quality, Volume 285, pp. 53–60. Wallingford [Oxfordshire]: IAHS, 1981-.
Trought, M. C. T. and R. G. V. Bramley (2011). Vineyard variability in Marlborough, New Zealand:
characterising spatial and temporal changes in fruit composition and juice quality in the vineyard.
Australian Journal of Grape and Wine Research 17(1), 79–89.
Van Allen, T., R. Greiner, and P. Hooper (2001). Bayesian error-bars for Belief Net inference. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle.
Citeseer.
Van Allen, T., A. Singh, R. Greiner, and P. Hooper (2008). Quantifying the uncertainty of a Belief Net
response: Bayesian error-bars for Belief Net inference. Artificial Intelligence 172, 483–513.
Varis, O. (1995). Belief networks for modelling and assessment of environmental change. Environ-
metrics 6, 439–444.
Varis, O. (1997). Bayesian decision analysis for environmental and resource management. Environmental
Modelling and Software 12(2-3), 177–185.
Varis, O. (1998). A belief network approach to optimization and parameter estimation: application to
resource and environmental management. Artificial Intelligence 101(1-2), 135–163.
Verbyla, A., B. Cullis, M. Kenward, and S. Welham (1999). The analysis of designed experiments
and longitudinal data by using smoothing splines. Journal of the Royal Statistical Society: Series C
(Applied Statistics) 48(3), 269–311.
VSN International (2011). Genstat. Available online: http://www.vsni.co.uk/software/genstat/.
Wakefield, J., N. Best, and L. Waller (2000). Bayesian approaches to disease mapping. In P. Elliott,
J. Wakefield, N. Best, and D. Briggs (Eds.), Spatial Epidemiology: Methods and Applications, pp.
104–127. Oxford: Oxford University Press.
432 APPENDIX C. SUPPLEMENTARY GRAPHS AND TABLES FOR CHAPTER 7
Waller, L. A., B. P. Carlin, H. Xia, and A. E. Gelfand (1997). Hierarchical spatio-temporal mapping of
disease rates. Journal of the American Statistical Association 92(438), 607–617.
Wand, M. P. (2009). Semiparametric and graphical models. Australian and New Zealand Journal of
Statistics 51(1), 9–41.
Wang, L. A. and Z. Goonewardene (2004). The use of mixed models in the analysis of animal experiments
with repeated measures data. Canadian Journal of Animal Science 84(1), 1–11.
Ward, R., D. Bernstein, E. Young, J. Sherwood, D. Knowlton, and G. Schiff (1986). Human Rotavirus
studies in volunteers: determination of infectious dose and serological response to infection. Journal
of Infectious Diseases 154(5), 871–880.
Water Corporation (2010). Subiaco Wastewater Treatment Plant Annual Report 2009-10. Technical
Report PM-3851463, Water Corporation, Perth, Western Australia.
Water Corporation (2011a). McGillivray Oval Irrigation Project. Available online:
http://www.watercorporation.com.au/M/mcgillivray_oval.cfm. Accessed: February
2, 2011.
Water Corporation (2011b). Subiaco treatment plant. Available online:
http://www.watercorporation.com.au/W/wwtp_subiaco.cfm. Accessed: February 2,
2011.
Water Environment Research Foundation, A. Olivieri, and C. Summers (2007). Assessing risk of
pathogens in separate stormwater systems. Available online: http://www.werf.org/am/. Accessed:
February 17, 2011.
Weidl, G., A. Madsen, and E. Dahlquist (2003). Object oriented Bayesian network for industrial process
operation.
Weidl, G., A. L. Madsen, and S. S. Israelson (2005). Applications of object-oriented Bayesian networks
for condition monitoring, root cause analysis and decision support on operation of complex continuous
processes. Computers and Chemical Engineering 29, 1996–2009.
Wermuth, N. and D. R. Cox (1998). On association models defined over independence graphs.
Bernouilli 4(4), 477–495.
FULL REFERENCE LIST 433
West, M. and J. Harrison (1997). Bayesian forecasting and dynamic models (2nd ed.). Springer series in
statistics. New York: Springer.
Westrell, T., O. Bergstedt, T. Stenstrom, and N. Ashbolt (2003). A theoretical approach to assess micro-
bial risks due to failures in drinking water systems. International Journal of Environmental Health
Research 13, 181–197.
Whelan, B. M., A. B. McBratney, and B. Minasny (2001). Vesper-spatial prediction software for preci-
sion agriculture. In Third European Conference on Precision Agriculture. (G. Grenier, S. Blackmore
Eds.) pp. 139-144. Agro Montpellier, Ecole Nationale Agronomique de Montpellier., pp. 18–20. Cite-
seer.
Whiting, R. C. and R. L. Buchanan (1997). Development of a quantitative risk assessment model for
Salmonella enteritidis in pasteurized liquid eggs. International Journal of Food Microbiology 36,
111–125.
Whittaker, J. (1990). Graphical Models in Multivariate Statistics. Chichester (England); New York:
Wiley.
Williams, E. R. (1986). A neighbour model for field experiments. Biometrika 73(2), 279–287.
Wong, V. N. L., B. W. Murphy, T. B. Koen, R. S. B. Greene, and R. C. Dalal (2008). Soil organic carbon
stocks in saline and sodic landscapes. Australian Journal of Soil Research 46(4), 378–389.
Woo, D. M. and K. J. Vicente (2003). Sociotechnical systems, risk management, and public health: com-
paring the North Battleford and Walkerton outbreaks. Reliability Engineering & System Safety 80(3),
253–269.
World Health Organization (2008). ICD-10 Classification of Diseases. Available online:
www.cdc.gov/nchs/data/dvs/2008Vol1.pdf. Accessed: April 10, 2008.
Yan, P. and M. K. Clayton (2006). A cluster model for space-time disease counts. Statistics in
Medicine 25(5), 867–881.