Post on 21-Dec-2015
Simplification of Mechanistic ModelsSimplification of Mechanistic Models
Neil CroutNeil CroutSchool of BioscienceSchool of Bioscience
University of NottinghamUniversity of Nottingham
Context
• Interests in the prediction of contaminant transfer in the environment
• Originally followed rather ‘mechanistic’ approaches
• Drifted gradually to increasingly empirical models
• ‘The proposers seem to believe that they will improve predictions if they remove all understanding of processes...’
- anonymous grant reviewer
Mechanistic Models - Model Complexity
• Environmental systems are complex– Many interacting processes– Data is often limited
• Even detailed process based models are simplifications of the real systems
• Judgements are being made about the appropriate level of detail in models
• Often this is done in a rather ad hoc fashion
• Not much use of model selection methods etc
Model Selection/Model Averaging etc
• Methods exist for choosing the most predictively reliable model
• Or averaging over a family of models
• Why aren’t they applied much?
• They require a family of alternative models for a system
• These are not easy to create for mechanistic models (unlike linear models)
• Can we find ways to automatically simplify mechanistic models?
Example: Radiocaesium Contamination
Radiocaesium Plant Uptake Model
pH
mcamg
Ex-K
%clay
%OM
mNH4
Kdclay
mK
Kdhumus
Kdl
TF
CEChumus
RIPclay
Kexhumus
CECclay
‘Automatic Simplification’
• Starting from this ‘mechanistic’ model create ‘simpler’ models and compare performance
• Recipe:• Identify model variables to be removed• Replace them (one at a time) with the mean value
they attain over an un-simplified run• The variable whose replacement gives the best
performance is then permanently replaced• (Refit any adjustable parameters)• Repeat with the remaining variables until they are
all replaced by constants
Example Results for MDL
0
10
20
30
40
50
60
70
Full M
odel
RIP_c
lay pH
m_c
amg
kd_h
umus
CEC_hum
us
CEC_clay
Kx_hu
mus
thet
a_hu
mus
Kx_so
il
mNH4
thet
a_cla
y
Replaced Model Variable
MD
L
Simplification sequences for various selection criteria
red = plausible models (subjectively)
bold = lowest criteria value
MDL ICOMP BIC AIC RSSRIP_clay RIP_clay RIP_clay pH pHpH pH pH m_camg m_camgm_camg kd_humus kd_humus CEC_humus CEC_humuskd_humus m_camg m_camg kd_humus kd_humusCEC_humus CEC_humus CEC_humus RIP_clay CEC_clayCEC_clay CEC_clay CEC_clay CEC_clay RIP_clayKx_humus Kx_soil Kx_soil Kx_soil Kx_soiltheta_humus mNH4 mNH4 mNH4 mNH4Kx_soil theta_clay theta_clay theta_clay theta_claymNH4 kd_clay kd_clay kd_clay kd_claytheta_clay Kdl Kdl Kdl Kdlkd_clay theta_humus Kx_humus Kx_humus theta_humusKdl Kx_humus theta_humus theta_humus Kx_humus
1:1 Comparison for Model 0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
Predicted
Ob
se
rve
d
1:1 Comparison for Model 0 and Model E
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
Predicted
Ob
se
rve
d
Model 0Model E
1:1 Graph for Transfer Factor (Model 0)
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Predicted log(TF)
Ob
se
rve
d l
og
(TF
)
Observed
1:1 Graph for Transfer Factor (Model 0 & E)
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Predicted log(TF)
Ob
se
rve
d l
og
(TF
)
Model 0
Model E
Spatial Application: TF over England & Wales
• Principal application is for spatial prediction of uptake to crops
• This example using the geochemical atlas of E&W
• Data from c. 6000 soil samples (5x5km resolution)
Distribution of TF across England & Wales
0
500
1000
1500
2000
2500
3000
3500
4000
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
log(TF)
Fre
qu
en
cy
(E
ng
lan
d &
Wa
les
)
Model0 ModelE
Summary
• Want to start with some mechanistic credibility
• Recognise the risk of over-fitting
• Automatically work back to simpler models iteratively (quick, easy) and compare their performance
• Obviously could be made more sophisticated
• Is it ‘just an abuse of statistical measures’?- paraphrasing an anonymous grant
reviewer