Testing Predictive Performance of Ecological Niche Models
A. Townsend Peterson, STOLEN FROMRichard Pearson
Niche Model Validation• Diverse challenges …
– Not a single loss function or optimality criterion– Different uses demand different criteria– In particular, relative weights applied to omission and
commission errors in evaluating models
• Nakamura: “which way is relevant to adopt is not a mathematical question, but rather a question for the user”– Asymmetric loss functions
Where do I get testing data????
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: resubstitution
100%
Same region
Different region
Different time
Different resolutionEvaluation
Calibration
Projection
All available
data
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: independent validation
100%All
available data
Same region
Different region
Different time
Different resolutionEvaluation
Calibration
Projection
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: data splitting
70%
Test data
Same region
Different region
Different time
Different resolution
Evaluation
Calibration
Projection
Calibration data
30%
Types of Error
The four types of results that are possible when testing a distribution model
(see Pearson NCEP module 2007)
Presence-absence confusion matrix
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Thresholding
Selecting a decision threshold (p/a data)
(Liu et al. 2005 Ecography 29:385-393)
Selecting a decision threshold (p/a data)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
Threshold
Kapp
a
Selecting a decision threshold (p/a data)
Omission(proportion of presences predicted absent)
(c/a+c)
Commission(proportion of absences predicted present)
(b/b+d)
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
threshold
omis
sion
rate
LPTT10
Selecting a decision threshold (p-o data)
Threshold-dependent Tests(= loss functions)
The four types of results that are possible when testing a distribution model
(see Pearson NCEP module 2007)
Presence-absence test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion (%) correctly predicted (or ‘accuracy’, or ‘correct classification rate’):
(a + d)/(a + b + c + d)
Cohen’s Kappa:
)]/)))(())(((([)]/)))(())(((()[(
ndcdbbacanndcdbbacadak
Presence-absence test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):
a/(a + c)
Presence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):
a/(a + c)
Proportion of observed presences incorrectly predicted (or ‘omission rate’, or ‘false negative fraction’):
c/(a + c)
Presence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Presence-only test statistics:testing for statistical significance
U. sikorae
Leaf-tailed gecko (Uroplatus)
U. sikorae
Success rate: 4 from 7Proportion predicted present: 0.231Binomial p = 0.0546
Success rate: 6 from 7Proportion predicted present: 0.339Binomial p = 0.008
Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):
d/(b + d)
Absence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):
d/(b + d)
Proportion of observed (or assumed) absences incorrectly predicted (or ‘commission rate’, or ‘false positive fraction’):
b/(b + d)
Absence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
AUC: a threshold-independent test statistic
Predicted presentPredicted absent
Recorded present Recorded (or assumed) absent
a (true positive)c (false negative)
b (false positive)d (true negative)
sensitivity = a/(a+c)
specificity = d/(b+d)
(1 – omission rate)
(fraction of absences predicted present)
1 - specificity0 1
0
1
sens
itivi
ty Predicted probability of occurrence
Predicted probability of occurrence
10
10Fr
eque
ncy
Freq
uenc
y
set of ‘absences’ set of ‘presences’
set of ‘absences’ set of ‘presences’
Threshold-independent assessment:The Receiver Operating Characteristic (ROC) Curve
A B
C
(check out: http://www.anaesthetist.com/mnm/stats/roc/Findex.htm)
Top Related