Stacked generalization of statistical learners – a case study with soil iron content in Brazil

14
Stacked generalization of statistical learners – a case study with soil iron content in Brazil Pedometrics 2017, Wageningen, NL Thursday 29 June Parallel session on Machine learning for soil mapping (5H) Chaired by Laura Poggio * [email protected] A. (Alessandro) Samuel-Rosa * & R. S. D. Dalmolin

Transcript of Stacked generalization of statistical learners – a case study with soil iron content in Brazil

Stacked generalization of statisticallearners – a case study with soil iron content in Brazil

Pedometrics 2017, Wageningen, NLThursday 29 JuneParallel session on Machine learning for soil mapping (5H)Chaired by Laura Poggio

* [email protected]

A. (Alessandro) Samuel-Rosa* & R. S. D. Dalmolin

25 years ago

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

25 years ago

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

NowadaysModel-based Gaussian and robust geostatistics (georob)

Andreas Papritz (May 9, 2017)

Nowadays

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Legacy soil dataSuboptimal geographic/feature coverageExtrapolation/reference area method

25 years ago

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Stacked generalization (regression)

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Goals1) Combine statistical learners, and2) Improve generalization accuracy

Learning error Generalization error

?

Stacked generalization (regression)

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Strategy1) Metamodel with cross-validation predictions as

covariates

2) Constrained metamodel coefficients to drop redundant covariates/models

Case study

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

● Brazilian soil database

● 5770 soil profiles (22,981 records)

● 70% learning / 30% evaluation

● iron ~ depth + taxon + colour + parent + carbon + clay + ph

1. Linear regression with stepwise selection

2. Multivariate adaptive regression splines

3. Regression random forest4. Single-hidden-layer neural

network5. Weighted k-nearest

neighbor regression6. Support vector machine

with polynomial kernel10-fold cross-validation

Metamodel

(Meta)Model evaluation

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Learner ME MSE MAE RMSE AVE

rf 0.56 2369.58 27.82 48.68 0.55 0.5378

svm -5.24 2600.78 28.69 51.00 0.51 0.2773

kknn -0.19 2427.48 28.75 49.27 0.54 0.0796

mars -0.19 2577.03 29.84 50.76 0.51 0.0752

nnet 0.17 2721.52 31.76 52.17 0.49 0.0403

lm -0.31 2875.50 32.70 53.62 0.46 0.0000

Metamodel -0.53 2349.93 27.63 48.48 0.56 1.0101

Evaluation

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Evaluation

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

Conclusions

Machine learning for soil mapping (5H)Stacked generalization of statistical learners – a case study with soil iron in Brazil

● Stack of learners is superior– Always?

● Easy to compute prediction error variance– Standard regression/classification formulas

● Environmental interpretation– Danger zone!?

● Cannot make miracles– Data quality/quantity, diversity of learners

Stacked generalization of statisticallearners – a case study with soil iron content in Brazil

A. (Alessandro) Samuel-Rosa* & R. S. D. Dalmolin

* [email protected]

This project was developed under the auspices of the Postgraduate Program in Soil Science of the Federal University of Santa Maria as part of the National Postdoctoral Program (PNPD) of the Coordination for Advancement for High Level Personnel (CAPES) of the Ministry of Education of Brazil.