A Method for the More Accurate Measurement and Communication of Model Error

17
A Method for the More Accurate Measurement and Communication of Model Error Scott Fortmann-Roe University of California, Berkeley

description

A Method for the More Accurate Measurement and Communication of Model Error. Scott Fortmann-Roe University of California, Berkeley. Predictions. 1) More accurate assessment of prediction error. 2 ) More accurate models. Inferences. 3) More accurate measures of significance. - PowerPoint PPT Presentation

Transcript of A Method for the More Accurate Measurement and Communication of Model Error

Page 1: A Method for the More Accurate Measurement and Communication of Model Error

A Method for the More Accurate Measurement and Communication of Model Error

Scott Fortmann-RoeUniversity of California, Berkeley

Page 2: A Method for the More Accurate Measurement and Communication of Model Error

1) More accurate assessment of prediction error

Predictions

Inferences

2) More accurate models

3) More accurate measures of significance

4) Altered inferences and conclusions

Page 3: A Method for the More Accurate Measurement and Communication of Model Error

Measure R2, p-

value, AIC

Accuracy

Accessibility

Adaptability

Page 4: A Method for the More Accurate Measurement and Communication of Model Error

The Method: A3

Page 5: A Method for the More Accurate Measurement and Communication of Model Error

Applications

Page 6: A Method for the More Accurate Measurement and Communication of Model Error

Housing Market

Predicting housing price based on house and market attributes

Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management 5: 81–102.

Page 7: A Method for the More Accurate Measurement and Communication of Model Error

  Coefficient Std. Error t-Value p-Value

(Intercept) 7.767 4.989 1.557 0.12

AGE -0.015 0.014 -1.096 0.27

ROOMS 7.006 0.412 17.015 < 0.01

NOX -13.314 3.903 -3.412 < 0.01

PUPIL/TEACHER -1.116 0.148 -7.544 < 0.01

HIGHWAY -0.025 0.043 -0.584 0.56

Adjusted R2: 0.60; p-Value < 0.01

Page 8: A Method for the More Accurate Measurement and Communication of Model Error

  Coefficient CrVa R2 p-Value

-Full Model- 59.3 % < 0.01

(Intercept) 7.767 - 0.1 % 0.39AGE -0.015 + 0.0 % 0.22ROOMS 7.006 + 22.9 % < 0.01NOX -13.314 + 0.8 % < 0.01PUPIL/TEACHER -1.116 + 4.6 % < 0.01

HIGHWAY -0.025 - 0.2 % 1.00

A3: Linear Model

Page 9: A Method for the More Accurate Measurement and Communication of Model Error

CrVa R2 p-Value

-Full Model- 74.3 % < 0.01AGE - 1.5 % 0.01

ROOMS + 20.4 % < 0.01

NOX + 6.3 % < 0.01

PUPIL/TEACHER - 1.4 % < 0.01

HIGHWAY - 2.6 % 0.03

A3: Random Forest Model

Page 10: A Method for the More Accurate Measurement and Communication of Model Error

Linear Regression

Random Forest

Support Vector 

Machines

CrVa R2 0.593 0.743 0.711

Significant at p = 0.05

• ROOMS• NOX• PUPIL/

TEACHER

• AGE• ROOMS• NOX• PUPIL/

TEACHER• HIGHWAY

• AGE• ROOMS• NOX• PUPIL/

TEACHER

Not Significant at 

p = 0.05

• AGE• HIGHWAY

• HIGHWAY

Page 11: A Method for the More Accurate Measurement and Communication of Model Error

Environmental Productivity

Measure utility of an ecosystem based on different physical attributes

Maestre FT, Quero JL, Gotelli NJ, Escudero A, Ochoa V, et al. (2012) Plant Species Richness and Ecosystem Multifunctionality in Global Drylands. Science 335: 214–218.

Page 12: A Method for the More Accurate Measurement and Communication of Model Error

  Coefficient Std. Error t-Value p-Value

(Intercept) 1.0080 0.175 5.772 < 0.01

SR 0.0099 0.004 2.351 0.02

SLO 0.0176 0.006 3.139 < 0.01

SAC -0.0174 0.002 -8.523 < 0.01

C1 -0.0209 0.039 -0.537 0.59

C2 -0.0677 0.053 -1.285 0.20

C3 0.0348 0.036 0.979 0.33

C4 -0.2663 0.038 -7.005 < 0.01

LAT 0.0024 0.001 1.797 0.07

LONG -0.0019 0.001 -3.474 < 0.01

ELE -0.0002 0.000 -3.887 < 0.01

Adjusted R2=0.56; p-Value < 0.01

Page 13: A Method for the More Accurate Measurement and Communication of Model Error

  Coefficient CrVa R2 p-Value-Full Model- 52.5 % < 0.01(Intercept) 1.008 + 7.2 % < 0.01SR 0.010 + 0.8 % 0.01SLO 0.018 + 1.7 % 0.01SAC -0.017 + 16.3 % < 0.01C1 -0.021 - 0.5 % 0.91C2 -0.068 + 0.0 % 0.15C3 0.035 - 0.2 % 0.28C4 -0.266 + 10.8 % < 0.01LAT 0.002 + 0.2 % 0.09LONG -0.002 + 2.4 % < 0.01ELE 0.000 + 3.0 % < 0.01

A3: Linear Model

Page 14: A Method for the More Accurate Measurement and Communication of Model Error

  CrVa R2 p-Value

-Full Model- 68.3 % < 0.01SR + 1.2 % < 0.01SLO - 1.3 % 0.95SAC + 4.0 % < 0.01C1 + 1.8 % < 0.01C2 - 0.04 % 0.02C3 + 0.3 % 0.16C4 + 0.6 % < 0.01LAT + 0.5 % < 0.01LONG + 0.2 % 0.02ELE + 0.4 % 0.02

A3: Random Forest Model

Page 15: A Method for the More Accurate Measurement and Communication of Model Error

SR SLO SAC C1 C2 C3 C4 ELE0

0.2

0.4

0.6

0.8

1

Maestre et al Relative Importance (Fig 2A)Relative Importance using Random Forests

Relative Im

portance Predict-

ing Productivity

Page 16: A Method for the More Accurate Measurement and Communication of Model Error

Applications Recap

Explained an additional 15-16% of the squared error

Significantly altered inferences and conclusions about the underlying systems

Page 17: A Method for the More Accurate Measurement and Communication of Model Error

Questions….