Why Model? Make predictions or forecasts where we don’t have data.

30
Why Model? Make predictions or forecasts where we don’t have data

Transcript of Why Model? Make predictions or forecasts where we don’t have data.

Page 1: Why Model? Make predictions or forecasts where we don’t have data.

Why Model?

• Make predictions or forecasts where we don’t have data

Page 2: Why Model? Make predictions or forecasts where we don’t have data.

Linear Regression

wikipedia

Page 3: Why Model? Make predictions or forecasts where we don’t have data.

Modeling Process

Observe

Define Theory/Type of Model

DesignExperiment

Collect Data

SelectModel

Evaluate the Model

Qualify Data

EstimateParameters

Publish Results

Page 4: Why Model? Make predictions or forecasts where we don’t have data.

Bouncing Balls• Observation: balls bounce more when

dropped from higher height• Theory: there is a linear relationship

between the height of a drop and the number of bounces

people.rit.edu

Page 5: Why Model? Make predictions or forecasts where we don’t have data.

Bounding Balls (con’t)

• Experimental Design?• Collect Data?• Qualify Data?• Select Model:

– Start with linear regression

Page 6: Why Model? Make predictions or forecasts where we don’t have data.

Parameter Estimation

• Excel spreadsheet• X, Y columns• Add “trend line”

Page 7: Why Model? Make predictions or forecasts where we don’t have data.

DefinitionsHorizontal axis: Used to create prediction– Independent variable– Predictor variable– Covariate– Explanatory variable– Control variable– Typically a raster– Examples:

• Temperature, aspect, SST, precipitation

Vertical axis: What we are trying to predict

– Dependent variable– Response variable– Measured value– Explained– Outcome– Typically an attribute

of points– Examples:

• Height, abundance, percent, diversity, …

Page 8: Why Model? Make predictions or forecasts where we don’t have data.

Linear Regression: Assumptions• Predictors are error free• Linearity of response to predictors• Constant variance within and for all

predictors (homoscedasticity)• Independence of errors• Lack of multi-colinearity• Also:

– All points are equally important– Residuals are normally distributed (or close).

Page 9: Why Model? Make predictions or forecasts where we don’t have data.

Linear Regression 

 

Page 10: Why Model? Make predictions or forecasts where we don’t have data.

Normal Distribution

 

 

To positive infinity

To negativeinfinity

Page 11: Why Model? Make predictions or forecasts where we don’t have data.

Linear Data Fitted w/Linear Model

Should be a diagonal line for normally distributed data

Page 12: Why Model? Make predictions or forecasts where we don’t have data.

Non-Linear Data Fitted with a Linear Model

This shows the residuals are not normally distributed

Page 13: Why Model? Make predictions or forecasts where we don’t have data.

Homoscedasticity

• Residuals have the same normal distribution throughout the range of the data

Page 14: Why Model? Make predictions or forecasts where we don’t have data.

Ordinary Least Squares•  

Page 15: Why Model? Make predictions or forecasts where we don’t have data.

Linear Regression

•  

 

 

Residual 

Page 16: Why Model? Make predictions or forecasts where we don’t have data.

Parameter Estimation

•  

 

 

 

Page 17: Why Model? Make predictions or forecasts where we don’t have data.

Evaluate the Model

•  

Page 18: Why Model? Make predictions or forecasts where we don’t have data.

Evaluation

• Find the highest performing model in Excel for the golf ball data

• https://www.youtube.com/watch?v=fss3i1XMMIY

Page 19: Why Model? Make predictions or forecasts where we don’t have data.

“Goodness of fit”

•  

Page 20: Why Model? Make predictions or forecasts where we don’t have data.

 

y = 0.0024x + 0.4347R² = 0.0051

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35

Page 21: Why Model? Make predictions or forecasts where we don’t have data.

 

y = 1.0029x + 0.4188R² = 0.999

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

Page 23: Why Model? Make predictions or forecasts where we don’t have data.

Two Approaches

• Hypothesis Testing– Is a hypothesis supported or not?– What is the chance that what we are seeing

is random?• Which is the best model?

– Assumes the hypothesis is true (implied)– Model may or may not support the

hypothesis• Data mining

– Discouraged in spatial modeling– Can lead to erroneous conclusions

Page 24: Why Model? Make predictions or forecasts where we don’t have data.

Significance (p-value)

• H0 – Null hypothesis (flat line)• Hypothesis – regression line not flat• The smaller the p-value, the more

evidence we have against H0 – Our hypothesis is probably true

• It is also a measure of how likely we are to get a certain sample result or a result “more extreme,” assuming H0 is true

• The chance the relationship is random

http://www.childrensmercy.org/stats/definitions/pvalue.htm

Page 25: Why Model? Make predictions or forecasts where we don’t have data.

Confidence Intervals

• 95 percent of the time, values will fall within a 95% confidence interval

• Methods:– Moments (mean, variance)– Likelihood– Significance tests (p-values)– Bootstrapping

Page 26: Why Model? Make predictions or forecasts where we don’t have data.

Model Evaluation

• Parameter sensitivity• Ground truthing• Uncertainty in data AND predictors

– Spatial– Temporal– Attributes/Measurements

• Alternative models• Alternative parameters

Page 28: Why Model? Make predictions or forecasts where we don’t have data.

Robust models• Domain/scope is well defined• Data is well understood• Uncertainty is documented• Model can be tied to phenomenon• Model validated against other data• Sensitivity testing completed• Conclusions are within the domain/scope

or are “possibilities”• See:https

://www.youtube.com/watch?v=HuyMQ-S9jGs

Page 29: Why Model? Make predictions or forecasts where we don’t have data.

Modeling Process II

Investigate

Find Data

SelectModel

Evaluate the Model

Qualify Data

EstimateParameters

Publish Results

Page 30: Why Model? Make predictions or forecasts where we don’t have data.

Research Papers• Introduction

– Background– Goal

• Methods– Area of interest– Data “sources”– Modeling approaches– Evaluation methods

• Results– Figures– Tables– Summary results

• Discussion– What did you find?– Broader impacts– Related results

• Conclusion– Next steps

• Acknowledgements– Who helped?

• References– Include long URLs