Plans to improve estimators to better utilize panel data

Plans to improve estimators to better utilize panel data

John CoulstonSouthern Research Station

Forest Inventory and Analysis

Background and Motivation

• Symposium session on combining panel data: Recommendation– “…any serious attempt at defining an

estimation system for analysis of changes and trends over time must explicitly account for time in the assumed underlying model…adopt and encourage an inferential model for FIA that places time on an equal footing with area…”

• Putting the “A” back in FIA – Clutter 2006.

Examples and approach

• Forest area change in Georgia from 1998-2007

• Spatial realization of forest age structure in Alabama in 2007

• Use an appropriate technique for the question posed

Why reinvent the wheel?

• Some analytical alternatives to Bechtold and Patterson 2005 for the annual forest inventory– Mixed estimator (Van Deusen 1999, 2002)

• Current estimates – flexible underlying trend

– Mixed model (Smith & Conkling 2005)• Current estimates and significance of annual change – linear

trend

– Random Forest ( Breiman 2001, Crookston & Finley 2008)

• Machine learning approach to classification and regression. Implemented in temporal map based estimation.

Is there a trend in Georgia forest area

from 1998-2007#

# ## #

#

## ###

##

# #

#

#Southwest

Southeast

CentralNorth Central

North

Mixed Estimator

ttt ey

equation)n transitioaby (described over timet coefficien randomerror randomt independen

tat time splot value ofmean

t

t

t

ey

Mixed Model

Stratified Estimate

h stratumin valuelevel-plotmean yexample)for NLCD n,informatio

sensedremotely with defined (typicallyh stratum of weight valuelevel-plotmean

h

1

h

h

h hh

wy

ywy

Is there a trend in Georgia forest area

from 1998-2007#

# ## #

#

## ###

##

# #

#

#Southwest

Southeast

CentralNorth Central

North

year

Pro

porti

on F

ores

t

0.50

0.55

0.60

0.65

0.70

0.75

1998 2000 2002 2004 2006

Central North

1998 2000 2002 2004 2006

North Central

South East

1998 2000 2002 2004 2006

0.50

0.55

0.60

0.65

0.70

0.75

South West

mixed estimationmixed modelsimple random samplestratified estimation

Example: forest area trends in GA 1998-2007

Example: forest area trends in GA 1998-2007

year

S.E

.(Pro

porti

on F

ores

t)

0.010

0.015

0.020

0.025

1998 2000 2002 2004 2006

Central North

1998 2000 2002 2004 2006

North Central

South East

1998 2000 2002 2004 2006

0.010

0.015

0.020

0.025

South West


Typical “sampling error” approach

0.51

0.515

0.52

0.525

0.53

0.535

0.54

0.545

0.55

2000 2001 2002 2003 2004 2005

year

Prop

ortio

n Fo

rest

Hypothesis:H0: Δpf=0H1: Δpf≠0

Approach:Sampling errors overlap so no significant change.

Issues:Type II errors;Failure to leverage repeated measures

Explicitly testing for change

• If trend is “sufficiently linear” then the mixed model can be used to test

• HO: b1 = 0• H1: b1 ≠ 0

Unit b1 t-value Prob > tSoutheast -0.05% -0.730 0.466

Southwest 0.13% 1.094 0.274Central 0.00% -0.004 0.997

North Central -0.31% -2.489 0.013North -0.21% -2.401 0.017

year

Pro

porti

on F

ores

t

0.50

0.55

0.60

0.65

0.70

0.75

1998 2000 2002 2004 2006

North

1998 2000 2002 2004 2006

North Central


•Recall the mixed model: b1 is the slope (change in y over time).

Example 2: Spatial realization of forest age structure in Alabama in 2007

• Using a time-series on Landsat images identify the disturbance year and magnitude for each pixel.

• Calibrate the disturbance year and magnitude information to FIA age class information based on:

Cjz=f(Xz,Yz,Mz(j-d),(j-d)z,Fjz)Where cjz is the age class for location z at time j.Xz=longitude of location zYz=latitude of location zMz(j-d)=magnitude of last disturbance in year j-d at location z. (j-d)z=the number of years since the last disturbance at

location z.Fjz=land cover in year j at location z.

Random Forest AlgorithmLearning algorithm

Each tree is constructed using the following algorithm:

1. Let the number of training cases be N, and the number of variables in the classifier be M.

2. We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M.

3. Choose a training set for this tree by choosing N times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes.

4. For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set.

5. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).

Accuracy of Age Class Map

0-8 9-16 17-24 25+ non-forest0-8 233 46 10 35 37

FIA data 9-16 51 323 42 101 4717-24 22 34 192 117 21

25+ 38 30 45 1265 16non-forest 29 29 17 103 1710

OverallUser's accuracy 62% 70% 63% 78% 93% 81%

Random Forest model

Conclusions• No one technique could answer the two question posed• Use the appropriate methodology or combination of methodologies

to address your question.• From the examples, time should be explicitly accounted for when

doing trend analysis or making “current” estimates.• Leverage the longitudinal (repeated measure) data when possible. • The temporally indifferent method currently used by FIA does

generally provide estimates with smaller standard error. However, it is not a current estimate and the estimate should be tied to the approximate mid-point of the cycle – not the end year.

• All demonstrated techniques run using the R statistical package which can be directly linked to either internal oracle tables or FIADB.

Plans to improve estimators to better utilize panel data

Documents

Transcript of Plans to improve estimators to better utilize panel data