Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer...

26
Bayesian modeling for Bayesian modeling for ordinal substrate size ordinal substrate size using EPA stream data using EPA stream data Megan Dailey Higgs Megan Dailey Higgs Jennifer Hoeting Jennifer Hoeting Brian Bledsoe* Brian Bledsoe* Department of Statistics, Colorado State University Department of Statistics, Colorado State University *Department of Civil Engineering, Colorado State *Department of Civil Engineering, Colorado State University University A spatial model for ordered A spatial model for ordered categorical data categorical data
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer...

Page 1: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Bayesian modeling for Bayesian modeling for ordinal substrate size ordinal substrate size

using EPA stream data using EPA stream data

Megan Dailey HiggsMegan Dailey HiggsJennifer HoetingJennifer HoetingBrian Bledsoe*Brian Bledsoe*

Department of Statistics, Colorado State UniversityDepartment of Statistics, Colorado State University*Department of Civil Engineering, Colorado State University*Department of Civil Engineering, Colorado State University

A spatial model for ordered A spatial model for ordered categorical datacategorical data

Page 2: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Substrate size in streamsSubstrate size in streams► Influences in-stream physical habitatInfluences in-stream physical habitat► Often indicative of stream healthOften indicative of stream health► EPA collected data at 485 sites in Washington and EPA collected data at 485 sites in Washington and

Oregon between 1994 and 2004Oregon between 1994 and 2004

Page 3: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Data Collection ProtocolData Collection Protocol►At a site:At a site:

11 transects x 5 points along each 11 transects x 5 points along each transect transect

Choose particle under the sharp end of a Choose particle under the sharp end of a stickstick

Visually Visually estimate and classifyestimate and classify size size

Page 4: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Creating the responseCreating the response► For a site:For a site:

Transform the original size classes to Transform the original size classes to

loglog1010(Geometric Mean) for all sample points(Geometric Mean) for all sample points Find the median for the siteFind the median for the site

►Geometric meanGeometric mean

Page 5: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

The responseThe response► YYii = median[ = median[loglog1010((geometric meangeometric mean)] for site i)] for site i

► Transformation provides a more symmetric, Transformation provides a more symmetric, continuous-like variablecontinuous-like variable Typically modeled as a continuous variableTypically modeled as a continuous variable Predictive models have performed poorlyPredictive models have performed poorly

► Response is an ordered categorical variableResponse is an ordered categorical variable 12 categories (6 with very few observations)12 categories (6 with very few observations)

Page 6: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Ordered categorical dataOrdered categorical data

►YYii is a categorical response variable is a categorical response variable with K ordered values: {1,…,K}with K ordered values: {1,…,K}

►Modeling objectives:Modeling objectives: Explain the variation in the ordered

response from covariate(s) Incorporate the spatial dependence Estimate, predict, and create maps of Pr(Yi

≤ k) and Pr(Yi = k)

Page 7: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Formulating the spatial modelFormulating the spatial model

Spatial model for ordered categorical data

+ =Non-spatial model for ordered categorical data

Albert & Chib (1993, 1997)

Spatial model for binary and count data

• Diggle, Tawn, & Moyeed (1998)

• Gelfand & Ravishanker (1998)

•Generalized geostatistical models with a Generalized geostatistical models with a latent latent

Gaussian processGaussian process•Metropolis Hastings within Gibbs sampling Metropolis Hastings within Gibbs sampling

approachapproach

Page 8: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Latent variable formulation Latent variable formulation

► Define latent variable, ZDefine latent variable, Zii, such that Z, such that Zi i = = XXii’’ββ + + εεii

εεii ~ N(0,1) for the probit model ~ N(0,1) for the probit model εεii ~ Standard Logistic for logit model ~ Standard Logistic for logit model

► Define the categorical response, YDefine the categorical response, Yi i = {1,…,K}, = {1,…,K},

using Zusing Zii and ordered cut-points, and ordered cut-points, θθ = (= (θθ1 1 , … ,, … ,θθK-1K-1),),

where 0 = where 0 = θθ1 1 < < θθ2 2 < … < < … < θθK-1 K-1 < < θθK K = ∞= ∞

YYii = 1 if Z = 1 if Zii < < θθ1 1

YYii = k if = k if θθk-1 k-1 ≤ Z≤ Zii < < θθk k

YYii = K if Z = K if Zii ≥ ≥ θθK-1K-1

Page 9: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Latent variable formulationLatent variable formulation

► Thus, Pr(YThus, Pr(Yii ≤ k | ≤ k | θθ, , ββ) = Pr(Z) = Pr(Zii < < θθkk))

Pr(YPr(Yii = k | = k | θθ, , ββ) = Pr() = Pr(θθk-1 k-1 ≤ Z≤ Zii < < θθkk))

If Z ~ N(If Z ~ N(XXii’’ββ, 1), then, 1), then

Pr(YPr(Yii ≤ k | ≤ k | θθ, , ββ) = ) = ΦΦ((θθkk – – XXii’’ββ) )

Pr(YPr(Yii = k | = k | θθ, , ββ) = ) = ΦΦ((θθkk – – XXii’’ββ) - ) - ΦΦ((θθk-1k-1 – – XXii’’ββ) )

where where ΦΦ is the N(0,1) cdf is the N(0,1) cdf

Page 10: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Spatial cumulative modelSpatial cumulative model► ZZi i = = XXii’’ββ + W + Wii + + εεi i is the latent variableis the latent variable

where where εεii ~ N(0,1) ~ N(0,1)

WW ~ N(0, ~ N(0, 22H(H()) (H()) (H())))ijij = = ((ssii--ssjj;;))

ZZi i | | ββ, W, Wi i ~ N(~ N(XXii’’ββ + W + Wii , 1) , 1)

► Pr(YPr(Yii ≤ k | ≤ k | ββ, , θθ, , WWii) = Pr(Z) = Pr(Zii < < θθkk))

= = ΦΦ((θθkk – – XXii’’ββ - W - Wii))

Where Where θθ = (= (θθ1 1 , … ,, … ,θθKK) is a vector of cut-points) is a vector of cut-points

such that 0 = such that 0 = θθ1 1 < < θθ2 2 < … < < … < θθK-1 K-1 < < θθK K = ∞= ∞

Page 11: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Fitting the spatial modelFitting the spatial model► The likelihoodThe likelihood

► Estimating Estimating = (= (00, , 11), ), = (= (22, , ) , ) , θθ = (= (θθ22, … ,, … ,θθK-1K-1))

► Transform Transform θθ to a real-valued, unrestricted cut-points: to a real-valued, unrestricted cut-points: = (= (where where log(log(θθ22))

kklog(log(θθkk – – θθk-1k-1)) ► MCMC samplingMCMC sampling

Metropolis-Hastings within Gibbs samplingMetropolis-Hastings within Gibbs sampling Prior: Prior:

► – – flat and conjugate Normalflat and conjugate Normal► 22 and and – Independent uniform priors – Independent uniform priors► multivariate normalmultivariate normal

Page 12: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Simulated dataSimulated data► Simulated data at a subset of the original Simulated data at a subset of the original

locations (n = 82)locations (n = 82) Cluster infill around the 82 sites (n=120)Cluster infill around the 82 sites (n=120)

Spatial process:Spatial process:►W is a stationary Gaussian process with W is a stationary Gaussian process with

E[W(E[W(ss)]=0 and Cov[W()]=0 and Cov[W(ssii),W(),W(ssjj)] = )] = 22((ssii--ssjj;;))►Exponential correlation function: Exponential correlation function: (d) = exp(-(d) = exp(-

dd))

Covariate:Covariate:►Distance weighted stream powerDistance weighted stream power

Page 13: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Preliminary ResultsPreliminary Results

►Posterior quantitiesPosterior quantities Based on 1000 iterations (burn-in = 1000)Based on 1000 iterations (burn-in = 1000)

Page 14: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Posterior mean of the spatial Posterior mean of the spatial processprocess

Page 15: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Posterior SD of the spatial processPosterior SD of the spatial process

Page 16: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Posterior mean and SD forPosterior mean and SD for Pr(Y Pr(Yii = 2) = 2)

Page 17: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Posterior mean and SD forPosterior mean and SD for Pr(Y Pr(Yii = 5) = 5)

Page 18: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Posterior mean and SD forPosterior mean and SD for Pr(Y Pr(Yii ≤≤ 5) 5)

Page 19: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Future WorkFuture Work► Convergence and mixing for the spatial model Convergence and mixing for the spatial model

► Models and methods for large data setsModels and methods for large data sets Spectral parameterization of the spatial processSpectral parameterization of the spatial process

► Wikle (2002), Paciorek & Ryan (2005), Royle & Wikle (2005)Wikle (2002), Paciorek & Ryan (2005), Royle & Wikle (2005) Importance samplingImportance sampling

► Gelfand & Ravishanker (1998), Gelfand, Ravishanker, & Ecker Gelfand & Ravishanker (1998), Gelfand, Ravishanker, & Ecker (2000)(2000)

Sub-samplingSub-sampling

► Investigate different spatial correlation functions Investigate different spatial correlation functions and distance metricsand distance metrics TraditionalTraditional Stream basedStream based

► Model selection for the spatial modelModel selection for the spatial model

Page 20: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Funding and AffiliationsFunding and Affiliations

FUNDING/DISCLAIMERThe work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA.  The views expressed here are solely those of the authors and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation.

Megan’s research is also partially supported by the PRIMES National Science Foundation Grant DGE-0221595003.

CR-829095

Page 21: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Thank youThank you

Page 22: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.
Page 23: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Subset of data Subset of data ((nnsmall small = 82)= 82)

Page 24: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Sample path plot - ExampleSample path plot - Example

Page 25: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Surface for estimating Surface for estimating =(=(22,,))

Page 26: Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Sample path plot – Avoiding plateauSample path plot – Avoiding plateau