Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study.

A Bayesian hierarchical network for truck demand modeling

Boilé M., M.M. Golias, & S. Ivey

Innovations in Freight Demand Modeling and DataA Transportation Research Board SHRP2 Symposium

September 14-15, Crowne Plaza Hotel-Dulles Airport, Washington, DC

ContentsIntroductionMotivationCase study

IntroductionFreight demand modeling & Regression

techniquesOne of the best and worst tools we haveProblems come from:

Data Misleading performance measures

When data is limited regression techniques cannot perform well (after all they are pattern recognition techniques)

Even worse sometimes we rely on training-based measures of performance

Typical regression

Input

•We want to predict Y

•We believe that a number of known inputs X can predict Y based on a function

Black

Box(Or Not)

•Run an algorithm to select which of the variables we selected are actually meaningful and what the parameters of the function

Output

•Obtain a model and performance measures

•Use the model

Although not really the case we assume that X is a linear function of Y

Is this assumption correct?Given our usual data availability non-linear models will

not (necessarily) perform betterWhat are the best inputs?Two mentalities:

Throw in what ever you can findUse some rational

Input

•We want to predict Y

•We believe that a number of known inputs X can predict Y based on a function

There are a number of algorithms for regressionMost of them select some of the X’s (variable selection)Some of them add constraints to the Y’s (constraint

regression)Some of them add constraints to the effect of X’s

(shrinkage techniques)

Black Box(Or Not)

Run an algorithm to select which of the variables we selected are actually meaningful and what the parameters of the function

Two main measures of performance:What is the error of the model (R2)?Is the model and input significant (p-values)?

When many independent variables are used, variable selection techniques can lead to models with high R2

Some accept performance measures based on data used to train the model (not such a good idea)

Some use what is called a hold out sample (more appropriate)

Output

• Obtain a model and performance measures

• Use the model

Data, data, dataSelection of input: we need data

Performance measures: we need data

Testing of the model: we need data

So what can we do when we have limited data?

Simulation looks like a good approach that has worked in other areas

Markov Chain Monte Carlo Simulation

• Typical regression linear model with selection. Close form solution using some heuristic (e.g. backward selection, forward selection) instead of going through all the possible subsets

• Instead of a closed form solution we can assume prior and posterior distributions for the variables (we can also do that for the parameters but lets talk about that some other time) and use simulation (more precise MCMC simulation)

• Why use simulation? • Integrals are intractable • MCMC simulation to go from the priors to the

posteriors• Is it better? One way to find out!

Case StudyPrediction of truck volumes on state highways in New Jersey

Major AssumptionTruck volumes can be predicted given

socioeconomic data surrounding the highway

Case study: DataDependent dataset : 270 locations throughout NJ

(long and short duration classification counts )Long duration counts: Weight-In-Motion (WIM)

locationsShort duration vehicle classification countsVehicle class 5 through 13 (FHWA classification)34 Independent variables:

PopulationNumber of employees (11 SIC codes)Sales volume (11 SIC codes)Number of establishments (11 SIC codes)

Case Study: Traffic counts by roadway class

Functional Class (FC)Counts (#Observations)

A: 1,2 (Rural interstate and major arterials) 31

B: 6, 7, 8, 9 (Rural minor arterials, collectors, and local)

51

C: 11 (Urban interstate) 29

D: 12 (Urban expressways and parkways) 20

E: 14 (Urban major arterials) 59

F: 16, 17, 19 (Urban minor arterials, collectors, and local)

80

Table 1. Clustered Dataset by Highway FC and Count Availability

Case Study: Bandwidth of sections•Uniform highway sections •Major interchanges, roadway functionality, geometry•Nine different bandwidths (0.25, 0.50, 0.75, 1.0, 1.25, 1.5, 2, 3 and 5 miles)•Nine different models were estimated, for each FC•Different models =>sensitivity with increasing size of the area

ModelWhat do we want to achieve:1. Select the most appropriate X’s out of a pool of candidate

predictors2. Constrain the values of Y3. Constrain the influence of the selected X’s

A priori non of the variables can explain

truck volumes

The depended variable can only take positive values

Diffuse priors

Diffuse priors

ResultsBayesian model (BRM)Stepwise linear regression (SLR)Statewide model (4-step planning model) (SWTM)Cross-validation with a 90% - 10% estimation-validation

dataset split

R2 Values

FC SWTMSLR

EstimationSLR

ValidationBRM

EstimationBRM

Validation

A: 1-2 0.58 0.97 0.09 0.3 0.29

B: 6-9 0.48 0.84 0.12 0.64 0.58

C: 11 0.07 0.92 0.14 0.28 0.27

D: 12 0.34 0.99 0.05 0.1 0.27

E: 14 0.04 0.13 0.16 0.19 0.47

F: 16-19 0.13 0.59 0.25 0.25 0.32

Usability for PractitionerBUGS: The best

thing since sliced bread!!!

Its free and easy to use

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml





A Bayesian hierarchical network for truck demand modeling

Boilé M., M.M. Golias, & S. Ivey

Innovations in Freight Demand Modeling and DataA Transportation Research Board SHRP2 Symposium

September 14-15, Crowne Plaza Hotel-Dulles Airport, Washington, DC

Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study.

Documents

Transcript of Boilé M., M.M. Golias, & S. Ivey. Contents Introduction Motivation Case study.