[IEEE 2014 Systems and Information Engineering Design Symposium (SIEDS) - Charlottesville, VA, USA...

6
Empirical Analysis of Bayesian Kernel Methods for Modeling Count Data Molly Stam Floyd, Hiba Baroud, and Kash Barker University of Oklahoma, mstam, hbaroud, [email protected] Abstract - Bayesian models are used for estimation and forecasting in a wide range of application areas. One extension of such methods is the Bayesian kernel model, which integrate the Bayesian conjugate prior with kernel functions. This paper empirically analyzes the performance of Bayesian kernel models when applied to count data. The analysis is performed with several data sets with different characteristics regarding the numbers of observations and predictors. While the size of the data and number of predictors is changing across data sets, the predictors are all continuous in this study. The Poisson Bayesian kernel model is applied to each data set and compared to the Poisson generalized linear model. The measures of goodness of fit used are the deviance and the log-likelihood functional value, and the computation is done by dividing the data into training and testing sets, for the Bayesian kernel model, a tuning set is used to optimize the parameters of the kernel function. The Bayesian kernel approach tends to outperform classical count data models for smaller data sets with a small number of predictors. The analysis conducted in this paper is an initial step towards the validation of the Poisson Bayesian kernel model. This type of model can be useful in risk analysis applications in which data sources are scarce and can help in analytical and data-driven decision making. Index Terms - Bayesian kernel models, Count data, Goodness of fit, Poisson regression INTRODUCTION In many situations, the likelihood of an event is found with the average rate at which the event occurs. And often that rate is a function of characteristics surrounding the event. To integrate the impacts of both the component characteristics and any prior failure information, we propose to use a Bayesian kernel model as an approach to a more accurate estimation of the rate of occurrence of an event. More specifically, we use an extended version of this method, the Poisson Bayesian kernel model to accommodate count data and estimate the rate of occurrence. This paper provides an empirical analysis of this model using different types of datasets and measures of goodness of fit used to validate the accuracy of the model in comparison to more classical approaches. Kernel methods, first introduced in a pattern recognition setting several decades ago [1], have found popularity across a number of data mining domains, including bioinformatics [2, 3], sensing [4, 5], and financial risk management and forecasting [6, 7], among many others. Kernel functions are used to map input data, for which no pattern can be recognized, to a higher dimensional space, where patterns are more readily detected. Such functions enable algorithms designed to detect relationships among data in the higher dimensional space, including least squares regression and support vector machine (SVM) classification [8-10]. Integrating Bayesian methods with kernel methods has recently garnered attention [11-14], as Bayesian methods make use of previous data to estimate posterior probability distributions of the parameter of interest given that it follows a specific prior distribution. The integration of Bayesian and kernel methods enables a classification algorithm which provides probabilistic outcomes as opposed to deterministic outcomes (i.e., such as those resulting from SVM classification). That is, rather than assigning a class to a data point, Bayesian kernel methods assign a probability that the data point belongs to a particular class. Several extensions to Bayesian kernel models have appeared, including (i) the relevance vector machine (RVM) which assumes a Gaussian distribution for the probability to be estimated [15, 16], and (ii) non-Gaussian distributions for binary problems [17-19]. This paper analyzes a similar approach to model count data, whereby the outcome estimated is the rate of occurrence of a certain event rather than a classification of a data point in a deterministic class. The prior distribution of the rate is assumed to follow a gamma distribution and the notion of conjugate prior is used to construct the posterior distribution whose parameters are depending on the kernel function. Section 2 provides a review of appropriate literature. Section 3 details the development of the Bayesian kernel approach for count data. Section 4 discusses the goodness of fit measures used in the empirical analysis presented in section 5, and section 6 provides concluding remarks. BACKGROUND I. Bayes Rule and the Conjugate Prior The classic Bayes rule assumes that a prior probability for an event of interest, A, is given as P(A), and a likelihood of event B conditioned on the occurrence of A is given as P(B | A). With these probabilities, along with P(B), one can 978-1-4799-4836-9/14/$31.00 (c) 2014, IEEE 328

Transcript of [IEEE 2014 Systems and Information Engineering Design Symposium (SIEDS) - Charlottesville, VA, USA...

Empirical Analysis of Bayesian Kernel Methods for

Modeling Count Data

Molly Stam Floyd, Hiba Baroud, and Kash Barker University of Oklahoma, mstam, hbaroud, [email protected]

Abstract - Bayesian models are used for estimation and

forecasting in a wide range of application areas. One extension of such methods is the Bayesian kernel model,

which integrate the Bayesian conjugate prior with kernel

functions. This paper empirically analyzes the

performance of Bayesian kernel models when applied to

count data. The analysis is performed with several data

sets with different characteristics regarding the numbers

of observations and predictors. While the size of the data

and number of predictors is changing across data sets,

the predictors are all continuous in this study. The

Poisson Bayesian kernel model is applied to each data set

and compared to the Poisson generalized linear model.

The measures of goodness of fit used are the deviance

and the log-likelihood functional value, and the

computation is done by dividing the data into training

and testing sets, for the Bayesian kernel model, a tuning

set is used to optimize the parameters of the kernel

function. The Bayesian kernel approach tends to outperform classical count data models for smaller data

sets with a small number of predictors. The analysis

conducted in this paper is an initial step towards the

validation of the Poisson Bayesian kernel model. This

type of model can be useful in risk analysis applications

in which data sources are scarce and can help in

analytical and data-driven decision making.

Index Terms - Bayesian kernel models, Count data,

Goodness of fit, Poisson regression

INTRODUCTION

In many situations, the likelihood of an event is found with

the average rate at which the event occurs. And often that

rate is a function of characteristics surrounding the event.

To integrate the impacts of both the component

characteristics and any prior failure information, we propose

to use a Bayesian kernel model as an approach to a more

accurate estimation of the rate of occurrence of an event.

More specifically, we use an extended version of this

method, the Poisson Bayesian kernel model to accommodate

count data and estimate the rate of occurrence. This paper

provides an empirical analysis of this model using different

types of datasets and measures of goodness of fit used to

validate the accuracy of the model in comparison to more

classical approaches.

Kernel methods, first introduced in a pattern recognition

setting several decades ago [1], have found popularity across a number of data mining domains, including bioinformatics

[2, 3], sensing [4, 5], and financial risk management and

forecasting [6, 7], among many others. Kernel functions are

used to map input data, for which no pattern can be

recognized, to a higher dimensional space, where patterns

are more readily detected. Such functions enable algorithms

designed to detect relationships among data in the higher

dimensional space, including least squares regression and

support vector machine (SVM) classification [8-10].

Integrating Bayesian methods with kernel methods has

recently garnered attention [11-14], as Bayesian methods

make use of previous data to estimate posterior probability

distributions of the parameter of interest given that it follows

a specific prior distribution. The integration of Bayesian and

kernel methods enables a classification algorithm which

provides probabilistic outcomes as opposed to deterministic

outcomes (i.e., such as those resulting from SVM classification). That is, rather than assigning a class to a data

point, Bayesian kernel methods assign a probability that the

data point belongs to a particular class. Several extensions to

Bayesian kernel models have appeared, including (i) the

relevance vector machine (RVM) which assumes a Gaussian

distribution for the probability to be estimated [15, 16], and

(ii) non-Gaussian distributions for binary problems [17-19].

This paper analyzes a similar approach to model count

data, whereby the outcome estimated is the rate of

occurrence of a certain event rather than a classification of a

data point in a deterministic class. The prior distribution of

the rate is assumed to follow a gamma distribution and the

notion of conjugate prior is used to construct the posterior

distribution whose parameters are depending on the kernel

function. Section 2 provides a review of appropriate

literature. Section 3 details the development of the Bayesian

kernel approach for count data. Section 4 discusses the goodness of fit measures used in the empirical analysis

presented in section 5, and section 6 provides concluding

remarks.

BACKGROUND

I. Bayes Rule and the Conjugate Prior

The classic Bayes rule assumes that a prior probability for an

event of interest, A, is given as P(A), and a likelihood of

event B conditioned on the occurrence of A is given as P(B |

A). With these probabilities, along with P(B), one can

978-1-4799-4836-9/14/$31.00 (c) 2014, IEEE 328

calculate the posterior distribution for the event of interest

given knowledge of B, or P(A | B) shown in Eq. (1).

(1)

This manifests itself, for example, when we want to

develop a posterior distribution for a parameter of interest

from (i) the prior distribution for that parameter, and (ii) the data describing that parameter in the form of a likelihood

function, which is a conditional likelihood of obtaining the

data given what we understand about the parameter. In such

a case, the denominator does not depend on the parameter of

interest and can be excluded from the Bayes rule equation

when maximum likelihood calculations are performed.

More specifically, in the SVM framework, consider a

function t that maps input data x to a value corresponding to

its binary class (y = +1, -1). Given a training set of data, a

posterior probability distribution for this function t can be

estimated as being proportional to its prior distribution

multiplied by the likelihood function, as depicted in Eq. (2).

(2)

An important notion used in the Bayesian framework is

conjugate distributions, which assume that posterior P(t | x)

and prior P(t) distributions are from the same family of

distributions. For example, in the non-Gaussian extension

for the Bayesian kernel models, MacKenzie et al. [19] use

the Beta-Bernoulli conjugate prior. Having the prior and

posterior follow the same distribution insures that the overall

data properties are kept while modifying the details of the

distribution such as the parameters to better explain the

trends.

II. Gaussian Bayesian Kernel Models and Non-Gaussian Extensions

For an m × d data matrix X with rows corresponding to m

data points each with d attributes, the function t(X) can be

thought of as a random vector of length m. Gaussian

Bayesian kernel models assume the vector-valued function t

follows a multivariate normal distribution with mean

and covariance matrix , where matrix

K is positive definite and matrix element Kij is the kernel

function k(xi, xj) between the ith and jth data points. The

multivariate normal distribution for the realization of t is

found in Eq. (3), where is a vector-valued variable of

length m, such that , [16]. The first term

in the probability density function does not depend on

parameter t, and hence the prior distribution can be further

reduced.

(3)

In the case of a binary classification, an appropriate

likelihood function, P(t | x), would be the logit function

shown in Eqs. (4) and (5).

(4)

(5)

The posterior distribution is then the product of the

likelihood function and the prior distribution for a data set of

m data points, found in Eq. (6). To estimate the parameter of

interest, , Eq. (6) is maximized (or its negative log is minimized) using any of several optimization algorithms

(e.g., the Newton-Raphson method).

(6)

An extension to the basic Bayesian kernel model is the

non-Gaussian Bayesian kernel model [17-19], which can

improve predictive accuracy for certain problems where a

Gaussian distribution for model parameters should not

realistically be assumed. MacKenzie et al. [19] highlight

some of the drawbacks of using the Gaussian distribution for

binary classification problems, use a beta conjugate prior,

and offer an alternative likelihood function to the logit,

expanding previous work done on Non-Gaussian kernel

models [17,18] by introducing a more generalized model.

III. Methods for Count Data

One of the classical approaches used to analyze count data is

the Poisson Generalized Linear Model (GLM) [20, 21]. The

Poisson GLM assumes that the rate to be estimated has an

exponential relationship with a set of covariates representing

coefficients for the different attributes, shown in Eq. (7).

(7)

More sophisticated models analyze count data within a

Bayesian framework such as the Bayesian analysis of the

Poisson model using the Gamma-Poisson conjugate prior,

which will be further discussed in the next section.

Extensions to this model include the analysis of the

parameters of the gamma prior distribution [22].

Other extensions to Bayesian Poisson methods consider

hierarchical models [23]. The proposed model is then based

on the multivariate Poisson-log normal distribution with a

hierarchical Bayesian application. This multivariate

distribution is used to model discrete multiple count data and

is shown in Eq. (8). is the M-dimensional

multivariate normal distribution. The mean vector is

represented by μ, and T is the inverse of the covariance

matrix. The hyper-prior parameters R and π = M are known.

The model is advantageous in that it can model joint

responses and can detect relationships among the categories

of count variables. However, Markov Chain Monte Carlo

329

methods were utilized to make inferences about the model

parameters, which can oftentimes be complex.

(8)

The model discussed and illustrated in this paper is

simple enough to avoid expensive computations but detailed enough to overcome issues in basic Bayesian such as the

Gamma-Poisson conjugate prior and count regression

models such as the GLM.

POISSON BAYESIAN KERNEL MODEL

Bayesian kernel methods estimate the rate of occurrence of

the event rather than estimating a deterministic value for the

number of times the event is estimated to occur. A common distribution to model count data within a Bayesian

framework is the Gamma-Poisson conjugate prior. The

development of the Poisson Bayesian kernel method

discussed here is found in [24].

It is assumed that the parameter to be estimated is the

rate of occurrence, , which follows a Gamma prior

distribution with parameters and , as shown in Eq. (9).

(9)

For the likelihood function, the product of the Poisson

density function, shown in Eq. (10), is used, since this is a

Gamma-Poisson conjugate prior approach.

(10)

Thus, the posterior distribution is the product of Eqs. (9)

and (10). The posterior distribution is also a gamma

distribution where and . This result is the basic Gamma-Poisson Bayesian approach which

assumes the notion of exchangeability meaning that for

different sets of training and testing datasets, the resulting

posterior parameter will be similar since they are a function

of the prior parameter, the size of the dataset and the

summation of all the data points. The characteristic of each

outcome are not taken into consideration in this case, but

rather the overall property of the dataset [19].

Rearranging the product of the likelihood function and

the prior distribution function results in a Gamma

distribution in Eq. (11).

(11)

Using the same argument as above, the parameters for

the Bayesian kernel model for counts are expressed in Eqs.

(12) and (13). K is the m × m kernel matrix, Y is an m × 1

vector containing the output data associated with the m

observations of X, and V is an m × 1 vector containing ones.

(12)

(13)

With the addition of the kernel function, the new data

point is compared with the training set and according to the

similarities of the attributes, new values for the parameter of

the posterior distribution will be computed. The choice of the type of kernel function depends on the application and

the model user. For the purpose of the empirical analysis

conducted in this paper, we use the most popular kernel

function, the radial basis function in Eq. (14),

where is one entry in the matrix representing the

kernel function between the ith and jth data points. Note that

in the data sets used in this empirical study, all predictors are

continuous variables. The radial basis function parameter, ,

is tuned to obtain an optimal value that would either

maximize the log-likelihood function or minimize the

deviance, details on the tuning of this parameter are

discussed in the next section.

(14)

The rate for the new data point follows then a Gamma

distribution with parameters and . As a point estimate for this parameter, we will consider the expected value of the

posterior distribution, shown in Eq. (15) as the ratio of the

gamma distribution parameters and .

(15)

Note that a different point estimate for the rate can be

used such as the median, the mode, or the variance,

depending on the type of problem and the model users.

GOODNESS OF FIT MEASURES

The purpose of this paper is to empirically test the Poisson

Bayesian kernel model to determine how well it fits different

data sets in comparison to another classical method for

modeling count data, the Poisson generalized linear model

(GLM) [20, 21]. The Poisson GLM, presented in the

background section in (7), assumes that the rate to be

330

estimated has an exponential relationship with a set of

covariates representing coefficients for the different

attributes.

The functional values of two metrics are used to compare

the two models. The first metric is the deviance, which

computes the difference in the log-likelihood function

between the fitted model and the saturated model, Eq. (16),

where is the size of the testing set, is the true value of

the data point, and is the estimated rate for the particular data point.

(16)

The deviance is the generalized form of the sum of

squared errors used in the linear regression model, it is a

metric that analyzes the discrepancy between the observed

and estimated values. The deviance for a Poisson regression

model is represented in Eq. (17), where

when We use to assess how well the fitted values

are representing the observed rate of occurrences in both the

Poisson GLM and the Poisson Bayesian kernel model.

(17)

The second metric used is the functional value of the log-

likelihood, shown in Eq. (18), which is to be maximized.

Note that the Poisson GLM coefficients estimates are

computed such that the likelihood is maximized. The

Poisson Bayesian kernel model is fitted given a tuned

parameter, , of the radial basis kernel function in (14). This

parameter is optimized such that the log-likelihood function

is maximized.

(18)

To determine the robustness of the tuning of this

parameter and its influence on the estimated posterior

parameters, and , in (12) and (13), respectively, the

metrics are also computed for tuned to minimize the

deviance. Note that for the analysis of both metrics, we

discard components that are independent of the model, such

as the multiplication by 2 in the deviance and in the

log-likelihood function.

EMPIRICAL ANALYSIS

The model discussed above is applied to several data sets

[25-27], and its performance is compared to the Poisson

GLM using the metrics discussed in the previous section. A

brief description of the data sets is found in Table I. The first

three data sets are similar in terms of the number of

predictors and the size of the data, while the fourth set has a

larger number of predictors for a small data set, and the fifth

is a large data set with a small number of predictors. Note

that the number of predictors is held constant across the

models to ensure consistency in the comparison, though

future research could consider the goodness of fit of each

model given the number of predictors required to explain the

rate of occurrence and achieve the same level of accuracy.

Also, the prior parameters are assumed to be equal to zero,

. Testing is performed for 100 trials on 30% of the data,

with 50% of the data used as a training set and 20% as a

tuning set for computing the unknown parameter, , in the

kernel function. The training and the tuning sets were combined into one training set to perform the testing. For

each of the two models, the estimated rate of occurrence is

computed for the testing test and used to evaluate the

deviance and the log-likelihood functional value given the

observed values. This process is repeated 100 times where,

at each iteration, random samples of training, tuning, and

testing sets are chosen. Tables II and III provide a summary

of the analysis.

The deviance and log-likelihood values presented in the

tables below are the average values of the goodness of fit

measures evaluated over 100 trials. PBK refers to the

Poisson Bayesian kernel model and PGLM refers to the

Poisson GLM. Recall that a model with a smaller deviance

and a larger log-likelihood functional value fits better the

data.

TABLE I

DESCRIPTION OF DATA SETS IN THE POISSON BAYESIAN KERNEL MODEL VALIDATION STUDY

Data set Number of

attributes

Data

set size Dependent variable Predictors

Crime 4 50 Crime rate Race, percentage of high school graduates, percentage below poverty level,

percentage with a single parent

Murder 4 51 Murder rate Race, percentage of high school graduates, percentage below poverty level,

percentage with a single parent

Murder in

Metropolitan 4 51

Murder rate in

metropolitan areas

Race, percentage of high school graduates, percentage below poverty level,

percentage with a single parent

Mussels 8 45 Number of species of

mussels

Area, number of stepping stones (intermediate rivers) to 4 major

species-source river systems, concentration of nitrate, solid residue,

concentration of hydronium

Customer 5 110

Number of customers

visiting a store from a

particular region

Number of housing units in the region, average household income in the region,

average housing unit age in the region, distance to the nearest competitor,

distance to the store

331

Overall, there are three out of five data sets for which the

Poisson Bayesian kernel model performs better than the

Poisson GLM, and in particular, those three cases are all

among the four small data sets. Both the deviance and the

log-likelihood behave similarly for all the datasets and lead

to the same conclusion of the model performance.

The deviance tends to be larger whenever we have a

small dataset and a small number of predictors, and in both

cases where we have the largest deviances among all datasets (Crime and Murder in Metropolitan area), the

Poisson Bayesian kernel model performed better than the

Poisson GLM. While conclusions might not be definitive

without further analysis, the Poisson Bayesian kernel model

initially appears to be a good model when we have a small

data set with a small number of predictors, a situation known

to cause issues with regression modeling [21].

TABLE II

GOODNESS OF FIT MEASURES (MAXMIZING THE LOG-

LIKELIHOOD)

Deviance Log-Likelihood

Data Set PBK PGLM PBK PGLM

Crime 79.8 130.1 2632.5 2582.1

Murder 23.4 7.6 172.1 187.9

Murder in

Metropolitan area 56.7 64.2 3190.7 3183.3

Mussels 13.9 17.3 207.6 204.2

Customer 24.2 18.6 567.7 573.2

Recall that the radial basis function parameter, , was

initially tuned such that the log-likelihood is maximized, which complies with the estimation method of the Poisson

GLM [21]. In order to identify the robustness of the tuning

process and its impact on the empirical analysis and the

goodness of fit measures, we perform the same computation

using a tuned such that the deviance is minimized, the

results of the computation are summarized in Table III.

TABLE III

GOODNESS OF FIT MEASURES (MINIMIZING THE DEVIANCE)

Deviance Log-Likelihood

Data Set PBK PGLM PBK PGLM

Crime 90.2 130.1 2662.1 2582.1

Murder 24.1 7.6 171.4 187.9

Murder in

Metropolitan area 55.6 64.2 3191.9 3183.3

Mussels 16.6 17.3 204.9 204.2

Customer 41.9 18.6 550.0 573.2

Although the values of the goodness of fit measures are different for the Poisson Bayesian kernel model, the

conclusion regarding the performance of the model is the

same under both the deviance and log-likelihood function.

Note that these new values are compared with the same

values of deviance and log-likelihood of the Poisson GLM

as maximum likelihood estimation is popular and generally

used for GLMs [21]. This suggests that the tuning process is

robust enough that the difference in the goodness of fit

measures is insignificant and did not result in any change in

the conclusion of the analysis.

CONCLUDING REMARKS

The Poisson Bayesian kernel model is presented in this

paper and empirically tested and compared with the classical

Poisson GLM. Both models were used to fit several datasets

having different characteristics in terms of the size of the

data and the number of predictors.

The evaluation of the performance of each model is

based on the values of the deviance and the log-likelihood

function. Based on the results obtained, the Poisson

Bayesian kernel model outperformed the Poisson GLM in

the majority of the sets. Also, the Poisson Bayesian kernel

model might be a better model for small-sized data sets having few predictors. Such a result can be very useful in

risk analysis applications to estimate the rate of occurrence

of a certain disruption in transportation systems or power

grids. In such cases data sources can be scarce due to the

lack of occurrence of the event and the possible factors that

might cause a disruption, and a need for a more accurate

estimation of the rate of disruption can help save lives and

lead to a more efficient preparedness and recovery

investment and allocation.

The results presented in this paper serve as an initial step

in the validation process of the Poisson Bayesian kernel

model. Future research will investigate the impact of

changing the number of predictors across the models

analyzed and look into the comparison with other types of

count data models, in addition to considering other measures

for testing the goodness of fit.

REFERENCES

[1] Aizerman, M., Braverman, E., and Rozonoer, L., 1964, “Theoretical

foundations of the potential function method in pattern recognition

learning,” Automation and Remote Control, 25, pp. 821–837.

[2] Schölkopf, B., Guyon, I., and Weston, J., 2003, “Statistical Learning

and Kernel Methods In Bioinformatics,” IOS Press Amsterdam, The

Netherlands, pp. 1–21.

[3] Ben-Hur, A. and Noble, W.S. 2005. “Kernel methods for predicting

protein–protein interactions,” Bioinformatics, 21(Suppl. 1), pp. i38–

i46.

[4] Arias, P., Randall, G., and Sapiro, G. 2007. “Connecting the Out-of-

sample and Preimage Problems in Kernel Methods.” Proceedings of

IEEE Conference on Computer Vision and Pattern Recognition,

Minneapolis, Minnesota, pp. 18-23.

[5] Camps-Valls, G., Rojo-Alvarez, J. L., and Martinez-Ramon, M. 2006.

Kernel Methods in Bioengineering, Signal and Image Processing.

Hershey, PA: IGI Global.

[6] Wang, L. and Zhu, J., 2010, “Financial market forecasting using a

two-step kernel learning method for the support vector regression.”

Annals of Operation Research, 174, pp. 103-120.

[7] Mitschele, A., Chalup, S., Schlottmann, F., and Seese, D. 2006.

“Applications of Kernel Methods in Financial Risk Management.”

Computing in Economics and Finance , Society for Computational,

no. 317.

[8] Cherkassky, V. and F. Mulier. 1998. Learning from Data: Concepts,

Theory, and Methods. Hoboken, NJ: Wiley.

332

[9] Cristianini, N., J. Shawe-Taylor. 2000. An Introduction to Support

Vector Machines and Other Kernel based Learning Methods.

Cambridge, UK: Cambridge University Press.

[10] Hastie, T., R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New

York, NY: Springer.

[11] Seeger, M. “Bayesian model selection for support vector machines,

Gaussian processes and other kernel classifiers.” In Solla, S. A., Leen,

T. K., and Müller, K. R. (eds.), 2000, Advances in neural information

processing systems. Cambridge: MIT Press, pp. 603-609.

[12] Bishop, C. M. and Tipping, M. E. “Bayesian regression and

classification,” In Suykens, J. A. K., Horváth, G., Basu, S., Micchelli,

C., and Vandewalle, J. (eds), 2003, Advances in Learning Theory:

Methods, Models and Applications, IOS press, Amsterdam, pp. 267-

288

[13] Mallick, B. K., Ghosh, D., and Ghosh, M. 2005. “Bayesian

classification of tumours by using gene expression data,” Journal of the Royal Statistical Society, Part B, 67(2), pp. 219-234.

[14] Zhang, Z., Dai, G., and Jordan, M. I. 2011. “Bayesian generalized

kernel mixed models,” Journal of Machine Learning Research, 12,

pp. 111-139.

[15] Tipping, M.E. 2001. “Sparse Bayesian Learning and the Relevance

Vector Machine.” Journal of Machine Learning Research, 1, pp. 211-

244

[16] Schölkopf, B. and Smola A. J. 2002. Learning with Kernels: Support

Vector Machines, Regularization, Optimization, and Beyond. MIT

Press, Cambridge, MA.

[17] Montesano, L., and Lopes, M. June 2009. “Learning grasping

affordances from local visual descriptors.” Proceedings of the 8th

IEEE international conference on development and learning,

Shanghai, China, pp. 1-6.

[18] Mason, M., and Lopes, M. March 2011. “Robot self-initiative and

personalization by learning through repeated interactions,”

Proceedings of the 6th ACM/IEEE international conference on

human-robot interaction, Lausanne, Switzerland, pp. 433-440.

[19] MacKenzie, C.A., Trafalis, T. B., and Barker, K. “Bayesian Kernel

Methods for Non-Gaussian Distributions.” In revision.

[20] Cameron, A.C. and Trivedi, P. K. 1986. “Econometric Models Based

on Count Data: Comparisons and Applications of Some Estimators

and Tests,” Journal of Applied Econometrics, 1(1), pp. 29-53.

[21] Cameron, A.C. and Trivedi, P. K. 1998. Regression Analysis of Count

Data. Cambridge University Press, Cambridge, UK.

[22] Winkelman, R. 2008. “Chapter 8: Bayesian Analysis of Count Data.”

In Econometric Analysis of Count Data, 5th edition, Springer, Verlag

Berlin Heidelberg.

[23] Tunaru, R. 2002. “Hierarchical Bayesian Models for Multiple Count

Data,” Australian Journal of Statistics, 31(2-3), pp. 221-229.

[24] Baroud, H., Barker, K., Lurvey, R., and MacKenzie, C. A. 2013.

“Bayesian Kernel Models for Disruptive Event Data,” Proceedings of

the ISERC, San Juan, Puerto Rico, pp.1777-1785.

[25] Agresti A. and Finlay, B. 2008. Statistical Methods for the Social Sciences, 4th edition, Prentice Hall.

[26] Sepkoski, J. J. and Rex, M. A. 1974. "Distribution of Freshwater

Mussels: Coastal Rivers as Biogeographic Islands." Systematic

Zoology, 23(2), pp. 165-188.

[27] Kutner, M. H., Nachtsheim, C., Neter, J., and Li, W. 2005. Applied

Linear Statistical Models, 5th edition, New York: McGraw-Hill-Irwin.

AUTHOR INFORMATION

Molly Stam Floyd is an Undergraduate Research Assistant

in the School of Industrial and Systems Engineering at the

University of Oklahoma. She will earn a B.S. in Industrial

Engineering in May 2014 and will pursue graduate studies

thereafter. Her research interests lie in the resilience of

disaster recovery and humanitarian relief networks, and her

research has been funded by the Experimental Program to

Stimulate Competitive Research (EPSCoR).

Hiba Baroud is a Ph.D. Candidate and Graduate Research

Assistant in the School of Industrial and Systems

Engineering at the University of Oklahoma. She came to OU

following B.S. and M.S. degrees in Actuarial Science from

Notre Dame University, Lebanon and the University of

Waterloo, respectively. Her research interests include

statistical modeling for risk analysis and decision making.

Kash Barker is an Assistant Professor in the School of

Industrial and Systems Engineering at the University of

Oklahoma. His research interests primarily lie in the

reliability, resilience, and economic impact of infrastructure

networks, and his work has been funded by the National

Science Foundation and the Army Research Office, among

others. He earned B.S. and M.S. degrees in Industrial

Engineering from the University of Oklahoma and a Ph.D. in

Systems Engineering at the University of Virginia.

333