On Learning Parsimonious Predictive Models for High...

33
On Learning Parsimonious Predictive Models for High Dimensional Data Sets Xue Bai 12 and Rema Padman 1 1 The H. John Heinz III School of Public Policy and Management 2 School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 {xb, rpadman}@cmu.edu

Transcript of On Learning Parsimonious Predictive Models for High...

Page 1: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

On Learning Parsimonious Predictive Models for High Dimensional Data Sets

Xue Bai12

and Rema Padman1

1 The H. John Heinz III School of Public Policy and Management 2 School of Computer ScienceCarnegie Mellon University

Pittsburgh PA 15213 {xb, rpadman}@cmu.edu

Page 2: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 2

Outline

• Motivation

• Problem definition

• Methodology

• Evaluation

• Contributions and conclusion

Page 3: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 3

Motivation

• High Dimensional Data sets with relatively few cases arise in– Health care

– Text mining

– Internet marketing 2,936235Patronage Prediction

1,4007,716Sentiment Extraction

326779Cancer Diagnosis

#Samples#VariablesProblem Type

Page 4: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 4

Internet Movie Database: Movie reviewshttp://www.cs.cornell.edu/people/pabo/movie-review-data/

Extract the main text of the reviews using MixUp(Cohen, 2004)

Vector representation for the reviews

I ,Robot →→→→ [“ aah”=2,…, “awesome”=1, … ]

Positive / Negative

discrete1,40034,000Reviewer opinions

Target Variable (Y)

Variable Types (Xi)

#Samples#VariablesProblem Type

Page 5: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 5

Prostate Cancer Database (American Association for Cancer Research)

http://cancerres.aacrjournals.org/misc/terms.shtml

Yes / Nodiscretized326779Cancer diagnosis

Target Variable (Y)

Variable Types (Xi)

#Samples#VariablesProblem Type

Page 6: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 6

Importance and Challenges • Challenges

– dimensional difficulties

– limited data

• Importance– the impact of the complexity of a model (or simply the

number of variables) on • thespeed of computation• the quality of decisions• operational costs• understandability and user acceptanceof the decision model

Page 7: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 7

Problem definition• To build effective

prediction models from high dimensional and insufficient data

• Our approach– Hybrid methods combining

• Bayesian graphic models• Heuristic optimization

data

Hybrid methods

Model &prediction

Page 8: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 8

The goal

• To find a parsimonious Markov Blanket Directed Acyclic Graph (MB DAG) that:

– Has substantially fewer predictor variables

– Encodes the conditional dependence relations

– Provides good prediction

Page 9: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 9

Methodology

• TS/MB - Tabu Search enhanced Markov Blanketprocedure:

– 1st stage: Learning the structure of an initial MB DAG using conditional independence tests (Spirtes, Glymour and Scheines, 1993) and causal reasoning(Pearl, 1988, Spirtes, Glymour and Scheines, 2000)

– 2nd stage: Using Tabu Search (TS) heuristic (Glover, 1989) to improve the predictive structure of the MB DAG

Page 10: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 10

A Bayesian Network

–A graphical representation of the joint probability distribution P of a set of random variables S

Page 11: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 11

A Bayesian Network

• Markov factorization ofP according toS

Page 12: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 12

• Markov Blanket– MB: a smallest subset of variables inS such that Y is

independent of S\MB, conditional on the variables inMB– Equals the set of parents of Y , children of Y, and the

parents of children of Y

Page 13: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 13

Advantage of Markov Blanket Search

Page 14: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 14

Representation and Background Knowledge (cont.)

• Tabu Search (Glover, 1989)

– guide traditional local search methods to escape local optima

– the use of adaptive memory

Page 15: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 15

Methodology• Two stage algorithm:

1. 1st stage: Learning an initial MB DAG

InitialMBsearch {

1.1 Finding the adjacency structure 1.2 Pruning the graph1.3 Orienting edges1.4 Transforming into an MB DAG}

Page 16: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 16

Algorithm: 1.1 finding adjacencies

Page 17: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 17

Algorithm: 1.2 pruning redundant nodes and edges

Page 18: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 18

Algorithm: 1.3 orienting edges

Page 19: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 19

Algorithm: 1.4 pruning the graph into an MB DAG

Page 20: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 20

Algorithm: 2nd Stage – Tabu Search inMarkov Blanket Space

• Search space– The space of Markov Blanket DAGs

• Neighborhood– The set of new Markov Blanket DAGs that can

be constructed via one feasiblemove from the current Markov Blanket DAG

Page 21: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 21

• Movesedge deletion: (a) to (b), edge addition: (b) to (c),edge reversal: (c) to (d), edge reversal with node pruning: (d) to (e).

Algorithm: 2nd Stage – Tabu Search inMarkov Blanket Space

Page 22: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 22

Special features of Tabu Search

– Tabu List• A list of most recent moves

– Tabu Tenure• 7 for all kinds of moves

• Why Tabu Search?

• Non-improving solution

• Revisit

Page 23: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 23

Evaluations in Three Different Domains

• Case study I, II and IIII. Text mining domain (Movie reviewers’

opinion classification from text data)

II. Health care domain (Prostate Cancer diagnosis from serum protein data)

III. Electronic commerce domain (Customers’response prediction from DealTime data )

Page 24: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 24

Design of Experimental Parameters

• Design of Experimental Parameters:

1,2,3

Depth of search

0.001,0.005,0.01,0.05

Type I,Type II

AUC, Accuracy

90/10, 80/20Config-urations

AlphaStarting Solution

Scoring Criteria

Data-splits(training/testing)

Para-meters

Page 25: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 25

Case I - Movie reviewer opinions classification

• Data source: Internet Movie Database

• http://www.cs.cornell.edu/people/pabo/movie-review-data/

• Originally selected and studied by Pang et al., 2002

Positive / Negative

binary1,4007,716Reviewer opinions

Target Variable (Y)

Variable Types (Xi)

#Samples#VariablesProblem Type

Page 26: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 26

Computational Result I (Movie review data)

Average performance when using the full set of variables as input

Page 27: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 27

Computational Result II (Movie review data)Average performance when using the same number of variables as identifiedby TS/MB as input for all the other classifiers, selected by information gain criterion

Page 28: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 28

Computational Result III (Movie review data)

Average performance when using the exact same variables as identified by TS/MB as input

Page 29: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 29

The MB DAG learned from the Movie Review data

Page 30: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 30

Early-Stage Results from Financial News Corpora

• Data source: Infonic http://www.infonic.com/ (Lipski, 2003)– Mergers and acquisitions (M & A, 600 documents)

– Finance (600 documents)

– Mixed news (600 documents)

Positive /neutral/ Negative

binary60011,221Finance

Positive /neutral/ Negative

binary60015,686Mixed news

Positive /neutral/ Negative

binary60010,532M & A

Target Variable (Y)

Variable Types (X i)

#Samples#VariablesProblem Type

Page 31: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 31

Early-Stage Results from Financial News Corpora

The average performance, with three possible opinions

Page 32: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 32

Contributions1. Learning the Markov Blanket rather than Bayesian

Network for classification– Reduces the search space and problem complexity– Theoretically correct in limit

2. Search directly for prediction score– Vs. criteria such as BIC

3. Tabu search among the Markov Blanket space– Tabu search dynamic memory is the key to improve the

classification4. The effectiveness, robustness and superiority are tested

extensively on diverse real world data– No previous research on graphical classification models

has been empirically validated

Page 33: On Learning Parsimonious Predictive Models for High ...gkmc.utah.edu/winter2005/papers/Bai_slides.pdf · On Learning Parsimonious Predictive Models for High Dimensional Data Sets

March 11-12, 2005 1st IS Winter Conference 33

Conclusions• TS/MB classifier:

– reduces the set of predictor variables

– encodes the conditional dependencies among features

– provides excellent classification results

• Theoretically correct in limit • Empirically effective with finite samples