An Overview and Example of Data Mining

54
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1 An Overview and Example An Overview and Example of Data Mining of Data Mining Daniel T. Larose, Ph.D. Daniel T. Larose, Ph.D. Professor of Statistics Professor of Statistics Director, Director, Data Mining @CCSU Data Mining @CCSU Editor, Editor, Wiley Series on Methods and Applications in Data Mining Wiley Series on Methods and Applications in Data Mining [email protected] [email protected] www.math.ccsu.edu/larose www.math.ccsu.edu/larose University of Rhode Island Department of Computer Science and Statistics March 30, 2007

description

University of Rhode Island Department of Computer Science and Statistics March 30, 2007. An Overview and Example of Data Mining. Daniel T. Larose, Ph.D. Professor of Statistics Director, Data Mining @CCSU Editor, Wiley Series on Methods and Applications in Data Mining - PowerPoint PPT Presentation

Transcript of An Overview and Example of Data Mining

Page 1: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 11

An Overview and ExampleAn Overview and Exampleof Data Miningof Data Mining

Daniel T. Larose, Ph.D.Daniel T. Larose, Ph.D.Professor of StatisticsProfessor of Statistics

Director, Director, Data Mining @CCSUData Mining @CCSUEditor, Editor, Wiley Series on Methods and Applications in Data MiningWiley Series on Methods and Applications in Data Mining

[email protected]@ccsu.edu www.math.ccsu.edu/larose www.math.ccsu.edu/larose

University of Rhode IslandDepartment of Computer Science and Statistics

March 30, 2007

Page 2: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 22

OverviewOverview• Part One: Part One:

– A Brief Overview of Data MiningA Brief Overview of Data Mining

• Part Two: Part Two: – An Example of Data Mining:An Example of Data Mining:– Modeling Response to Direct Mail MarketingModeling Response to Direct Mail Marketing

• But first, a shameless plug …But first, a shameless plug …

Page 3: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 33

Master of Science in DM at CCSUMaster of Science in DM at CCSU FacultyFaculty

• Dr. Roger Bilisoly (from Ohio State Univ., Statistics)Dr. Roger Bilisoly (from Ohio State Univ., Statistics)– Text Mining, Intro to Data Mining Text Mining, Intro to Data Mining

• Dr. Darius Dziuda (from Warsaw Polytechnic Univ, CS)Dr. Darius Dziuda (from Warsaw Polytechnic Univ, CS)– Data Mining for Genomics and Proteomics, Biomarker DiscoveryData Mining for Genomics and Proteomics, Biomarker Discovery

• Dr. Zdravko Markov (from Sofia Univ, CS)Dr. Zdravko Markov (from Sofia Univ, CS)– Data Mining (CS perspective), Machine LearningData Mining (CS perspective), Machine Learning

• Dr. Daniel Miller (from UConn, Statistics)Dr. Daniel Miller (from UConn, Statistics)– Applied Multivariate Analysis, Mathematical Statistics II, Intro to Applied Multivariate Analysis, Mathematical Statistics II, Intro to

Data MiningData Mining

• Dr. Krishna Saha (from Univ of Windsor, Statistics)Dr. Krishna Saha (from Univ of Windsor, Statistics)– Intro to Data Mining using RIntro to Data Mining using R

• Dr. Daniel Larose (Program Director) (from UConn, Dr. Daniel Larose (Program Director) (from UConn, Statistics)Statistics)– Intro to Data Mining, Data Mining Methods, Applied Data Intro to Data Mining, Data Mining Methods, Applied Data

Mining, Web MiningMining, Web Mining

Page 4: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 44

Master of Science in DM at CCSUMaster of Science in DM at CCSU Program (36 credits) Program (36 credits)

• Core Courses (27 credits) All available online.Core Courses (27 credits) All available online.– Stat 521 Introduction to Data Mining (4 cr)Stat 521 Introduction to Data Mining (4 cr)– Stat 522 Data Mining Methods (4 cr)Stat 522 Data Mining Methods (4 cr)– Stat 523 Applied Data Mining (4 cr)Stat 523 Applied Data Mining (4 cr)– Stat 525 Web MiningStat 525 Web Mining– Stat 526 Data Mining for Genomics and ProteomicsStat 526 Data Mining for Genomics and Proteomics– Stat 527 Text MiningStat 527 Text Mining– Stat 416 Mathematical Statistics IIStat 416 Mathematical Statistics II– Stat 570 Applied Multivariate AnalysisStat 570 Applied Multivariate Analysis

• Electives ( 6 credits. Choose two) Electives ( 6 credits. Choose two) – CS 570 Topics in Artificial Intelligence: Machine LearningCS 570 Topics in Artificial Intelligence: Machine Learning– CS 580 Topics in Advanced Database: Data MiningCS 580 Topics in Advanced Database: Data Mining– Stat 455 Experimental DesignStat 455 Experimental Design– Stat 551 Applied Stochastic ProcessesStat 551 Applied Stochastic Processes– Stat 567 Linear ModelsStat 567 Linear Models– Stat 575 Mathematical Statistics III  Stat 575 Mathematical Statistics III  – Stat 529 Current Issues in Data Mining   Stat 529 Current Issues in Data Mining                                                          

• Capstone Requirement: Stat 599 Thesis (3 credits)Capstone Requirement: Stat 599 Thesis (3 credits)

Page 5: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 55

Master of Science in DM at Master of Science in DM at CCSUCCSU• Only MS in DM that is entirely online.• Some courses available on campus.• Student must come to CCSU to present Thesis• We reach students in about 30 US States and a dozen

foreign countries• Half of our students already have master’s degrees• About 15% already have Ph.D.’s• Typical student is a mid-career professional• Backgrounds are diverse: Computer Science, Engineering,

Finance, Chemistry, Database Admin, Statistics, etc.• www.ccsu.edu/datamining

Page 6: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 66

Graduate Certificate in Data Graduate Certificate in Data MiningMining• 18 Credits:18 Credits:• Required Courses Required Courses (12 credits) (12 credits)

– Stat 521 Introduction to Data Mining Stat 521 Introduction to Data Mining – Stat 522 Data Mining Methods and Models Stat 522 Data Mining Methods and Models – Stat 523 Applied Data Mining Stat 523 Applied Data Mining

• Elective Courses (6 credits. Choose Two):Elective Courses (6 credits. Choose Two):– Stat 525 Web Mining Stat 525 Web Mining – Stat 526 Data Mining for Genomics and Proteomics Stat 526 Data Mining for Genomics and Proteomics – Stat 527 Text Mining Stat 527 Text Mining – Stat 529 Current Issues in Data Mining Stat 529 Current Issues in Data Mining – Some other graduate-level data mining or statistics course, with Some other graduate-level data mining or statistics course, with

approval of advisor. approval of advisor.

• No Mathematical Statistics requirement.No Mathematical Statistics requirement.

Page 7: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 77

Material for Part I Drawn From:Material for Part I Drawn From:Discovering Knowledge in Data: Discovering Knowledge in Data: An Introduction to Data MiningAn Introduction to Data Mining

(Wiley, 2005)(Wiley, 2005)• Chapter 1. An Introduction to Data Mining Chapter 1. An Introduction to Data Mining • Chapter 2. Data Preprocessing Chapter 2. Data Preprocessing • Chapter 3. Exploratory Data Analysis Chapter 3. Exploratory Data Analysis • Chapter 4. Statistical Approaches to Chapter 4. Statistical Approaches to

Estimation and Prediction Estimation and Prediction • Chapter 5. K-Nearest Neighbor Chapter 5. K-Nearest Neighbor • Chapter 6. Decision Trees Chapter 6. Decision Trees • Chapter 7. Neural Networks Chapter 7. Neural Networks • Chapter 8. Hierarchical and K-Means Chapter 8. Hierarchical and K-Means

Clustering Clustering • Chapter 9. Kohonen networksChapter 9. Kohonen networks• Chapter 10. Association Rules Chapter 10. Association Rules • Chapter 11. Model Evaluation TechniquesChapter 11. Model Evaluation Techniques

Page 8: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 88

Material for Part II Drawn From:Material for Part II Drawn From:Data Mining Methods and ModelsData Mining Methods and Models

(Wiley, 2006)(Wiley, 2006)

• Chapter 1. Dimension Reduction Chapter 1. Dimension Reduction MethodsMethods

• Chapter 2. Regression Modeling Chapter 2. Regression Modeling • Chapter 3. Multiple Regression and Chapter 3. Multiple Regression and

Model Building Model Building • Chapter 4. Logistic RegressionChapter 4. Logistic Regression• Chapter 5. Naïve Bayes Classification Chapter 5. Naïve Bayes Classification

and Bayesian Networksand Bayesian Networks• Chapter 6. Genetic Algorithms Chapter 6. Genetic Algorithms • Chapter 7. Case Study: Chapter 7. Case Study: Modeling Modeling

Response to Direct-Mail MarketingResponse to Direct-Mail Marketing

Page 9: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 99

No Material Drawn From:No Material Drawn From:Data Mining the Web: Uncovering Data Mining the Web: Uncovering

Patterns in Web Content, Structure, and Patterns in Web Content, Structure, and UsageUsage

(Wiley, April 2007)(Wiley, April 2007)

• Part One: Web Structure MiningPart One: Web Structure Mining– Information Retrieval and Web SearchInformation Retrieval and Web Search– Hyperlink-Based RankingHyperlink-Based Ranking

• Part Two: Web Content MiningPart Two: Web Content Mining– ClusteringClustering– Evaluating ClusteringEvaluating Clustering– ClassificationClassification

• Part Three: Web Usage MiningPart Three: Web Usage Mining– Data Preprocessing, Data Preprocessing, – Exploratory Data Analysis, Exploratory Data Analysis, – Association Rules, Clustering, and Association Rules, Clustering, and

Classification for Web Usage MiningClassification for Web Usage Mining

• With Dr. Zdravko Markov, Computer With Dr. Zdravko Markov, Computer Science, CCSUScience, CCSU

Page 10: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1010

Call for Book ProposalsCall for Book Proposals Wiley Series on Wiley Series on

Methods and Applications in Data Methods and Applications in Data MiningMining

• Suggested topics:Suggested topics:– Data Mining in BioinformaticsData Mining in Bioinformatics– Emerging Techniques in Data Mining Emerging Techniques in Data Mining (e.g., SVM)(e.g., SVM)– Data Mining with Evolutionary AlgorithmsData Mining with Evolutionary Algorithms– Drug Discovery Using Data MiningDrug Discovery Using Data Mining– Mining Data StreamsMining Data Streams– Visual Analysis in Data MiningVisual Analysis in Data Mining

• Books in press:Books in press:– Data Mining for Genomics and ProteomicsData Mining for Genomics and Proteomics, by Darius Dziuda, by Darius Dziuda– Practical Text Mining Using PerlPractical Text Mining Using Perl, by Roger Bilisoly , by Roger Bilisoly

• Contact Series Editor at [email protected] Series Editor at [email protected]

Page 11: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1111

What is Data Mining?What is Data Mining?• ““Data mining is the analysis Data mining is the analysis

of (often large) of (often large) observationalobservational data sets to find data sets to find unsuspectedunsuspected relationships relationships and to and to summarizesummarize the data the data in novel ways that are both in novel ways that are both understandableunderstandable and and usefuluseful to the data owner.”to the data owner.”– David Hand, Heikki Mannila & David Hand, Heikki Mannila &

Padhraic Smyth, Padhraic Smyth, Principles of Data Principles of Data Mining, Mining, MIT Press, 2001 MIT Press, 2001

Page 12: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1212

Why Data Mining?Why Data Mining?• ““We are drowning in information but We are drowning in information but

starved for knowledge.” starved for knowledge.” – John Naisbitt, John Naisbitt, MegatrendsMegatrends, 1984. , 1984.

• ““The problem is that there are not The problem is that there are not enough trained enough trained human human analysts available analysts available who are skilled at translating all of this who are skilled at translating all of this data into knowledge, and thence up the data into knowledge, and thence up the taxonomy tree into wisdom.”taxonomy tree into wisdom.”

– Daniel Larose, Daniel Larose, Discovering Knowledge in Data: An Discovering Knowledge in Data: An Introduction to Data MiningIntroduction to Data Mining, Wiley, 2005., Wiley, 2005.

Page 13: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1313

Need for Human DirectionNeed for Human Direction• Automation is no substitute for human

supervision and input. – Humans need to be actively involved

at every phase of data mining process.

•“Rather than asking where humans fit into data mining, we should instead inquire about how we may design data mining into the very human process of problem solving.”

- Daniel Larose, - Daniel Larose, Discovering Knowledge in Discovering Knowledge in Data: An Introduction to Data MiningData: An Introduction to Data Mining, , Wiley, 2005.Wiley, 2005.

Page 14: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1414

““Data Mining is Easy to Do Data Mining is Easy to Do Badly”Badly”• Black box softwareBlack box software

– Powerful, “easy-to-use” data mining algorithmsPowerful, “easy-to-use” data mining algorithms– Makes their misuse dangerous. Makes their misuse dangerous. – Too easy to point and click your way to disaster.Too easy to point and click your way to disaster.

• What is needed:What is needed:– An understanding of the An understanding of the

underlying algorithmic and underlying algorithmic and statistical model structures.statistical model structures.

– An understanding of which An understanding of which algorithms are most appropriate algorithms are most appropriate in which situations and for which in which situations and for which types of data.types of data.

Page 15: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1515

CRISP-DM: Cross-Industry Standard CRISP-DM: Cross-Industry Standard Process for Data MiningProcess for Data Mining

Page 16: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1616

CRISP: DM as a ProcessCRISP: DM as a Process1.1. Business / Research Understanding PhaseBusiness / Research Understanding Phase

Enunciate your objectives Enunciate your objectives 2.2. Data Understanding Phase: EDA Data Understanding Phase: EDA

3.3. Data Preparation Phase: Preprocessing Data Preparation Phase: Preprocessing

4.4. Modeling Phase: Fun and interesting! Modeling Phase: Fun and interesting!

5.5. Evaluation PhaseEvaluation Phase

Confluence of results? Objectives Met? Confluence of results? Objectives Met?

6.6. Deployment Phase: Use results to solve problem.Deployment Phase: Use results to solve problem. If desired: Use lessons learned to reformulate business / If desired: Use lessons learned to reformulate business /

research objective. research objective.

Page 17: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1717

What About Data Dredging?What About Data Dredging?

Data DredgingData Dredging““A sufficiently exhaustive search will A sufficiently exhaustive search will certainly throw up patterns of some certainly throw up patterns of some kind. Many of these patterns will kind. Many of these patterns will simply be a product of random simply be a product of random fluctuations, and will not represent fluctuations, and will not represent any underlying structure.”any underlying structure.”

David J. Hand, David J. Hand, Data Mining: Statistics and Data Mining: Statistics and More?More? The American StatisticianThe American Statistician, May, , May, 1998.1998.

Page 18: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1818

Guarding Against Data Dredging:Guarding Against Data Dredging:Cross-Validation is the KeyCross-Validation is the Key

• Partition the data into training set and test set. Partition the data into training set and test set. • If the pattern shows up in both data sets, If the pattern shows up in both data sets,

decreases the probability that it represents decreases the probability that it represents noise.noise.

• More generally, may use More generally, may use nn-fold cross-validation.-fold cross-validation.

Page 19: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 1919

Inference and Huge Data SetsInference and Huge Data Sets• Hypothesis testing becomes sensitive at the huge Hypothesis testing becomes sensitive at the huge

sample sizes prevalent in data mining sample sizes prevalent in data mining applications.applications.– Even very tiny effects will be found significant.Even very tiny effects will be found significant.– So, data mining tends to So, data mining tends to de-emphasize inferencede-emphasize inference

Page 20: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2020

Need for Transparency and Need for Transparency and InterpretabilityInterpretability• Data mining models should be Data mining models should be transparenttransparent

– Results should be interpretable by humansResults should be interpretable by humans• Decision Trees are transparentDecision Trees are transparent• Neural Networks tend to be opaqueNeural Networks tend to be opaque• If a customer complains about why he/she was If a customer complains about why he/she was

turned down for credit, we should be able to explain turned down for credit, we should be able to explain why, without saying “Our neural net said so.”why, without saying “Our neural net said so.”

Page 21: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2121

Part Two:Part Two:Modeling Response to Direct Mail Modeling Response to Direct Mail MarketingMarketingBusiness Understanding Phase:Business Understanding Phase:

– Clothing Store Purchase DataClothing Store Purchase Data• Results of a direct mail marketing Results of a direct mail marketing

campaigncampaign• Task: Construct a classification Task: Construct a classification

modelmodel– For classifying customers as either For classifying customers as either

respondersresponders or or non-responders non-responders to to the marketing campaign, the marketing campaign,

– To reduce costs and increase To reduce costs and increase return-on-investmentreturn-on-investment

Page 22: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2222

Data Understanding: Data Understanding: The Clothing Store datasetThe Clothing Store dataset

List of fields in the dataset (28,7999 customers, 51 fields)List of fields in the dataset (28,7999 customers, 51 fields)Customer ID: Unique, encrypted customer identification

Number of days the customer has been on file

Product uniformity (Low score = diverse spending patterns)

Zip Code Number of days between purchases Lifetime average time between visits

Number of purchase visitsMarkdown percentage on customer purchases Microvision® Lifestyle Cluster Type

Total net salesNumber of different product classes purchased Percent of Returns

Average amount spent per visitNumber of coupons used by the customer Flag: Credit card user

Amount spent at each of four different franchises (four variables)

Total number of individual items purchased by the customer Flag: Valid phone number on file

Amount spent in the past month, the past three months, and the past six months

Number of stores the customer shopped at Flag: Web shopper

Amount spent the same period last yearNumber of promotions mailed in the past year

15 variables providing the percentages spent by the customer on specific classes of clothing, including sweaters, knit tops, knit dresses, blouses, jackets, career pants, casual pants, shirts, dresses, suits, outerwear, jewelry, fashion, legwear, and the collectibles line. Also a variable showing the brand of choice (encrypted).

Gross margin percentageNumber of promotions responded to in the past year Target variable: Response to promotion

Number of marketing promotions on filePromotion response rate for the past year

Page 23: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2323

Data Preparation and EDA Data Preparation and EDA PhasePhase• Not covered in this presentation.Not covered in this presentation.

Page 24: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2424

Modeling StrategyModeling Strategy• Apply principal components analysis to address Apply principal components analysis to address

multicollinearity. multicollinearity. • Apply cluster analysis. Briefly profile clusters.Apply cluster analysis. Briefly profile clusters.• Balance the training data set. Balance the training data set. • Establish baseline model performanceEstablish baseline model performance

– In terms of expected profit per customer contacted. In terms of expected profit per customer contacted. • Apply classification algorithms to training data set: Apply classification algorithms to training data set:

– CARTCART– C5.0 (C4.5)C5.0 (C4.5)– Neural networksNeural networks– Logistic regression.Logistic regression.

Page 25: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2525

Modeling Strategy continuedModeling Strategy continued

• Evaluate each model using test data set.Evaluate each model using test data set.• Apply misclassification costs in line with cost benefit Apply misclassification costs in line with cost benefit

table.table.• Apply overbalancing as a surrogate for Apply overbalancing as a surrogate for

misclassification costs.misclassification costs.– Find best overbalancing proportion.Find best overbalancing proportion.

• Combine predictions from four modelsCombine predictions from four models– Using model voting.Using model voting.– Using mean response probabilities.Using mean response probabilities.

Page 26: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2626

Principal Components Analysis Principal Components Analysis (PCA)(PCA)• Multicollinearity does not degrade prediction Multicollinearity does not degrade prediction

accuracy.accuracy.– But muddles individual predictor coefficients.But muddles individual predictor coefficients.

• Interested in predictor characteristics, customer Interested in predictor characteristics, customer profiling, etc?profiling, etc?– Then PCA is required.Then PCA is required.

• But, if interested But, if interested solelysolely in classification in classification (prediction, estimation),(prediction, estimation),– PCA not strictly required.PCA not strictly required.

Page 27: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2727

Report Two Model Sets:Report Two Model Sets:• Model Set A:Model Set A:

– Includes principal componentsIncludes principal components– All purpose model setAll purpose model set

• Model Set B:Model Set B:– Includes correlated predictors, not principal Includes correlated predictors, not principal

componentscomponents– Use restricted to classificationUse restricted to classification

Page 28: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2828

Principal Components Analysis Principal Components Analysis (PCA)(PCA)

• Seven correlated variables.Seven correlated variables.– Two components extractedTwo components extracted– Account for 87% of Account for 87% of

variabilityvariability

Page 29: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 2929

Principal Components Analysis Principal Components Analysis (PCA)(PCA)• Principal Component 1Principal Component 1: :

– Purchasing HabitsPurchasing Habits– Customer general purchasing habitsCustomer general purchasing habits– Expect component to be strongly indicative of Expect component to be strongly indicative of

responseresponse

• Principal Component 2Principal Component 2: : – Promotion ContactsPromotion Contacts– Unclear whether component will be associated Unclear whether component will be associated

with responsewith response

• Components validated by test data setComponents validated by test data set

Page 30: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3030

BIRCH Clustering AlgorithmBIRCH Clustering Algorithm

• Requires only one pass through data setRequires only one pass through data set– Scalable for large data setsScalable for large data sets

• Benefit: Analyst need not pre-specify number of Benefit: Analyst need not pre-specify number of clustersclusters

• Drawback: Sensitive to initial records encountered Drawback: Sensitive to initial records encountered – Leads to widely variable cluster solutionsLeads to widely variable cluster solutions

• Requires “outer loop” to find consistent cluster Requires “outer loop” to find consistent cluster solutionsolution

• Zhang, Ramakrishnan and Livny, Zhang, Ramakrishnan and Livny, BIRCH: A New Data Clustering BIRCH: A New Data Clustering Algorithm and Its Applications, Algorithm and Its Applications, Data Mining and Knowledge Discovery 1, 1997.1997.

Page 31: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3131

BIRCH BIRCH ClustersClusters

• Cluster 3 shows:Cluster 3 shows:– Higher response for flag Higher response for flag

predictorspredictors– Higher averages for numeric Higher averages for numeric

predictorspredictors Cluster 1 Cluster 2 Cluster 3 z ln Purchase Visits –0.575 –0.570 1.011 z ln Total Net Sales –0.177 –0.804 0.971 z sqrt Spending Last One Month –0.279 –0.314 0.523 z ln Lifetime Average Time Between Visits 0.455 0.484 –0.835 z ln Product Uniformity 0.493 0.447 –0.834 z sqrt # Promotion Responses in Past Year –0.480 –0.573 0.950 z sqrt Spending on Sweaters –0.486 0.261 0.116

Page 32: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3232

BIRCH ClustersBIRCH Clusters• Cluster 3 has highest Cluster 3 has highest

response rate (red).response rate (red).– Cluster 1: 7.6%Cluster 1: 7.6%– Cluster 2: 7.1%Cluster 2: 7.1%– Cluster 3: 33.0%Cluster 3: 33.0%

Page 33: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3333

Balancing the DataBalancing the Data• For “rare” classes, For “rare” classes,

provides more provides more equitable distribution.equitable distribution.

• Drawback: Loss of Drawback: Loss of data:data:– Here, 40% of non-Here, 40% of non-

responders randomly responders randomly omittedomitted

– All responders retainedAll responders retained– Responders increases Responders increases

from 16.58% to 24.76%from 16.58% to 24.76%• Test data set should Test data set should

never be balancednever be balanced

Page 34: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3434

False Positive vs. False Negative:False Positive vs. False Negative:Which is Worse?Which is Worse?

• For direct mail marketing, a For direct mail marketing, a false false negative errornegative error is probably worse than is probably worse than a false positive.a false positive.

• Generate misclassification costs Generate misclassification costs based on the observed data.based on the observed data.– Construct cost-benefit tableConstruct cost-benefit table

Page 35: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3535

Decision Cost / Benefit AnalysisDecision Cost / Benefit AnalysisOutcome Classified Actual Cost Rationale

True Negative NoNo NoNo $0

No contact made; no

revenue lost

True Positive YesYes YesYes

-$26.4

0

(Anticipated revenue) –

(Cost of contact)

False Negative NoNo YesYes $28.4

0Loss of

anticipated revenue

False Positive YesYes NoNo $2.00 Cost of contact

Page 36: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3636

Establish Baseline Model Establish Baseline Model PerformancePerformance• BenchmarksBenchmarks

– ““Don’t Send a Marketing Promotion to Anyone” Don’t Send a Marketing Promotion to Anyone” ModelModel

– ““Send a Marketing Promotion to Everyone” Send a Marketing Promotion to Everyone” ModelModel• Will compare candidate models against this baseline Will compare candidate models against this baseline

error rate.error rate.Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost

“Don’t Send Anyone” 5908 0 1151 0 16.3% $32,688.40 ($4.63 per customer)

“Send to Everyone” 0 1151 0 5908 83.7% -$18,570.40 (-$2.63 per customer)

Page 37: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3737

Model Set A Model Set A (With 50% Balancing)(With 50% Balancing)

• No model beats benchmark of $2.63 profit per customer

• Misclassification costs had not been applied

• Now define FN cost = $28.40, FP cost = $2– Outperformed Outperformed

baseline “Send baseline “Send to everyone” to everyone” modelmodel

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer

Neural Network 4694 672 479 9.3%

1214 64.4% 24.0% -$0.24

CART 4348 829 322 6.9%

1560 65.3% 26.7% -$1.36

C5.0 4465 782 369 7.6%

1443 64.9% 25.7% -$1.03

Logistic Regression 4293 872 279 6.1%

1615 64.9% 26.8% -$1.68

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer

CART 754 1147 4 0.5%

5154 81.8% 73.1% -$2.81

C5.0 858 1143 8 0.9%

5050 81.5% 71.7% -$2.81

Page 38: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3838

Model Set A: Model Set A: Effect of Misclassification CostsEffect of Misclassification Costs• For the 447 highlighted records:For the 447 highlighted records:

– Only 20.8% responded. Only 20.8% responded. – But model predicts positive response.But model predicts positive response.– Due to high false negative misclassification cost.Due to high false negative misclassification cost.

Page 39: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 3939

Model Set A: Model Set A: PCA Component 1 is Best PredictorPCA Component 1 is Best Predictor• First principal component ($F-PCA-1), First principal component ($F-PCA-1),

Purchasing Habits, represents both the Purchasing Habits, represents both the root node split and the secondary splitroot node split and the secondary split– Most important factor for predicting responseMost important factor for predicting response

Page 40: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4040

Over-Balancing as a Surrogate for Over-Balancing as a Surrogate for Misclassification CostsMisclassification Costs• Software limitation:Software limitation:• Neural network and logistic regression models in Neural network and logistic regression models in

Clementine:Clementine:– Lack methods for applying misclassification costsLack methods for applying misclassification costs

• Over-balancingOver-balancing is an alternate method which can is an alternate method which can achieve similar resultsachieve similar results

• Starves the classifier of instances of non-responseStarves the classifier of instances of non-response

Page 41: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4141

Over-Balancing as a Surrogate for Over-Balancing as a Surrogate for Misclassification CostsMisclassification Costs• Neural network model resultsNeural network model results

– Three over-balanced models outperform baselineThree over-balanced models outperform baseline• Properly applied, over-balancing can be used as a Properly applied, over-balancing can be used as a

surrogate for misclassification costssurrogate for misclassification costs

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer No Balancing

(16.3% - 83.7%) 5865 124 1027 14.9%

43 25.7% 15.2% +$3.68

50% - 50% Balancing 4694 672 479

9.3% 1214

64.4% 24.0% -$0.24

65% - 35% Over-Balancing 1918 1092 59

3.0% 3990

78.5% 57.4% -$2.72

80% - 20% Over-Balancing 1032 1129 22

2.1%

4876 81.2%

69.4% -$2.75

90% - 10% Over-Balancing 592 1141 10

1.7% 5316

82.3% 75.4% -$2.72

Page 42: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4242

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer Neural Network

885 1132 19 2.1%

5023 81.6% 71.4% -$2.73

CART 1724 1111 40 2.3%

4184 79.0% 59.8% -$2.81

C5.0 1467 1116 35 2.3%

4441 79.9% 63.4% -$2.77

Logistic Regression 2389 1106 45 1.8%

3519 76.1% 50.5% -$2.96

Over-Balancing as a Surrogate for Over-Balancing as a Surrogate for Misclassification CostsMisclassification Costs

• Apply 80% - 20% over-balancing to Apply 80% - 20% over-balancing to the other models.the other models.

Page 43: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4343

Combination Models: VotingCombination Models: Voting• Smoothes out strengths and weaknesses of each modelSmoothes out strengths and weaknesses of each model

– Each model supplies a prediction for each recordEach model supplies a prediction for each record– Count the votes for each recordCount the votes for each record

• Disadvantage of combination models:Disadvantage of combination models:– Lack of easy interpretabilityLack of easy interpretability

• Four competing combination models…Four competing combination models…

Page 44: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4444

Combination Models: VotingCombination Models: VotingMail a Promotion only if:Mail a Promotion only if:• All fourAll four models predict response models predict response

– Protects against false positiveProtects against false positive– All four classification algorithms must agree on All four classification algorithms must agree on

a positive predictiona positive prediction• At least threeAt least three models predict response models predict response• At least twoAt least two models predict response models predict response• AnyAny model predicts response model predicts response

– Protects against false negativesProtects against false negatives

Page 45: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4545

Combination Model

TN Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer Mail a Promotion Only

if All Four Models Predict Response

2772 1067 84 2.9%

3136 74.6% 45.6% -$2.76

Mail a Promotion Only if Three or Four Models

Predict Response 1936 1115 36

1.8% 3972

78.1% 56.8% -$2.90

Mail a Promotion Only if At Least Two Models

Predict Response 1207 1135 16

1.3% 4701

80.6% 66.8% -$2.85

Mail a Promotion if Any Model Predicts

Response 550 1148 3

0.5% 5358

82.4% 75.9% -$2.76

Combination Models: VotingCombination Models: Voting• None beat the logistic regression model: $2.96 profit per None beat the logistic regression model: $2.96 profit per

customercustomer• Perhaps combination models will do better with Model Collection Perhaps combination models will do better with Model Collection

B…B…

Page 46: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4646

Model Collection B: Non-PCA Model Collection B: Non-PCA ModelsModels

• Models retain correlated variables Models retain correlated variables – Use restricted to prediction onlyUse restricted to prediction only

• Since the correlated variables are highly predictive Since the correlated variables are highly predictive – Expect Collection B will outperform the PCA modelsExpect Collection B will outperform the PCA models

Page 47: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4747

Model Collection B: CART and C5.0Model Collection B: CART and C5.0• Using misclassification costs, and 50% Using misclassification costs, and 50%

balancingbalancing• Both models outperform the best PCA model Both models outperform the best PCA model

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer

CART 1645 1140 11 0.7%

4263 78.9% 60.5% -$3.01

C5.0 1562 1147 4 0.3%

4346 79.1% 61.6% -$3.04

Page 48: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4848

Model TN

Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer

Neural Network 1301 1123 28 2.1%

4607 80.4% 65.7% -$2.78

CART 2780 1100 51 1.8%

3128 74.0% 45.0% -$3.02

C5.0 2640 1121 30 1.1%

3268 74.5% 46.7% -$3.15

Logistic Regression 2853 1110 41 1.4%

3055 73.3% 43.9% -$3.12

Model Collection B: Over-BalancingModel Collection B: Over-Balancing• Apply over-balancing as a surrogate for Apply over-balancing as a surrogate for

misclassification costs for all modelsmisclassification costs for all models• Best performance thus far.Best performance thus far.

Page 49: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 4949

Combination Model

TN Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer Mail a Promotion Only

if All Four Models Predict Response

3307 1065 86 2.5%

2601 70.9% 38.1% -$2.90

Mail a Promotion Only if Three or Four Models

Predict Response 2835 1111 40

1.4% 3073

73.4% 44.1% -$3.12

Mail a Promotion Only if At Least Two Models

Predict Response 2357 1133 18

0.7% 3551

75.8% 50.6% -$3.16

Mail a Promotion if Any Model Predicts

Response 1075 1145 6

0.6% 4833

80.8% 68.6% -$2.89

Combination Models: VotingCombination Models: Voting• Combine the four models via voting and 80%-20% Combine the four models via voting and 80%-20%

over-balancingover-balancing• Synergy: Combination model outperforms any Synergy: Combination model outperforms any

individual model.individual model.

Page 50: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 5050

Combining Models Using Combining Models Using Mean Response ProbabilitiesMean Response Probabilities• Combine the confidences that each Combine the confidences that each

model reports for its decisionsmodel reports for its decisions– Allows finer tuning of the decision spaceAllows finer tuning of the decision space

• Derive a new variable:Derive a new variable:– Mean Response Probability Mean Response Probability (MRP):(MRP):

•Average of response confidences of the four Average of response confidences of the four models.models.

Page 51: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 5151

Combining Models Using Combining Models Using Mean Response ProbabilitiesMean Response Probabilities• Multi-modality due to the discontinuity of the Multi-modality due to the discontinuity of the

transformation used in derivation of MRPtransformation used in derivation of MRP

Page 52: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 5252

Combining Models Using Combining Models Using Mean Response ProbabilitiesMean Response Probabilities• Where shall we define response vs. non-response?Where shall we define response vs. non-response?

– Recall that FN is 14.2 times worse than FP Recall that FN is 14.2 times worse than FP – Set partitions on the low side => fewer FN decisions are madeSet partitions on the low side => fewer FN decisions are made

Page 53: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 5353

Combining Models Using Combining Models Using Mean Response ProbabilitiesMean Response Probabilities• Optimal partition: near Optimal partition: near

50%.50%.• Mail a promotion to a Mail a promotion to a

prospective customer prospective customer only if the mean response only if the mean response probability is at least 50%probability is at least 50%

• Best model in case study.Best model in case study.– MRP = 0.51 MRP = 0.51

• $3.1744 profit $3.1744 profit – ““send to everyone” send to everyone”

• $2.62 profit$2.62 profit– 20.7% profit 20.7% profit

enhancement enhancement (54.44 cents)(54.44 cents)

Combination Model

TN Cost $0

TP Cost

– $26.4

FN Cost

$28.40

FP Cost $2.00

Overall Error Rate

Overall Cost per

Customer

95.095.0

:MRPMRP

Partition 5648 353 798 12.4%

260 42.4%

15.0% +$1.96

85.085.0

:MRPMRP

Partition 3810 994 157 4.0%

2098 67.8% 31.9% -$2.49

65.065.0

:MRPMRP

Partition 2995 1104 47 1.5%

2913 72.5% 41.9% -$3.11

54.054.0

:MRPMRP

Partition 2796 1113 38 1.3%

3112 73.7%

44.6% -$3.13

52.052.0

:MRPMRP

Partition 2738 1121 30 1.1%

3170 73.9% 45.3% -$3.1736

51.051.0

:MRPMRP

Partition 2686 1123 28 1.0%

3222 74.2% 46.0% -$3.1744

50.050.0

:MRPMRP

Partition 2625 1125 26 1.0%

3283 74.5% 46.9% -$3.1726

46.046.0

:MRPMRP

Partition 2493 1129 22 0.9%

3415 75.2% 48.7% -$3.166

42.042.0

:MRPMRP

Partition 2369 1133 18 0.8%

3539 75.7% 50.4% -$3.162

Page 54: An Overview and Example of Data Mining

URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - LaroseURI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose 5454

SummarySummary

• For more on this Case Study, see For more on this Case Study, see Data Data Mining Methods and ModelsMining Methods and Models (Wiley, 2006) (Wiley, 2006)

• So, the best part about all this is:So, the best part about all this is:– Data mining is fun!Data mining is fun!– If you love to play with data, and you If you love to play with data, and you

love to construct and evaluate models, love to construct and evaluate models, then data mining is for you.then data mining is for you.