IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association...

18
IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 1 IBM SPSS

Transcript of IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association...

Page 1: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Data Mining ConceptsIntroduction to Undirected Data Mining: Association Analysis

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 1

IBM SPSS

Page 2: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Association Analysis Also referred to as

Affinity Analysis

Market Basket Analysis

For MBA, basically means what is being purchased together

•Association rules represent patterns without a specific target; thus undirected or unsupervised data mining

•Fits in the Exploratory category of data mining

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

2

Page 3: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Association RulesOther potential uses

◦ Items purchases on credit card give insight to next produce or service purchased

◦ Help determine bundles for telcoms◦ Help bankers determine identify customers for other

services◦ Unusual combinations of things like insurance claims

may need further investigation◦ Medical histories may give indications of

complications or helpful combinations for patients

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

3

Page 4: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Defining MBAMBA data

◦ Customers◦ Purchases (baskets or item sets)◦ Items

Figure 9-3 set of tables◦ Purchase (Order) is the fundamental data structure

Individual items are line items Product –descriptive info Customer info can be helpful

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

4

Page 5: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Levels of Data

Adapted from Barry & Linoff

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

5

Page 6: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

MBA The three levels of data are important for MBA. They can

be used to answer a number of questions◦ Average number of baskets/customer/time unit◦ Average unique items per customer◦ Average number of items per basket◦ For a given product, what is the proportion of customers who

have ever purchased the product?◦ For a given product, what is the average number of baskets per

customer that include the item◦ For a given product, what is the average quantity purchased in

an order when the product is purchased?

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

6

Page 7: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Item PopularityMost common item in one-item basketsMost common item in multi-item basketsMost common items among repeat customersChange in buying patterns of item over timeBuying pattern for an item by regionTime and geography are two of the most

important attributes of MBA data

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

7

Page 8: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Tracking Market Interventions

Adapted from Barry & Linoff

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

8

Page 9: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Association RulesActionable Rules

◦ Wal-Mart customers who purchase Barbie dolls have a 60 percent likelihood of also purchasing one of three types of candy bars

Trivial Rules◦ Customers who purchase maintenance agreements

are very likely to purchase a large applianceInexplicable Rules

◦ When a new hardware store opens, one of the most commonly sold items is toilet cleaners

Adapted from Barry & Linoff

Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

9

Page 10: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

What exactly is an Association Rule?Of the form:

IF antecedent THEN consequent

If (orange juice, milk) Then (bread, bacon)

Rules include measure of support and confidence

Prepared by David Douglas, University of Arkansas 10

Page 11: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

How good is an Association Rule?Transactions can be converted to Co-occurrence

matricesCo-occurrence tables highlight simple patternsConfidence and support can be directly

determined from a co-occurrence tableOr by counting via SQL, etc.DM software makes the presentation easy

Prepared by David Douglas, University of Arkansas 11

Page 12: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Co-Occoncurrence Table

OJ WC Milk Soda Det

OJ

WC -

Milk - -

Soda - - -

Det - - - -

Customer Items

1 Orange juice, soda

2 Milk, orange juice, window cleaner

3 Orange juice, detergent

4 Orange juice, detergent, soda

5 Window cleaner, milk

Prepared by David Douglas, University of Arkansas 12

Page 13: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Co-Occoncurrence Table

OJ WC Milk Soda Det

OJ 4 1 1 2 2

WC - 2 2 0 0

Milk - - 2 0 0

Soda - - - 2 1

Det - - - - 2

Customer Items

1 Orange juice, soda

2 Milk, orange juice, window cleaner

3 Orange juice, detergent

4 Orange juice, detergent, soda

5 Window cleaner, milk

Prepared by David Douglas, University of Arkansas 13

Page 14: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Confidence, Support and LiftSupport for the rule

# records with both antecedent and consequent Total # records

Confidence for the rule# records with both antecedent and consequent # records of the antecedent

Expected Confidence # records of the consequent Total # records

LiftConfidence / Expected Confidence

Prepared by David Douglas, University of Arkansas 14

Page 15: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Confidence and Support Rule: If soda then orange juice

From the co-occurrence table, soda and orange juice occur together 2 times (out of 5 total transactions)

Thus, support for the rule is 2/5 or 40%

Confidence for the rule:Soda occurs 2 times; so confidence of orange juice given soda would be 2/2 or 100%

Lift for the rule: Confidence / Expected Confidenceconfidence = 100%; expected confidence=80%lift = 1.0/.8 = 1.25

Rule: If orange juice then sodasupport for the rule is the same—40%

orange juice occurs 4 times; so confidence of soda given orange juice is 2/4 or 50%

lift = .5/.8

Prepared by David Douglas, University of Arkansas 15

Page 16: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Building Association Rules

Adapted from Barry & Linoff

Prepared by David Douglas, University of Arkansas 16

Page 17: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Product Hierarchies

Prepared by David Douglas, University of Arkansas 17

Page 18: IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

IBM SPSS Modeler 14.2

Hosted by the University of Arkansas

Lessons LearnedMBA is complex and no one technique is powerful

enough to provide all the answers.Three levels—Order (basket), line items and

customerMBA can answer a number of questionsAssociation rules most common technique for

MBAGenerate rules--support, confidence and lift

Prepared by David Douglas, University of Arkansas 18