Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94...

12
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999

Transcript of Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94...

Page 1: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

Fast Algorithms for Mining Association Rules

Rakesh Agrawal and Ramakrishnan Srikant

VLDB '94

presented by

kurt partridge

cse 590db

oct 4, 1999

Page 2: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

2

DB of "Basket Data"

TID items

100 1 3 4

200 2 3 5

300 1 2 3 5

400 2 5

Mining Association RulesMining Association Rules

ons transactiall

containing nstransactiosupport

containing nstransactio

containing nstransactio confidence

ca

a

ca

ISIS

IS

ISIS

association rules

{1} => {3}

{2,3} => {5}

{2,5} => {3}

association rule metrics:

Page 3: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

3

• Step I: Find all itemsets with minimum support (minsup)

• Step II: Generate rules from minsup'ed itemsets

General StrategyGeneral Strategy

support itemsets

0.25 {4}, {1,2}, {1,4}, {1,5}, {3,4},{1,3,4}, {1,2,3}, {1,2,5}, {1,3,5},{1,2,3,5}

0.5 {1}, {1,3}, {2,3}, {3,5}, {2,3,5}0.75 {2}, {3}, {5}, {2,5}

support confidence rules

0.5 66% {3}=>{1}, {3}=>{2}, {2}=>{3}, {3}=>{5},{5}=>{3}, {5}=>{2,3}, {3}=>{2,5},{2}=>{3,5}, {5,2}=>{3}, {5,3}=>{2}

0.5 100% {1}=>{3}, {5,3}=>{2}, {2,3}=>{5}

0.75 100% {5}=>{2}, {2}=>{5}

TID items

100 1 3 4

200 2 3 5300 1 2 3 5

400 2 5

Page 4: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

4

Step I: Finding Minsup ItemsetsStep I: Finding Minsup Itemsets

• Key fact:

Adding items to an itemset never increases its support

• General Strategy: Proceed inductively on itemset size

• Apriori Algorithm:

1. Base case: Begin with all minsup itemsets of size 1 (L1)

2. Without peeking at the DB, generate candidate itemsets ofsize k (Ck) from Lk-1

3. Remove candidate itemsets that contain unsupported subsets

4. Further refine Ck using the database to produce Lk

repe

at

Page 5: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

5

Algorithm to Guess ItemsetsAlgorithm to Guess Itemsets

• Naïve way:

• Extend all itemsets with all possible items

• More sophisticated:

• Join Lk-1 with itself, adding only a single, final item

• e.g.: {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2, 3, 4} produces{1 2 3 4} and {1 3 4 5}

• Remove itemsets with an unsupported subset

• e.g.: {1 3 4 5} has an unsupported subset: {1 4 5} if minsup = 50%

• Use the database to further refine Ck

Page 6: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

6

ExampleExampleTID items

100 1 3 4

200 2 3 5300 1 2 3 5

400 2 5

Page 7: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

7

Part II: Generating RulesPart II: Generating Rules

• Key fact:

Moving items from the antecedent to the consequent never changes support, and never increases confidence

• Algorithm

• For each itemset IS with minsup:

• Find all minconf rules with a single consequent of the form (IS - L1 => L1 )

• Guess candidate consequents Ck by appending items from IS - Lk-1 to Lk-1

• Verify confidence of each rule IS - Ck => Ck using known itemset support valuesre

peat

Page 8: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

8

Other DetailsOther Details

• Itemset hash trees for subset testing

• Buffering

• Variations

• Fewer database passes, itemsets from multiple iterations

• AprioriTID -- exclude unnecessary database records

• AprioriHybrid -- use either Apriori or AprioriTID

• Future Work:

• Multiple ISA Taxonomies

• constraints on rules (e.g. # of items)

Page 9: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

9

Subsequent PapersSubsequent Papers

• Mining sequenced rules

• Finding "interesting" rules

• Efficiently handling long itemsets

• Integration with query optimizers

• Adjustments to handle dense/relational databases

• Apply constraints to further filter association rules

Page 10: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

10

QuestionsQuestions

• How are rules ranked? Do the minsup and minconf find interesting rules? Do they omit any interesting rules?

• What about maximum support?

• How well will this approach work for other problems (e.g. clustering, classification)?

Page 11: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

11

AprioriApriori

Page 12: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

12

AprioriApriori

• Join operation

• Subset filtering