Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94...
-
Upload
hortense-hunter -
Category
Documents
-
view
220 -
download
0
Transcript of Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94...
![Page 1: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/1.jpg)
Fast Algorithms for Mining Association Rules
Rakesh Agrawal and Ramakrishnan Srikant
VLDB '94
presented by
kurt partridge
cse 590db
oct 4, 1999
![Page 2: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/2.jpg)
2
DB of "Basket Data"
TID items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Mining Association RulesMining Association Rules
ons transactiall
containing nstransactiosupport
containing nstransactio
containing nstransactio confidence
ca
a
ca
ISIS
IS
ISIS
association rules
{1} => {3}
{2,3} => {5}
{2,5} => {3}
association rule metrics:
![Page 3: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/3.jpg)
3
• Step I: Find all itemsets with minimum support (minsup)
• Step II: Generate rules from minsup'ed itemsets
General StrategyGeneral Strategy
support itemsets
0.25 {4}, {1,2}, {1,4}, {1,5}, {3,4},{1,3,4}, {1,2,3}, {1,2,5}, {1,3,5},{1,2,3,5}
0.5 {1}, {1,3}, {2,3}, {3,5}, {2,3,5}0.75 {2}, {3}, {5}, {2,5}
support confidence rules
0.5 66% {3}=>{1}, {3}=>{2}, {2}=>{3}, {3}=>{5},{5}=>{3}, {5}=>{2,3}, {3}=>{2,5},{2}=>{3,5}, {5,2}=>{3}, {5,3}=>{2}
0.5 100% {1}=>{3}, {5,3}=>{2}, {2,3}=>{5}
0.75 100% {5}=>{2}, {2}=>{5}
TID items
100 1 3 4
200 2 3 5300 1 2 3 5
400 2 5
![Page 4: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/4.jpg)
4
Step I: Finding Minsup ItemsetsStep I: Finding Minsup Itemsets
• Key fact:
Adding items to an itemset never increases its support
• General Strategy: Proceed inductively on itemset size
• Apriori Algorithm:
1. Base case: Begin with all minsup itemsets of size 1 (L1)
2. Without peeking at the DB, generate candidate itemsets ofsize k (Ck) from Lk-1
3. Remove candidate itemsets that contain unsupported subsets
4. Further refine Ck using the database to produce Lk
repe
at
![Page 5: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/5.jpg)
5
Algorithm to Guess ItemsetsAlgorithm to Guess Itemsets
• Naïve way:
• Extend all itemsets with all possible items
• More sophisticated:
• Join Lk-1 with itself, adding only a single, final item
• e.g.: {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2, 3, 4} produces{1 2 3 4} and {1 3 4 5}
• Remove itemsets with an unsupported subset
• e.g.: {1 3 4 5} has an unsupported subset: {1 4 5} if minsup = 50%
• Use the database to further refine Ck
![Page 6: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/6.jpg)
6
ExampleExampleTID items
100 1 3 4
200 2 3 5300 1 2 3 5
400 2 5
![Page 7: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/7.jpg)
7
Part II: Generating RulesPart II: Generating Rules
• Key fact:
Moving items from the antecedent to the consequent never changes support, and never increases confidence
• Algorithm
• For each itemset IS with minsup:
• Find all minconf rules with a single consequent of the form (IS - L1 => L1 )
• Guess candidate consequents Ck by appending items from IS - Lk-1 to Lk-1
• Verify confidence of each rule IS - Ck => Ck using known itemset support valuesre
peat
![Page 8: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/8.jpg)
8
Other DetailsOther Details
• Itemset hash trees for subset testing
• Buffering
• Variations
• Fewer database passes, itemsets from multiple iterations
• AprioriTID -- exclude unnecessary database records
• AprioriHybrid -- use either Apriori or AprioriTID
• Future Work:
• Multiple ISA Taxonomies
• constraints on rules (e.g. # of items)
![Page 9: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/9.jpg)
9
Subsequent PapersSubsequent Papers
• Mining sequenced rules
• Finding "interesting" rules
• Efficiently handling long itemsets
• Integration with query optimizers
• Adjustments to handle dense/relational databases
• Apply constraints to further filter association rules
![Page 10: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/10.jpg)
10
QuestionsQuestions
• How are rules ranked? Do the minsup and minconf find interesting rules? Do they omit any interesting rules?
• What about maximum support?
• How well will this approach work for other problems (e.g. clustering, classification)?
![Page 11: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/11.jpg)
11
AprioriApriori
![Page 12: Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.](https://reader036.fdocuments.net/reader036/viewer/2022082819/56649f295503460f94c426ae/html5/thumbnails/12.jpg)
12
AprioriApriori
• Join operation
• Subset filtering