Apriori algorithm
-
Upload
junghoon-kim -
Category
Technology
-
view
2.101 -
download
4
Transcript of Apriori algorithm
1KAIST Knowledge Service Engineering
Data Mining Lab.
Apriori AlgorithmJung Hoon Kim
N5, Room 2239 E-mail: [email protected]
2014.01.07
Introduction
Frequent pattern and association rule mining is one of the few exceptions to emerge from machine learning
Apriori algorithmAprioriTid algorithmAprioriAll algorithmFP-Tree algorithm
2KAIST Knowledge Service Engineering
Data Mining Lab.
Notation
be a set of items = Transaction IDSupport : Confidence : Set of large k-itemsets : Set of candidate k-itemsets
3KAIST Knowledge Service Engineering
Data Mining Lab.
Principle
4KAIST Knowledge Service Engineering
Data Mining Lab.
downward closure property.If an itemset is frequenct,
then all of its subsets must also be frequent
if an itemset is not frequent, any of its superset is never frequent
Apriori algorithm
Pseudo code
5KAIST Knowledge Service Engineering
Data Mining Lab.
Example
6KAIST Knowledge Service Engineering
Data Mining Lab.
Discussion
Too many database scanning makes high computationNeed minsup & minconf to be specified in advance.Use hash-tree to store the candidate itemsets.
Sometimes it adapt trie-structure to store sets.
7KAIST Knowledge Service Engineering
Data Mining Lab.
AprioriTid
To avoid database scan repeatly.
Principle1) The database is not used at all for couning the support of
candidate itemsets after the first pass2) The candidate itemsets are generated the same way as in
Apriori algorithm3) Another set is generated of which each member has the
TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset.
8KAIST Knowledge Service Engineering
Data Mining Lab.
AprioriTid
9KAIST Knowledge Service Engineering
Data Mining Lab.
AprioriTid
10KAIST Knowledge Service Engineering
Data Mining Lab.
AprioriTid
11KAIST Knowledge Service Engineering
Data Mining Lab.
FP-Growth
To avoid scanning multiple databasethe cost of database is too high !!
To avoid making lots of candidatesin apriori algorithm, the bottleneck is generation of
candidate
How can solve these problems?
12KAIST Knowledge Service Engineering
Data Mining Lab.
FP-Growth
Algorithm was too simple
1. Scan the database once, find frequent 1-itemsets (single item patterns)
2. Sort the frequent items in frequency descending order, f-list(F-list = f-c-a-b-m-p)
3. Scan the DB again, construct the FP-tree13
KAIST Knowledge Service EngineeringData Mining Lab.
FP-Growth Algorithm
14KAIST Knowledge Service Engineering
Data Mining Lab.
FP-Tree
Scanning the transaction with TID=100
15KAIST Knowledge Service Engineering
Data Mining Lab.
FP-Tree
Scanning the transaction with TID=200
16KAIST Knowledge Service Engineering
Data Mining Lab.
FP-Tree
Final FP-Tree
17KAIST Knowledge Service Engineering
Data Mining Lab.
Mine a FP-Tree
I. forming conditional pattern basesII. constructing conditional FP-treesIII. recursively mining conditional FP-trees
18KAIST Knowledge Service Engineering
Data Mining Lab.
Conditional pattern base
frequent itemset as a co-occurring suffix pattern
for examplem : <f, c, a> : support / 2m : <f,c,a,b> : support / 1
19KAIST Knowledge Service Engineering
Data Mining Lab.
Conditional pattern tree
{m}’s conditional pattern tree
20KAIST Knowledge Service Engineering
Data Mining Lab.
Pseudo Code
21KAIST Knowledge Service Engineering
Data Mining Lab.
Conclusion
In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout.
22KAIST Knowledge Service Engineering
Data Mining Lab.
23KAIST Knowledge Service Engineering
Data Mining Lab.
Thank you