Apriori algorithm

23
1 KAIST Knowledge Service Engineering Data Mining Lab. Apriori Algorithm Jung Hoon Kim N5, Room 2239 E-mail: [email protected] 2014.01.07

Transcript of Apriori algorithm

Page 1: Apriori algorithm

1KAIST Knowledge Service Engineering

Data Mining Lab.

Apriori AlgorithmJung Hoon Kim

N5, Room 2239 E-mail: [email protected]

2014.01.07

Page 2: Apriori algorithm

Introduction

Frequent pattern and association rule mining is one of the few exceptions to emerge from machine learning

Apriori algorithmAprioriTid algorithmAprioriAll algorithmFP-Tree algorithm

2KAIST Knowledge Service Engineering

Data Mining Lab.

Page 3: Apriori algorithm

Notation

be a set of items = Transaction IDSupport : Confidence : Set of large k-itemsets : Set of candidate k-itemsets

3KAIST Knowledge Service Engineering

Data Mining Lab.

Page 4: Apriori algorithm

Principle

4KAIST Knowledge Service Engineering

Data Mining Lab.

downward closure property.If an itemset is frequenct,

then all of its subsets must also be frequent

if an itemset is not frequent, any of its superset is never frequent

Page 5: Apriori algorithm

Apriori algorithm

Pseudo code

5KAIST Knowledge Service Engineering

Data Mining Lab.

Page 6: Apriori algorithm

Example

6KAIST Knowledge Service Engineering

Data Mining Lab.

Page 7: Apriori algorithm

Discussion

Too many database scanning makes high computationNeed minsup & minconf to be specified in advance.Use hash-tree to store the candidate itemsets.

Sometimes it adapt trie-structure to store sets.

7KAIST Knowledge Service Engineering

Data Mining Lab.

Page 8: Apriori algorithm

AprioriTid

To avoid database scan repeatly.

Principle1) The database is not used at all for couning the support of

candidate itemsets after the first pass2) The candidate itemsets are generated the same way as in

Apriori algorithm3) Another set is generated of which each member has the

TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset.

8KAIST Knowledge Service Engineering

Data Mining Lab.

Page 9: Apriori algorithm

AprioriTid

9KAIST Knowledge Service Engineering

Data Mining Lab.

Page 10: Apriori algorithm

AprioriTid

10KAIST Knowledge Service Engineering

Data Mining Lab.

Page 11: Apriori algorithm

AprioriTid

11KAIST Knowledge Service Engineering

Data Mining Lab.

Page 12: Apriori algorithm

FP-Growth

To avoid scanning multiple databasethe cost of database is too high !!

To avoid making lots of candidatesin apriori algorithm, the bottleneck is generation of

candidate

How can solve these problems?

12KAIST Knowledge Service Engineering

Data Mining Lab.

Page 13: Apriori algorithm

FP-Growth

Algorithm was too simple

1. Scan the database once, find frequent 1-itemsets (single item patterns)

2. Sort the frequent items in frequency descending order, f-list(F-list = f-c-a-b-m-p)

3. Scan the DB again, construct the FP-tree13

KAIST Knowledge Service EngineeringData Mining Lab.

Page 14: Apriori algorithm

FP-Growth Algorithm

14KAIST Knowledge Service Engineering

Data Mining Lab.

Page 15: Apriori algorithm

FP-Tree

Scanning the transaction with TID=100

15KAIST Knowledge Service Engineering

Data Mining Lab.

Page 16: Apriori algorithm

FP-Tree

Scanning the transaction with TID=200

16KAIST Knowledge Service Engineering

Data Mining Lab.

Page 17: Apriori algorithm

FP-Tree

Final FP-Tree

17KAIST Knowledge Service Engineering

Data Mining Lab.

Page 18: Apriori algorithm

Mine a FP-Tree

I. forming conditional pattern basesII. constructing conditional FP-treesIII. recursively mining conditional FP-trees

18KAIST Knowledge Service Engineering

Data Mining Lab.

Page 19: Apriori algorithm

Conditional pattern base

frequent itemset as a co-occurring suffix pattern

for examplem : <f, c, a> : support / 2m : <f,c,a,b> : support / 1

19KAIST Knowledge Service Engineering

Data Mining Lab.

Page 20: Apriori algorithm

Conditional pattern tree

{m}’s conditional pattern tree

20KAIST Knowledge Service Engineering

Data Mining Lab.

Page 21: Apriori algorithm

Pseudo Code

21KAIST Knowledge Service Engineering

Data Mining Lab.

Page 22: Apriori algorithm

Conclusion

In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout.

22KAIST Knowledge Service Engineering

Data Mining Lab.

Page 23: Apriori algorithm

23KAIST Knowledge Service Engineering

Data Mining Lab.

Thank you