Apriori algorithm

1KAIST Knowledge Service Engineering

Data Mining Lab.

Apriori AlgorithmJung Hoon Kim

N5, Room 2239 E-mail: [email protected]

2014.01.07

Introduction

Frequent pattern and association rule mining is one of the few exceptions to emerge from machine learning

Apriori algorithmAprioriTid algorithmAprioriAll algorithmFP-Tree algorithm


Data Mining Lab.

Notation

be a set of items = Transaction IDSupport : Confidence : Set of large k-itemsets : Set of candidate k-itemsets


Data Mining Lab.

Principle


Data Mining Lab.

downward closure property.If an itemset is frequenct,

then all of its subsets must also be frequent

if an itemset is not frequent, any of its superset is never frequent

Apriori algorithm

Pseudo code


Data Mining Lab.

Example


Data Mining Lab.

Discussion

Too many database scanning makes high computationNeed minsup & minconf to be specified in advance.Use hash-tree to store the candidate itemsets.

Sometimes it adapt trie-structure to store sets.


Data Mining Lab.

AprioriTid

To avoid database scan repeatly.

Principle1) The database is not used at all for couning the support of

candidate itemsets after the first pass2) The candidate itemsets are generated the same way as in

Apriori algorithm3) Another set is generated of which each member has the

TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset.


Data Mining Lab.

AprioriTid


Data Mining Lab.

FP-Growth

To avoid scanning multiple databasethe cost of database is too high !!

To avoid making lots of candidatesin apriori algorithm, the bottleneck is generation of

candidate

How can solve these problems?


Data Mining Lab.

FP-Growth

Algorithm was too simple

1. Scan the database once, find frequent 1-itemsets (single item patterns)

2. Sort the frequent items in frequency descending order, f-list(F-list = f-c-a-b-m-p)

3. Scan the DB again, construct the FP-tree13

KAIST Knowledge Service EngineeringData Mining Lab.

FP-Growth Algorithm


Data Mining Lab.

FP-Tree

Scanning the transaction with TID=100


Data Mining Lab.

FP-Tree

Scanning the transaction with TID=200


Data Mining Lab.

FP-Tree

Final FP-Tree


Data Mining Lab.

Mine a FP-Tree

I. forming conditional pattern basesII. constructing conditional FP-treesIII. recursively mining conditional FP-trees


Data Mining Lab.

Conditional pattern base

frequent itemset as a co-occurring suffix pattern

for examplem : <f, c, a> : support / 2m : <f,c,a,b> : support / 1


Data Mining Lab.

Conditional pattern tree

{m}’s conditional pattern tree


Data Mining Lab.

Pseudo Code


Data Mining Lab.

Conclusion

In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout.


Data Mining Lab.


Data Mining Lab.

Thank you

Apriori algorithm

Technology

Transcript of Apriori algorithm