Top Down FP-Growth for Association Rule Mining

ByKe Wang

Introduction

• Classically, for rule A B :– support: computed by count( AB )

• frequent --- if pass minimum support threshold

– confidence: computed by count( AB ) / count(A )• confident – if pass minimum confidence

threshold

• How to mine association rules?– find all frequent patterns– generate rules from the frequent

patterns

Introduction

• Limitations of current research– use uniform minimum support

threshold– only use support as pruning measure

• Our contribution– improve efficiency– adopt multiple minimum supports– introduce confidence pruning

Related work -- Frequent pattern

mining• Apriori algorithm– method: use anti-monotone property of

support to do pruning, i.e.• if length k pattern is infrequent, its length

k+1 super-pattern can never be frequent

• FP-growth algorithm--better than Apriori– method:

• build FP-tree to store database• mine FP-tree in bottom-up order

Related work -- Association rule

mining• Fast algorithms trying to

guarantee completeness of frequent patterns

• Parallel algorithms & association rule based query languages

• Various association rule mining problems– multi-level multi-dimension rule– constraints on specific item

TD-FP-Growth for frequent pattern mining• Similar tree structure as FP-growth

– Compressed tree to store the database– nodes on each path of the tree are

globally ordered

• Different mining method VS.FP-growth– FP-growth: bottom-up tree mining – TD-FP-Growth : top-down tree mining

TD-FP-Growth for frequent pattern mining

b: 1 c: 1

c: 1e: 1

Header Table H

FP-tree and header table H

b, ea, b, c, e

b, c, ea, c, d

aminsup = 2

Entry value count side-link

Construct a FP-tree:

FP-tree and header table H

b, ea, b, c, e

b, c, ea, c, d

aminsup = 2

Header Table H

item Head of node-link

TD-FP-Growth for frequent pattern miningFP-growth: bottom-up mining

b: 1 c: 1

c: 1e: 1

(b: 1)(b: 1, c: 1)(a: 1, b: 1, c: 1)

e’s conditional pattern base

Mining order:e, c, b, a

TD-FP-Growth for frequent pattern mining• FP-growth: bottom-up mining

(b: 1)(b: 1, c: 1)(a: 1, b: 1, c: 1)

e’s conditional pattern base

c: 2e’s conditional FP-tree

item Head of node-link

drawback!• both e’s conditional pattern base and conditional FP-tree are stored in memory

• mine e’s conditional FP-tree recursively

• conditional pattern bases and FP-trees are built for all other items and their super-patterns

TD-FP-Growth for frequent pattern mining• TD-FP-Growth : adopt top-down

mining strategy– motivation: avoid building extra

databases and sub-trees as FP-growth does

– method: process nodes on the upper level before those on the lower level

– result: any modification happened on the upper level nodes would not affect the lower level nodes

See example

TD-FP-Growth for frequent pattern mining

b, ea, b, c, e

b, c, ea, c, d

aminsup = 2

Header Table H

CT-tree and header table H

b: 1 c: 1

c: 1e: 1

Mining order:a, b, c, e

CT-tree for frequent pattern mining

b, ea, b, c, e

b, c, ea, c, d

aminsup = 2

b: 1root

b: 2 a: 3

3e: 1 c: 1 b: 1 c: 1

CT-tree and header table H

sub-header-table H_c

Entry value count

side-link

CT-tree for frequent pattern mining• Completeness

– for entry i in H, we mine all the frequent patterns that end up with item i, no more and no less

• Complete set of frequent patterns:{a }

{b }{c }, {b, c }, {a, c } {e }, {b, e }, {c, e }, {b, c, e }

TD-FP-Growth for frequent pattern mining• Comparing to FP-growth, TD-FP-

Growth is:– Space saving:

• only one tree and a few header tables• no extra databases and sub-trees

– Time saving:• does not build extra databases and sub-

trees• walk up path only once to update count

information for nodes on the tree and build sub-header-tables.

TD-FP-Growth for association rule mining• Assumptions:

– There is a class-attribute in the database– Items in the class-attribute called class-

items, others are non-class-items– Each transaction is associated a class-item – Only class-item appears in the right-hand

of the ruleTransaction ID

non-class-attribute

class-attribute

1 a, b… C1

2 d… C2

3 e, d, f… C3

… … …

example rule:a, b Ci

TD-FP-Growth for association rule mining--multi mini support• Why?

– Use uniform minimum support, computation of count considers only number of appearance

– Uniform minimum support is unfair to items that appears less but worth more. • Eg. responder vs. non-responder

• How?– Use different support threshold for

different class

TD-FP-Growth for association rule mining -- multi mini support• multiple VS. uniform

– C1 : 4, C 2 : 2– rules with relative minsup = 50%

proportional to each class -- multiplier in performance• uniform minimum support: absolute minsup

= 1; – 11 nodes tree, 23 rules

• multiple minimum supports: absolute minsup1 = 2; absolute minsup2 = 1;

– 7 nodes tree, 9 rules– more effective and space-saving– time-saving --- show in performance

c, f, C1

b, e, C2

b, e, f, C1

a, c, f, C1

c, e, C2

b, c, d, C1

TD-FP-Growth for association rule mining--conf pruning• Motivation

– make use of the other constraint of association rule: confidence, to speed up mining

• Method– confidence is not anti-monotone– introduce: acting constraint of

confidence, which is anti-monotone– push it inside the mining process

TD-FP-Growth for association rule mining--conf pruning

conf(A B) = count(AB) / count(A) >= minconf

count(AB) >= count(A) * minconf

count(AB) >= minsup * minconf

(anti-monotone & weaker)

--- the acting constraint of confidence for the original confidence constraint of rule A B

• support of rule is computed by: count(A) • count(AB): class-count of itemset A related to class B

TD-FP-Growth for association rule mining--conf pruning

c, f, C1

b, e, C2

b, e, f, C1

a, c, f, C1

a, c, d, C2

minsup = 2minconf= 60%

Entry value i

count (i) count(i,C1) count(i,C2) side-link

……

Header table H: count(i) = count(i, C1) + count(i, C2)

count(e) >= minsup; However,both count(e, C1) & count(e, C2) < minsup * minconf;

terminate mining for e!

Entry value i

count (i) count(i,Ci) count(i,C2) side-link

If no confidence pruning b 2 1 1

sub-header-table H_e

Performance• Choose several data sets from UC_Irvine

Machine Learning Database Repository:

http://www.ics.uci.edu/~mlearn/MLRepository.html.

name of dataset

# of transactions

# of items in each

transactionclass distribution

# of distinct items

Dna-train 2000 6123.2%, 24.25%,

52.55%240

Connect-4 67557 439.55%, 24.62%,

65.83%126

Forest 581012 130.47%, 1.63%, 2.99%,

3.53%, 6.15%, 36.36%, 48.76%

Performance—frequent pattern

results on Forest

0% 2% 4% 6% 8% 10%

support threshold

CT-tree

Fp-growth

Apriori

Performance — mine rules with multiple minimum supports

multiple sup on Forest

100000

0% 1% 2% 3% 4% 5%

multiplier (minconf =90%)

CT-multi-supApri-uni-supCT-uni-sup

relative minsup, proportional to each class

FP-growth is only for frequent

pattern mining

Performance — mine rules with confidence pruning

conf-pruning on Forest

0.04% 0.05% 0.06% 0.07% 0.08% 0.09% 0.10%

support threshold(minconf = 90%)

CT-conf-prune

Apriori

CT-no-conf-prune

Conclusions and future work• Conclusions of TD-FP-Growth algorithm

– more efficient in finding both frequent patterns and association rules

– more effective in mining rules by using multiple minimum supports

– Introduce a new pruning method: confidence pruning, and push it inside the mining process; thus further speed up mining

Conclusions and future work• Future work

– Explore other constraint-based association rule mining method

– Mine association rules with item concept hierarchy

– Apply TD-FP-Growth to applications based on association rule mining• Clustering• Classification

Reference• (1) R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between

sets of items in large databases. Proc. 1993 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’93), pages 207-216, Washington, D.C., May 1993.

• (2) U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.

• (3) H. Toivonen. Sampling large databases for association rules. Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 134-145, Bombay, India, September 1996.

• (4) R. Agrawal and S. Srikant. Mining sequential patterns. Proc. 1995 Int. Conf. Data Engineering (ICDE’95), pages 3-14, Taipei, Taiwan, March 1995.

• (5) J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), pages 1-12, Dallas, TX, May 2000.

• (6) J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. Proc. 2001 ACM-SIGMOD Int. Conf., Santa Barbara, CA, May 2001.

And more!

Top Down FP-Growth for Association Rule Mining

Documents

Transcript of Top Down FP-Growth for Association Rule Mining

Data mining : rule mining algorithms

9 Association Rule Mining

Association Rule Mining (II)

Association Rule Mining - cju.ac.krdelab.cju.ac.kr/lecture/datamining/arm.pdf · 2016-10-04 · Data Mining: 2016-2 Association Rule Mining . 1 . Association Rule Mining . 1. What

Online Association Rule Mining

Data mining: rule mining algorithms

Association Rule Mining in Data Mining

Association Rule Mining

Smart frequent itemsets mining algorithm based on FP-tree ...journals.tubitak.gov.tr/elektrik/issues/elk-17-25-3/elk-25-3-39-1602-113.pdf · Key words: Association rule data mining,

Data Mining Apriori FP Growth Arafat

Parallel Association Rule Mining

Association Rule Mining for Improvement of IT Project ...€¦ · Keywords – IT project, Project team, Association Rule Mining, Apriori, FP-Growth. 1. Introduction Nowadays information

TEMPORAL ASSOCIATION RULE MINING

FP-growth - ULisboa · PDF file1 FP-growth Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association

Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and.

Association Rule Mining III

Performance Analysis of Apriori and FP-Growth Algorithms ... · Performance Analysis of Apriori and FP-Growth Algorithms (Association Rule Mining) 1. ... and data management aspects,

Data Mining Techniques Association Rule. What Is Association Mining? Association Rule Mining – Finding frequent patterns, associations, correlations,

Association rule mining · 2018. 9. 9. · Association rule mining Association rule mining Finding frequent patterns, associations, correlations, or causal structures among sets of

Association Rule Mining Example