Mining top k frequent closed itemsets
-
Upload
yuanchung -
Category
Technology
-
view
2.089 -
download
2
Transcript of Mining top k frequent closed itemsets
![Page 1: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/1.jpg)
Mining top-k frequent closed itemsets over data streams using the sliding window
model
Author: Pauray S.M TsaiPublication: ESA 2010Presenter: Yuan-Chung Chang
![Page 2: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/2.jpg)
2
Outline
Introduction Motivation Mining top-k frequent closed itemsets
FCI_max algorithm Example for FCI_max algorithm Conclusion
![Page 3: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/3.jpg)
3
Introduction
With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream.
Because the data in streams come with high speed and are continuous and unbounded, there are three challenges for data stream mining. First, each item in a stream could be examined only once. Second, although the data are generated continuously, the
memory space could be used is limited. Third, the mining result should be generated as fast as
possible.
![Page 4: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/4.jpg)
4
Introduction (cont.)
In the database community, one of the major applications is mining association rules in large transaction databases.
There are two problems occurring in traditional association rule mining. First, a minimum support is required for mining. Second, there are usually a lot of association rules
generated from the mining, which gives rise to difficulties in practical applications.
![Page 5: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/5.jpg)
5
Introduction (cont.) In the data stream environment, the problem of mining
frequent itemsets becomes more complicated.
Traditional algorithms for mining frequent itemsets cannot satisfy the requirement of examining each item in a stream only once. How to effectively maintain frequent itemsets over data
streams is another important issue.
Because data are generated continuously in data streams, present frequent itemsets may become infrequent, and present infrequent itemsets may become frequent.
We cannot save all the itemsets and their related information in the memory due to the restriction of memory space.
![Page 6: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/6.jpg)
6
Introduction (cont.)
The time models for data stream mining mainly include the landmark model (2002), the tilted-time window model (2003) and the sliding window model (2006). The landmark model considers all the data from a specified
point of time to the current time. The tilted-time window model is a variation of the
landmark model. The sliding window model focuses on the recent data from
the current moment back to a specified time point.
![Page 7: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/7.jpg)
7
Motivation The two problems occurring in traditional association rule
mining also exist in the data stream environment: specifying an appropriate minimum support and reducing the number of frequent itemsets.
The idea of mining frequent closed itemsets was first proposed in 1999.
![Page 8: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/8.jpg)
8
Motivation (cont.) An alternative approach for mining top-k frequent closed
itemsets of length no less than min_l without specifying the minimum support was proposed in 2005. The mining result only presents frequent closed itemsets of
length no less than min_l, resulting in the loss of information about closed itemsets with high support but short length.
In fact, the longer the length of a closed itemset is, the smaller the support of it will be.
In this paper, the author proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l, using a sliding window technique.
![Page 9: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/9.jpg)
9
Motivation (cont.)
For mining top-k frequent closed itemsets of length no less than min_l (2005) Case 1: Mining top-3 frequent closed itemsets with min_l = 2.
• The mining result is {ab:7, abc:6, ad:4}. {a:8}
Case 2: Mining top-3 frequent closed itemsets with min_l = 3.• The mining result is {abc:6, abcd:3, abe:2, ace:2}. {a:8},{ab:7},{ad:4}
![Page 10: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/10.jpg)
10
Motivation (cont.)
For mining top-k frequent closed itemsets of length no more than max_l (2010) Case 3: Mining top-4 frequent closed itemsets with max_l = 3.
• The mining result is {a:8, ab:7, abc:6, ad:4}.
Case 4: Mining top-4 frequent closed itemsets with max_l = 2.• The mining result is {a:8, ab:7, ad:4, ae:3}.
![Page 11: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/11.jpg)
11
Mining top-k frequent closed itemsets
The auther use the sliding window model shown in Fig. 1 for the following discussion.
![Page 12: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/12.jpg)
12
Mining top-k frequent closed itemsets
The number of windows: n The time covered by each window: t Items in window: {x1,x2, . . . , xm}
The sliding windows: {Wi1,Wi2, . . . ,Win}
The set of identifiers of transactions containing itemset {x1,x2m, . . . , xm} in window Wij: SPij({x1,x2, . . . , xm})
The union of SPij({x1,x2m, . . . , xm}): CSi({x1,x2, . . . , xm})
The number of transaction identifiers in CSi({x1,x2m, . . . , xm}): CSi({x1,x2, . . . , xm})
The top-k 1-itemsets by CSi: {S1,S2, . . . ,Sk}
The current top-k frequent closed itemsets are denoted as a set: P The initial value of P is set to {S1,S2, . . . ,Sk}
![Page 13: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/13.jpg)
13
Mining top-k frequent closed itemsets
The detailed algorithm for mining top-k frequent closed itemsets with max_l FCI_max algorithm
![Page 14: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/14.jpg)
14
Mining top-k frequent closed itemsets
![Page 15: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/15.jpg)
15
Example for FCI_max algorithm
Assume the number of windows is 4 and the size of a window is 5 minutes.
Assume the number of given frequent closed itemsets is 5 and the maximum length of frequent closed itemsets is 4.
![Page 16: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/16.jpg)
16
![Page 17: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/17.jpg)
17
![Page 18: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/18.jpg)
18
![Page 19: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/19.jpg)
19
![Page 20: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/20.jpg)
20
![Page 21: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/21.jpg)
21
![Page 22: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/22.jpg)
22
![Page 23: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/23.jpg)
23
![Page 24: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/24.jpg)
24
![Page 25: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/25.jpg)
25
![Page 26: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/26.jpg)
26
![Page 27: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/27.jpg)
27
![Page 28: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/28.jpg)
28
![Page 29: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/29.jpg)
29
![Page 30: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/30.jpg)
30
![Page 31: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/31.jpg)
31
![Page 32: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/32.jpg)
32
![Page 33: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/33.jpg)
33
![Page 34: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/34.jpg)
34
Conclusion
In this paper, the auther proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l.
The method of using the maximum length to replace with the minimum support resolves the problem of losing information about itemsets with short length but high support.
FCI_max algorithm needs not to store all the support counts of itemsets at each time point.
It utilizes a technique of dynamic computation to generate all the frequent closed itemsets and their related information, which efficiently discovers top-k frequent closed itemsets under the data stream environment.
![Page 35: Mining top k frequent closed itemsets](https://reader036.fdocuments.net/reader036/viewer/2022081512/5561665ad8b42a72628b4ed5/html5/thumbnails/35.jpg)
www.themegallery.com
Thank youfor your listening
Q & A