FPPM algorithm
-
Upload
ashis-kumar-chanda -
Category
Engineering
-
view
80 -
download
3
Transcript of FPPM algorithm
1 I NAME OF PRESENTER
An Efficient Approach to Mine Flexible Periodic Patterns in
Time Series Databases
Supervised byDr. Chowdhury Farhan AhmedAssociate Professor
Md. Samiullah (Lecturer)
Presented byAshis Kumar Chanda
Swapnil Saha
Department of Computer Science and EngineeringUniversity of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Introduction
Problem Definitions
Motivation
Contribution
Experimental Results
Conclusion
1
2
4
7
8
Existing Algorithms3
The Proposed Algorithm
5
Topics to be covered
6
3 I NAME OF PRESENTERCSE, DU3
Extracting hidden patterns or structureGain Information from huge data
Data Mining
Introduction
Example:Periodic amount of money withdrawn within a fixed time Interval from an ATM booth in a specific location
Day Time slot Money amount (million)
Sun 12 am - 8 am8 am – 4 pm
4 pm – 12 am
269
Mon 12 am - 8 am8 am – 4 pm
4 pm – 12 am
1.2129
Thu 12 am - 8 am8 am – 4 pm
4 pm – 12 am
1.53
4.5
4 I NAME OF PRESENTERCSE, DU4
Flexible Periodic Pattern: Skipping a single or couple of particular intermediate characters or events which are not interesting in the user's point of view
F = ‘abc’ or ‘adc’
Introduction (cont.)
Example:Consider T = {abc adc abc}Flexible pattern = ‘a*c’Where ‘*’ indicates any unimportant intermediate events
‘a*c’
5 I NAME OF PRESENTER
Problem Definition
CSE, DU5
Flexible Pattern Mining: Given a sequence with n number of characters or events, S = {e1, e2, e3 ... en} a time series database, user specified maximum event skipping threshold, ϴ and support threshold, σ
Mine all possible Flexible Periodic sequence of events, FP = {e1, e2, e3 ... ei} Є S
that satisfy σ, and considering variable starting position st, where i ≤ n with maximum ϴ number of unimportant intermediate events
6 I NAME OF PRESENTER
Existing Algorithms
CSE, DU6
Effective periodic pattern mining
Apriori based sequential pattern mining
Nishi et al. 2013 Huge candidate set,False pattern generation
Most notable algorithms:Algorithm Mechanism Authors Year Drawbacks
CONV Convolution process
M. G. Elfeky et al.
2005 Fails in insertion, deletion process
WARP Time warping technique
M. G. Elfeky et al.
2005 Only detects segment periodicity
STNR Suffix tree Rasheed et al. 2010 Lack of skipping intermediate events
7 I NAME OF PRESENTER
Motivation
CSE, DU7
Apriori based approach should be avoided
To vary starting positions in generated sequences
Mine three types of periodicity detection in one run
8 I NAME OF PRESENTER
Contribution
CSE, DU8
Reduced redundant patterns>
Developed a new algorithm using suffix tree like data structure to generate Flexible Periodic Patterns
>
Also proposed a new periodicity detection algorithm>
Capable of mining all three types of periodicity in a single run>
Considered variable starting positions from the given time series sequence
>
9 I NAME OF PRESENTER
Terms & Definitions
CSE, DU9
• Occurrence vector• Confidence
T = {acbd afbd agbd}
Occurrence vector:• occ_vec[a] = [0, 4, 8]• occ_vec[c] = [1]• occ_vec[b] = [2, 6, 10]
Confidence of ‘a’:• actual periodicity = 3• perfect periodicity = 3• Confidence = 3 / 3
Confidence of ‘c’:• actual periodicity = 1• perfect periodicity = 3• Confidence = 1 / 3
Perfect periodicity = (endpos – stpos + 1)/ periodConfidence = actual periodicity/ perfect periodicity
0 1 2 3 4 5 6 7 8 9 10 11
10 I NAME OF PRESENTER
Terms & Definitions
Ladder factor:• lad_fact[A2] = 3• lad_fact[A6] = 2
CSE, DU10
• Occurrence vector• Confidence • Length vector• Ladder factor
a
$ $
$b
bbb
A5A4
A3A2
A1A6
A7
A8
A9
Fig: SSES tree for T = {abb$}
Length vector:• len_vec[A2] = [3]• len_vec[A6] = [2, 1]
support threshold, σ = 50%
lad_fact = nth max(len_vec)n = size of len_vec * σ
11 I NAME OF PRESENTER
The Proposed Algorithm
CSE, DU11
Key Features:- Apply discretization technique on given database- Construct the Single Symbol Edge based Suffix (SSES)
tree - Calculate occurrence vector at the time of construction - Traverse the tree level-wise - Mine patterns following joining property- Check each generated patterns through the proposed
periodicity detection algorithm
12 I NAME OF PRESENTER
SSES Tree Construction
12
1T = { } abcabbabb$12 45
3934
23
45
6
7
8
9
10
11
1314
15
16
17
18
19
20
2930
31
32
33
43 35 44
36
37
38
40
41
42
2122
23
24
25
26
27
28
a
a
a
aa
a
aa
a
a
$
$
$
$
$
$
$
$
$bb
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
c c
c1
5 17
1412
23
4 6711
13 16 15
89
10
a
a
aa
a
$
$bb
b b
b
bc c
c
Period = 3
root
13 I NAME OF PRESENTERCSE, DU13
Unique event
occ_vec
Occurrence vector calculation
b [1, 4, 5, 7, 8]
Confidence calculation
Pattern
occ_vec
confidence status
b [1, 4, 7]
100% √
Algorithm Demonstration
1
5 17
1412
23
4 6711
13 16 15
89
10
a
a
aa
a
$
$bb
b b
b
bc c
c
Patterns
Pattern occ_vec
b [1, 4, 5, 7, 8]
L1
L4 L7 L5
L8
σ = 50%
14 I NAME OF PRESENTER
bb [4, 7]
CSE, DU14
Unique event
occ_vec
Occurrence vector calculation
c [1]
Confidence calculation
Pattern
occ_vec
confidence status
bc [1] 33% χ
1
5 17
14126
7
13 16 15
10a
aa
$
$b
b
b
c
ba [5] 100% √
bb [4, 7] 100% √
b [4, 7]
a [5]
Patterns
Pattern occ_vec
b [1, 4, 5, 7, 8]
Joinba [5]
b* [1, 4, 5, 7]
Algorithm Demonstration
L1
L4 L7 L5
L8
σ = 50%
15 I NAME OF PRESENTERCSE, DU15
Unique event
occ_vec
Occurrence vector calculation
a [1]
Pattern
occ_vec
confidence status
bba [4] 50% √
1
5 17
14126
7
13 16 15
10a
aa
$
$b
b
b
c
b*a [1, 4] 66% √
baa [] 0% χ
a [1, 4]
b [5]
Patterns
Pattern occ_vec
Joinba [5]
b* [1, 4, 5, 7]
bab [5] 100% √
bbb [] 0% χ
b*b [5] 100% √
bb [4, 7]
b*a [1, 4]
bab [5]
bba [4]
b*b [5]
Algorithm DemonstrationConfidence calculation
L1
L4 L7 L5
L8
σ = 50%
16 I NAME OF PRESENTER
Final Result
CSE, DU16
Mined patternsPattern occ_veca [0, 3, 6]ab [0, 3, 6]abb [3, 6]
a*b [3, 6]
b [1, 4, 7]
bb [4, 7]
ba [5]
b*a [1, 4]
bab [5]
b*b [5]
c [2]
ca [2]
cab [2]
c*b [2]bba [4]
A1
1
5 17
1412
23
4 6711
13 16 15
89
10
a
aa
aa
a
$
$bb
b b
b
bc c
c
Pattern occ_vec
Pattern occ_vec
T = {abcabbabb$}
17 I NAME OF PRESENTER
Experimental Result
CSE, DU17
18 I NAME OF PRESENTER
Conclusion
CSE, DU18
Future Works:
Improve the proposed procedure to compare with noise-resilient features
Develop an efficient way to execute in parallel time series databases
Reduce memory consumption
Summary:
Mine Flexible Periodic Patterns using Suffix tree like structure
Improve performance by pruning tree Consider variable starting positions in given time sequence
19 I NAME OF PRESENTER
References
CSE, DU19
1. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng., 17(7):875-887, 2005
2. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Adapting machine learning technique for periodicity detection in nucleosomal locations in sequences. In IDEAL, pages 870-879, 2007.
3. Manziba Akanda Nishi, Chowdhury Farhan Ahmed, Md. Samiullah, and Byeong-Soo Jeong. Eective periodic pattern mining in time series databases. Expert Syst. Appl., 40(8):3015-3027, 2013.
4. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Warp: Time warping for periodicity detection. In ICDM, pages 138-145, 2005.5. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997.6. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.7. Piotr Indyk, Nick Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In VLDB, pages
363-372, 2000.8. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In FOCS, pages 596{604, 1999.9. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefix Span: Mining sequential patterns by
prefix-projected growth. In ICDE, pages 215{224, 2001.10. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Efficient periodicity mining in time series databases using suffix trees. IEEE Trans. Knowl.
Data Eng., 23(1):79-94, 2011.11. Faraz Rasheed and Reda Alhajj. Stnr: A suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl. Intell.,
32(3):267-278, 2010. 12. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260,1995.13. Andreas S. Weigend and Neil A. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994.14. Huei-Wen Wu and Anthony J. T. Lee. Mining closed exible patterns in time-series databases. Expert Syst. Appl., 37(3):2098-2107, 2010.15. Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, pages 3-17, 1996.16. Anthony K. H. Tung, Hongjun Lu, Jiawei Han, and Ling Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In KDD,
pages 297-301,1999.17. Chang Sheng, Wynne Hsu, and Mong-Li Lee. Mining dense periodic patterns in time series data. In ICDE, page 115, 2006. 18. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259-289,
1997.19. Sheng Ma and Joseph L. Hellerstein. Mining partially periodic event patterns with unknown periods. In ICDE, pages 205-214, 2001.20. Earl F. Glynn, Jie Chen, and Arcady R. Mushegian. Detecting periodic patterns in unevenly spaced gene expression time series using lomb-scargle
periodograms. Bioinformatics, 22(3):310-316, 2006.21. Walid G. Aref, Mohamed G. Elfeky, and Ahmed K. Elmagarmid. Incremental, online, and merge mining of partial periodic patterns in time-series
databases. IEEE Trans. Knowl. Data Eng., 16(3):332-342, 2004.
20 I NAME OF PRESENTERCSE, DU20
Questions?
21 I NAME OF PRESENTERCSE, DU21
Thank You