Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.
-
Upload
angela-armstrong -
Category
Documents
-
view
229 -
download
1
Transcript of Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.
![Page 1: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/1.jpg)
Modul 8:Sequential Pattern Mining
![Page 2: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/2.jpg)
TUGAS 2
ewinarko.staff.ugm.ac.id/datamining/tugas2
![Page 3: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/3.jpg)
Terminology Item Itemset Sequence (Customer-sequence) Subsequence Support for a sequence Large/frequent sequence
![Page 4: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/4.jpg)
Example
Q. How to find the sequential patterns?
![Page 5: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/5.jpg)
Example
Item
Itemset
Transaction
![Page 6: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/6.jpg)
Example (cont.)
Sequence
3-Sequence
![Page 7: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/7.jpg)
Subsequence
![Page 8: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/8.jpg)
Example (cont.)
<(30) (90)> is supported by customer 1 and 4
<30 (40 70)> is supported by customer 2 and 4
customer 1 and 4 contain <(30) (90)>
![Page 9: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/9.jpg)
Example (cont.)
Q. Find the large/frequent sequences with minimum support set to 25%:
-Frequent sequence = The sequence with minimum support
<(30)>, <(40)>, <(70)>, <(90)><(30) (40)>, <(30) (70)>, <(40 70)>
![Page 10: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/10.jpg)
Examples of Sequence DataSequence Database
Sequence Element (Transaction)
Event(Item)
Customer Purchase history of a given customer
A set of items bought by a customer at time t
Books, diary products, CDs, etc
Web Data Browsing activity of a particular Web visitor
A collection of files viewed by a Web visitor after a single mouse click
Home page, index page, contact info, etc
Event data History of events generated by a given sensor
Events triggered by a sensor at time t
Types of alarms generated by sensors
Genome sequences
DNA sequence of a particular species
An element of the DNA sequence
Bases A,T,G,C
![Page 11: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/11.jpg)
Examples of Sequence
Web sequence:
< {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} >
Sequence of initiating events causing the nuclear accident at 3-mile Island:(http://stellar-one.com/nuclear/staff_reports/summary_SOE_the_initiating_event.htm)
< {clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}>
Sequence of books checked out at a library:<{Fellowship of the Ring} {The Two Towers} {Return of the King}>
![Page 12: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/12.jpg)
GSP algorithm
![Page 13: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/13.jpg)
Candidate generation
Contains 2 phase: Join phase and Prune phase
Join phase: Ck = Fk-1 x Fk-1
A sequence s1 and s2 in Fk-1 can be joined if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2.
The resulting sequence is the sequence s1 extended by the last item in s2. The added item becomes a separate element if it was a separate
element in s2, and part of element s1 otherwise
![Page 14: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/14.jpg)
Candidate Generation Examples
Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4 5}> will produce the candidate sequence < {1} {2 3} {4 5}> because the last two events in w2 (4 and 5) belong to the same element
Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4} {5}> will produce the candidate sequence < {1} {2 3} {4} {5}> because the last two events in w2 (4 and 5) do not belong to the same element
We do not have to merge the sequences w1 =<{1} {2 6} {4}> and w2 =<{1} {2} {4 5}> to produce the candidate < {1} {2 6} {4 5}> because if the latter is a viable candidate, then it can be obtained by merging w1 with < {1} {2 6} {5}>
![Page 15: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/15.jpg)
Pruning phase: Delete candidate sequences that have an infrequent (k-1)-
subsequence.
![Page 16: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/16.jpg)
GSP Example
< {1} {2} {3} >< {1} {2 5} >< {1} {5} {3} >< {2} {3} {4} >< {2 5} {3} >< {3} {4} {5} >< {5} {3 4} >
< {1} {2} {3} {4} >< {1} {2 5} {3} >< {1} {5} {3 4} >< {2} {3} {4} {5} >< {2 5} {3 4} >
< {1} {2 5} {3} >
Frequent3-sequences
CandidateGeneration
CandidatePruning
![Page 17: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/17.jpg)
Database Example
![Page 18: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/18.jpg)
The mining result
![Page 19: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/19.jpg)
The Algorithm Apriori Five phases
Sort phase Large itemset phase Transformation phase Sequence phase Maximal phase
![Page 20: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/20.jpg)
Sort the database with customer-id as the major key and transaction-time as the minor key
Sort phase
![Page 21: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/21.jpg)
Find the large itemset. Itemsets mapping
Litemset phase
![Page 22: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/22.jpg)
Transformation phase
Deleting non-large itemsets Mapping large itemsets to integers
![Page 23: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/23.jpg)
Sequence phase Use the set of litemsets to find the desired sequence. Two families of algorithms:
Count-all:
Algorithm AprioriAll Count-some:
Algorithm AprioriSome, Algorithm DynamicSome
![Page 24: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/24.jpg)
AprioriAll The basic method to mine sequential patterns Based on the Apriori algorithm. Count all the large sequences, including non-maximal
sequences. Use Apriori-generate function to generate candidate
sequence.
![Page 25: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/25.jpg)
AprioriAll (cont.)
L1 = {large 1-sequences}; // Result of the phasefor ( k=2; Lk-1≠Φ; k++) do begin Ck = New candidate generate from Lk-1 foreach customer-sequence c in the database do Increment the count of all candidates in Ck that are contained in cLk = Candidates in Ck with minimum support.EndAnswer=Maximal Sequences in UkLk;
![Page 26: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/26.jpg)
Apriori Candidate Generation
generate candidates for pass using only the large sequences found in the previous pass and then makes a pass over the data to find their support.
![Page 27: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/27.jpg)
Algorithm: Lk the set of all large k-sequences
Ck the set of candidate k-sequences
Apriori Candidate Generation
insert into Ck
select p.litemset1, p.litemset2,…, p.litemsetk-1, q.litemsetk-1
from Lk-1 p, Lk-1 qwhere p.litemset1=q.litemset1,…, p.litemsetk-2=q.litemsetk-2;
forall sequences cCk do forall (k-1)-subsequences s of c do if (sLk-1) then delete c from Ck;
![Page 28: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/28.jpg)
Example: Transformed Customer Sequences
Apriori Candidate Generation
<{1 5}{2}{3}{4}><{1}{3}{4}{3 5}><{1}{2}{3}{4}>
<{1}{3}{5}><{4}{5}>
next step: find the large 1-sequences
With minimum set to 25%
![Page 29: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/29.jpg)
next step: find the large 2-sequences
Sequence Support
<1>
<2>
<3>
<4>
<5>
<{1 5}{2}{3}{4}><{1}{3}{4}{3 5}><{1}{2}{3}{4}>
<{1}{3}{5}><{4}{5}>
ExampleLarge 1-Sequence
4
2
4
4
2
![Page 30: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/30.jpg)
next step: find the large 3-sequences
Sequence Support
<1 2> 2
<1 3> 4
<1 4> 3
<1 5> 3
<2 3> 2
<2 4> 2
<3 4> 3
<3 5> 2
<4 5> 2
<{1 5}{2}{3}{4}><{1}{3}{4}{3 5}><{1}{2}{3}{4}>
<{1}{3}{5}><{4}{5}>
ExampleLarge 2-Sequence
![Page 31: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/31.jpg)
next step: find the large 4-sequences
Sequence Support
<1 2 3> 2
<1 2 4> 2
<1 3 4> 3
<1 3 5> 2
<2 3 4> 2
<{1 5}{2}{3}{4}><{1}{3}{4}{3 5}><{1}{2}{3}{4}>
<{1}{3}{5}><{4}{5}>
ExampleLarge 3-Sequence
![Page 32: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/32.jpg)
next step: find the maximal sequential pattern
Sequence Support
<1 2 3 4> 2<{1 5}{2}{3}{4}><{1}{3}{4}{3 5}><{1}{2}{3}{4}>
<{1}{3}{5}><{4}{5}>
ExampleLarge 4-Sequence
![Page 33: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/33.jpg)
Maximal phase Find the maximum sequences among the set of large
sequences. In some algorithms, this phase is combined with the
sequence phase.
![Page 34: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/34.jpg)
Maximal phase Algorithm:
S the set of all litemsets n the length of the longest sequence
for (k = n; k > 1; k--) do foreach k-sequence sk do Delete from S all subsequences of sk
![Page 35: Modul 8: Sequential Pattern Mining. TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649edc5503460f94bec0bc/html5/thumbnails/35.jpg)
Sequence Support
<1 2 3 4> 2
Example
Sequence Support
<1> 4
<2> 2
<3> 4
<4> 4
<5> 2
Sequence Support
<1 2> 2
<1 3> 4
<1 4> 3
<1 5> 3
<2 3> 2
<2 4> 2
<3 4> 3
<3 5> 2
<4 5> 2
Sequence Support
<1 2 3> 2
<1 2 4> 2
<1 3 4> 3
<1 3 5> 2
<2 3 4> 2
Find the maximal large sequences