When to Update the Sequential Patterns of Stream Data?

Post on 05-Jan-2016

27 views 0 download

description

When to Update the Sequential Patterns of Stream Data?. Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003 . Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2004.8.12. Introduction. - PowerPoint PPT Presentation

Transcript of When to Update the Sequential Patterns of Stream Data?

1

When to Update the Sequential Patterns of Stream Data?

Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003.

Adviser: Jia-Ling KohSpeaker: Shu-Ning ShinDate: 2004.8.12

2

Introduction An experimental method, called TPD

(Tradeoff between Performance and Difference), to decide when to update the sequential patterns of stream data by making a tradeoff between the performance of increasingly

updating algorithms and the difference of sequential

patterns.

3

Stream Data Model (1) Stream event:

Ei=<ei, tn> ei: stream event type tn: the time of stream event type occurring

Stream tuple: Qi=((ek1, ek2, …,ekm), ti)=(Ek1, Ek2, …, Ekm)

Length Stream tuple: |Qi|=|(ek1, ek2, …, ekm)|=m

4

Stream Data Model (2) Stream queue:

Sij=<Qi, Qi+1, …, Qj>, where ti< ti+1< …< tj

=<(Ei1, …, Eik)…(Ej1, …, Ejm)> Length of queue:

|Sij|=<Qi, Qi+1, …, Qj>=j-i+1 Stream viewing window:

Wk=<Qm, …, Qn|d=n-m+1> Size of viewing window:

|Wk|=n-m+1=d

5

Stream Data Model (3) occur(seqm, Wk):

|the times of seqm occurring in Wk| Seqm=<ei1, ei2, …, eim> Wk: an stream viewing window

support(seqm, Wk): Occur(seqm, Wk) / |Wk|

6

Stream Data Model - Example S18=<Q1, Q2 ,Q3, Q4, Q5, Q6, Q7, Q8>

S18=<E2, E5, E1, (E3, E6), E7, E9, E10>

W5=< Q1, Q2 ,Q3, Q4, Q5, Q6, Q7 |d=7>

7

Sliding Stream viewing window ΔWi: incremental window, i=0, 1, 2, 3, …

ΔW0: initial window Wi+1=Wi+ΔWi+1

|ΔW1|/|W0|: incremental ratio of stream data

8

Estimation of difference between the old and new sequential patterns Difference:

LWk: old frequent sequences in Wk

LWk+1: new frequent sequences in Wk+1

LWkΔ LWk+1 : symmetric difference

0),(,,),( 1

1

1

1

KKK

KK

KK

KK WWWWW

WWWW LLdotherwiseLif

LL

LLLLd

9

The Algorithm of Updating Sequential Pattern (IUS) (1) IUS algorithm uses the frequent and negative border sequences in D

B and db as the candidates to compute new frequent sequences and negative border sequences in the updated database U.

DB: The original database which contains old time-related data. db: The increment database which contains new time-related data. dd: The decrement database from DB which contains deleted time-related data. U: The updated database. When database being increasingly updated, the total set of dat

a which are equal to DB+db. When database being decreasingly updated, the total set of data which are equal to DB-dd.

Support(F, X): the support of the sequence X in the X database, where X ∈ {db, dd, DB, U}.

Min_supp:Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X database, where X ∈{db, dd, DB, U}. LX : Frequent sequences in the X database, where X ∈{db, dd, DB, U}. NBD(X)=CX- LX, where NBD(X) consists of the sequences in X database whose sub_sets are

10

IUS (2) Property1: Let B be a frequent sequence in Wk, if , w

e have occur(A, DB)>occur(B, DB). Property2:

Proof: assume that occur(S,DB)<Min_sup*|DB| and occur(S,db)<Min_sup*|db|occur(S,DB+db)<Min_sup*|DB+db|Support(S,U)<Min_sup, contradict the given condition.

BAA ,

.),( dbDBU LSorLShavewethendbDBULSandSsequencea

dbDB LSandLS

11

IUS – using the stream data model

Wk: The original stream view window which contains old time-related data.

ΔWk+1: The increment stream view window which contains new time-related data.

Wk+1: The updated stream view window. When stream data being increasingly updated, the total set of data which are equal to Wk+ΔWk+1

Support(F, X): the support of the sequence F in the X stream view windows, where X { W∈ k+1 ,Wk, ΔWk+1}.

Min_supp :Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X stream view windows, where X { W∈ k+1 ,Wk, ΔWk+1}.

LX : Frequent sequences in the X stream view windows, where X { W∈ k+1 ,Wk, ΔW

k+1}.

NBD(X)=CX- LX, where NBD(X) consists of the sequences in X stream view windows whose sub_sets are frequent, its Support is lower than Min_supp and greater than Min_nbd_supp. Note that X {W∈ k+1 ,Wk, ΔWk+1}

12

IUS – Algorithm (1)

13

IUS – Algorithm (2)

14

Tradeoff between Performance and Difference (TPD) (1) Use the speedups to measurement of IUS:

Speedup=the execution time of Robust_search / the execution time of IUS

Use the difference to measure the old and the new frequent sequence.

Use Min-Max normalization:

15

TPD (2) TPD method maps the curve of the

speedup and the difference changing with the size of incremental windows into the same graph under the same scale.

The points of intersection of the two curves are the suitable range of the incremental ratio of the initial windows for IUS.

16

Experiment conducted a set of experiments to find when to

update sequential patterns for stream data. Environment:

DELL PC Sever with 2 CPU Pentium II Memory 512M, Disk 16G Operating system: Red Hat Linux 6.0

Data1: the alarms in GSM Networks, contain 194 alarm types

and 100k alarm events. The time of alarm events in the data1 range from

2001-08-11-18 to 2001-08-13-17.

17

Experiment 1 – on Data 1|initial window|=20k

The intersection point: 6KThe suitable range of incremental ratio of initial window: 30% of W0.

18

Experiment 2 – on Data 1|initial window|=40k

The intersection point: 9K~10KThe suitable range of incremental ratio of initial window: 22.5%~25% of W0.

19

Experiment 3 – on Data 1|initial window|=50k

The intersection point: 15K~18KThe suitable range of incremental ratio of initial window: 30%~36% of W0.

20

Experiment 4 – on Data 1|initial window|=60k

The intersection point: 10K~12KThe suitable range of incremental ratio of initial window: 16.7%~20% of W0.

21

Conclusion TPD method, it is shown experimentally that t

he suitable range of incremental ratio of initial windows to update is about 20 to 30 percent of the size of initial windows for the IUS algorithm.