BRAID: Stream Mining through Group Lag Correlations
description
Transcript of BRAID: Stream Mining through Group Lag Correlations
![Page 1: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/1.jpg)
BRAID: Stream Mining through Group Lag Correlations
Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos
SIGMOD 2005
![Page 2: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/2.jpg)
IntroductionIntroduction
Lag correlations :Lag correlations : For example:For example:
Higher amounts of fluoride in water Higher amounts of fluoride in water → → fewer dental cavities some years laterfewer dental cavities some years later
Goal : Goal : Monitor multiple numerical streams Monitor multiple numerical streams
determine the pair correlated with lag and determine the pair correlated with lag and the valuethe value
![Page 3: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/3.jpg)
Introduction Introduction
k numerical sequences k numerical sequences XX11,…X,…Xk k , , repreport all pair of ort all pair of XXii and and XXjj which which XXii follo follow w XXjj with lag with lag ll
![Page 4: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/4.jpg)
Introduction Introduction
![Page 5: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/5.jpg)
IntroductionIntroduction
In this paper, propose BRAID handle In this paper, propose BRAID handle data stream of semi-infinite lengthdata stream of semi-infinite length Any time processing, and fastAny time processing, and fast NimbleNimble AccurateAccurate Small resource consumptionSmall resource consumption
![Page 6: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/6.jpg)
Proposed methodProposed method
Data stream Data stream X X : {: {xx11, …, , …, xxtt, ..., , ..., xxnn} , } , xxnn is the is the most recent valuemost recent value
RR(0) : X and Y with the same length n and (0) : X and Y with the same length n and have zero lag have zero lag
ρρ Coefficient : Coefficient :
![Page 7: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/7.jpg)
Proposed methodProposed method
For lag For lag ll ,consider common part of ,consider common part of XX and and shifted shifted Y Y , only n-, only n-l l time tickstime ticks
![Page 8: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/8.jpg)
Proposed methodProposed method
![Page 9: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/9.jpg)
Proposed methodProposed method
RR((ll) : correlation coefficient, X is delayed ) : correlation coefficient, X is delayed by by ll
Score at lag Score at lag l l ::
![Page 10: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/10.jpg)
Proposed methodProposed method
RR((ll) for large value of lag ) for large value of lag ll ≈ ≈ nn, the origi, the original and shifted time sequence have too fnal and shifted time sequence have too few overlappingew overlapping Restrict maximum lag Restrict maximum lag mm to be to be nn/2/2
![Page 11: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/11.jpg)
Proposed methodProposed method
Naive solution :Naive solution : At time At time nn, access all value of , access all value of XX and and YY, compu, compu
te te RR((ll) of all value lag ) of all value lag ll(=0,1,…)(=0,1,…) Choose earliest max score above Choose earliest max score above r r , or repor, or repor
t no lagt no lag The solution based on three major stepThe solution based on three major step
![Page 12: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/12.jpg)
Proposed methodProposed method
Need some sufficient statistics for Need some sufficient statistics for RR to c to computed easilyomputed easily SSxx((ll,,nn) = : sum of ) = : sum of XX of length of length nn SSxxxx((ll,,nn) = : sum of square ) = : sum of square XX of length of length nn SSxyxy((ll) = : sum of square ) = : sum of square XX of length of length nn
n
tx
1t
2
1
n
ttx
n
lt
ttyx1
1
![Page 13: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/13.jpg)
Proposed methodProposed method
RR((ll) is obtained :) is obtained :
![Page 14: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/14.jpg)
Proposed methodProposed method
RR((ll) can estimate at any point time, only ) can estimate at any point time, only need to keep track five sufficient statistineed to keep track five sufficient statisticscs
It still needs linear time to compute the It still needs linear time to compute the cross-correlation function between two cross-correlation function between two sequencessequences
![Page 15: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/15.jpg)
Proposed methodProposed method
Propose to keep track of only a geometric Propose to keep track of only a geometric progression of the lag value : progression of the lag value : ll= 0,1,2,..2= 0,1,2,..2ii,.,.
Only O(logOnly O(lognn) number to track of, instead o) number to track of, instead of O(f O(nn) that “Naïve solution” requires) that “Naïve solution” requires
Space required grow linearly with length Space required grow linearly with length nn
![Page 16: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/16.jpg)
Proposed methodProposed method In order to compute In order to compute RR((ll) at any time, keep slidi) at any time, keep slidi
ng window of size ng window of size ll, , mm==nn/2 need O(/2 need O(nn) space) space
Instead of operating on original time sequence,Instead of operating on original time sequence, also compute their smoothed version by com also compute their smoothed version by computing non-overlapping windowsputing non-overlapping windows
![Page 17: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/17.jpg)
Proposed methodProposed method Window size : power of g=2Window size : power of g=2 XX : original time sequence : original time sequence AAxxh h : smoothed version with window of length 2: smoothed version with window of length 2hh
AAxx00 : original sequence, A : original sequence, Axx11 : consists of n/2 ticks : consists of n/2 ticks ,..etc ,..etc
AAxxh h ‘s sufficient statistic need compute every 2‘s sufficient statistic need compute every 2hh time tickstime ticks
At time n, need O(log At time n, need O(log nn) level, for each level com) level, for each level compute sufficient statisticpute sufficient statistic
![Page 18: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/18.jpg)
![Page 19: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/19.jpg)
Proposed methodProposed method
In contrast with small lags, the larger onIn contrast with small lags, the larger one are sparsee are sparse Use cubic spline to interpolate the missing cUse cubic spline to interpolate the missing c
orrelation coefficient orrelation coefficient
![Page 20: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/20.jpg)
Proposed methodProposed method
AAxxhh(t) : window average at time tick t for (t) : window average at time tick t for level level hh
AAxxhh(0) ≡ (0) ≡ xxt t
![Page 21: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/21.jpg)
Proposed methodProposed method Sufficient statistics:Sufficient statistics:
![Page 22: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/22.jpg)
![Page 23: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/23.jpg)
Enhanced BRAIDEnhanced BRAID
If two sequence of size If two sequence of size ≈ 2≈ 22020, require ab, require about 5*log 2out 5*log 22020 = 5*20=100 float numbers , = 5*20=100 float numbers , about 800 bytes about 800 bytes
Large memory available, propose a soluLarge memory available, propose a solution to probe more but use O(log tion to probe more but use O(log nn) spac) spacee
Use mix of arithmetic plus geometric proUse mix of arithmetic plus geometric probingbing
![Page 24: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/24.jpg)
Enhanced BRAIDEnhanced BRAID
BRAID use only one window at each smoBRAID use only one window at each smoothing levelothing level
Propose use b>1 windows, b=4 insteadPropose use b>1 windows, b=4 instead Algorithm before b=1,with exception botAlgorithm before b=1,with exception bot
tom level has 2b coefficienttom level has 2b coefficient While computing While computing RR((ll), use mixture geom), use mixture geom
etric and arithmetic progression:etric and arithmetic progression:
![Page 25: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/25.jpg)
Enhanced BRAIDEnhanced BRAID
Example of enhanced BRAID of b=4Example of enhanced BRAID of b=4
The algorithm behind if b=1 also The algorithm behind if b=1 also equal to the algorithm beforeequal to the algorithm before
![Page 26: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/26.jpg)
![Page 27: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/27.jpg)
![Page 28: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/28.jpg)
![Page 29: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/29.jpg)
![Page 30: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/30.jpg)
![Page 31: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/31.jpg)
Conclusion Conclusion
Proposed BRAID to detection lag Proposed BRAID to detection lag correlation on streaming datacorrelation on streaming data At any timeAt any time Low resource consumptionLow resource consumption High accuracyHigh accuracy
![Page 32: BRAID: Stream Mining through Group Lag Correlations](https://reader035.fdocuments.net/reader035/viewer/2022062321/56812c23550346895d908c11/html5/thumbnails/32.jpg)
Thank you very much~Thank you very much~