sigt98
-
Upload
finigan-joyce -
Category
Documents
-
view
18 -
download
2
Transcript of sigt98
�
An Introduction to Change�PointDetection
Joseph L� HellersteinT�J� Watson Research Center
IBM ResearchHawthorne� New York
Fan ZhangDepartment of Industrial Engineering and
Operations ResearchColumbia UniversityNew York� New York
June� ����
Hellerstein and Zhang �
Background and Motivations
�Most analysis and control assumes sta�tionary stochastic processes�no change in
�Mean
� Variance
� Covariances
� Bad things can happen to good processes� A router can fail in a network
� A conveyor belt can stop on an assem�bly line
� A bank can fail in an economy
� Need to determine when process param�eters have changed in order to
� Correct the process
� Change control parameters
Hellerstein and Zhang �
Mainframe Data
Hellerstein and Zhang �
Web Server Data
0 10 200
20
40
time (hr)
us
r
0 10 200
20
40
time (hr)
sys
0 10 20
0
20
40
time (hr)
pd
b
0 10 200
10
20
time (hr)
m
db
0 10 200
200
400
time (hr)
Ipkt
/s
0 10 200
200
400
time (hr)
Opk
ts
0 10 200
10
20
time (hr)
Col
l%
0 10 200
1
2x 10
5
time (hr)
tcpI
n/s
0 10 200
2
4x 10
5
time (hr)
tcpO
ut/s
Hellerstein and Zhang �
Types of Change�Point Detection
�O�line� Data are presented en�mass
� Identify stationary intervals
�On�line� Data are presented serially
� Detect when the parameters of the pro�cess change
Hellerstein and Zhang �
Outline
� Hypothesis testing and statistical back�ground
�O�line tests� Theory for on�line tests
�On�line tests� Practical considerations� References
Hellerstein and Zhang �
Hypothesis Testing
� Test assertions about parameters of a pro�cess e�g�� mean� variance� covariance�
� H� Null hypothesis�� normal situatione�g�� mean response time is � second�
� H� Alternate hypothesis�� abnormal sit�uation e�g�� mean response time is sec�onds�
Hellerstein and Zhang �
Components of a Statistical Test ���
� T � A test statistic that is computed fromthe data
Ty� � fy�� � � � � yN�
� dT� � f���g� A decision function that de�termines if the test statistic is within anacceptable range
� �� Okay
� �� raise an alarm
�Observation� d classi�es values of y
Hellerstein and Zhang
Examples of Test Components
� Test statistics
� Ty� � �y
� Ty� � Piyi � �y��
� Decision functions� Use critical value up�per or lower limit�
dT� �
������������ if T � TLC� otherwise
dT� �
������������ if T � THC� otherwise
dT � �
������������ if T � TLC or T � THC� otherwise
�Mixed test� dT� � �����
Hellerstein and Zhang �
Outcomes of Tests
� Raise an alarm if dT� � �
No Alarm Alarm
H� is true OK false positiveH� is true false negative OK
Hellerstein and Zhang ��
Critical Regions
� Set of y values for which H� is rejected
� Denoted by C
P false positive � � �
� Py � C j H��
P false negative � � �
� Py � �C j H��
Hellerstein and Zhang ��
Critical Regions
Hellerstein and Zhang ��
Test Design
�Objective� select test � that minimizes ��subject to the constraint that �� is nottoo large�
� Power of a test provides a succinct wayof expressing this objective
��� � P� rejects H� j �
� Note that
��� �
���������������������
�� if � H�
�� �� if � H�
� Ideal test
��� �
���������������������
� if � H�� if � H�
Hellerstein and Zhang ��
Notes
� Can always minimize � or � separatelyby having a deterministic outcome to thetest
Hellerstein and Zhang ��
Likelihood Function
� Transformation of the data
� Used in test statistics
� Indicates the probability or density� ofthe data if the distribution is known
Ly� �
�����������Py j � if y is discretefy j � if y is continuous
� Example� Normal distribution with �� ���
Ly� ��BB�
��p��
�CCA exp�y � ��
�����
where H� is speci�ed in terms of and ��
� If observations are i�i�d�Ly�� � � � � yN� � Ly�� � � �LyN�
� For the normal� this means
Ly�� � � � � yN� ��BB�
��p��
�CCANexp� NX
i��
yi � ��
����
Hellerstein and Zhang ��
Likelihood Function For Correct
N(0,1) Likelihood Values for N(0,1) RV
L(y1,...,yN) is approximately 1/10**21
Hellerstein and Zhang ��
Notes
� Put on alternate projector
Hellerstein and Zhang ��
Likelihood Function For Incorrect
N(0,1) Likelihood Values for N(3,1) RV
L(y1,...,yN) is approximately 1/10**72
Hellerstein and Zhang �
Likelihood Ratio
� Indicates the relative probability or den�sity� of obtaining the data
L�yi�
L�yi�
�Often use the log of the likelihood ratio
si � lnL�
yi�
L�yi�
� Example� N�� ��� and N�� ���
si ������
��yi � ���
�
�A
� v��yi � ��� v�
���
� b�
yi � � � v
�
� v � � � � is the change in magnitude
� b� ���� is the signal to noise ratio
Hellerstein and Zhang �
Notes
� Put on second projector
Hellerstein and Zhang ��
Observations About LikelihoodRatios
Consider Gaussian yi
� si is Gaussian since a linear combinationof Gaussians is Gaussian
� If � H�� then Esi� � �
� If � H�� then Esi� � �
� If H� � H�� then Esi� � �
Hellerstein and Zhang ��
Notes
� Esi� � � follows from E�v��yi � �� �
v�
���� � � v�
����
� Esi� � � follows from E�v��yi � �� �
v�
���� � v�
����
Hellerstein and Zhang ��
Most Powerful Test
�Given � and � corresponding to H� andH�
� De�nition� �� is a most powerful test i
� For all � such that ���� � �����
� Then ����� ����
� Intuition for constructing ��� Place �rstinto the critical region those y that have�
� the lowest probability under H�
� the highest probability under H�
� Neyman�Pearson Lemma
� �� is a most powerful test if it is con�structed as follows
y � C iL�
y�
L�y�
� h
Hellerstein and Zhang ��
Notes
� Illustrate intuition from the critical re�gion �gure
Hellerstein and Zhang ��
O��Line Tests
� View as constrained clustering
�Want homogeneous clusters
� Choose change points such that
� Variance within a cluster is smaller thanvariance between
� Assumes that only the mean changes
Hellerstein and Zhang ��
Example of Partitioning
A 3-Partitioning
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
-1
0
1
2
3
4
5
6
y[1..5]= .48Asq[1..5]= .95
y(6..14]= 2.05Asq[6..14]=1.72
y(15..19]= 1.09Asq[15..19]=1.31
y(20..30]= 3.83Asq[20..30]=5.20
Hellerstein and Zhang ��
Notes
� Put on alternate projector
Hellerstein and Zhang ��
An Approach to O��Line ChangePoint Detection
� Perspective� Locating change�points is equiv�alent to �nding the optimal way to par�tition time�serial data
� Homogeneous within a partition
� Heterogeneous between partitions
� A range of indices is indicated by �m��n�for � � m � n � N
� Detecting k change�points results in a k�partitioning
� P � P�� � � � � Pk��� � � P� � P� � � � � � Pk � N
� Approach is due to W�D� Fisher
Hellerstein and Zhang �
De�nitions
�Mean of a range of observations
�y�m��n� �ym� � � �� ynn�m� �
� Adjusted sum of squares degree of ho�mogeneity� within a partition
ASQ�m��n� �nX
j�myj � �y�m��n���
� Figure of merit for the change points iden�ti�ed is
DP � ASQ�Pk��N � �k��Xj��
ASQ�Pj��Pj�� � ��
�P is an optimal k�partitioning if there isno k�partitioning P such that DP� � DP
Hellerstein and Zhang �
Observations
� The computational complexity of �ndingan optimal k�partitioning is
�BBBB�N
k
�CCCCA
� DP �
� If P is an N�partitioning� then DP � �
�Want a k�partitioning with
� k large enough to �nd the change points
� k small enough so that non�change pointsare avoided
Hellerstein and Zhang ��
Fisher Algorithm for Change�PointDetection
ChangePoints�rst� last� CPList�
� ComputeQ� �� Q��� the optimal ��partitioning
� Compute T where
T �ASQ�first��last�
ASQ�first��Q� � ��� ASQ�Q���last�
� If T exceeds a critical value
� Add Q� to CPList
�ChangePoints�rst� Q� � �� CPList�
�ChangePointsQ�� last� CPList�
� Return
Hellerstein and Zhang ��
Results of Applying FisherAlgorithm to Mainframe Data
Hellerstein and Zhang ��
On�Line Change�Point Detection
� Introduction� Shewhart Test� Average run length
�Geometric moving average test
� CUSUM test
Hellerstein and Zhang ��
Introduction to On�LineChange�Point Detection
� Data are presented serially
� Raise an alarm if a change is detected
� ta is the time of the alarm
� Identify when the change occurred
� t� is the time of the change�point� witht� � ta
Hellerstein and Zhang ��
Illustration of Concepts in On�LineChange�Point Detection
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
-1
0
1
2
3
4
5
6
Alarm time
Change Point
Ilustration of On-Line Detection
Alarm Delay
t0 ta
Hellerstein and Zhang ��
Formalization for On�Line Tests
� Let� p be the actual distribution density�function
� p�be the distribution density� under
H�
� p�be the distribution density� under
H�
� Consider yk� H�� pyk j yk��� � � � � y�� � p�
yk j yk��� � � � � y��� H�� There is a time t� such that
� for � � i � t� � �� pyi j yi��� � � � � y�� �p�
yi j yi��� � � � � y��� for t� � i � k� pyi j yi��� � � � � yt�� �p�
yi j yi��� � � � � yt��� The alarm time ta is the smallest k suchthat H� is chosen over H�
Hellerstein and Zhang ��
Test Statistic for On�LineChange�Points
� Sum of log likelihood ratios
Ski �kXj�i
sj
� Consider
� yi is N� ���� � � under H�
� � � under H�
� m� k � i
� Then
� Ski � b�Pkj�i
yj � � � v
�
� H�� Ski is N�mv�
����mb��
� H�� Ski is Nmv�
����mb��
Hellerstein and Zhang ��
Notes
� Put on alternate projector
Hellerstein and Zhang �
Shewhart Algorithm for On�LineChange Point Detection
�Operation� Take samples of �xed batch size N
�Make a decision independently for eachbatch
� Do for k � � to �� Obtain the next N samples
� If SkN�k���N�� � h
� Raise an alarm
� exit� Note� Granularity of detection is determinedby N
Hellerstein and Zhang �
How is h Determined�
� Ideally want
� Short alarm delay if there is a change�point
� Long time until an alarm if there is nochange�point
� Criteria� Average Run Length �ARL�� Average number of observations untilthere is an alarm
� ARL is related to but dierent from thepower of an o�line test
Hellerstein and Zhang ��
Computing Average Run Length forShewhart Test
� �� � PSN� � h j H��
� Pta � kN j H�� � �� ����k�����
� ARL�� Time until an alarm if there is nochange point
EARL�� � N ��
� ARL�� Alarm delay time from change�point until alarm is raised�
EARL�� � N PSN� � h j H��
� Choose h based on a desired ARL�
Hellerstein and Zhang ��
Geometric Moving Average ControlCharts
�Motivation
� Give recent observations more weight
� Consider � � � � �
gk � �� ��gk��� � �sk�� � P�
n���� ��nsk�n� Decision for an alarm
ta � minfk j gk hg
�Obtaining h� Observe that under H�
� gk is N� v�
���� b
������
Hellerstein and Zhang ��
Notes
� Relate to Shewhart� Geometric weighting ensures that themean of gk is the same as that for sk�
� However� the variance is dierent�
Hellerstein and Zhang ��
CUSUM Cumulative Sum ControlCharts
�Motivation
� Ski has a negative drift under H�
� As a result� ARL� may be longer thannecessary
� Strategy� Adjust Ski so that it does not becometoo small
gk � Sk� �mk�
mk � minfSj� j � � j � kg� Approach� ta � minfk j Sk mk� hg
Hellerstein and Zhang ��
Illustration of the Three On�LineAlgorithms
Hellerstein and Zhang ��
Unknown Probability Distributions
� Situation� Can estimate � from historical data
� � is unknown
�Generalized likelihood ratio
� Uki � sup�fSki ��g
� For Gaussian distributions and changesin the mean
� Uki � sup�
�����
��
�A Pkj�iyj � ���
� �
� This can be solved explicitly
� Use Uki instead of Ski
Hellerstein and Zhang ��
Handling Non�Stationary Data
� Suppose that the data vary with time ofday or day of month
�Question� How do we separate normalvariability from abnormal variability�
� Answer� Model the normal variability
Hellerstein and Zhang ��
Normal Variability in Web ServerData
0 10 200
10
20
hour
http
op/s
Mon day 9
0 10 200
10
20
hour
http
op/s
Tue day 10
0 10 200
10
20
hour
http
op/s
Wed day 11
0 10 200
10
20
hour
http
op/s
Thu day 12
0 10 200
10
20
hour
http
op/s
Fri day 13
0 10 200
2
4
hourht
tpop
/s
Sat day 14
0 10 200
2
4
hour
http
op/s
Sun day 15
Hellerstein and Zhang �
Summary
� Basics� O�line vs� on�line change�point de�tection
� Likelihood functions� ratios� log likeli�hood functions
� Neyman�Pearson lemma
�O�line change point detection� Fisher algorithm
�On�Line change�point detection� Shewhart
� Geometric moving average
� CUSUM
Hellerstein and Zhang �
References
�Michele Basseville and Igor V� Nikiforov�Detection of Abrupt Changes The�ory and Application� Prentice Hall� �����
�W�D� Fisher� �On Grouping for Maxi�mum Homogeneity� Journal of the Amer�ican Statistical Association� � ���������December� �����
� Richard A� Johnson and Dean W� Wich�ern�Applied Multivariate StatisticalAnalysis� Prentice Hall� �����