sigt98

50

Transcript of sigt98

Page 1: sigt98

An Introduction to Change�PointDetection

Joseph L� HellersteinT�J� Watson Research Center

IBM ResearchHawthorne� New York

Fan ZhangDepartment of Industrial Engineering and

Operations ResearchColumbia UniversityNew York� New York

June� ����

Page 2: sigt98

Hellerstein and Zhang �

Background and Motivations

�Most analysis and control assumes sta�tionary stochastic processes�no change in

�Mean

� Variance

� Covariances

� Bad things can happen to good processes� A router can fail in a network

� A conveyor belt can stop on an assem�bly line

� A bank can fail in an economy

� Need to determine when process param�eters have changed in order to

� Correct the process

� Change control parameters

Page 3: sigt98

Hellerstein and Zhang �

Mainframe Data

Page 4: sigt98

Hellerstein and Zhang �

Web Server Data

0 10 200

20

40

time (hr)

us

r

0 10 200

20

40

time (hr)

sys

0 10 20

0

20

40

time (hr)

pd

b

0 10 200

10

20

time (hr)

m

db

0 10 200

200

400

time (hr)

Ipkt

/s

0 10 200

200

400

time (hr)

Opk

ts

0 10 200

10

20

time (hr)

Col

l%

0 10 200

1

2x 10

5

time (hr)

tcpI

n/s

0 10 200

2

4x 10

5

time (hr)

tcpO

ut/s

Page 5: sigt98

Hellerstein and Zhang �

Types of Change�Point Detection

�O�line� Data are presented en�mass

� Identify stationary intervals

�On�line� Data are presented serially

� Detect when the parameters of the pro�cess change

Page 6: sigt98

Hellerstein and Zhang �

Outline

� Hypothesis testing and statistical back�ground

�O�line tests� Theory for on�line tests

�On�line tests� Practical considerations� References

Page 7: sigt98

Hellerstein and Zhang �

Hypothesis Testing

� Test assertions about parameters of a pro�cess e�g�� mean� variance� covariance�

� H� Null hypothesis�� normal situatione�g�� mean response time is � second�

� H� Alternate hypothesis�� abnormal sit�uation e�g�� mean response time is sec�onds�

Page 8: sigt98

Hellerstein and Zhang �

Components of a Statistical Test ���

� T � A test statistic that is computed fromthe data

Ty� � fy�� � � � � yN�

� dT� � f���g� A decision function that de�termines if the test statistic is within anacceptable range

� �� Okay

� �� raise an alarm

�Observation� d classi�es values of y

Page 9: sigt98

Hellerstein and Zhang

Examples of Test Components

� Test statistics

� Ty� � �y

� Ty� � Piyi � �y��

� Decision functions� Use critical value up�per or lower limit�

dT� �

������������ if T � TLC� otherwise

dT� �

������������ if T � THC� otherwise

dT � �

������������ if T � TLC or T � THC� otherwise

�Mixed test� dT� � �����

Page 10: sigt98

Hellerstein and Zhang �

Outcomes of Tests

� Raise an alarm if dT� � �

No Alarm Alarm

H� is true OK false positiveH� is true false negative OK

Page 11: sigt98

Hellerstein and Zhang ��

Critical Regions

� Set of y values for which H� is rejected

� Denoted by C

P false positive � � �

� Py � C j H��

P false negative � � �

� Py � �C j H��

Page 12: sigt98

Hellerstein and Zhang ��

Critical Regions

Page 13: sigt98

Hellerstein and Zhang ��

Test Design

�Objective� select test � that minimizes ��subject to the constraint that �� is nottoo large�

� Power of a test provides a succinct wayof expressing this objective

��� � P� rejects H� j �

� Note that

��� �

���������������������

�� if � H�

�� �� if � H�

� Ideal test

��� �

���������������������

� if � H�� if � H�

Page 14: sigt98

Hellerstein and Zhang ��

Notes

� Can always minimize � or � separatelyby having a deterministic outcome to thetest

Page 15: sigt98

Hellerstein and Zhang ��

Likelihood Function

� Transformation of the data

� Used in test statistics

� Indicates the probability or density� ofthe data if the distribution is known

Ly� �

�����������Py j � if y is discretefy j � if y is continuous

� Example� Normal distribution with �� ���

Ly� ��BB�

��p��

�CCA exp�y � ��

�����

where H� is speci�ed in terms of and ��

� If observations are i�i�d�Ly�� � � � � yN� � Ly�� � � �LyN�

� For the normal� this means

Ly�� � � � � yN� ��BB�

��p��

�CCANexp� NX

i��

yi � ��

����

Page 16: sigt98

Hellerstein and Zhang ��

Likelihood Function For Correct

N(0,1) Likelihood Values for N(0,1) RV

L(y1,...,yN) is approximately 1/10**21

Page 17: sigt98

Hellerstein and Zhang ��

Notes

� Put on alternate projector

Page 18: sigt98

Hellerstein and Zhang ��

Likelihood Function For Incorrect

N(0,1) Likelihood Values for N(3,1) RV

L(y1,...,yN) is approximately 1/10**72

Page 19: sigt98

Hellerstein and Zhang �

Likelihood Ratio

� Indicates the relative probability or den�sity� of obtaining the data

L�yi�

L�yi�

�Often use the log of the likelihood ratio

si � lnL�

yi�

L�yi�

� Example� N�� ��� and N�� ���

si ������

��yi � ���

�A

� v��yi � ��� v�

���

� b�

yi � � � v

� v � � � � is the change in magnitude

� b� ���� is the signal to noise ratio

Page 20: sigt98

Hellerstein and Zhang �

Notes

� Put on second projector

Page 21: sigt98

Hellerstein and Zhang ��

Observations About LikelihoodRatios

Consider Gaussian yi

� si is Gaussian since a linear combinationof Gaussians is Gaussian

� If � H�� then Esi� � �

� If � H�� then Esi� � �

� If H� � H�� then Esi� � �

Page 22: sigt98

Hellerstein and Zhang ��

Notes

� Esi� � � follows from E�v��yi � �� �

v�

���� � � v�

����

� Esi� � � follows from E�v��yi � �� �

v�

���� � v�

����

Page 23: sigt98

Hellerstein and Zhang ��

Most Powerful Test

�Given � and � corresponding to H� andH�

� De�nition� �� is a most powerful test i

� For all � such that ���� � �����

� Then ����� ����

� Intuition for constructing ��� Place �rstinto the critical region those y that have�

� the lowest probability under H�

� the highest probability under H�

� Neyman�Pearson Lemma

� �� is a most powerful test if it is con�structed as follows

y � C iL�

y�

L�y�

� h

Page 24: sigt98

Hellerstein and Zhang ��

Notes

� Illustrate intuition from the critical re�gion �gure

Page 25: sigt98

Hellerstein and Zhang ��

O��Line Tests

� View as constrained clustering

�Want homogeneous clusters

� Choose change points such that

� Variance within a cluster is smaller thanvariance between

� Assumes that only the mean changes

Page 26: sigt98

Hellerstein and Zhang ��

Example of Partitioning

A 3-Partitioning

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

-1

0

1

2

3

4

5

6

y[1..5]= .48Asq[1..5]= .95

y(6..14]= 2.05Asq[6..14]=1.72

y(15..19]= 1.09Asq[15..19]=1.31

y(20..30]= 3.83Asq[20..30]=5.20

Page 27: sigt98

Hellerstein and Zhang ��

Notes

� Put on alternate projector

Page 28: sigt98

Hellerstein and Zhang ��

An Approach to O��Line ChangePoint Detection

� Perspective� Locating change�points is equiv�alent to �nding the optimal way to par�tition time�serial data

� Homogeneous within a partition

� Heterogeneous between partitions

� A range of indices is indicated by �m��n�for � � m � n � N

� Detecting k change�points results in a k�partitioning

� P � P�� � � � � Pk��� � � P� � P� � � � � � Pk � N

� Approach is due to W�D� Fisher

Page 29: sigt98

Hellerstein and Zhang �

De�nitions

�Mean of a range of observations

�y�m��n� �ym� � � �� ynn�m� �

� Adjusted sum of squares degree of ho�mogeneity� within a partition

ASQ�m��n� �nX

j�myj � �y�m��n���

� Figure of merit for the change points iden�ti�ed is

DP � ASQ�Pk��N � �k��Xj��

ASQ�Pj��Pj�� � ��

�P is an optimal k�partitioning if there isno k�partitioning P such that DP� � DP

Page 30: sigt98

Hellerstein and Zhang �

Observations

� The computational complexity of �ndingan optimal k�partitioning is

�BBBB�N

k

�CCCCA

� DP �

� If P is an N�partitioning� then DP � �

�Want a k�partitioning with

� k large enough to �nd the change points

� k small enough so that non�change pointsare avoided

Page 31: sigt98

Hellerstein and Zhang ��

Fisher Algorithm for Change�PointDetection

ChangePoints�rst� last� CPList�

� ComputeQ� �� Q��� the optimal ��partitioning

� Compute T where

T �ASQ�first��last�

ASQ�first��Q� � ��� ASQ�Q���last�

� If T exceeds a critical value

� Add Q� to CPList

�ChangePoints�rst� Q� � �� CPList�

�ChangePointsQ�� last� CPList�

� Return

Page 32: sigt98

Hellerstein and Zhang ��

Results of Applying FisherAlgorithm to Mainframe Data

Page 33: sigt98

Hellerstein and Zhang ��

On�Line Change�Point Detection

� Introduction� Shewhart Test� Average run length

�Geometric moving average test

� CUSUM test

Page 34: sigt98

Hellerstein and Zhang ��

Introduction to On�LineChange�Point Detection

� Data are presented serially

� Raise an alarm if a change is detected

� ta is the time of the alarm

� Identify when the change occurred

� t� is the time of the change�point� witht� � ta

Page 35: sigt98

Hellerstein and Zhang ��

Illustration of Concepts in On�LineChange�Point Detection

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

-1

0

1

2

3

4

5

6

Alarm time

Change Point

Ilustration of On-Line Detection

Alarm Delay

t0 ta

Page 36: sigt98

Hellerstein and Zhang ��

Formalization for On�Line Tests

� Let� p be the actual distribution density�function

� p�be the distribution density� under

H�

� p�be the distribution density� under

H�

� Consider yk� H�� pyk j yk��� � � � � y�� � p�

yk j yk��� � � � � y��� H�� There is a time t� such that

� for � � i � t� � �� pyi j yi��� � � � � y�� �p�

yi j yi��� � � � � y��� for t� � i � k� pyi j yi��� � � � � yt�� �p�

yi j yi��� � � � � yt��� The alarm time ta is the smallest k suchthat H� is chosen over H�

Page 37: sigt98

Hellerstein and Zhang ��

Test Statistic for On�LineChange�Points

� Sum of log likelihood ratios

Ski �kXj�i

sj

� Consider

� yi is N� ���� � � under H�

� � � under H�

� m� k � i

� Then

� Ski � b�Pkj�i

yj � � � v

� H�� Ski is N�mv�

����mb��

� H�� Ski is Nmv�

����mb��

Page 38: sigt98

Hellerstein and Zhang ��

Notes

� Put on alternate projector

Page 39: sigt98

Hellerstein and Zhang �

Shewhart Algorithm for On�LineChange Point Detection

�Operation� Take samples of �xed batch size N

�Make a decision independently for eachbatch

� Do for k � � to �� Obtain the next N samples

� If SkN�k���N�� � h

� Raise an alarm

� exit� Note� Granularity of detection is determinedby N

Page 40: sigt98

Hellerstein and Zhang �

How is h Determined�

� Ideally want

� Short alarm delay if there is a change�point

� Long time until an alarm if there is nochange�point

� Criteria� Average Run Length �ARL�� Average number of observations untilthere is an alarm

� ARL is related to but dierent from thepower of an o�line test

Page 41: sigt98

Hellerstein and Zhang ��

Computing Average Run Length forShewhart Test

� �� � PSN� � h j H��

� Pta � kN j H�� � �� ����k�����

� ARL�� Time until an alarm if there is nochange point

EARL�� � N ��

� ARL�� Alarm delay time from change�point until alarm is raised�

EARL�� � N PSN� � h j H��

� Choose h based on a desired ARL�

Page 42: sigt98

Hellerstein and Zhang ��

Geometric Moving Average ControlCharts

�Motivation

� Give recent observations more weight

� Consider � � � � �

gk � �� ��gk��� � �sk�� � P�

n���� ��nsk�n� Decision for an alarm

ta � minfk j gk hg

�Obtaining h� Observe that under H�

� gk is N� v�

���� b

������

Page 43: sigt98

Hellerstein and Zhang ��

Notes

� Relate to Shewhart� Geometric weighting ensures that themean of gk is the same as that for sk�

� However� the variance is dierent�

Page 44: sigt98

Hellerstein and Zhang ��

CUSUM Cumulative Sum ControlCharts

�Motivation

� Ski has a negative drift under H�

� As a result� ARL� may be longer thannecessary

� Strategy� Adjust Ski so that it does not becometoo small

gk � Sk� �mk�

mk � minfSj� j � � j � kg� Approach� ta � minfk j Sk mk� hg

Page 45: sigt98

Hellerstein and Zhang ��

Illustration of the Three On�LineAlgorithms

Page 46: sigt98

Hellerstein and Zhang ��

Unknown Probability Distributions

� Situation� Can estimate � from historical data

� � is unknown

�Generalized likelihood ratio

� Uki � sup�fSki ��g

� For Gaussian distributions and changesin the mean

� Uki � sup�

�����

��

�A Pkj�iyj � ���

� �

� This can be solved explicitly

� Use Uki instead of Ski

Page 47: sigt98

Hellerstein and Zhang ��

Handling Non�Stationary Data

� Suppose that the data vary with time ofday or day of month

�Question� How do we separate normalvariability from abnormal variability�

� Answer� Model the normal variability

Page 48: sigt98

Hellerstein and Zhang ��

Normal Variability in Web ServerData

0 10 200

10

20

hour

http

op/s

Mon day 9

0 10 200

10

20

hour

http

op/s

Tue day 10

0 10 200

10

20

hour

http

op/s

Wed day 11

0 10 200

10

20

hour

http

op/s

Thu day 12

0 10 200

10

20

hour

http

op/s

Fri day 13

0 10 200

2

4

hourht

tpop

/s

Sat day 14

0 10 200

2

4

hour

http

op/s

Sun day 15

Page 49: sigt98

Hellerstein and Zhang �

Summary

� Basics� O�line vs� on�line change�point de�tection

� Likelihood functions� ratios� log likeli�hood functions

� Neyman�Pearson lemma

�O�line change point detection� Fisher algorithm

�On�Line change�point detection� Shewhart

� Geometric moving average

� CUSUM

Page 50: sigt98

Hellerstein and Zhang �

References

�Michele Basseville and Igor V� Nikiforov�Detection of Abrupt Changes The�ory and Application� Prentice Hall� �����

�W�D� Fisher� �On Grouping for Maxi�mum Homogeneity� Journal of the Amer�ican Statistical Association� � ���������December� �����

� Richard A� Johnson and Dean W� Wich�ern�Applied Multivariate StatisticalAnalysis� Prentice Hall� �����