From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University...

51
From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000

Transcript of From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University...

Page 1: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

From Grid Data to Patternsand Structures

Padhraic SmythInformation and Computer Science

University of California, Irvine

July 2000

Page 2: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Monday’s talk:

An introduction to data mining

General concepts

Focus on current practice of data mining: mainmessage is be aware of the “hype factor”

Today’s talk:

Modeling structure and patterns

Page 3: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Further Reading on Data Mining

• Review Paper:– www.ics.uci.edu/~datalab– P. Smyth, “Data mining: data analysis on a grand scale?”, preprint

of review paper to appear in Statistical Methods in Medical Research

• Text (forthcoming)

– Principles of Data Mining

• D. J Hand, H. Mannila, P. Smyth

• MIT Press, late 2000

Page 4: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

NonlinearRegression

PatternFindingComputer Vision,

Signal Recognition

FlexibleClassificationModels

ScalableAlgorithms

GraphicalModels

HiddenVariableModels

“Hot Topics”

HiddenMarkov Models

BeliefNetworks

SupportVectorMachines

Mixture/Factor Models

Classification Trees

AssociationRules

DeformableTemplates

ModelCombining

Wavelets

Page 5: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Theme: From Grid Points to Patterns

• A Mismatch

– earth science is concerned with structures and objects and their dynamic behavior

• global: EOF patterns, trends, anomalies

• local: storms, eddies, currents, etc

– but much of earth science modeling is at the grid level

• models are typically defined at the lowest level of the “object hierarchy”

Page 6: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Theme: From Grid Points to Patterns

Models are oftendown here

Structure ofScientific Intereste.g.,local: storm, eddy, etcglobal: EOF, trend, etc

Page 7: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Examples of Grid Models

• Analysis: Markov Random Fields (e.g., Besag, Geman and Geman)

– p(x1|neighbors of x1,all other pixels) = p(x1|neighbors of x1)

– p(x1,….xN) = product of clique functions

– Problem

• only models “low-level” pixel constraints

• no systematic way to include information about shape

• Simulation: GCM models

– grid model for 4d, first-principles equations

– produces vast amounts of data

– no systematic way to extract structure from GCM output

Page 8: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

The Impact of Massive Data Sets

• Traditional Spatio-Temporal Data Analysis

– visualization, EDA: look at the maps, spot synoptic patterns

• But with Massive Data Sets…….

– e.g., GCM: multivariate fields, high resolution, many years

– impossible to manually visualize

• Proposal

– pattern analysis and modeling can play an important role in data abstraction

– many new ideas and techniques for pattern modeling are now available

Page 9: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Data Abstraction Methods

• Simple Aggregation

• Basis Functions/Dimension Reduction

– EOFs/PCA, Wavelets, Kernels

• Latent Variable Models

– mixture models

– hidden variable models

• Local Spatial and/or Temporal Patterns

– e.g., trajectories, eddies, El Nino, etc

Less widely-used

Relatively widely-used

Page 10: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

A Modeling Language: Graphical Models

p(A,B,C,D) = p(x|parents(x))

= p(D|C)p(C|A)p(B|A)p(A)

joint distribution = product of local factors

A

B C

D

Page 11: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Two Advantages of Graphical Models

• Communication– clarifies independence relations in the multivariate model

• Computation (Inference)– posterior probabilities can be calculated efficiently

• tree structure: linear in number of variables • graph with loops: depends on clique structure

– Exists completely general algorithms for inference• e.g., see Lauritzen and Spiegelhalter, JRSS, 1988

• for more recent work see Learning in Graphical Models, M. I. Jordan (ed), MIT Press, 1999

Page 12: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

The Hidden Markov Model

X1 X2 X3 XT

Y1 Y2 Y3 YT

Time

Observed

Hidden

Page 13: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

The Hidden Markov Model

X1 X2 X3 XT

Y1 Y2 Y3 YT

Time

Observed

Hidden

P(X,Y) = p(xt | xt-1 ) p(yt | xt )

Markov Chain Conditional Density

Page 14: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

The Hidden Markov Model

• Standard Model

– Discrete X, m values: Multivariate Y

• Inference and Estimation

– Estimation: Baum-Welch algorithm (uses EM)

– Inference: scales as O(m2T), linear in length of chain

• same as graphical models (Smyth, Heckerman, Jordan, 1997)

• What it is useful for:

– “compresses” high dimensional Y dependence into lower-dimensional X

– model dependence at X level rather than at Y level

– learned states can be viewed as dynamic clusters

– widely used in speech recognition

Page 15: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Kalman Filter Models, etc

X1 X2 X3 XT

Y1 Y2 Y3 YT

Time

Observed

Hidden

If the X’s are real-valued, Gaussian => Kalman filter model

If p(Y|X) is tree-structured => spatio-temporal tree structure

Page 16: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Application: Coupling Precipitation and Atmospheric Models

• Problem

– separate models for precipitation and atmosphere over time

– how to couple both together into a single model?

• “downscaling”

• Hidden Markov approach

– Hughes, Guttorp, Charles (Applied Statistics, 1999)

– coupled data recorded on different time and space scales

– dependence is “compressed” into hidden Markov state dependence

– Nonhomogenous in time:

• atmospheric measurements modulate Markov transitions

Page 17: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

X1 X2 X3 XT

A1 A2 A3 AT

P1 P2 P3 PT Precipitation

AtmosphericMeasurements

HiddenWeatherStates

Page 18: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Precipitation Measurements

• Spatially irregular

• Daily totals (binarized)

Page 19: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Atmospheric Measurements

•Interpolated to regular grid

•SLP, temp, GH

•Twice/day

Page 20: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Joint Data

Page 21: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

“Weather-state” model

• “Weather states”– small discrete set of distinct weather states – assumed to be Markov over time– unobserved => hidden Markov model– Represent atmosphere by locally derived variables

• Spatial precipitation – relatively simple autologistic model – only dependent on weather state

• Algorithm “discovered” 6 physically plausible weather states– validated out of sample

• Example of automated structure discovery ?

Page 22: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Finite Mixture Models

Y

CHidden

Observed

P(Y) = p(Y|c) p(c)

Component Densities

ClassProbabilities

Page 23: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Finite Mixture Models

• Estimation

– Direct application of EM

• Uses

– density estimation,

• approximate p(Y) as linear combination of simple components

– model-based clustering

• interpret component models as clusters

• probabilistic membership of data points, overlap

• can use Bayesian methods, cross-validation to find K

Page 24: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Application: Clustering of Geopotential Height

EOF1

EOF2

Page 25: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Application: Clustering of Geopotential Height

EOF1

EOF2

3 Gaussian solution consistently chosen bv cross-validationClusters agree with analysis of Cheng and Wallace (1995)

Smyth, Ide,Ghil,JAS 1999

Page 26: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Conditional Independence Mixture Models

Y2

C

Y1 Yd

Component Densities

ClassProbabilities

P(Y) = p(Y|c) p© = ( p(Yi|c) ) p(c)

Note: Y’s are marginally dependent: model dependence via C

Page 27: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Mixtures of PCA bases

Page 28: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Mixtures of PCA bases

Formulate a probabilistic model for PCA

Learn mixture of PCAs using EM (Tipping and Bishop, 1999)

Page 29: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Multiple Cause/Factor Models

Y2

C

Y1 Yd

p(Y) = p(Y|c,d) p(c) = ( p(Yi|c,d) ) p(c)p(d)

D

Intuition: Y’s are a result of multiple (hidden) factors

See Dunmur and Titterington (1999)

Page 30: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Summary so far on Mixture Models

• Mixture Model/Latent Variable models

– key idea is that hidden state is an abstraction/categorization

– probabilistic modeling allows a systematic approach

• many models can be expressed in graphical form

• parameters can be learned via EM

• model structure can be automatically chosen

– many exotic variations of these models being proposed in machine learning/neural network learning

– learning of hidden variables <=> discovery of structure

Page 31: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Clustering Objects from Sequential Observations

• Say we want to cluster eddies as a function of time-evolution of

– shape

– intensity

– position

– velocity, etc

• Two problems here:

– 1. “extract” eddy features (shape, etc) from raw grid data

– 2. How can we cluster these “objects”

• different durations: how do we define distance?

Page 32: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Probabilistic Model-Based Approach

p(Y) = p(Y|c) p(c)

Y could be a time-series, curve, sequence, etc

=> p(Y|c) is a density function on time-series, curves, etc

=> mixtures of density models for time-series, curves, etc

EM generalizes nicely => general framework for clusteringobjects (Cadez, Gaffney, Smyth, KDD 2000)

Page 33: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Clusters of Markov Behavior

B

C

D

A

B

C

D

A

B

C

D

A

Cluster 1 Cluster 2

Cluster 3

Page 34: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.
Page 35: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Mixtures of Curves

• Regression Clustering

– model each curve as a regression function f(y|x)

– hypothesize a generative model

• probability pk of being chosen for cluster k

• given cluster k, a noisy version of fk(y|x) is generated

– mixture model, can learn the K noisy functions using EM algorithm

– (Gaffney and Smyth, KDD99)

– significant improvement on k-means

• variable length trajectories, multi-dimensional trajectories

– can use non-parametric kernel regression for component models

Page 36: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

0 5 10 15 20 25 3040

60

80

100

120

140

160

TIME

X-P

OS

ITIO

N

TRAJECTORIES OF CENTROIDS OF MOVING HAND IN VIDEO STREAMS

0 5 10 15 20 25 300

10

20

30

40

50

60

70

80

TIME

Y-P

OS

ITIO

N

ESTIMATED CLUSTER TRAJECTORY

0 5 10 15 20 25 3085

90

95

100

105

110

115

120

125

TIME

X-P

OS

ITIO

N

ESTIMATED CLUSTER TRAJECTORY

Page 37: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Detecting and Clustering Cyclone Trajectories

• Background

– extra-tropical cyclone-center trajectories detected as (x,y) functions of time

– North Atlantic data clustered into 3 distinct groups by Blender et al (QJRMS, 1997)

– clusters have distinct physical interpretation, allow for “higher-level” analysis of data, e.g., state transitions

• Limitations

– (x,y) trajectories treated as fixed-length vectors so that vector-based clustering can be used (k-means)

– forces all trajectories to be of same length, ignores smoothness

Page 38: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

-4 -2 0 2 4 6 8 10 12

-4

-2

0

2

4

6

8

10

12

14

16Simulations from 2-component mixture of 2d AR(2) Models

Page 39: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Modeling an Object’s Shape

• Parametric Template Model for Shape of Interest

– e.g., boundary template modeled as smooth parametric function

Page 40: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Deformable Templates

• Probabilistic Interpretation– mean shape– spatial variability about mean shape– defines a density in shape space (Dryden and Mardia, 1998)

Page 41: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Deformable Templates

• A probabilistic model enables many applications– object recognition: what is the probability under the model?– spatial segmentation based on both shape and intensity– matching/registration– principal component directions– estimation of shape parameters from data– evolution of shape parameters over time– clusters,mixtures of shapes– compositional hierarchies of shapes

– Provides a sound statistical foundation for shape analysis• applications: automated analysis of medical images

– Probabilistic approach means it can be coupled to other spatial and temporal models

Page 42: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Example: Pattern-Matching in Time Series

Problem: “Find similar patterns to this one in a time-series archive”

Page 43: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Example: Pattern-Matching in Time Series

Problem: “Find similar patterns to this one in a time-series archive”

Is this similar ?

Page 44: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Model-Based Approach: 1d Deformable Templates

Segmental hidden semi-Markov model (Ge and Smyth, KDD 2000) Detection via “maximum likelihood parsing”

S1 S2ST

- - - - - - - -

Segments

States

Page 45: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Pattern-Based End-Point Detection

0 50 100 150 200 250 300 350 400200

300

400

500

0 50 100 150 200 250 300 350 400200

300

400

500

TIME (SECONDS)

Original Pattern

Detected Pattern

End-Point Detection in Semiconductor Manufacturing

Page 46: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Heterogeneity among Objects

Page 47: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Form of Population Density

Page 48: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Mixture Models, No Variability

This is in effect the model we used forclustering sequences, curves, etc., earlier

Page 49: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Potential Application: Storm Tracking

Observed Data (past storms)

Parameters forIndividualStorms

Population Densityin Parameter Space

New Data

Page 50: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Software Tools for Model Building

• There is a bewildering number of possible models– concept of “data analysis strategy”– branching factor is very high– all the modeling comes to naught unless scientists can use it!

• Desirable to have “toolkits” that scientists can use on their own– graphical models are a start, although perhaps not ideal

• use graphical models as a language for model representation• details of estimation (EM, Bayes, etc) are hidden from user• probabilistic representation language allows “plug and play”

• see BUGS for a Bayesian version of this idea• see Buntine et al, KDD99, for “algorithm compilers”

Page 51: From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Conclusions

• Motivation: grid-level -> structures,patterns

• Patterns can be described, modeled, and analyzed statistically– latent variable models– hidden Markov models– deformable templates– hierarchical models

• Significant recent work in pattern recognition, neural networks, machine learning on these topics– recent emphasis on probabilistic formalisms

• Need more effort in transferring to science applications– systematic model-building framework/tools– education