Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d...

77
Object Orie’d Data Analysis, Last Time • Clustering – Quantify with Cluster Index – Simple 1-d examples – Local mininizers – Impact of outliers • SigClust – When are clusters really there? – Gaussian Null Distribution – Which Gaussian for HDLSS settings?

Transcript of Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d...

Page 1: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Object Orie’d Data Analysis, Last Time

• Clustering– Quantify with Cluster Index

– Simple 1-d examples

– Local mininizers

– Impact of outliers

• SigClust– When are clusters really there?

– Gaussian Null Distribution

– Which Gaussian for HDLSS settings?

Page 2: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

2-means Clustering

Page 3: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

2-means Clustering

Page 4: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

2-means Clustering

Page 5: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

2-means Clustering

Page 6: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust

• Statistical Significance of Clusters• in HDLSS Data• When is a cluster “really there”?

From Liu, et al. (2007)

Page 7: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Interesting Statistical Problem

For HDLSS data:When clusters seem to appear

E.g. found by clustering method

How do we know they are really there?Question asked by Neil Hayes

Define appropriate statistical significance?

Can we calculate it?

Page 8: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Simple Gaussian Example

Clearly only 1 Cluster in this ExampleBut Extreme Relabelling looks differentExtreme T-stat strongly significantIndicates 2 clusters in data

Page 9: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Statistical Significance of Clusters

Basis of SigClust Approach:

What defines: A Cluster?A Gaussian distribution (Sarle & Kou 1993)

So define SigClust test based on:2-means cluster index (measure) as statisticGaussian null distributionCurrently compute by simulationPossible to do this analytically???

Page 10: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Statistic – 2-Means Cluster Index

Measure of non-Gaussianity:

2-means Cluster Index:

Class Index Sets Class Means

“Within Class Var’n” / “Total Var’n”

22

1

2

1

|| ||

,|| ||

k

k

cj

k j C

n

jj

x x

CIx x

Page 11: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

Estimated Mean (of Gaussian dist’n)?1st Key Idea: Can ignore thisBy appealing to shift invariance of CI

When data are (rigidly) shiftedCI remains the same

So enough to simulate with mean 0Other uses of invariance ideas?

Page 12: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

Challenge: how to estimate cov. Matrix?Number of parameters:E.g. Perou 500 data:

Dimension

so

But Sample Size Impossible in HDLSS settings????

Way around this problem?

2

)1( dd

533n

46,797,9752

)1(

dd9674d

Page 13: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

2nd Key Idea: Mod Out RotationsReplace full Cov. by diagonal matrixAs done in PCA eigen-analysis

But then “not like data”???OK, since k-means clustering (i.e. CI) is

rotation invariant

(assuming e.g. Euclidean Distance)

tMDM

Page 14: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

2nd Key Idea: Mod Out Rotations

Only need to estimate diagonal matrix

But still have HDLSS problems?

E.g. Perou 500 data:

Dimension

Sample Size

Still need to estimate param’s9674d

533n9674d

Page 15: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

3rd Key Idea: Factor Analysis Model

Model Covariance as: Biology + Noise

Where

is “fairly low dimensional”

is estimated from background noise2NB

INB 2

Page 16: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

Estimation of Background Noise :

Reasonable model (for each gene):

Expression = Signal + Noise

“noise” is roughly Gaussian

“noise” terms essentially independent

(across genes)

2N

Page 17: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

Estimation of Background Noise :

Model OK, since

data come from

light intensities

at colored spots

2N

Page 18: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Gaussian null distribut’n

Estimation of Background Noise :

For all expression values (as numbers)

Use robust estimate of scale

Median Absolute Deviation (MAD)

(from the median)

Rescale to put on same scale as s. d.:

2N

)1,0(

ˆN

data

MADMAD

Page 19: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

SigClust Estimation of Background Noise

Page 20: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

An aside:

Fitting probability distributions to data

• Does Gaussian distribution “fit”???

• If not, why not?

• Fit in some part of the distribution?

(e.g. in the middle only?)

Page 21: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsApproaches to:

Fitting probability distributions to data

• Histograms

• Kernel Density Estimates

Drawbacks: often not best view

(for determining goodness of fit)

Page 22: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSimple Toy Example, non-Gaussian!

Page 23: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSimple Toy Example, non-Gaussian(?)

Page 24: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSimple Toy Example, Gaussian

Page 25: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSimple Toy Example, Gaussian?

Page 26: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsNotes:

• Bimodal see non-Gaussian with histo

• Other cases: hard to see

• Conclude:

Histogram poor at assessing Gauss’ity

Kernel density estimate any better?

Page 27: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKernel Density Estimate, non-Gaussian!

Page 28: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKernel Density Estimate, Gaussian

Page 29: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE (undersmoothed), Gaussian

Page 30: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE (oversmoothed), Gaussian

Page 31: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKernel Density Estimate, Gaussian

Page 32: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKernel Density Estimate, Gaussian?

Page 33: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsHistogram poor at assessing Gauss’ity

Kernel density estimate any better?

• Unfortunately doesn’t seem to be

• Really need a better viewpoint

Interesting to compare to:

• Gaussian Distribution

• Fit by Maximum Likelihood (avg. & s.d.)

Page 34: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gaussian Fit, Gaussian?

Page 35: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gaussian Fit, Gaussian?

• Looks OK?

• Many might think so…

• Strange feature:

– Peak higher than Gaussian fit

– Usually lower, due to smoothing bias

– Suggests non-Gaussian?

• Dare to conclude non-Gaussian?

Page 36: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gaussian Fit, Gaussian

Page 37: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gaussian Fit, Gaussian

• Substantially more noise

– Because of smaller sample size

– n is only 1000 …

• Peak is lower than Gaussian fit

– Consistent with Gaussianity

• Weak view for assessing Gaussianity

Page 38: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gauss., non-Gaussian(?)

Page 39: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsKDE vs. Max. Lik. Gau’n Fit, non-

Gaussian(?)

• Still have large noise

• But peak clearly way too high

• Seems can conclude non-Gaussian???

Page 40: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots• Conclusion:

KDE poor for assessing Gaussianity

How about a SiZer approach?

Page 41: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSiZer Analysis, non-Gaussian(?)

Page 42: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsSiZer Analysis, non-Gaussian(?)

• Can only find one mode

• Consistent with Gaussianity

• But no guarantee

• Multi-modal non-Gaussianity

• But converse is not true

• SiZer is good at finding multi-modality

• SiZer is poor at checking Gaussianity

Page 43: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsStandard approach to checking Gaussianity

• QQ – plots

Background:

Graphical Goodness of Fit

Fisher (1983)

Page 44: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

Background: Graphical Goodness of Fit

Basis:   

Cumulative Distribution Function  (CDF)

Probability quantile notation:

for "probability” and "quantile" 

xXPxF

p q

qFp pFq 1

Page 45: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

Probability quantile notation:

for "probability” and "quantile“

Thus is called the quantile function  1F

p q

qFp pFq 1

Page 46: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

Two types of CDF:

1.Theoretical

2.Empirical, based on data nXX ,,1

qXPqFp

n

qXiqFp i

:#ˆˆ

Page 47: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

Direct Visualizations:

1. Empirical CDF plot:

plot

vs. grid of (sorted data) values

2. Quantile plot (inverse):

plot

vs.

nin

ip ,,1,ˆ

Page 48: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plots

Comparison Visualizations:

(compare a theoretical with an empirical)

3.P-P plot:

plot vs.

for a grid of values

4.Q-Q plot:

plot vs.

for a grid of values

q

p̂ p

p

q

Page 49: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

Page 50: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsEmpirical Quantiles (sorted data points)

1q̂2q̂

3q̂

4q̂5q̂

Page 51: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsCorresponding ( matched) Theoretical Quantiles

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

p

Page 52: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

Main goal of Q-Q Plot:

Display how well quantiles compare

vs.iq̂ iq ni ,,1

Page 53: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

Page 54: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

Page 55: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

Page 56: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

Page 57: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

1q̂2q̂

3q̂

4q̂5q̂ 1q

2q

3q

4q

5q

Page 58: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

Page 59: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsIllustrative graphic (toy data set):

Empirical Qs near Theoretical Qs

when

Q-Q curve is near 450 line

(general use of Q-Q plots)

Page 60: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian! departures from line?

Page 61: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian! departures from line?

• Seems different from line?

• 2 modes turn into wiggles?

• Less strong feature

• Been proposed to study modality

• But density view + SiZer is

much better for finding modality

Page 62: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian (?) departures from line?

Page 63: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian (?) departures from line?

• Seems different from line?

• Harder to say this time?

• What is signal & what is noise?

• Need to understand sampling variation

Page 64: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian? departures from line?

Page 65: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian? departures from line?

• Looks much like?

• Wiggles all random variation?

• But there are n = 10,000 data points…

• How to assess signal & noise?

• Need to understand sampling variation

Page 66: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsNeed to understand sampling variation

• Approach: Q-Q envelope plot

– Simulate from Theoretical Dist’n

– Samples of same size

– About 100 samples gives

“good visual impression”

– Overlay resulting 100 QQ-curves

– To visually convey natural sampling variation

Page 67: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian! departures from line?

Page 68: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian! departures from line?

• Envelope Plot shows:

• Departures are significant

• Clear these data are not Gaussian

• Q-Q plot gives clear indication

Page 69: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian (?) departures from line?

Page 70: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsnon-Gaussian (?) departures from line?

• Envelope Plot shows:

• Departures are significant

• Clear these data are not Gaussian

• Recall not so clear from e.g. histogram

• Q-Q plot gives clear indication

• Envelope plot reflects sampling variation

Page 71: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian? departures from line?

Page 72: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian? departures from line?

• Harder to see

• But clearly there

• Conclude non-Gaussian

• Really needed n = 10,000 data points…

(why bigger sample size was used)

• Envelope plot reflects sampling variation

Page 73: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsWhat were these distributions?

• Non-Gaussian!

– 0.5 N(-1.5,0.752) + 0.5 N(1.5,0.752)

• Non-Gaussian (?)

– 0.4 N(0,1) + 0.3 N(0,0.52) + 0.3 N(0,0.252)

• Gaussian

• Gaussian?

– 0.7 N(0,1) + 0.3 N(0,0.52)

Page 74: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsNon-Gaussian! .5 N(-1.5,0.752) + 0.5 N(1.5,0.752)

Page 75: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsNon-Gaussian (?) 0.4 N(0,1) + 0.3 N(0,0.52) + 0.3 N(0,0.252)

Page 76: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian

Page 77: Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Q-Q plotsGaussian? 0.7 N(0,1) + 0.3 N(0,0.52)