02 Murphy Multi Variate Distanc

13
Multivariate Distance an Multivariate Distance an Similarity Similarity Robert F. Murphy Cytometry Development Workshop 2000

description

fghfghgfh

Transcript of 02 Murphy Multi Variate Distanc

  • Multivariate Distance and SimilarityRobert F. MurphyCytometry Development Workshop 2000

  • General Multivariate DatasetWe are given values of p variables for n independent observationsConstruct an n x p matrix M consisting of vectors X1 through Xn each of length p

  • Multivariate Sample MeanDefine mean vector I of length p

    ormatrix notationvector notation

  • Multivariate VarianceDefine variance vector s2 of length pmatrix notation

  • Multivariate Varianceor

    vector notation

  • Covariance MatrixDefine a p x p matrix cov (called the covariance matrix) analogous to s2

  • Covariance MatrixNote that the covariance of a variable with itself is simply the variance of that variable

  • Univariate DistanceThe simple distance between the values of a single variable j for two observations i and l is

  • Univariate z-score DistanceTo measure distance in units of standard deviation between the values of a single variable j for two observations i and l we define the z-score distance

  • Bivariate Euclidean DistanceThe most commonly used measure of distance between two observations i and l on two variables j and k is the Euclidean distance

  • Multivariate Euclidean DistanceThis can be extended to more than two variables

  • Effects of variance and covariance on Euclidean distancePoints A and B have similar Euclidean distances from the mean, but point B is clearly more different from the population than point A.BAThe ellipse shows the 50% contour of a hypothetical population.

  • Mahalanobis DistanceTo account for differences in variance between the variables, and to account for correlations between variables, we use the Mahalanobis distance