Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which...

69
Continuous Latent Variables --Bishop Xue Tian

Transcript of Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which...

Page 1: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

Continuous Latent Variables--Bishop

Xue Tian

Page 2: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

2

Continuous Latent Variables

• Explore models in which some, or all of the latent variables are continuous

• Motivation is in many data sets– dimensionality of the original data space is

very high– the data points all lie close to a manifold of

much lower dimensionality

Page 3: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

3

Example

• data set: 100x100 pixel grey-level images• dimensionality of the original data space is

100x100• digit 3 is embedded, the location and

orientation of the digit is varied at random• 3 degrees of freedom of variability

– vertical translation– horizontal translation– rotation

Page 4: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

4

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 5: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

5

PCA-maximum variance formulation

• PCA can be defined as – the orthogonal projection of the data onto a

lower dimensional linear space-principal subspace

– s.t. the variance of the projected data is maximized

goal

Page 6: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

6

red dots: data pointspurple line: principal subspacegreen dots: projected points

PCA-maximum variance formulation

Page 7: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

7

• data set: {xn} n=1,2,…N

• xn: D dimensions

• goal:– project the data onto a space having

dimensionality M < D– maximize the variance of the projected data

PCA-maximum variance formulation

Page 8: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

8

• M=1

• D-dimension unit vector u1: the direction

u1T u1=1

• xn u1Txn

• mean of the projected data:

• variance of the projected data:

PCA-maximum variance formulation

project

covariance matrix

1

Page 9: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

9

• goal: maximize variance of projected data

• maximize variance u1TSu1 with respect to u1

• introduce a Lagrange multiplier λ1

– a constrained maximization to prevent ||u1||

– constraint comes from u1T u1=1

• set derivative equal to zero

u1: an eigenvector of S

max variance: largest λ1 u1

PCA-maximum variance formulation

2

Page 10: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

10

• define additional PCs in an incremental fashion

• choose new directions– maximize the projected variance– orthogonal to those already considered

• general case: M-dimensional• the optimal linear projection defined by

– M eigenvectors u1, ... ,uM of S– M largest eigenvalues λ1,...,λM

PCA-maximum variance formulation

Page 11: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

11

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 12: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

12

PCA-minimum error formulation

• PCA can be defined as– the linear projection – minimizes the average projection cost– average projection cost: mean squared

distance between the data points and their projections

goal

Page 13: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

13

red dots: data pointspurple line: principal subspacegreen dots: projected pointsblue lines: projection error

PCA-minimum error formulation

Page 14: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

14

• complete orthonormal set of basis vectors {ui}

– i=1,…D, D-dimensional –

• each data point can be represented by a linear combination of the basis vectors

• take the inner produce with uj

PCA-minimum error formulation

3

4

Page 15: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

15

• to approximate data points using a

M-dimensional subspace

- depend on the particular data points

- constant, same for all data points

• goal: minimize the mean squared distance

• set derivative with respect to to zero

j=1,…,M

PCA-minimum error formulation

5

Page 16: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

16

• set derivative with respect to to zero

j=M+1,…,D

• remaining task: minimize J with respect to ui

PCA-minimum error formulation

6

7

8

Page 17: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

17

• M=1 D=2

• introduce a Lagrange multiplier λ2

– a constrained minimization to prevent ||u2||0

– constraint comes from u2T u2=1

• set derivative equal to zero

u2: an eigenvector of S

min error: smallest λ2 u2

PCA-minimum error formulation

Page 18: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

18

• general case:

i=M+1,…,D

J: sum of the eigenvalues of those eigenvectors that are orthogonal to the principal subspace

• obtain the min value of J:– select eigenvectors corresponding to the D - M smallest

eigenvalues– the eigenvectors defining the principal subspace are

those corresponding to the M largest eigenvalues

PCA-minimum error formulation

Page 19: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

19

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 20: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

20

PCA-application

• dimensionality reduction

• lossy data compression

• feature extraction

• data visualization

• examplePCA is unsupervised and depends only on

the values xn

Page 21: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

21

• go through the steps to perform PCA on a set of data

• Principal Components Analysis by Lindsay Smith

• http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

PCA-example

Page 22: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

22

Step 1: get data set

D=2 N=10

PCA-example

Page 23: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

23

Step 2: subtract the mean

PCA-example

x y

Page 24: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

24

PCA-example

Step 3: calculate the covariance matrix S

S: 2x2

9

Page 25: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

25

• Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix S

• the eigenvector with the highest eigenvalue is the first principle component of the data set

PCA-example

Page 26: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

26

• two eigenvectors

• go through the middle of the points, like drawing a line of best fit

• extract lines to characterize the data

PCA-example

Page 27: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

27

• in general, once eigenvectors are found

• the next step is to order them by eigenvalues, highest to lowest

• this gives us the PCs in order of significance

• decide to ignore the less significant components

• here is where the notion of data compression and reduced dimensionality comes

PCA-example

Page 28: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

28

PCA-example

• Step 5: derive the new data set

newDataT=eigenvectorsT x originalDataAdjustT

originalDataAdjustT=

newData: 10x1

Page 29: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

29

• newData

PCA-example

Page 30: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

30

PCA-example

• newData: 10x2

Page 31: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

31

• Step 6: get back old data

data compression

• took all the eigenvectors in transformation, get exactly the original data back

• otherwise, lose some information

PCA-example

Page 32: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

32

• newDataT=eigenvectorsT x originalDataAdjustT

• newDataT=eigenvectors-1 x originalDataAdjustT

• originalDataAdjustT=eigenvectors x newDataT

• originalDataT=eigenvectors x newDataT + mean

PCA-example

• take all the eigenvectors• inverse of the eigenvectors matrix is equal to the transpose of it• unit vectors

Page 33: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

33

PCA-example

• newData: 10x1

Page 34: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

34

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 35: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

35

PCA-high dimensional data

• number of data points is smaller than the dimensionality of the data space N < D

• example: – data set: a few hundred images– dimensionality: several million corresponding

to three color values for each pixel

Page 36: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

36

• standard algorithm for finding eigenvectors for a DxD matrix is O(D3)O(MD2)

• if D is really high, a direct PCA is computationally infeasible

PCA-high dimensional data

Page 37: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

37

• N < D • a set of N points defines a linear subspace

whose dimensionality is at most N – 1• there is little point to apply PCA for M > N – 1

• if M > N-1• at least D-N+1 of the eigenvalues are 0• eigenvectors has 0 variance of the data set

PCA-high dimensional data

Page 38: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

38

solution:

• define X: NxD dimensional centred data matrix

• nth row:

PCA-high dimensional data

DxD

Page 39: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

39

define

PCA-high dimensional data

eigenvector equation for matrix NxN

• have the same N-1 eigenvalues

has D-N+1 zero

eigenvalues

• O(D3)O(N3)

• eigenvectors

Page 40: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

40

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 41: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

41

Kernel

• Kernel function

• inner product in feature space• feature space M ≥ input space N• feature space mapping is implicit

mapping of x into a feature space

Page 42: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

42

PCA-linear

• maximum variance formulation– the orthogonal projection of the data onto a

lower dimensional linear space

– s.t. the variance of the projected data is maximized

• minimum error formulation– the linear projection – minimizes the average projection distance

linearlinear

Page 43: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

43

Kernel PCA

• data set: {xn} n=1,2,…N

• xn: D dimensions

• assume: the mean has been subtracted from xn (zero mean)

• PCs are defined by the eigenvectors ui of S

i=1,…,D

Page 44: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

44

• a nonlinear transformation into an M-dimensional feature space

• xn

• perform standard PCA in the feature space

• implicitly defines a nonlinear PC in the original data space

Kernel PCA

project

Page 45: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

45

original data space feature space

green lines: linear projection onto the first PCnonlinear projection in the original data space

Page 46: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

46

• assume: the projected data has zero mean

MxM

i=1,…,M

given , vi is a linear combination of

Kernel PCA

Page 47: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

47

Kernel PCA

express this in terms of kernel function

in matrix notation i=1,…,N ai: column vector

the solutions of these two eigenvector equations differ only byeigenvectors of K having zero eigenvalues

Page 48: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

48

Kernel PCA

• normalization condition for ai

Page 49: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

49

• in feature space: what is the projected data points after PCA

Kernel PCA

Page 50: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

50

Kernel PCA

• original data space– dimensionality: D– D eigenvectors– at most D linear PCs

• feature space– dimensionality: M M>>D (even infinite)– M eigenvectors– a number of nonlinear PCs then can exceed D

• the number of nonzero eigenvalues can not exceed N

Page 51: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

51

Kernel PCA

• assume: the projected data has zero mean

• nonzero mean– cannot simply compute and then subtract off

the mean– avoid working directly in feature space

• formulate the algorithm purely in terms of kernel function

Page 52: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

52

Kernel PCA

in matrix notation

1N: NxN matrix 1/N

Gram matrix

Page 53: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

53

Kernel PCA

• linear kernel: standard PCA

• Gaussian kernel:

• example: kernel PCA

Page 54: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

54

Page 55: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

55

Kernel PCA• contours: lines along which the projection

onto the corresponding PC is constant

Page 56: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

56

Kernel PCA

disadvantage:

• determine the eigenvectors of , NxN

• for large data sets, approximations are used

Page 57: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

57

Outline

• PCA-principal component analysis– maximum variance formulation– minimum-error formulation– application of PCA– PCA for high-dimensional data

• Kernel PCA

• Probabilistic PCA

two commonly used definitions of PCAgive rise to the same algorithm

Page 58: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

58

Probabilistic PCA

• standard PCA: a linear projection of the data onto a lower dimensional subspace

• probabilistic PCA: the maximum likelihood solution of a probabilistic latent variable model

Page 59: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

59

Probabilistic PCA

• the combination of a probabilistic model and EM allows us to deal with missing values in the data set– EM: expectation-maximization algorithm– a method to find maximum likelihood solutions

for models with latent variables

Page 60: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

60

Probabilistic PCA

• probabilistic PCA forms the basis for a Bayesian treatment of PCA

• in Bayesian PCA, the dimensionality of the principal subspace can be found automatically

Page 61: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

61

Probabilistic PCA

• the probabilistic PCA model can be run generatively to provide samples from the distribution

• the simplest continuous latent variable model assumes– Gaussian distribution for both the latent and

observed variables– makes use of a linear-Gaussian dependence

of the observed variables on the state of the latent variables

Page 62: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

62

Probabilistic PCA• an explicit latent variable z

– corresponding to the PC subspace

• a Gaussian prior distribution p(z) over the latent variable

• a Gaussian conditional distribution p(x|z)

– W: DxM matrix– columns of W: principal subspace – : D-dimensional vector

observed variableDx1

Mx1

Page 63: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

63

Probabilistic PCA• get a sample value of the observed

variables by– choosing a value for the latent variable – sampling the observed variable given the

latent value

• x is defined by a linear transformation of z – plus additive Gaussian noise

– : D-dimensional noise

Page 64: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

64

Probabilistic PCA

data space: 2-dimensional latent space: 1-dimensional• get a value for the latent variable z• get a value for x from an isotropic Gaussian distribution

• green ellipses: density contours for the marginal distribution p(x)

Page 65: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

65

Probabilistic PCA

• a mapping from latent space to data space

• in contrast to the standard PCA

Page 66: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

66

Probabilistic PCA

• Gaussian conditional distribution

• maximum likelihood PCA - determine 3 parameters

• we need an expression for p(x)

Page 67: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

67

Probabilistic PCA• so far, we assumed the value M is given• in practice, choose a suitable value for M

– for visualization: M=2 or M=3– plot the eigenvalue spectrum for the data setseek a significant gap indicating a choice for Min practice, such a gap is often not seen– Bayesian PCAemploy cross-validation to determine the value

of Mby selecting the largest log likelihood on a

validation data set

Page 68: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

68

Probabilistic PCA

• the only clear break is between the 1st and 2nd PCs• the 1st PC explains less than 40% of the variance

– more components are probably needed

• the first 3 PCs explain two thirds of the total variability– 3 might be a reasonable value of M

Page 69: Continuous Latent Variables --Bishop Xue Tian. 2 Continuous Latent Variables Explore models in which some, or all of the latent variables are continuous.

69