Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing...
Transcript of Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing...
![Page 1: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/1.jpg)
5/15/19
1
Reducing the Dimensionality of Data with Neural NetworksANDREA CASTRO
MAY 14, 2019
The curse of dimensionality• High dimensional data often has more features than observations• As more variables are added, it becomes more difficult to make
accurate predictions• Example: Finding a cell in a 2D petri dish vs. 3D beaker
2
25 cm2 125 cm3https://www.statisticshowto.datasciencecentral.com/dimensionality/
![Page 2: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/2.jpg)
5/15/19
2
Reducing dimensionality
• Principle Components Analysis (PCA) • Finds directions of
greatest variance • Represents each data
point by coordinates along these directions
3
http://www.nlpca.org/pca_principal_component_analysis.html
Autoencoders
4
• Composed of encoder and decoder networks• Encoder: high-dimensional data -> low-
dimensional code• Decoder: recovers original data from low-
dimensional code
• Minimize discrepancy between input and output• Difficult to perform gradient descent
without well-initialized weights
![Page 3: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/3.jpg)
5/15/19
3
Pretraining to optimize weights
• Training layer-by-layer as restricted Boltzmann machines (RBMs)• Learned feature activations are used as
input data in next layer
5
RBMs are energy-based models
Hidden units model the distribution
where
Energy can be raised or lowered by adjusting the biases and weight matrix
6
v1 v2 v3 vi…
h1 h2 hj…
bj
bi
Hidden layer
Visible layer
![Page 4: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/4.jpg)
5/15/19
4
RBMs are energy-based models
The network assigns a probability to every possible image
Conditional distribution is easier to calculate
7
v1 v2 v3 vi…
h1 h2 hj…
bj
bi
Hidden layer
Visible layer
(and vice versa)
Derivation (1/2)
8
Joint over marginal
Expansion, cancel terms not dependent on h
Expansion
Exponential of sum is product of exponentials
Independent hj
Exponential of sum is product of exponentials
Expand for h’j = 0 and 1 cases
Combine both ∏j
Note distribution
![Page 5: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/5.jpg)
5/15/19
5
Derivation (2/2)
9
Multiply by exp(-bj -Wj x)
RBM training
Given an input, hidden unit states are set to 1 according to
Next, a “confabulation” image is produced by setting each according to
Finally, the hidden unit states are updated to represent the confabulated image’s features
10
![Page 6: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/6.jpg)
5/15/19
6
Unfolding and finetuning
Each next RBM is trained on previous hidden layer of feature detectors
Autoencoder is created by unfolding/mirroring stacked RBMs
Finetune using standard backpropagation
11
Exampleson images
12
Test data
6D Autoencoder
6D logistic PCA 7.64 MSE
1.44 MSE
Test data
30D Autoencoder
30D logistic PCA 8.01 MSE
3.00 MSE
Test data
30D Autoencoder
30D PCA 135 MSE
126 MSE
![Page 7: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The](https://reader033.fdocuments.net/reader033/viewer/2022041711/5e47f8ee750e6c23b66df885/html5/thumbnails/7.jpg)
5/15/19
7
Example: 2D MNIST code visualization
13
LDA Autoencoder
Example: 2D document class visualization
14
Latent Semantic Analysis
Autoencoder