Unsupervised Learning. Supervised learning vs. unsupervised learning.
Unsupervised learning represenation with DCGAN
-
Upload
shyam-krishna-khadka -
Category
Education
-
view
287 -
download
0
Transcript of Unsupervised learning represenation with DCGAN
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS
Alec Radford, Luke Metz, and Soumith Chintala
(indico Research, Facebook AI Research)
Accepted paper of ICLR 2016
HY587 Paper Presentation Shyam Krishna Khadka
George Simantiris
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS
Introduced by Ian Goodfellow in 2014: Generative Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. (2014).
GANs are focused on the optimization of competing criteria:
“We simultaneously train two models: a generative model G and a discriminative model D. Eg: G: Forger that produces counterfeit money D: Police to identify whether it is true money or fake End goal: G produces money that is hard to be distinguished by D.
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS
Unsupervised learning that actually works well to generate and discriminate!
Generated results are hard to believe, but qualitative experiments are convincing.
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS
Main contribution: Extensive model exploration to identify a family of
architectures that resulted in stable training across a range of datasets and allowed for training higher
resolution and deeper generative models.
Other contributions: • use the trained discriminator for image classification • the generators have vector arithmetic properties
Images generated from this method:
References:
https://github.com/Newmu/dcgan_code
http://bamos.github.io/2016/08/09/deep-completion/
GENERATED - IMAGENET
GENERATED - FACES
Overview of the Deep Convolutional Generative Adversarial Network (DCGAN)
Can be thought of as two separate networks
Generator G(.) input = random numbers
output = generated image
Generated image G(z):
Uniform noise vector (random numbers, z = a 100-dimensional
vector from a uniform distribution) z is the
distribution that creates new images!
Discriminator D(.) input = real/generated image
output = prediction of real image
Generator G(.) Discriminator D(.)
Generator Goal: Fool D(G(z)) i.e., generate an image G(z) such that D(G(z)) is wrong, i.e., D(G(z)) = 1.
Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image D(G(z))=0, where G(z) is a generated image.
Conflicting goals. Both goals are unsupervised. Optimal when D(.)=0.5 (i.e., cannot
tell the difference between real and generated images) and G(z)=learns the training images distribution.
Example Architecture:
DCGAN Generator:
Fully-connected layer (composed of weights) reshaped to have width, height and feature
maps
Uses ReLU activation functions
Fractionally-strided convolutions: 8x8 input, 5x5 conv window = 16x16
output
Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini-
batch, but not in last layer (to prevent sample oscilation and model instability)
Uses Tanh to scale generated image output
between -1 and 1
No max pooling! Increases spatial dimensionality
through fractionally-strided convolutions
Fractionally-strided convolution
Input = 5x5 with zero-padding at
border = 6x6 (stride=2)
Output = 3x3
Input = 3x3 Interlace zero-padding with
inputs = 7x7 (stride=1)
Output = 5x5
Filter size=3x3
Clear dashed squares = zero-padded inputs
Regular convolution
DCGAN Discriminator:
Real image
Generated
Uses LeakyReLU activation functions
Batch Normalization
No max pooling! Reduces spatial dimensionality through strided
convolutions
Sigmoid (between 0-1)
Stride 2, padding 2
ARCHITECTURE GUIDELINES FOR STABLE DEEP CONVOLUTIONAL GANS
Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
Use batchnorm in both the generator and the discriminator.
Remove fully connected hidden layers for deeper architectures.
Use ReLU activation in generator.
Use LeakyReLU activation in the discriminator.
DETAILS OF ADVERSARIAL TRAINING
Pre-processing: scale images between -1 and 1 (tanh range).
Minibatch SGD (m = 128).
Weight init.: zero-centered normal distribution (std. dev. = 0.02).
Leaky ReLU slope = 0.2.
Adam optimizer with tuned hyperparameters to accelerate training.
Learning rate = 0.0002.
Momentum term β1 = 0.5 to stabilize training.
DCGANs were trained on three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, Faces (newly assembled).
GENERATED IMAGES AND SANITY CHECKS THAT IT'S NOT JUST MEMORIZING EXAMPLES…
Generated LSUN bedrooms after one (left) and five (right) epochs of training.
SMOOTH TRANSITION OF SCENES PRODUCED BY INTERPOLATION BETWEEN A SERIES OF RANDOM POINTS IN Z
Average 4 vectors from exemplar faces looking left and 4 looking right.
Interpolate between the left and right vectors creates a "turn vector“.
(Top) Unmodified sample generated images
(Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformed.
The overall scene stays the same, indicating the generator has separated objects (windows) from the scene.
MANIPULATING THE GENERATOR REPRESENTATION (FORGETTING TO DRAW CERTAIN OBJECTS)
Find 3 exemplar images (e.g., 3 smiling women)
Average their Z vector
Other images produced by adding small uniform noise to the new vector!
Generate an image based on this new vector!!!
Do simple vector arithmetic operations
Arithmetic in pixel space
VECTOR ARITHMETIC ON FACE SAMPLES
GANS AS FEATURE EXTRACTOR
CIFAR-10
1) Train on ImageNet 2) Get all the responses from the Discriminator's layers 3) Max-pool each layer to get a 4x4 spatial grid 4) Flatten to form feature vector 5) Train a regularized linear L2-SVM classifier for CIFAR-10 (note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)
SUMMARY
Unsupervised learning that really seems to work. Visualizations indicate that the Generator is learning something
close to the true distribution of real images. Classification performance using the Discriminator features
indicates that features learned are discriminative of the underlying classes.
APPENDIX:
OPTIMIZING A GENERATIVE ADVERSARIAL NETWORK (GAN)
Gradient w.r.t the parameters of the
Discriminator
Gradient w.r.t the parameters of the
Generator
maximize
minimize
Loss function to maximize for the
Discriminator
Loss function to minimize for the
Generator
Interpretation: compute the gradient of the loss function, and then update the parameters to
min/max the loss function (gradient descent/ascent)
EXAMPLE 1:
Uniform noise vector (random numbers)
Real images
minimize
Imagine for a real image D(x) scores 0.8 it is a real image (correct)
D(x) = 0.8 log(0.8) = -0.2
D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2
Then for a generated image, D(G(z)) scores 0.2 it is a generated image (correct)
We add them together and this gives us a fairly high (-0.4) loss. We ascend so we want to maximize this). Note that we are adding two negative numbers so 0 is the upper bound.
EXAMPLE 1 (continued):
D(G(z)) scores 0.2 a generated image is a generated image bad, D(.) wasn’t fooled. Assigned loss is -0.2. Note that we want to minimize this loss function.
D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2
EXAMPLE 2:
minimize
For a real image D(x) scores 0.2 it is a generated image (wrong)
D(x) = 0.2 log(0.8) = -1.6
D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6
Then for a generated image, D(G(z)) scores 0.8 it is a real image (wrong)
These bad predictions combined give a loss of -3.2. A lower value compared to the loss to when we had good predictions (Ex. 1). Remember the goal is to maximize!
EXAMPLE 2 (continued):
D(G(z)) scores 0.8 a generated image is a real image good, D(.) was fooled. Assigned loss is -1.6. Compare to the previous loss and remember that we want to minimize this loss function!
D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6