Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M....

46
Data Augmentation and Semi-Supervised learning with Generative Adversarial Networks Jérôme Rony Saypraseuth Mounsaveng

Transcript of Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M....

Page 1: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Data Augmentation and Semi-Supervised learning with

Generative Adversarial Networks

Jérôme RonySaypraseuth Mounsaveng

Page 2: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

The GAN Framework

Principle and Applications

2

Page 3: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

How it all began...

Source: www.les3brasseurs.cascholar.google.fr

Source: central photo: www.freeimageslive.co.ukcat: cc0.photo

Source: bottom cat: www.pinterest.ca/pin/45669383692759692/

Razvan Pascanu

Ian Goodfellow

3

Page 4: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Generative Adversarial Networks

4

Source: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f

Page 5: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Generative Adversarial Networks

Source: towardsdatascience.com

5

Page 6: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

A 2-player minimax game

Training means solving:

Where:

In practice:

● Sample a minibatch of random vectors z and generate a minibatch of images with G● Sample a minibatch of real images● Compute loss of D as a binary classifier with real and fake images, backprop and optimize

● Sample a minibatch of random vectors z and generate a minibatch of images with G● Compute loss of G by feeding D with fake images, backprop and optimize

6

Page 7: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Advantages

● Flexibility on the type of networks used for the generator and discriminator

○ MLP, CNN or VAE

● Subjectively better visual quality than other generative models

○ VAE images are blurry

● Faster generation: no sequential process involved like in autoregressive models

○ Easier exploration of the latent space

● Adaptation to other tasks like classification

7

Page 8: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Pitfalls

● Unstable training: nash equilibrium difficult to reach

with SGD optimisation due to saddle

● Mode collapse

● Difficulty to handle discrete data (e.g. text)

8

Source: Unrolled generative adversarial networks (Metz et al., 2017)

Source: Wikipedia

Page 9: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Conditional Generation

Monarch butterfly goldfinch daisy redshank grey whale

128×128 images from ImageNet

A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier gans. arXiv arXiv:1610.09585, 2016

9

Page 10: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Domain and Style Transfer

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-Image translation with conditional adversarial networks. In CVPR, 2017

Live demo at https://affinelayer.com/pixsrv/

10

Page 11: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Domain Transfer at High-Resolution

T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. arXiv preprint arXiv:1711.11585, 2017

2048×1024 images from Cityscapes Dataset

11

Page 12: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Domain Transfer at High-Resolution

12

Page 13: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Super-Resolution

C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, ´ A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016

Input SR-GAN Original

13

Page 14: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Sound Generation

● C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208, 2018http://wavegan-v1.s3-website-us-east-1.amazonaws.com/

● E. Hosseini-Asl, Y. Zhou, C. Xiong, R. Socher, “A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation” arXiv arXiv:1804.00522https://einstein.ai/research/a-multi-discriminator-cyclegan-for-unsupervised-non-parallel-speech-domain-adaptation

14

Page 15: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

And Much More!

15

https://github.com/hindupuravinash/the-gan-zoo

Page 16: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classificationwith GANs

Source: Oliver et al., 2018

16

Page 17: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Multi-agent architecture

Source: towardsdatascience.com

Generation task

Classification task

17

Architecture with 2 agents learning a different task and helping each other in an adversarial setup

Page 18: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

● For image generation, D helps G approximate the true data distribution and generate better

images

Source: https://blog.openai.com/generative-models/

Image generation

18

Page 19: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Image classification with GANs

● For classification, D is extended to a K+1 classes classifier, and G helps D by generating

additional samples (Salimans, 2016 and Odena, 2016)

○ True samples are classified in the K classes

○ Generated samples are classified in the K+1 class

19

Source: https://github.com/buriburisuri/ac-gan

K

K+1

Page 20: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Image classification with GANs

New loss function of the D:

where

and

Pushes predicted class of real data

to one of the K real classes

Pushes predicted class of real data away from K+1

class

Pushes predicted class of

generated data to K+1 class

20

Page 21: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

Hypothesis: limited amount of labeled dataset, large amount of unlabeled data

Problem A: Increase the usefulness of generated samples for D

Problem B: Leverage information contained in the unlabeled samples

21

Page 22: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

Good Semi-supervised Learning That Requires a Bad GAN (Dai et al, 2017)

Problem A: Increase the usefulness of generated samples for D

Perfect generator generates samples around labeled data

No improvement compared to fully supervised learning

Idea: Learn a “complementary distribution”

Complementary distribution is defined as

Generation of low-density samples leveraged by

22

Page 23: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

Good Semi-supervised Learning That Requires a Bad GAN (Dai et al, 2017)

Problem B: Leverage information contained in the unlabeled samples

Idea: Features matching = reduce distance between generated samples and unlabeled samples

Idea: Reinforce true/fake discrimination for unlabeled data by maximizing entropy of predicted class on real classes

23

Page 24: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

Good Semi-supervised Learning That Requires a Bad GAN (Dai et al, 2017)

Other issue addressed: Generator mode collapse

Idea: Maximize entropy of generated samples

24

Page 25: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

New objective function for D:Pushes predicted class of real data

to one of the K real classes

Pushes predicted class of

generated data to K+1 class

Pushes predicted class of unlabeled data to one of the

K real classes

Reinforce true/fake belief

on unlabeled data

25

Page 26: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

New objective function for G:

Minimizes mode collapse

Generates samples closer to

unlabeled data

Generates low density samples

26

Page 27: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Semi supervised image classification with GANs

Results:

27

# of labeled samples: 100 for MNIST, 1000 for SVHN, 4000 for CIFAR-10

Page 28: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Data Augmentation with GANs

PottedPlant Horse Bus ChurchOutdoor Bicycle TVMonitorSofa

T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," arXiv preprint arXiv:1710.10196, 2017

28

Page 29: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Real Sample

Data distribution

Learnt distribution

Why Data Augmentation with GANs?

Learning the distribution of real data while maintaining high image quality

InterpolationSynthetic Sample

29

Page 30: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

What do you mean “not stable”?

30

Page 31: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Let’s start with another formulation:Wasserstein GAN with Gradient Penalty

Pushes the samples toward the

distribution of the real samples

Defines the distribution of

the real samples

Prevents gradient explosion

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved Training of Wasserstein GANs. arXiv preprint arXiv:1704.00028, 2017

31

Page 32: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Problem at High Resolutions?

At low resolutions G and D = simple functions

WGAN-GP is based on the Lipschitz-continuity of D

32

Page 33: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Problem at High Resolutions?

WGAN-GP is based on the Lipschitz-continuity of D

At high resolutions G and D = simple functions

33

Page 34: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Solution: Progressive Growing (and other details)

34

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In arXiv preprint arXiv:1710.10196, 2017.

Page 35: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Solution: Progressive Growing (and other details)

35

4×44×4

8×88×8

×2

16×1616×16

×2

Equalized Learning RateConvolution: 3×3 / 1Pixel normalization

Upsampling (nearest neighbor) 4×4

4×4

8×88×8

×2

32×3232×32

×2

toRGB

toRGB

Convolution 1×1 / 1

Page 36: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

And in practice?

36

Page 37: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

High-resolutions images https://www.youtube.com/watch?v=G06dEcZ-QTg

37

Page 38: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

“This looks too good to be true”, Y. Bengio

38

Page 39: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Improved results on smaller images as well

39

FakeReal

Page 40: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Improved variability

Method Inception Score

ALI (Dumoulin et al., 2016) 5.34 ± 0.05

GMAN (Durugkar et al., 2016) 6.00 ± 0.19

Improved GAN (Salimans et al., 2016) 6.86 ± 0.06

CEGAN-Ent-VI (Dai et al., 2017) 7.07 ± 0.07

LR-AGN (Yang et al., 2017) 7.17 ± 0.17

DFM (Warde-Farley & Bengio, 2017) 7.72 ± 0.13

WGAN-GP (Gulrajani et al., 2017) 7.86 ± 0.07

Splitting GAN (Grinblat et al. 2017) 7.90 ± 0.09

PG-GAN (best run) 8.80 ± 0.05

PG-GAN (from 10 runs) 8.56 ± 0.06

Results on CIFAR-10 in

Unsupervised mode:

Only “standardized” way

of measuring image

quality and diversity

40

Page 41: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Fake Conditional Generation - Pre-Training

Subjects FakeReal

41

Page 42: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

Fake Conditional Generation - Fine TuningGlasses

Illumination

Hairstyle

42

Page 43: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

And When Training is Successful, Interpolation is Fun!

43

Page 44: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

44

Can a GAN really generate new data?

Nearest neighbors found from the training data, based on feature-space distance. We used activations from five VGG layers. Only the crop highlighted in bottom right image was used for comparison in order to exclude image background and focus the search on matching facial features.

Page 45: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

45

Thank You For Your Attention!

Any Questions?

Page 46: Saypraseuth Mounsaveng Jérôme Rony€¦ · Sound Generation C. Donahue, J. McAuley, and M. Puckette, “Synthesizing audio with generative adversarial networks,” CoRR, vol. abs/1802.04208,

46

Supplementary Material / Recommended Lectures

● I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.

● A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

● M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv:1411.1784v1, 2014.

● X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.

● P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.

● J. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision (ICCV), to appear, 2017.

● X. Mao, Q. Li, H. Xie, R. Y. K. Lau, and Z. Wang, Least squares generative adversarial networks. ArXiv: 1611.04076, 2016.

● M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.

● I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of Wasserstein GANs. arXiv:1704.00028v2, 2017.

● A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier GANs. In ICML, 2017.

● D. Warde-Farley and Y. Bengio. Improving generative adversarial networks with denoising feature matching. In ICLR, 2017.

● T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In arXiv preprint arXiv:1710.10196, 2017.

● R. D. Hjelm, A. P. Jacob, T. Che, K. Cho, and Y. Bengio. Boundary-seeking generative adversarial networks. arXiv preprint arXiv:1702.08431, 2017.