Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11:...

90
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 1 Lecture 11: Advanced Generative Models Efstratios Gavves

Transcript of Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11:...

Page 1: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 1

Lecture 11: Advanced Generative ModelsEfstratios Gavves

Page 2: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 2

oExact likelihood models◦Autoregressive Models

◦Non-autoregressive flow-based models

oAutoregressive Models◦NADE, MADE, PixelCNN, PixelCNN++, PixelRNN

oNormalizing Flows

oNon-autoregressive flow-based models◦RealNVP

◦Glow

◦Flow++

Lecture overview

Page 3: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 3

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 4: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 4

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 5: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 5

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals? Other signals and orders?

Autoregressive Models

Page 6: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 6

oLet’s assume we have signal modelled by an input random variable 𝑥◦Can be an image, video, text, music, temperature measurements

o Is there an order in all these signals?

Autoregressive Models

Page 7: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 7

o If 𝑥 is sequential, there is an order: 𝑥 = [𝑥1, … , 𝑥𝑘]◦E.g., the order of words in a sentence

o If 𝑥 is not sequential, we can create an artificial order 𝑥 = [𝑥𝑟(1), … , 𝑥𝑟(𝑘)]◦E.g., the order with which pixels make (generate) an image

oThen, the marginal likelihood is a product of conditionals

𝑝 𝑥 =ෑ

𝑘=1

𝐷

𝑝(𝑥𝑘|𝑥𝑗<𝑘)

oDifferent from Recurrent Neural Networks(a) no parameter sharing(b) chains are not infinite in length

Autoregressive Models

Page 8: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 8

oPros: because of the product decomposition, 𝑝 𝑥 is tractable

oCons: because the 𝑝 𝑥 is sequential, training is slower◦To generate every new word/frame/pixel, the previous words/frames/pixels in the order must be generated first no parallelism

Autoregressive Models

Page 9: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 10

NADE

Neural Autoregressive Distribution Estimation, Larochelle and Murray, AISTATS 2011

Page 10: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 11

oMinimizing negative log-likelihood as usual

ℒ = − log 𝑝 𝑥 = −

𝑘=1

𝐷

𝑝(𝑥𝑘|𝑥<𝑘)

oThen, we model the conditional as𝑝 𝑥𝑑 𝑥<𝑑 = 𝜎(Vd,∶ ⋅ ℎ𝑑 + bd)

where the latent variable ℎ𝑑 is defined asℎ𝑑 = 𝜎(𝑊:,<𝑑 ⋅ 𝑥<𝑑 + 𝑐)

Where 𝑊 is shared between conditionals

NADE

Page 11: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 12

o “Teacher forcing” training

NADE: Training & Testing

Training: Use ground truth values (e.g. of pixels) Testing: Use predicted values in previous order

Page 12: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 13

NADE Visualizations

Page 13: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 14

oQuestion: How could we construct an autoregressive autoencoder?

oTo rephrase: How to modify an autoencoder such that each output 𝑥𝑘depends only on the previous outputs 𝑥<𝑘 (autoregressive property)?◦Namely, the present 𝑘-th output 𝑥𝑘 must not depend on a computational path from future inputs 𝑥𝑘+1, … , 𝑥𝐷◦Autoregressive: 𝑝 𝑥|𝜃 = ς𝑘=1

𝐷 𝑝(𝑥𝑘|𝑥𝑗<𝑘 , 𝜃)

◦Autoencoder: 𝑝 𝑥|𝑥, 𝜃 = ς𝑘=1𝐷 𝑝(𝑥𝑘|𝑥𝑘 , 𝜃)

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 14: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 15

oQuestion: How could we construct an autoregressive autoencoder?

oTo rephrase: How to modify an autoencoder such that each output 𝑥𝑘depends only on the previous outputs 𝑥<𝑘 (autoregressive property)?◦Namely, the present 𝑘-th output 𝑥𝑘 must not depend on a computational path from future inputs 𝑥𝑘+1, … , 𝑥𝐷◦Autoregressive: 𝑝 𝑥|𝜃 = ς𝑘=1

𝐷 𝑝(𝑥𝑘|𝑥𝑗<𝑘 , 𝜃)

◦Autoencoder: 𝑝 𝑥|𝑥, 𝜃 = ς𝑘=1𝐷 𝑝(𝑥𝑘|𝑥𝑘 , 𝜃)

oAnswer: Masked convolutions!ℎ 𝑥 = 𝑔 𝑏 + 𝑊⨀𝑀𝑊 ⋅ 𝑥𝑥 = 𝜎 𝑐 + 𝑉⨀𝑀𝑉 ⋅ ℎ(𝑥)

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 15: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 16

MADE

Masked Autoencoder for Distribution Estimation, Germain, Mathieu et al., ICML 2015

Page 16: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 17

oUnsupervised learning: learn how to model 𝑝(𝑥)

oDecompose the marginal

𝑝 𝑥 =ෑ

𝑖=1

𝑛2

𝑝(𝑥𝑖|𝑥1, … , 𝑥𝑖−1)

oAssume row-wise pixel by pixel generation and sequential colors RGB◦Each color conditioned on all colors from previous pixels and specific colors in the same pixel

𝑝 𝑥𝑖,𝑅|𝑥<𝑖 ⋅ 𝑝 𝑥𝑖,𝐺|𝑥<𝑖 , 𝑥𝑖,𝑅 ⋅ 𝑝 𝑥𝑖,𝐵|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑥𝑖,𝐺

oFinal output is 256-way softmax

PixelRNN

Pixel Recurrent Neural Networks, van den Oord, Kalchbrenner and Kavukcuoglu, arXiv 2016

Page 17: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 18

oHow to model the conditionals?

𝑝 𝑥𝑖,𝑅|𝑥<𝑖 , 𝑝 𝑥𝑖,𝐺|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑝 𝑥𝑖,𝐵|𝑥<𝑖 , 𝑥𝑖,𝑅 , 𝑥𝑖,𝐺

oLSTM variants◦12 layers

oRow LSTM

oDiagonal Bi-LSTM

PixelRNN

Row LSTM Diagonal Bi-LSTM

Page 18: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 19

oHidden state (i, j) = Hidden state (i-1, j-1) + Hidden state (i-1, j) +Hidden state (i-1, j+1) +p(i, j)

oBy recursion the hidden statecaptures a fairly triangular region

PixelRNN - RowLSTM

Row LSTM

Page 19: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 20

oHow to capture the whole previous context

oPixel (i, j) = Pixel (i, j-1) + Pixel (i-1, j)

oProcessing goes on diagonally

oReceptive layer encompasses entire region

PixelRNN – Diagonal BiLSTM

Diagonal Bi-LSTM

Page 20: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 21

oPropagate signal faster

oSpeed up convergence

PixelRNN – Residual connections

Page 21: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 22

oPros: good modelling of 𝑝(𝑥) nice image generation

oHalf pro: Residual connections speeds up convergence

oCons: still slow training, slow generation

PixelRNN – Pros & Cons

Row LSTM Diagonal Bi-LSTM

Page 22: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 23

PixelRNN - Generations

Page 23: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 24

oUnfortunately, PixelRNN is too slow

oSolution: replace recurrent connections with convolutions◦Multiple convolutional layers to preserve spatial resolution

oTraining is much faster because all true pixels are known in advance, so we can parallelize◦Generation still sequential (pixels must be generated) still slow

PixelCNN

Stack of masked convolutions

Pixel Recurrent Neural Networks, van den Oord, Kalchbrenner and Kavukcuoglu, arXiv 2016

Page 24: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 25

oUse masked convolutions again to enforce autoregressive relationships

PixelCNN

Page 25: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 26

oCons: Performance is worse than PixelRNN

oWhy?

PixelCNN – Pros and Cons

Page 26: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 27

oCons: Performance is worse than PixelRNN

oWhy?

oNot all past context is taken into account

oNew problem: the cascaded convolutions create a “blind spot”

PixelCNN – Pros and Cons

Page 27: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 29

oBecause of(a) the limited receptive field of convolutions and(b) computing all features at once (not sequentially) cascading convolutions makes current pixel not depend on all previous blind spot

Blind spot

Page 28: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 30

oUse two layers of convolutions stacks◦Horizontal stack: conditions on current row and takes as input the previous layer output and the vertical stack

◦Vertical stack: conditions on all rows above current pixels

oAlso replace ReLU with a tanh(𝑊 ∗ 𝑥)⋅ 𝜎(𝑈 ∗ 𝑥)

Fixing the blind spot: Gated PixelCNN

Page 29: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 31

oCoral reef

PixelCNN - Generations

Page 30: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 32

oSorrel horse

PixelCNN - Generation

Page 31: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 33

oSandbar

PixelCNN - Generation

Page 32: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 34

oLhasa Apso

PixelCNN - Generation

Page 33: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 35

o Improving the PixelCNN model

oReplace the softmax output with a discretized logistic mixture lihelihood◦Softmax is too memory consuming and gives sparse gradients

◦ Instead, assume logistic distribution of intensity and round off to 8-bits

oCondition on whole pixels, not pixel colors

oDownsample with stride-2 convs to compute long-range dependencies

oUse shortcut connections

oDropout◦PixelCNN is too powerful a framework can onverfit easily

PixelCNN++

PixelCNN++: Improving the PixelCNN with Discretized Logistic, Salimans, Karpathy, Chen, Kingma, ICLR 2017

Page 34: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 36

PixelCNN++

Page 35: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 37

PixelCNN++ - Generations

Page 36: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 38

oSoTA density estimation

oQuite slow because of autoregressive nature◦They must sample sequentially

oThey do not have a latest space

Advantages/Disadvantages

Page 37: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 39

oA standard VAE with a PixelCNN generator/decoder

oBe careful. Often the generator is so powerful, that the encoder/inference network is ignored Whatever the latent code 𝑧 there will be a nice image generated

PixelVAE

PixelVAE: A Latent Variable Model for Natural Images, Gulrajani et al., ICLR 2017

Page 38: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 40

PixelVAE

Page 39: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 41

PixelVAE - Generations

64x64 LSUN Bedrooms 64x64 ImageNet

Page 40: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 42

PixelVAE - Generations

Varying pixel-level noise

Varying bottom latents

Varying top latents

Page 41: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES

ADVANCED GENERATIVE MODELS - 43

All about Flows

Page 42: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 44

o Ideally, we want to minimize the difference between the approximate posterior and the true posterior

ELBOθ,φ x = log 𝑝𝜃(𝑥) − KL(𝑞𝜑 𝑧|𝑥 ||p(z|x))

oVAEs assume simple diagonal Gaussian priors and posteriors

o Is this a good assumption?

oCould we have better posteriors without blowing up the complexity?

Towards better posteriors

Page 43: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 45

oSampling from a Gaussian prior

𝑧~𝒩(𝑧|𝜇 𝑥 , diag(𝜎2(𝑥)))

Sampling from a VAE

Page 44: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 46

o Let’s assume that the transformation 𝑓 is invertible, that is we can compute 𝑧𝑖 = 𝑓(𝑧𝑖−1)

as well as𝑧𝑖−1 = 𝑓−1(𝑧𝑖)

o If 𝑧𝑖−1 is an RV with a pdf 𝑝𝑖−1(𝑧𝑖−1), then 𝑧𝑖 also an RV

o Easy to compute the pdf of 𝑧𝑖 because of the change of variable formula

𝑝𝑖(𝑧𝑖) = 𝑝𝑖−1(𝑓−1(𝑧𝑖)) det 𝑑𝑓𝑖

−1

𝑑𝑧𝑖

And can be shown that det𝑑𝑓𝑖

−1

𝑑𝑧𝑖= det 𝑑𝑓𝑖

𝑑𝑧𝑖−1

−1

Flows: Or, what if we were to transform a simple prior?

Page 45: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 47

𝑝𝑖(𝑧𝑖) = 𝑝𝑖−1(𝑓−1(𝑧𝑖)) det

𝑑𝑓𝑖𝑑𝑧𝑖−1

−1

Change of variable formula

https://blog.evjang.com/2018/01/nf1.html

Page 46: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 48

oWe have that

𝑝𝑖 𝑧𝑖 = 𝑝𝑖−1(𝑧𝑖−1) det 𝑑𝑧𝑖−1𝑑𝑧𝑖

o If we start from a simple pdf, like a Gaussian for 𝑧𝑖−1

oAnd we use a transformation 𝑧𝑖 = 𝑓(𝑧𝑖−1) for which◦ It is easy to compute the inverse 𝑧𝑖−1 = 𝑓−1(𝑧𝑖)

◦And is easy to compute the Jacobian det𝑑𝑓𝑖

−1

𝑑𝑧𝑖and its determinant

oThen by the change of variable we can compute the pdf of 𝑧𝑖

oEven if we do not know the analytic form of 𝑝𝑖 𝑧𝑖 !!

What have we gained?

Page 47: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 49

o If we have a series of K transformations𝑥 = 𝑧𝐾 = 𝑓𝐾 ∘ 𝑓𝐾−1 ∘ ⋯ ∘ 𝑓1 𝑧0

Then by decomposing the log-likelihood we have

log 𝑝 𝑥 = log 𝜋0 𝑧0 −

𝑖

𝐾

log det 𝑑𝑓𝑖𝑑𝑧𝑖−1

−1

We can even apply this recursively

https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

Page 48: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 50

How does this look like?

Variational Inference with Normalizing Flows, Rezende and Mohammed, 2015

Page 49: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 51

oWe can have complex posteriors without really needing to know complex mathematical formulations of the pdf

o Instead, we learn the posteriors from the data

oHopefully, our posteriors then would be closer to the true posterior

So, what?

Page 50: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 52

o ELBOθ,φ x = 𝔼𝑧0~𝑞0(𝑧0|𝑥) log 𝑝𝜃 𝑥 𝑧𝐾

−𝐾𝐿(𝑞0(𝑧0|𝑥)||𝑝𝜆(𝑧)) + 𝔼𝑧0~𝑞0 𝑧0 𝑥 [

𝑖

𝐾

log det 𝑑𝑓𝑖𝑑𝑧𝑖−1

−1]

o The first term is simply the reconstruction term◦ How likely our generations are, after having sampled from our simple 𝑧0~𝑞0 𝑧0 𝑥

o The second term is the KL divergence between the flow 𝑞0(𝑧0|𝑥) and our simple prior 𝑝𝜆(𝑧)

o The third term is the accumulation of log determinants from the recursive change of variables

o Although the final posterior 𝑞𝐾 will have a complex form, its closed form is not needed for computing any expectations, e.g., in the ELBO◦Check the Law of Unconscious Statistician (LOTUS)

Inference with Flows

Page 51: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 53

Pipeline

Variational Inference with Normalizing Flows, Rezende and Mohammed, 2015

Page 52: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 54

oThe transformation is𝑓 𝑧 = 𝑧 + 𝑢ℎ(𝑤𝑇𝑧 + 𝑏)

◦𝑢,𝑤, 𝑏 are free parameters

◦ℎ is an element-wise non-linearity (element-wise so that it is easy to invert)

◦The log-determinant of the Jacobian is𝜓 𝑧 = ℎ′ 𝑤𝑇𝑧 + 𝑏 𝑤

det𝜕𝑓

𝜕𝑧= |1 + 𝑢𝑇𝜓(𝑧)|

Planar flow

Page 53: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 55

oThe transformation is𝑓 𝑧 = 𝑧 + 𝛽ℎ(𝛼, 𝑟)(𝑧 − 𝑧0)

Where ℎ 𝛼, 𝑟 = 1/(𝛼 + 𝑟)

oThe log-determinant of the Jacobian is

odet𝜕𝑓

𝜕𝑧= 1 + 𝛽ℎ 𝛼, 𝑟 𝑑−1 1 + 𝛽ℎ 𝛼, 𝑟 + ℎ′ 𝛼, 𝑟 𝑟

Radial flow

Page 54: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 56

How does this look like?

Variational Inference with Normalizing Flows, Rezend and Mohammed, 2015

Page 55: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 57

Some results

Page 56: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 58

oUsing flows and change of variable formula to compute the exact likelihood 𝑝(𝑥) tractably

oReal NVP/NICE

oGLOW

oFLOW++

Flow-based models

Page 57: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 59

oTakes the idea of changing variables formula to define an invertible generative model◦The function 𝑓 is the encoder/inference network

◦The inverse function 𝑓−1 is the decoder/generator network

oRealNVP defines computationally efficient operations to scale

Real NVP

Page 58: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 60

A visualization

Page 59: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 61

oCoupling layers

oMasked convolutions

oMulti-scale architecture

RealNVP contributions

Page 60: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 62

𝑝𝑖(𝑧𝑖) = 𝑝𝑖−1(𝑓−1(𝑧𝑖)) det

𝑑𝑓𝑖𝑑𝑧𝑖−1

−1

o It must be easy to compute the inverse 𝑓−1

o It must be easy to compute thedeterminant of the Jacobian det 𝑑𝑓𝑖

𝑑𝑧𝑖−1

Again: Change of variable formula

https://blog.evjang.com/2018/01/nf1.html

Page 61: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 63

oHow to make computing the determinant easy?

oHave triangular matrices

𝐴 =𝑎11 0 0𝑎21 𝑎22 0𝑎31 𝑎32 𝑎33

⇒ det 𝐴 =ෑ

𝑖

𝑎𝑖𝑖

oSo, let’s design a transformation function that yields triangular Jacobians

RealNVP: Coupling layers

Page 62: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 64

oAssume a D-dimensional input 𝑥 = 𝑥1:𝐷

oThen, we split 𝑥 = [𝑥1:𝑑 , 𝑥𝑑+1:𝐷] to have the bijective transformation 𝑦1:𝑑 = 𝑥1:𝑑

𝑦𝑑+1:𝐷 = 𝑥𝑑+1:𝐷 ⊙ exp 𝑠 𝑥1:𝑑 + 𝑡(𝑥1:𝑑)

oBasically, the first 𝑑 dimensions remain unchanged, while the rest are modulated by the first 𝑑 dimensions

oLet’s check the Jacobian

𝜕𝑦

𝜕𝑥𝑇=

𝕀𝑑 0𝜕𝑦𝑑+1:𝐷

𝜕𝑥1:𝑑𝑇 diag(exp[𝑠(𝑥1:𝑑)])

RealNVP: Coupling layers

Page 63: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 65

𝜕𝑦

𝜕𝑥𝑇=

𝕀𝑑 0𝜕𝑦𝑑+1:𝐷

𝜕𝑥1:𝑑𝑇 diag(exp[𝑠(𝑥1:𝑑)])

oWhat is the log-determinant?

RealNVP: Coupling layers

Page 64: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 66

𝜕𝑦

𝜕𝑥𝑇=

𝕀𝑑 0𝜕𝑦𝑑+1:𝐷

𝜕𝑥1:𝑑𝑇 diag(exp[𝑠(𝑥1:𝑑)])

oWhat is the log-determinant?

log det𝜕𝑦

𝜕𝑥𝑇=

𝑖

𝑠 𝑥1:𝑑 𝑗

oWhat convenient observation do we make?

RealNVP: Coupling layers

Page 65: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 67

𝜕𝑦

𝜕𝑥𝑇=

𝕀𝑑 0𝜕𝑦𝑑+1:𝐷

𝜕𝑥1:𝑑𝑇 diag(exp[𝑠(𝑥1:𝑑)])

oWhat is the log-determinant?

log det𝜕𝑦

𝜕𝑥𝑇=

𝑖

𝑠 𝑥1:𝑑 𝑗

oWhat convenient observation do we make?

o Computing the log-determinant does not require compute the inverse or the determinant (or any other complex operation) of 𝑠 𝑥1:𝑑

o So, 𝑠 𝑥1:𝑑 can be as complex as we want

o For example?

RealNVP: Coupling layers

Page 66: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 68

𝜕𝑦

𝜕𝑥𝑇=

𝕀𝑑 0𝜕𝑦𝑑+1:𝐷

𝜕𝑥1:𝑑𝑇 diag(exp[𝑠(𝑥1:𝑑)])

oWhat is the log-determinant?

log det𝜕𝑦

𝜕𝑥𝑇=

𝑖

𝑠 𝑥1:𝑑 𝑗

oWhat convenient observation do we make?

o Computing the log-determinant does not require compute the inverse or the determinant (or any other complex operation) of 𝑠 𝑥1:𝑑

o So, 𝑠 𝑥1:𝑑 can be as complex as we want

o For example? Neural Networks

RealNVP: Coupling layers

Page 67: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 69

oThe inverse function is sort of trivial𝑦1:𝑑 = 𝑥1:𝑑

𝑦𝑑+1:𝐷 = 𝑥𝑑+1:𝐷 ⊙ exp 𝑠 𝑑1:𝑑 + 𝑡(𝑥1:𝑑)⇔

𝑥1:𝑑 = 𝑦1:𝑑𝑥𝑑+1:𝐷 = (𝑦𝑑+1:𝐷−𝑡(𝑥1:𝑑)) ⊙ exp −𝑠 𝑑1:𝑑

oAgain, no complex operations required neural networks ok

oDevil in the details:

RealNVP: Coupling layers invertibility

Page 68: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 70

oThe inverse function is sort of trivial𝑦1:𝑑 = 𝑥1:𝑑

𝑦𝑑+1:𝐷 = 𝑥𝑑+1:𝐷 ⊙ exp 𝑠 𝑑1:𝑑 + 𝑡(𝑥1:𝑑)⇔

𝑥1:𝑑 = 𝑦1:𝑑𝑥𝑑+1:𝐷 = (𝑦𝑑+1:𝐷−𝑡(𝑥1:𝑑)) ⊙ exp −𝑠 𝑑1:𝑑

oAgain, no complex operations required neural networks ok

oDevil in the details: The transformation must retain dimensionality

oSo, no bottleneck very large feature maps 𝑡(𝑥1:𝑑)

RealNVP: Coupling layers invertibility

Page 69: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 71

oSpatially: checkers pattern

oChannel-wise: half and half masking

How to split? Masked convolutions

Page 70: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 72

oPotential problem?

How to split? Masked convolutions

Page 71: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 73

oPotential problem?

oAlways the same dimensions in 𝑦1:𝑑 = 𝑥1:𝑑. An easy fix?

How to split? Masked convolutions

Page 72: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 74

oDo not use the same dimensions for the transformations when moving from one layer to the other

o Instead, alternate

oThe Jacobian still remains tractable

oAnd the inverse also

RealNVP: reversing dimensions between coupling layers

Page 73: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 75

Some results

Page 74: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 76

Qualitative results

Real samples

Generated samples

Page 75: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 77

GLOW

https://arxiv.org/abs/1807.03039

Page 76: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 78

GLOW

https://arxiv.org/abs/1807.03039

Generalization of permutation operation in RealNVP

Replaces BatchNorm that needs large minibatches

Page 77: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 79

GLOW

https://arxiv.org/abs/1807.03039

Page 78: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 80

GLOW pipeline

Page 79: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 81

Some results

Page 80: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 82

Some visual examples

Page 81: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 83

o In reality, images are continuous signals but recorded digitally (integers)

oFitting a continuous density function will therefore encourage a degenerate solution that puts all probability mass on discrete data points

oAn easy solution is “dequantization” (Uria et al. 2013, Dinh et al., 2016, Salimans et al., 2017)

o If your image is 𝑥 you simply add a bit of noise, so that 𝑥 does not gather around specific (discrete) values

𝑦 = 𝑥 + 𝑢, 𝑢~Uniform(0, 1)

FLOW++

https://arxiv.org/abs/1902.00275

Page 82: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 84

o It can be shown that dequantization optimizes a lower bound on the original discrete data

𝑃𝑚𝑜𝑑𝑒𝑙 𝑥 = න0,1 𝐷

𝑝𝑚𝑜𝑑𝑒𝑙 𝑥 + 𝑢 𝑑𝑢

oSo, since we are anyways using a (uniform) distribution 𝑢, why not … ?

FLOW++ dequantization

Page 83: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 85

o It can be shown that dequantization optimizes a lower bound on the original discrete data

𝑃𝑚𝑜𝑑𝑒𝑙 𝑥 = න0,1 𝐷

𝑝𝑚𝑜𝑑𝑒𝑙 𝑥 + 𝑢 𝑑𝑢

oSo, since we are anyways using a (uniform) distribution 𝑢, why not … ?

oLearn the optimal noise distribution. How?

FLOW++ dequantization

Page 84: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 86

o It can be shown that dequantization optimizes a lower bound on the original discrete data

𝑃𝑚𝑜𝑑𝑒𝑙 𝑥 = න0,1 𝐷

𝑝𝑚𝑜𝑑𝑒𝑙 𝑥 + 𝑢 𝑑𝑢

oSo, since we are anyways using a (uniform) distribution 𝑢, why not … ?

oLearn the optimal noise distribution. How?

oVariational Inference to the rescue Variational Dequantization

FLOW++ dequantization

Page 85: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 87

𝔼𝑥~𝑃𝑑𝑎𝑡𝑎 log 𝑃𝑚𝑜𝑑𝑒𝑙 𝑥 ≥ 𝔼𝑥~𝑃𝑑𝑎𝑡𝑎,𝜀~𝑝 𝜀 log𝑝𝑚𝑜𝑑𝑒𝑙 𝑥 + 𝑞𝑥 𝜀

𝜋 𝜀 ൗ𝜕𝑞𝑥𝜕𝜀

−1

oThe noise model 𝑞𝑥 𝜀 is implemented as a flow-based generative model

oAs 𝑝𝑚𝑜𝑑𝑒𝑙 is also flow-based, computing the Jacobians is possible

Variational Dequantization

Page 86: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 88

Some results

Page 87: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 89

Some visual examples

Page 88: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 90

oThree families of likelihood-based generative models◦Variational autoencoders

◦Autoregressive models

◦Flow-based models

oAutoregressive and flow-based models are exact-likelihood models◦They compute 𝑝(𝑥) directly, not an approximation or an estimation or a bound

A recap

Page 89: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES ADVANCED GENERATIVE MODELS - 91

A summary of properties

J. Tomczak’s lecture from April, 2019

Page 90: Lecture 11: Advanced Generative Modelsuvadlc.github.io/lectures/nov2019/lecture11.pdfLecture 11: Advanced Generative Models Efstratios Gavves UVA DEEP LEARNING COURSE –EFSTRATIOS

UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES

ADVANCED GENERATIVE MODELS - 92

Summary

o Exact likelihood models◦ Autoregressive Models

◦ Non-autoregressive flow-based models

o Autoregressive Models◦ NADE, MADE, PixelCNN, PixelCNN++, PixelRNN

o Normalizing Flows

o Non-autoregressive flow-based models◦ RealNVP

◦ Glow

◦ Flow++