Post on 15-Apr-2022
Lecture20:DeepGenerativeModelsCS109B,STAT121B,AC209B,CSE109B
MarkGlickmanandPavlosProtopapas
Thislectureisbasedonthefollowingtutorials:I.Goodfellow,GenerativeAdversarialNetworks(GANs),NIPS2016TutorialShenlongWang,DeepGenerativeModels,http://www.cs.toronto.edu/~slwang/generative_model.pdf
GenerativeModelDensityEstimation
SampleGeneration
Whygenerativemodeling?
• Modelinghigh-dimprobabilitydistributions• Denoisingdamagedsamples• SimulatefutureeventsforplanningandRL• Imputingmissingdata– Semi-supervisedLearning
• Multi-modaldata– Multiplecorrectanswers
NextVideoFramePrediction
Lotteretal.,2016
SingleImageSuper-resolution
• Synthesizeahigh-resolutionequivalentfromalow-resolutionimage
Ledigetal.,2016
InteractiveImageGeneration
Zhuetal.,2016
Image-to-ImageTranslation
Isolaetal.,2016
TaxonomyofDeepGenerativeModels
DeepGenerativeModels
ExplicitDensity
MarkovChain
Variational
ImplicitDensity
MarkovChain
Direct
VariationalAuto-encoders(VAE)
BoltzmannMachinesDeepBeliefNetworks
GenerativeAdversarialNetworks(GAN)
GenerativeMomentMatchingNetworks
(GMM)
Outline
• ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)
• ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)
BoltzmannMachines
E(x) = −xTUx − bT x
x:d-dimensionalbinaryvectorEnergyfunctiongivenby:
P(x) = exp(−E(x))Z
Z = exp(−E(x))x∑
VisibleandHiddenUnits
E(v,h) = − vTRv− vTWh− hTSh− bTv− cTh
x:d-dimensionalbinaryvectorEnergyfunctiongivenby:
P(v,h) = exp(−E(v,h))Z
Z = exp(−E(v,h))v,h∑
TrainingBoltzmannMachines
• Maximum-likelihoodestimation• Partitionfunctionisintractable• MaybeestimatedwithMCMCmethods
RestrictedBoltzmannMachines
• Noconnectionsamonghiddenunits,andamongvisibleunits(bipartitegraph)
RestrictedBoltzmannMachines
E(v,h) = − bTv− cTh− vTWh
x:d-dimensionalbinaryvectorEnergyfunctiongivenby:
P(v,h) = exp(−E(v,h))Z
Z = exp(−E(v,h))v,h∑
• Conditionaldistributionshavesimpleform
• EfficientMCMC:blockGibbssampling– AlternativelysamplefromP(h|v)andP(v|h)
P hj =1 v( ) = σ c j + (WTv) j( )P vj =1 h( ) = σ b j + (Wh) j( )
Sigmoid
TrainingRBMs
DeepBeliefNetworks
• Multiplehiddenlayers
Undirectedconnectionsbetweenlasttwolayers
IntroductionofDBNsin2006beganthecurrentdeeplearningrenaissance
DeepBeliefNetworks
h(K)
h(K-1)
h(1)
v
h(2)…
P vi =1 h(1)( ) = σ bi(1) +wi
(1) •h(1)( )
P hi(1) =1 h(2)( ) = σ bi
(2) +wi(2) •h(2)( )
P(h(K−1),h(K ) ) = 1Z
exp E(h(K−1),h(K ) )( )…
Layer-wiseDBNTraining
h(K)
h(K-1)
h(1)
v
h(2)…
Ev~pdatalog pθ (v)
TrainanRBMtomaximize
Layer-wiseDBNTraining
h(K)
h(K-1)
h(1)
v
h(2)…
Ev~pdataEh(1)~p h(1) v( ) log pθ1 (h
(1) )
TrainanRBMtomaximize
Layer-wiseDBNTraining
h(K)
h(K-1)
h(1)
v
h(2)…
Ev~pdata!E
h(K )~p h(K ) h(K−1)( )log pθK (h
(K ) )
TrainanRBMtomaximize
Outline
• ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)
• ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)
Recap:Autoencoder
Encoder Decoderz
VariationalAutoencoder
• Easiertotrainusinggradient-basedmethods• VAEsampling:
DecoderP(x|z)z
SamplefromP(z)StandardGaussian
VAELikelihood
pθ (x) = pθ x z( )z∫ pθ (z)dz
Difficulttoapproximateinhighdimthroughsampling
Formostzvaluesp(x|z)closeto0
Neuralnetwork
z ~N (0, I )
Xθ
VAELikelihood
pθ (x) = pθ x z( )z∫ qφ z x( )dz
Proposaldistribution:Likelytoproducevaluesofxforwhichp(x|z)isnon-zero
Xθ
Anotherneuralnet z
φ
VAEArchitecture
Encoder
Decoderpθ(x|z)
z
Mean µ
SD σ
Sample from
N (µ,σ )
qϕ(z|x)
VAELoss
−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss
ProposaldistributionshouldresembleaGaussian
VAELoss
−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss
ProposaldistributionshouldresembleaGaussian
VAELoss
−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss
ProposaldistributionshouldresembleaGaussian
≥ − log pθ (x)
Variationalupperboundonlosswecareabout!
TrainingVAE
• Applystochasticgradientdescent• Samplingstepnotdifferentiable• Useare-parameterizationtrick– Movesamplingtoinputlayer,sothatthesamplingstepisindependentofthemodel
BoltzmannMachinesvs.VAE
Pros:• VAEiseasiertotrain• DoesnotrequireMCMCsampling• Hastheoreticalbacking• Applicabletowiderrangeofmodels
Cons:• SamplesfromVAEtendtobeblurry
KingmaandWelling,2014
Outline
• ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)
• ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)
GenerativeAdversarialNetworks
• NoMarkovchainsneeded• Novariationalapproximationsneeded• Oftenregardedasproducingbetterexamples
TwoPlayerGame
GeneratorG
Seekstocreatesamplesfrompdata
DiscriminatorD
Classifiessamplesasrealorfake
Seekstotrickdiscriminatorintobelievingthatitssamples
aregenuine
Seekstonotgettrickedbythegenerator
GANOverview
GeneratorG
DiscriminatorD
z
Realor
Fake
Trainingsample
Codedrawnfromapriordistribution
Min-maxCost
V (θ D,θG ) = Ex~pdatalogD(x) + Ez log(1−D(G(z)))
Discriminatorseekstopredict1ontraining
samples
Discriminatorwantstopredict0onsamples
generatedbyG
GeneratorwantsDtonotdistinguishbetweenoriginalandgeneratedsamples!
J (G ) =V (θ (D),θ (G ) ) J (D) = −V (θ (D),θ (G ) )
Min-maxCost
• Equilibriumisasaddle-point• Trainingisdifficultinpractice
minθGmaxθD
J(θG,θ D )
TrainingGANs
• Samplemini-batchoftrainingimagesx,andgeneratorcodesz
• UpdateGusingback-prop• UpdateDusingback-prop
Optional:Runkstepsofoneplayerforeverystepoftheotherplayer
Radfordetal.,2015