10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE...
Transcript of 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE...
![Page 1: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/1.jpg)
10417/10617IntermediateDeepLearning:
Fall2019RussSalakhutdinov
Machine Learning [email protected]
https://deeplearning-cmu-10417.github.io/
Variational Autoencoders
![Page 2: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/2.jpg)
MotivatingExample• Canwegenerateimagesfromnaturallanguagedescriptions? Astopsignisflyingin
blueskies
Apaleyellowschoolbusisflyinginblueskies
Aherdofelephantsisflyinginblueskies
Alargecommercialairplaneisflyinginblueskies
(Mansimov,Parisotto,Ba,Salakhutdinov,2015)
2
![Page 3: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/3.jpg)
OverallModel
VariationalAutoecnoder
StochasticLayer
3
![Page 4: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/4.jpg)
Motivation• Hinton,G.E.,Dayan,P.,Frey,B.J.andNeal,R.,Science1995
Inputdata
h3
h2
h1
v
W3
W2
W1
GenerativeProcessApproximate
Inference
• Kingma&Welling,2014
• Rezende,Mohamed,Daan,2014
• Mnih&Gregor,2014
• Bornschein&Bengio,2015
• Tang&Salakhutdinov,2013
4
![Page 5: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/5.jpg)
VariationalAutoencoders(VAEs)• TheVAEdefinesagenerativeprocessintermsofancestralsamplingthroughacascadeofhiddenstochasticlayers:
h3
h2
h1
v
W3
W2
W1
Eachtermmaydenoteacomplicatednonlinearrelationship
• Samplingandprobabilityevaluationistractableforeach.
GenerativeProcess
• denotesparametersofVAE.
• isthenumberofstochasticlayers.
Inputdata 5
![Page 6: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/6.jpg)
VAE:Example• TheVAEdefinesagenerativeprocessintermsofancestralsamplingthroughacascadeofhiddenstochasticlayers:
Thistermdenotesaone-layerneuralnet.
DeterministicLayer
StochasticLayer
StochasticLayer
• denotesparametersofVAE.
• Samplingandprobabilityevaluationistractableforeach.
• isthenumberofstochasticlayers.
6
![Page 7: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/7.jpg)
RecognitionNetwork• Therecognitionmodelisdefinedintermsofananalogousfactorization:
Inputdata
h3
h2
h1
v
W3
W2
W1
GenerativeProcess
Eachtermmaydenoteacomplicatednonlinearrelationship
• Theconditionals:
areGaussianswithdiagonalcovariances
ApproximateInference
• Weassumethat
7
![Page 8: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/8.jpg)
VariationalBound• TheVAEistrainedtomaximizethevariationallowerbound:
Inputdata
h3
h2
h1
v
W3
W2
W1
• Hardtooptimizethevariationalboundwithrespecttotherecognitionnetwork(high-variance).
• KeyideaofKingmaandWellingistousereparameterizationtrick.
• Tradingoffthedatalog-likelihoodandtheKLdivergencefromthetrueposterior.
8
![Page 9: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/9.jpg)
ReparameterizationTrick• AssumethattherecognitiondistributionisGaussian:
withmeanandcovariancecomputedfromthestateofthehiddenunitsatthepreviouslayer.
• Alternatively,wecanexpressthisintermofauxiliaryvariable:
9
![Page 10: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/10.jpg)
• AssumethattherecognitiondistributionisGaussian:
• Or
DeterministicEncoder
• Therecognitiondistributioncanbeexpressedintermsofadeterministicmapping:
Distributionofdoesnotdependon
ReparameterizationTrick
10
![Page 11: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/11.jpg)
ComputingtheGradients• Thegradientw.r.ttheparameters:bothrecognitionandgenerative:
Gradientscanbecomputedbybackprop
Themappinghisadeterministicneuralnetforfixed.
11
![Page 12: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/12.jpg)
wherewedefinedunnormalizedimportanceweights:
• VAEupdate:Lowvarianceasitusesthelog-likelihoodgradientswithrespecttothelatentvariables.
• Thegradientw.r.ttheparameters:recognitionandgenerative:
• Approximateexpectationbygeneratingksamplesfrom:
ComputingtheGradients
12
![Page 13: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/13.jpg)
VAE:Assumptions• Rememberthevariationalbound:
• Thevariationalassumptionsmustbeapproximatelysatisfied.
• Weshowthatwecanrelaxtheseassumptionsusingatighterlowerboundonmarginallog-likelihood.
• Theposteriordistributionmustbeapproximatelyfactorial(commonpractice)andpredictablewithafeed-forwardnet.
13
![Page 14: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/14.jpg)
ImportanceWeightedAutoencoders• Considerthefollowingk-sampleimportanceweightingofthelog-likelihood:
wherearesampledfromtherecognitionnetwork.
Inputdata
h3
h2
h1
v
W3
W2
W1
unnormalizedimportanceweights
14
![Page 15: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/15.jpg)
ImportanceWeightedAutoencoders• Considerthefollowingk-sampleimportanceweightingofthelog-likelihood:
• Thisisalowerboundonthemarginallog-likelihood:
• SpecialCaseofk=1:SameasstandardVAEobjective.
• UsingmoresamplesàImprovesthetightnessofthebound.15
![Page 16: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/16.jpg)
TighterLowerBound
• Forallk,thelowerboundssatisfy:
• Usingmoresamplescanonlyimprovethetightnessofthebound.
• Moreoverifisbounded,then:
16
![Page 17: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/17.jpg)
ComputingtheGradients• Wecanusetheunbiasedestimateofthegradientusingreparameterizationtrick:
wherewedefinenormalizedimportanceweights:
17
![Page 18: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/18.jpg)
IWAEsvs.VAEs• Drawk-samplesformtherecognitionnetwork- ork-setsofauxiliaryvariables.
• ObtainthefollowingMonteCarloestimateofthegradient:
• ComparethistotheVAE’sestimateofthegradient:
18
![Page 19: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/19.jpg)
Firstterm:- Decoder:encouragesthegenerativemodeltoassignhighprobabilitytoeach.
IWAE:Intuition• Thegradientofthelogweightsdecomposes:
DeterministicEncoder
Deterministicdecoder
Inputdata
h3
h2
h1
v
W3
W2
W1
.- Encoder:encouragestherecognitionnettoadjustitslatentstateshsothatthegenerativenetworkmakesbetterpredictions.
19
![Page 20: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/20.jpg)
Secondterm:- Encoder:encouragestherecognitionnetworktohaveaspread-outdistributionoverpredictions.
IWAE:Intuition• Thegradientofthelogweightsdecomposes:
DeterministicEncoder
Deterministicdecoder
Inputdata
h3
h2
h1
v
W3
W2
W1
20
![Page 21: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/21.jpg)
TwoArchitectures
• FortheMNISTexperiments,weconsideredtwoarchitectures:
784
200
200
50
DeterministicLayers
1-stochasticlayer
784
200
200
100
100
100
50
2-stochasticlayers
StochasticLayers
DeterministicLayers
DeterministicLayers
21
![Page 22: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/22.jpg)
MNISTResults
22
![Page 23: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/23.jpg)
MNISTResults
23
![Page 24: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/24.jpg)
LatentSpaceRepresentation• BothVAEsandIWAEstendtolearnlatentrepresentationswitheffectivedimensionsfarbelowtheircapacity.
• Measuretheactivityofthelatentdimensionuusingthestatistics:
• Optimizationissue?
• Thedistributionofconsistoftwoseparatedmodes.
• Inactivedimensionsàunitsdyingout.
24
![Page 25: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/25.jpg)
IWAEsvs.VAEs
25
![Page 26: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/26.jpg)
IWAEsvs.VAEs
26
![Page 27: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/27.jpg)
OMNIGLOTExperiments
27
![Page 28: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/28.jpg)
ModelingImagePatchesBSDSDataset
• Model8x8patches.
64
500
40
DeterministicLayer
1-stochasticlayerStochasticLayer
• Reporttestlog-likelihoodson10^68x8patches,extractedfromBSDStestdataset.
• EvaluationprotocolestablishedbyUria,MurrayandLarochelle):- adduniformnoisebetween0and1,divideby256,- subtractthemeananddiscardingthelastpixel 28
![Page 29: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/29.jpg)
TestLog-probabilitiesModel nats Bits/pixel
RNADE6hiddenlayers(Uriaet.al.2013) 155.2nats 3.55bit/pixel
MoG,200full-covariancemixture(ZoranandWeiss,2012)
152.8nats 3.50bit/pixel
IWAE(k=500) 151.4nats 3.47bit/pixelVAE(k=500) 148.0nats 3.39bit/pixelGSM(GaussianScaleMixture) 142nats 3.25bit/pixel
ICA 111nats 2.54bit/pixelPCA 96nats 2.21bit/pixel
Burda2015 29
![Page 30: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/30.jpg)
LearnedFilters
Burda2015 30
![Page 31: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/31.jpg)
MotivatingExample• Canwegenerateimagesfromnaturallanguagedescriptions? Astopsignisflyingin
blueskies
Apaleyellowschoolbusisflyinginblueskies
Aherdofelephantsisflyinginblueskies
Alargecommercialairplaneisflyinginblueskies
(Mansimov,Parisotto,Ba,Salakhutdinov,2015)
31
![Page 32: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/32.jpg)
OverallModel
VariationalAutoecnoder
StochasticLayer
32
![Page 33: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/33.jpg)
Sequence-to-Sequence• Sequence-to-sequenceframework.(Sutskeveretal.2014;Choetal.2014;Srivastavaetal.2015)
• Caption(y)isrepresentedasasequenceofconsecutivewords.• Image(x)isrepresentedasasequenceofpatchesdrawnoncanvas.
• Attentionmechanismover:- Words:Whichwordstofocusonwhengeneratingapatch- ImageLocationWheretoplacethegeneratedpatchesonthecanvas
33
![Page 34: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/34.jpg)
RepresentingCaptionsBidirectionalRNN
• ForwardRNNreadsthesentenceyfromlefttoright:
• BackwardRNNreadsthesentenceyfromrighttoleft:
• Thehiddenstatesarethenconcatenated:
34
![Page 35: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/35.jpg)
• Ateachstepthemodelgeneratesapxppatch.
DRAWModelwriteoperator:
whosefilterlocationsandscalesarecomputedfrom:
• ItgetstransformedintowxhcanvasusingtwoarraysofGaussianfilterbanks
(Gregoret.al.2015)
35
![Page 36: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/36.jpg)
OverallModel
• GenerativeModel:StochasticRecurrentNetwork,chainedsequenceofVariationalAutoencoders,withasinglestochasticlayer.
StochasticLayer
Gregoret.al.2015
(Mansimov,Parisotto,Ba,Salakhutdinov,2015)
Bidirectional LSTM
36
![Page 37: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/37.jpg)
OverallModel
• GenerativeModel:StochasticRecurrentNetwork,chainedsequenceofVariationalAutoencoders,withasinglestochasticlayer.• RecognitionModel:DeterministicRecurrentNetwork.Gregoret.al.2015
(Mansimov,Parisotto,Ba,Salakhutdinov,2015)
Bidirectional LSTM
StochasticLayer
37
![Page 38: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/38.jpg)
• Attention(alignment):Focusondifferentwordsatdifferenttimestepswhengeneratingpatchesandplacingthemonthecanvas.
Sentencerepresentation:dynamicallyweightedaverageofthehiddenstatesrepresentingwords.
Bahdanauet.al.2015
OverallModel
StochasticLayer
38
![Page 39: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/39.jpg)
GeneratingImages
• Imageisrepresentedasasequenceofpatches(t=1,…T)drawnoncanvas:
39
![Page 40: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/40.jpg)
GeneratingImages
• Imageisrepresentedasasequenceofpatches(t=1,…T)drawnoncanvas:
40
![Page 41: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/41.jpg)
GeneratingImages
• Imageisrepresentedasasequenceofpatches(t=1,…T)drawnoncanvas:
• Inpractice,weusetheconditionalmean:.41
![Page 42: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/42.jpg)
AlignmentModel
• Dynamicsentencerepresentationattimet:weightedaverageofthebi-directionalhiddenstates:
wherethealignmentprobabilitiesarecomputedas:
42
![Page 43: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/43.jpg)
Learning
• Maximizethevariationallowerboundonthemarginallog-likelihoodofthecorrectimagexgiventhecaptiony:
43
![Page 44: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/44.jpg)
Sharpening
• Additionalpostprocessingstep:useanadversarialnetworktrainedonresidualsofaLaplacianpyramidtosharpenthegeneratedimages(Dentonet.al.2015).
44
![Page 45: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/45.jpg)
MSCOCODataset• Contains83Kimages.
Linet.al.2014
• Eachimagecontains5captions.
• Standardbenchmarkdatasetformanyoftherecentimagecaptioningsystems.
45
![Page 46: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/46.jpg)
FlippingColorsAyellowschoolbusparkedintheparkinglot
Aredschoolbusparkedintheparkinglot
Agreenschoolbusparkedintheparkinglot
Ablueschoolbusparkedintheparkinglot
46
![Page 47: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/47.jpg)
FlippingBackgroundsAverylargecommercialplaneflyinginclearskies.
Averylargecommercialplaneflyinginrainyskies.
Aherdofelephantswalkingacrossadrygrassfield.
Aherdofelephantswalkingacrossagreengrassfield.
47
![Page 48: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/48.jpg)
FlippingObjectsThedecadentchocolatedesertisonthetable.
Abowlofbananasisonthetable..
Avintagephotoofacat. Avintagephotoofadog.
48
![Page 49: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/49.jpg)
QualitativeComparisonAgroupofpeoplewalkonabeachwithsurfboardsOurModel LAPGAN(Dentonet.al.2015)
FullyConnectedVAEConv-DeconvVAE
49
![Page 50: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/50.jpg)
VariationalLower-Bound• Wecanestimatethevariationallower-boundontheaveragetestlog-probabilities:
• Atleastwecanseethatwedonotoverfittothetrainingdata,unlikemanyotherapproaches.
Model Training Test
OurModel -1792,15 -1791,53Skipthought-Draw -1794,29 -1791,37noAlignDraw -1792,14 -1791,15
50
![Page 51: 10417/10617 Intermediate Deep Learning: Fall2019rsalakhu/10417/Lectures/Lecture_VAE.pdf• The VAE is trained to maximize the variational lower bound: Input data h3 h2 h1 v W3 W2 W1](https://reader036.fdocuments.net/reader036/viewer/2022090603/6057063939c4bd1ad4718d67/html5/thumbnails/51.jpg)
NovelSceneCompositionsAtoiletseatsitsopeninthebathroom
AskGoogle?
Atoiletseatsitsopeninthegrassfield
51