Neural Scene De-renderingnsd.csail.mit.edu/talks/nsd_poster_cvpr.pdfNeural Scene De-rendering Jiajun...
Transcript of Neural Scene De-renderingnsd.csail.mit.edu/talks/nsd_poster_cvpr.pdfNeural Scene De-rendering Jiajun...
Neural Scene De-rendering Jiajun Wu1 Joshua B. Tenenbaum1 Pushmeet Kohli2,*
1 Massachusetts Institute of Technology 2 DeepMind * Work done when the author was with Microsoft Research
Scene De-renderingGoal: a compact, interpretable scene representation
Applications
Motivation• An object-based, disentangled representation has wide applications• Representations learned by current deep nets are hard to interpret
InpaintingSolution: looping in a forward graphics engine in recognitionAdvantages• Graphics engines bring in symbolic representation naturally• Graphics engines generalize well to a variable number of objects• The learned representation is rich, and has wide applications.
Model
<objects><balloon: right><bench: yellow><tree: right><boy: stand happy><girl: sit sad>
</objects>
<objects><pig: left close><villager: left><tree: tall right>
</objects>
De-render
Render
De-render
Render
(a) Input image
Interpreting proposals
Rendering images
(c) Inference
Proposingsegments
Applications
(d) Rendered image(I) (II) (III)
(b) Segment proposals
Imageediting:
Captioning: The boy is …
Inpainting, analogy-making, …
(prediction loss) (reconstruction loss)
Scene XML<object>
<category>triangle</category><size>1.5</size><color>blue</color><position>1.5,2,1</position><yaw>0</yaw>…
</object><object>…
(c) Original image(a) Corrupted input (b) Reconstruction
: :: :
: :: :
: :: :Reference 𝐴′Reference 𝐴 Query 𝐵 Prediction 𝐵′
: :: :
Analogy-Making
GraphicsEngine
Inference & Reconstruction
If = 0 If = 1
(a) Input image (c) NSD(b) CNN+LSTM (d) NSD (full)
Results
Training Steps• Supervised pre-training with the prediction loss• End-to-end fine-tuning with the reconstruction loss (with REINFORCE)
• Graphics engines as generalized decoders• Visually distinctive images may have similar representations• Solution: Optimizing in both spaces
jenny and mike are having fun in the sandbox unaware of the storm that's coming their way
Caption Retrieval