Deep Generative Models with Stick-Breaking Priorsenalisni/nalisnick_uciAIML_talk.pdf · Overview of...

Deep Generative Models with Stick-Breaking Priors

Eric NalisnickUniversity of California, Irvine

Overview of Deep Generative Models.1

Outline

DefinitionApplicationsTraining

Overview of Deep Generative Models.1

Outline

The Dirichlet Process.2

Applications

Stochastic Gradient Variational Bayes for the DP

Definition

Training

Overview of Deep Generative Models.

Deep Generative Models with Stick-Breaking Priors.

Outline

Stick-Breaking Variational Autoencoders

The Dirichlet Process.2

Applications

Dirichlet Process Variational AutoencodersStick-Breaking for Adaptive Computation

Stochastic Gradient Variational Bayes for the DP

Definition

Training

Overview of Deep Generative Models

“Generative modeling…denotes modeling tasks in which a density over all the observable quantities is constructed.”

David MacKayInformation Theory, Inference, and Learning Algorithms

p( y | x ) p( x, y )p( x )

Deep Generative Models

Density Network (MacKay & Gibbs, 1997)

One or More Neural Network Layers

Observed Data

Stochastic Latent Variable

Deep Generative Models

Density Network (MacKay & Gibbs, 1997)

One or More Neural Network Layers

Observed Data

Stochastic Latent Variable

Sampling from a Deep Generative Model

ImageNet

ImageNet Samples from a DGM

Samples from a DGM

(Bowman et al., 2016)

1.1 Why Generative Modeling?

Why Generative Modeling?

Cognitive Motivations.1

2 Semi-Supervised Learning: Regularizing discriminative models with unlabeled data.

} p( y | x )

} p( x, y )

Semi-Supervised Learning: Regularizing discriminative models with unlabeled data.

3 Simulating Environments for Reinforcement Learning.

1.2 How to Train a Deep Generative Model

Learning a Deep Generative Model

1 Importance Sampling

2 Variational Inference

3 Adversarial Training

1 Importance Sampling(MacKay & Gibbs, 1997)

1 Importance Sampling

Works only in low dimensions

(MacKay & Gibbs, 1997)

2 Variational InferencePosterior Approximation

Data Reconstruction Regularization

Problem: For neural network likelihoods, these integrals are analytically intractable

Data Reconstruction Regularization

Step #1: Use Monte Carlo Approximations

Problem: Still can’t compute gradient w.r.t. variational parameters

Step #2: Sample via Differentiable Non-Centered Parametrizations

(Kingma & Welling, 2014), (Rezende et al., 2014)

The Variational Autoencoder (VAE)

Classifier predicting if x is generated or observed data

3 Adversarial Training(Goodfellow et al., 2014)

Classifier predicting if x is generated or observed data

1 Minimize with respect to

Two step training:

2 Maximize with respect to

3 Adversarial Training(Goodfellow et al., 2014)

Generative Adversarial Networks (GANs)(Goodfellow et al., 2014)

Variational Bayes vs Adversarial Training

VAE GAN

Deep Generative Models Summary

(Original) Density Network

Variational Autoencoder

Generative Adversarial Network

Training Method

Importance Sampled Estimate of Marginal

Likelihood

Stochastic Gradient Variational Bayes

Adversarial Training

No auxiliary model

Backend auxiliary model

Frontend auxiliary model

Deep Generative Models Summary

(Original) Density Network

Variational Autoencoder

Generative Adversarial Network

Training Method

Importance Sampled Estimate of Marginal

Likelihood

Stochastic Gradient Variational Bayes

Adversarial Training

No auxiliary model

Backend auxiliary model

Frontend auxiliary model

The Dirichlet Process

Draw from GEM distribution

SGVB for The Dirichlet Process

Two Obstacles:1. Gradients through 2. Gradients thought samples from the mixture

Obstacle: The Beta distribution does not have a non-centered parametrization (except in special cases)

Differentiating Through πk

Obstacle: The Beta distribution does not have a non-centered parametrization (except in special cases)

Kumaraswamy Distribution: A Beta-like distribution with a closed-form inverse CDF. Use as variational posterior.

Poondi Kumaraswamy (1930-1988)

Differentiating Through πk

Obstacle: Not obvious how to use the reparametrization trick for samples from a mixture distribution.

Stochastic Backpropagation through Mixtures

Obstacle: Not obvious how to use the reparametrization trick for samples from a mixture distribution.

Stochastic Backpropagation through Mixtures

Brute Force Solution: Summing over all components results in a tractable ELBO but requires O(T) decoder propagations.

Why Combine the DP with DGMs?

Latent variable drawn from

stochastic process

1 Dynamic Latent Representations (Width)

stochastic process

2 Neural Networks with Adaptive Computation (Depth)

stochastic process

Automatic and Principled Model Selection (Based on Marginal Likelihood)

Neural Networks with Adaptive Computation (Depth)

stochastic process

3.1 The Stick-Breaking Variational Autoencoder

In collaboration with

Padhraic Smyth

MODEL #1: Stick-Breaking Variational Autoencoder

Kumaraswamy Samples

Truncated posterior; not necessary but learns faster

(Nalisnick & Smyth, 2017)

Stick-Breaking VAE

Truncation level of 50 dimensions, Beta(1,5) Prior

Samples from Generative Model (Nalisnick & Smyth, 2017)

Stick-Breaking VAE Gaussian VAE

50 dimensions, N(0,1) PriorTruncation level of 50 dimensions, Beta(1,5) Prior

Samples from Generative Model (Nalisnick & Smyth, 2017)

MNIST: Dirichlet Latent Space (t-SNE)

MNIST: Gaussian Latent Space (t-SNE)

MNIST: kNN Classifier on Latent Space

Quantitative Results for SB-VAE (Nalisnick & Smyth, 2017)

(Estimated) Marginal Likelihoods

Nonparametric version of (Kingma et al., NIPS 2014)’s M2 model

3.2 The Dirichlet Process Variational Autoencoder

Lars Hertel

Padhraic Smyth

MODEL #2: Dirichlet Process Variational Autoencoder (Nalisnick et al., 2016)

MNIST Samples from Component #1

MNIST Samples from Component #5

(Nalisnick et al., 2016)

Samples from Generative Model

MNIST: Dirichlet Process Latent Space (t-SNE)

Quantitative Results for DP-VAE (Nalisnick et al., 2016)

MNIST: Dirichlet Process Latent Space (t-SNE)

Quantitative Results for DP-VAE (Nalisnick et al., 2016)

3.3 Adaptive Computation Time via Stick-Breaking

Marc-Alexandre Cote

Hugo Larochelle

MODEL #3: Stick-Breaking Adaptive Computation

yky2y1

π1 π2 πk

Remaining stick is negligibly small

MODEL #3: Stick-Breaking Adaptive Computation

yky2y1

π1 π2 πk

Remaining stick is negligibly small

Quantitative Results for SB-RNN: Image Modeling

Quantitative Results for SB-RNN: Text Modeling

Superior Latent Spaces with NP priors: Seem to preserve class structure well, resulting in better discriminative properties. Their dynamic capacity naturally encodes factors of variation.

Conclusions

Adaptive Computation: Fusing NNs with nonparametric priors endows NNs with natural model selection properties.

Thank you. Questions?

Appendix

Deep Generative Models with Stick-Breaking Priorsenalisni/nalisnick_uciAIML_talk.pdf · Overview of...

Documents

Transcript of Deep Generative Models with Stick-Breaking Priorsenalisni/nalisnick_uciAIML_talk.pdf · Overview of...

Deep Generative Image Models using a Laplacian …papers.nips.cc/paper/5773-deep-generative-image-models...Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Learning Deep Generative Models - mit.edu9.520/spring10/Classes/class19_DBN_2010.pdf · Learning Deep Generative Models 9.520 Class 19 Ruslan Salakhutdinov BCS and CSAIL, MIT 1

Probabilistic Typology: Deep Generative Models of …eisner.acl17.pdf · Probabilistic Typology: Deep Generative Models of Vowel Inventories ... (Gordon,2016). Nevertheless, the empirical

Deep Generative Models

Deep Directed Generative Models with Energy-Based ...

A SYSTEMATIC STUDY OF DEEP GENERATIVE MODELS FOR …

Deep Generative The Success of Models · The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam PASC, 2018

Building LEGO Using Deep Generative Models of Graphs

Deep Generative Models - Columbia Universitystat.columbia.edu/~cunningham/teaching/GR8201/STAT_GR8201_20… · STAT G8201: Deep Generative Models 12 / 29 -VAE: qualitative assessment

Deep Generative Models for Distribution-Preserving Lossy Compression · 2018-10-27 · Deep Generative Models for Distribution-Preserving Lossy Compression Michael Tschannen ETH Zürich

O U DEEP GENERATIVE MODELS...Another emerging family of deep generative models is the Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), in which a discriminator is

Deep Generative Models and a Probabilistic Programming Libraryice.dlut.edu.cn/valse2018/ppt/zhujun.pdf · deep generative models semi-supervised learning & style transfer ZhuSuan

Hierarchical Deep Generative Models for Multi-Rate ...proceedings.mlr.press/v80/che18a/che18a.pdf · Hierarchical Deep Generative Models for Multi-Rate Multivariate Time Series vances

Deep Generative Models - McGill University School of ...wlh/comp766/files/chapter8_draft_apr8.pdf · Deep Generative Models The traditional graph generation approaches discussed in

Deep generative models for fast shower simulation in ATLAS - Bayesian Deep Learning ...bayesiandeeplearning.org › 2018 › papers › 24.pdf · 2018-12-07 · Deep generative models

Deep Generative Models - Columbia Universitystat.columbia.edu/~cunningham/teaching/GR8201/STAT_GR8201_2019… · STAT G8201: Deep Generative Models 4 / 34. Updates DGM website I Lecture

VAE-type Deep Generative Models

Deep generative models of natural imagesVariational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models Outline 1 Motivation

Deep Generative Models and Differentiable Inferencesrihari/CSE676/20.0-DeepGenerative.pdf · Deep Learning Srihari What are generative models? •Generative modeling refers to building

Deep Generative Models for Spatial Networks