f-GANs in an Information Geometric...

1
f-GANs in an Information Geometric Nutshell Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson (i) complete information-theoretic layer of f-GANs (Nowozin et al.’16), (ii) provide equivalent information-geometric layer showing the fitting power of f-GANs (iii) show the concinnity of deep architectures to the information-geometric layer, (iv) use it to devise improvements to the generator / discriminator in GAN game tagline Longer ArXiv version (# 1707.04385): more extensive treatment of the vig-f-GAN identity, analysis of penalty , vig-f-GAN identity in the expected utility theory, relationships with feature matching, etc. I f (PkQ) . = E XQ h f P (X) Q(X) ⌘i f : R + ! R f (1) = 0 convex, Information theory f-divergence D ' (k) . = '() - '() - (- ) > r'() convex differentiable ' Information geometry Bregman divergence log χ (z ) . = R z 1 1 χ(t) dt exp χ (z ) . =1+ R z 0 λ(t)dt Z . = R X χ(P χ,C (x|, φ))dμ(x) ˜ P χ,C . = 1 Z · χ(P χ,C ) φ : X ! R d C : ! R P χ,C (x|, φ) . = exp χ (φ(x) > - C ()) χ : R + ! R + non-decreasing λ(log χ (z )) . = χ(z ) with signature -exponential χ -logarithm χ -exponential family χ -escort χ density: cumulant sufficient statistics coordinate density: normalisation example: log χ = log exponential family exp χ = exp P C (x|, φ) = exp(φ(x) > - C ()) I kl (P kQ )= D C (k) χ = Id (= ˜ P C ) experiments: new generator/discriminator components 3 deep architectures in the vig-f-GAN 1 R d l 3 φ l (x) . = v (w l φ l-1 (x)+ b l ) , 8l 2 {1, 2, ..., L} , φ 0 (x) . = x 2 X . g (x) . = v out (Γφ L (x)+ β ) g : X ! R d standard deep generator architecture , (inner) deep layers L with Suppose invertible, . Let . Then for any continuous signature , there exists activation and offsets ( ) such that for any output , the generator’s density satisfies Q g (z )= f ( ˜ Q deep (x)) x . = g -1 (z ) z 8l 2 {1, 2, ..., L} b l 2 R d v v out , Γ, w l χ net 8l 2 {1, 2, ..., L} Q g (z ) with Hence, the deep generator architecture is able to fit complex escorts for particular choices of inner activation but does this hold for popular s? Define to be strongly admissible iff and is , lowerbounded, strictly increasing, convex. It is weakly admissible iff , strongly admissible such that . 2 with complete proper loss layer for the (vig-)f-GAN game I f (PkQ) / L (Q) L (Q) . = sup T : X!R E XP [-` (+1, T (X))] + E XQ [-` (-1, T (X))] Theorem ` (-1,z ) . = f ? f 0 -1 (z ) 1- -1 (z ) ⌘⌘ ` (+1,z ) . = -f 0 -1 (z ) 1- -1 (z ) `(+1,z ) . = -z ` x (-1,z ) . = - log (χ ) 1 ˜ Q(x) (-z ) χ (t) . =1/χ -1 (1/t) : (0, 1) ! R invertible link function loss function Reid & Williamson’11 with -1 = fake, +1 = real and Theorem (A) v v “deep” sufficient statistics coordinate cumulant -family χ ˜ Q deep (x) . = L Y l=1 d Y i=1 ˜ P χ net ,b l,i (x|w l,i , φ l-1 ) v dom(v ) \ R + 6= ; C 1 v 8> 0 9v kv - v k L 1 < see paper for details μ-ReLU(z ) . = z + p (1-μ) 2 +z 2 +μ-1 2 LSUN “tower” ( ) μ =0.4 MNIST MNIST Exp. A: replacing the sigmoid link by Matshushita’s in discriminator: mat (z ) . = (1/2) · (1 + z/ p 1+ z 2 ) Exp. B: replacing ReLU activation in the generator by strongly admissible generalization, -ReLU μ Code: https://github.com/qulizhen/fgan info geometric J (Q ) f gan (z ) . = z log z - (z + 1) log(z + 1) + 2 log 2 χ gan (z ) . = 1 log ( 1+ 1 z ) GAN game discriminator generator D C (k)+ J (Q ) sup ! {E XP [T ! (X)] - E X˜ Q [(- log χ ˜ Q ) ? (T ! (X))]} sup ! {E XP [T ! (X)] - E XQ [f ? (T ! (X))]} KL χ ˜ Q ( ˜ Q kP ) I f (PkQ) = = = = * * * * Information theory Information geometry Nowozin et al.’16 nature discriminator generator nature discriminator generator vig-f-GAN Nowozin et al.’16 show f -GAN(P, Q)= I f (PkQ) we show (variational information-geometric f-GAN) f -GAN(P, escort(Q)) = D (k#) + Penalty(Q) In short: * * ReLU = lim 1 μ-ReLU Layers in the GAN (*=conditions apply) v (A) holds for any strongly admissible . The following activations are (weakly or strongly) admissible: ELU, ReLU, leaky ReLU, Softplus see paper for more examples and details Theorems v

Transcript of f-GANs in an Information Geometric...

Page 1: f-GANs in an Information Geometric Nutshellusers.cecs.anu.edu.au/~rnock/docs/nips17-ncmqw-poster.pdf · -exponential family -escort density: cumulant sufficient statistics coordinate

f-GANsinanInformationGeometricNutshell!

RichardNock,ZacCranko,AdityaKrishnaMenon,LizhenQu,RobertC.Williamson

(i) complete information-theoretic layer of f-GANs (Nowozin et al.’16), (ii) provide equivalent information-geometric layer showing the fitting power of f-GANs (iii) show the concinnity of deep architectures to the information-geometric layer, (iv) use it to devise improvements to the generator / discriminator in GAN game

tagline

Longer ArXiv version (# 1707.04385): more extensive treatment of the vig-f-GAN identity, analysis of penalty , vig-f-GAN identity in the expected utility theory, relationships with feature matching, etc.

If (PkQ).= EX⇠Q

hf⇣

P (X)Q(X)

⌘i

f : R+ ! R f(1) = 0convex,

Information theory f-divergence

D'(✓k⇢).= '(✓)� '(⇢)� (✓ � ⇢)>r'(⇢)

convex differentiable'

Information geometry Bregman divergence

log�(z).=

R z1

1�(t)dt

exp�(z).= 1 +

R z0 �(t)dt Z

.=

RX�(P�,C(x|✓,�))dµ(x)

P̃�,C.= 1

Z · �(P�,C)

✓� : X ! Rd

C : ⇥ ! RP�,C(x|✓,�)

.= exp�(�(x)

>✓ � C(✓))

� : R+ ! R+non-decreasing

�(log�(z)).= �(z)with

signature

-exponential�

-logarithm�

-exponential family� -escort�

density:

cumulantsufficient statisticscoordinate

density:

normalisation

example:log� = logexponential familyexp� = exp

PC(x|✓,�) = exp(�(x)

>✓ � C(✓)) Ikl(P⇢kQ✓) = DC(✓k⇢)� = Id (= P̃C)

experiments: new generator/discriminator components3

deep architectures in the vig-f-GAN1

Rdl 3 �l(x).= v(wl�l�1(x) + bl) , 8l 2 {1, 2, ..., L} ,

�0(x).= x 2 X .

g(x).= v

out

(��L(x) + �)

g : X ! Rdstandard deep generator architecture , (inner) deep layersL

with

Suppose invertible, . Let . Then for any continuous signature , there exists activation and offsets ( ) such that for any output , the generator’s density satisfiesQg(z) = f(Q̃deep(x))

x

.= g

�1(z)

z

8l 2 {1, 2, ..., L}

bl 2 Rdv

vout

,�,wl

�net

8l 2 {1, 2, ..., L}Qg(z)

with

Hence, the deep generator architecture is able to fit complex escorts for particular choices of inner activation — but does this hold for popular s? Define to be strongly admissible iff and is , lowerbounded, strictly increasing, convex. It is weakly admissible iff , strongly admissible such that .

2

with

complete proper loss layer for the (vig-)f-GAN game

If (PkQ) / L (Q)

L (Q).= supT : X!R

⇢E

X⇠P[�` (+1, T (X))] + E

X⇠Q[�` (�1, T (X))]

Theorem

` (�1, z).= f?

⇣f 0

⇣ �1(z)

1� �1(z)

⌘⌘` (+1, z)

.= �f 0

⇣ �1(z)

1� �1(z)

`(+1, z).= �z`

x

(�1, z).= � log(�•) 1

Q̃(x)

(�z)

�•(t).= 1/��1(1/t)

: (0, 1) ! Rinvertible link functionloss function

Reid & Williamson’11

with -1 = fake, +1 = real andTheorem (A)

v v

“deep” sufficient statisticscoordinatecumulant -family�

Q̃deep(x).=

LY

l=1

dY

i=1

P̃�net,bl,i(x|wl,i,�l�1)

vdom(v) \ R+ 6= ; C1v 8✏ > 0

9v✏ kv � v✏kL1 < ✏

see paper for details

µ-ReLU(z).=

z+p

(1�µ)2+z2+µ�1

2

LSUN “tower” ( )µ = 0.4

MNIST

MNIST

Exp. A: replacing the sigmoid link by Matshushita’s in discriminator:

mat(z).= (1/2) · (1 + z/

p1 + z2)

Exp. B: replacing ReLU activation in the generator by strongly admissible generalization, -ReLUµ

Code: https://github.com/qulizhen/fgan info geometric

J(Q✓)

fgan(z).= z

log z �(z

+

1) log(z+

1)+

2log

2

�gan(z).= 1

log(1+ 1z )

GAN gamediscriminatorgenerator

DC(✓k⇢) + J(Q✓)

sup!{EX⇠P⇢ [T!(X)]� EX⇠Q̃✓[(� log�Q̃✓

)

?(T!(X))]}sup!{EX⇠P[T!(X)]� EX⇠Q✓ [f

?(T!(X))]}

KL�Q̃✓(Q̃✓kP⇢)If (PkQ) = =

==**

**

Information theory

Information geometry

Nowozin et al.’16

naturediscriminator generator naturediscriminator generator

vig-f-GAN

Nowozin et al.’16 show f -GAN(P, Q) = If (PkQ) we show (variational information-geometric f-GAN)f -GAN(P, escort(Q)) = D(✓k#) + Penalty(Q)In short: * *

ReLU = lim1 µ-ReLU

Laye

rs in

the

GA

N

(*=conditions apply)

v(A) holds for any strongly admissible . The following activations are (weakly or strongly) admissible: ELU, ReLU, leaky ReLU, Softplussee paper for more examples and details

Theorems v