f-GANs in an Information Geometric...
Transcript of f-GANs in an Information Geometric...
f-GANsinanInformationGeometricNutshell!
RichardNock,ZacCranko,AdityaKrishnaMenon,LizhenQu,RobertC.Williamson
(i) complete information-theoretic layer of f-GANs (Nowozin et al.’16), (ii) provide equivalent information-geometric layer showing the fitting power of f-GANs (iii) show the concinnity of deep architectures to the information-geometric layer, (iv) use it to devise improvements to the generator / discriminator in GAN game
tagline
Longer ArXiv version (# 1707.04385): more extensive treatment of the vig-f-GAN identity, analysis of penalty , vig-f-GAN identity in the expected utility theory, relationships with feature matching, etc.
If (PkQ).= EX⇠Q
hf⇣
P (X)Q(X)
⌘i
f : R+ ! R f(1) = 0convex,
Information theory f-divergence
D'(✓k⇢).= '(✓)� '(⇢)� (✓ � ⇢)>r'(⇢)
convex differentiable'
Information geometry Bregman divergence
log�(z).=
R z1
1�(t)dt
exp�(z).= 1 +
R z0 �(t)dt Z
.=
RX�(P�,C(x|✓,�))dµ(x)
P̃�,C.= 1
Z · �(P�,C)
✓� : X ! Rd
C : ⇥ ! RP�,C(x|✓,�)
.= exp�(�(x)
>✓ � C(✓))
� : R+ ! R+non-decreasing
�(log�(z)).= �(z)with
signature
-exponential�
-logarithm�
-exponential family� -escort�
density:
cumulantsufficient statisticscoordinate
density:
normalisation
example:log� = logexponential familyexp� = exp
PC(x|✓,�) = exp(�(x)
>✓ � C(✓)) Ikl(P⇢kQ✓) = DC(✓k⇢)� = Id (= P̃C)
experiments: new generator/discriminator components3
deep architectures in the vig-f-GAN1
Rdl 3 �l(x).= v(wl�l�1(x) + bl) , 8l 2 {1, 2, ..., L} ,
�0(x).= x 2 X .
g(x).= v
out
(��L(x) + �)
g : X ! Rdstandard deep generator architecture , (inner) deep layersL
with
Suppose invertible, . Let . Then for any continuous signature , there exists activation and offsets ( ) such that for any output , the generator’s density satisfiesQg(z) = f(Q̃deep(x))
x
.= g
�1(z)
z
8l 2 {1, 2, ..., L}
bl 2 Rdv
vout
,�,wl
�net
8l 2 {1, 2, ..., L}Qg(z)
with
Hence, the deep generator architecture is able to fit complex escorts for particular choices of inner activation — but does this hold for popular s? Define to be strongly admissible iff and is , lowerbounded, strictly increasing, convex. It is weakly admissible iff , strongly admissible such that .
2
with
complete proper loss layer for the (vig-)f-GAN game
If (PkQ) / L (Q)
L (Q).= supT : X!R
⇢E
X⇠P[�` (+1, T (X))] + E
X⇠Q[�` (�1, T (X))]
�
Theorem
` (�1, z).= f?
⇣f 0
⇣ �1(z)
1� �1(z)
⌘⌘` (+1, z)
.= �f 0
⇣ �1(z)
1� �1(z)
⌘
`(+1, z).= �z`
x
(�1, z).= � log(�•) 1
Q̃(x)
(�z)
�•(t).= 1/��1(1/t)
: (0, 1) ! Rinvertible link functionloss function
Reid & Williamson’11
with -1 = fake, +1 = real andTheorem (A)
v v
“deep” sufficient statisticscoordinatecumulant -family�
Q̃deep(x).=
LY
l=1
dY
i=1
P̃�net,bl,i(x|wl,i,�l�1)
vdom(v) \ R+ 6= ; C1v 8✏ > 0
9v✏ kv � v✏kL1 < ✏
see paper for details
µ-ReLU(z).=
z+p
(1�µ)2+z2+µ�1
2
LSUN “tower” ( )µ = 0.4
MNIST
MNIST
Exp. A: replacing the sigmoid link by Matshushita’s in discriminator:
mat(z).= (1/2) · (1 + z/
p1 + z2)
Exp. B: replacing ReLU activation in the generator by strongly admissible generalization, -ReLUµ
Code: https://github.com/qulizhen/fgan info geometric
J(Q✓)
fgan(z).= z
log z �(z
+
1) log(z+
1)+
2log
2
�gan(z).= 1
log(1+ 1z )
GAN gamediscriminatorgenerator
DC(✓k⇢) + J(Q✓)
sup!{EX⇠P⇢ [T!(X)]� EX⇠Q̃✓[(� log�Q̃✓
)
?(T!(X))]}sup!{EX⇠P[T!(X)]� EX⇠Q✓ [f
?(T!(X))]}
KL�Q̃✓(Q̃✓kP⇢)If (PkQ) = =
==**
**
Information theory
Information geometry
Nowozin et al.’16
naturediscriminator generator naturediscriminator generator
vig-f-GAN
Nowozin et al.’16 show f -GAN(P, Q) = If (PkQ) we show (variational information-geometric f-GAN)f -GAN(P, escort(Q)) = D(✓k#) + Penalty(Q)In short: * *
ReLU = lim1 µ-ReLU
Laye
rs in
the
GA
N
(*=conditions apply)
v(A) holds for any strongly admissible . The following activations are (weakly or strongly) admissible: ELU, ReLU, leaky ReLU, Softplussee paper for more examples and details
Theorems v