IIBMP2016 深層生成モデルによる表現学習

68
深層成モデルによる表現学習 Preferred Networks 岡野原 [email protected] 2016/9/29 IIBMP2016(第五回命医薬情報学連合会)

Transcript of IIBMP2016 深層生成モデルによる表現学習

  • Preferred Networks

    [email protected]

    2016/9/29IIBMP2016

  • l

    l 2012 201420151500*

    l

    l

    2

    201422GoogLeNet [Google 2014]

    *http://memkite.com/deep-learning-bibliography/

  • 3

    l xh Wi

    x1

    x2

    x3

    +1

    w1w2w3w4

    h = a(x1w1+x2w2+x3w3+w4)

    h

    a ReLU: h(x) = max(0, x)a :

    ReLU

  • 4

    l y

    x1

    x2

    x3

    +1

    +1 +1

    y

  • 5

    https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

  • 20122014

    6

    AlexNet, Kryzyevsky+, 2012 ImageNet winner8

    GoogLeNet, Szegedy+, 2014

  • 2015

    x_1 h y_1

    x_2 h y_2

    x_3 h y_3

    x_4 h y_4

    BPTTlength=3

    Inputword OutputRecurrentstate

    Stochastic Residual Net, Huang+, 2016Recurrent NN

    FractalNet, Larsson+, 2016RoR, Zhang+, 2016 7Dense CNN, Huang+, 2016

  • (1/4)l l l(y, y*)= (y y*)2

    l I{wi}l

    x1

    x2

    x3

    +1

    +1 +1

    ly

    y*

  • (2/4)l : A1D

    l C1D16/12 dD / dC = 16/12 (CdD16/12d

    l B1dC / dB = 8/16dD/dB = (16/12)(8/16)=8/12

    l dD/dA = (dD/dC)*(dC/dB)*(dB/dA) =10/12 : 10/12

    AB C

    10

    816

    D

    12

  • (3/4)

    l l

    x1

    x2

    x3

    +1

    r s

    ly

    y*

    @l

    @y@l

    @y

    @y

    @s

    yl

    sl

    wl

    w

    @l

    @w=

    @l

    @y

    @y

    @s

    @s

    @w

    =r

  • (4/4)l L() v = L()/ L0 -vL()

    l t+1 := t vt >0 AdamRMSProp

    -v1

    1

    2

  • l

    l

    l 1000

    l

  • Lin [Lin+ 16]l

    l 1.

    242.

    3.

    4.

    13/50

  • (1/2)

    l xzl

    z

    x

    z, , (, [10, 2, -4], white

    x

    z

    x

    P(z|x)

  • 2/2

    l 1 c

    l

    l CG

    15/50

    z1

    x

    c

    h

    z2

    h

  • l

    z :

    c

    x

    l xz

    l

    16/50

    z

    c

    h

    h

    x

  • l

    l x x xz

    17/50

  • l NP NumberParameter SNPs

    l

    l

    18/50

  • 19/50

  • 1. PCAl PCA

    z N(0, I)zN(0, I)

    z m(z) = Wz + m(z)x

    x | z N(Wz + , 2 I)p(x) = p(x|z) p(z) PCA20

    l PCA

    20/50

    z

    x

  • 2 ICAl ICA

    z Lap() x | z N(Wz + , 2 I)

    p(x) = p(x|z) p(z) dz

    l ICAzW, u xk-zi [Vinnikov+ 14]

    21/50

    z

    x

  • l p(z) = p(zi) Fisher

    l p(x) = p(x|z)p(z) dz

    l z Disentanglement

    22/50

  • VAE2

    784

    1

  • l xz end-to-end

    24/50

    SVM

    End-to-End

  • l

    25/50

    x

    11

    x

    Linzx

  • 26/50

    l

  • 27/50

    l

  • l VAE GAN

    l

  • l P(x)x MCMC

    l P(x) P(x) P(x)L(x)

    l x

  • x

    P(x)

    VAE

    GAN

    Q(x)/P(x)

    Pixel CNNWaveNet

  • VAE [Kingma+ 13]

    z

    (, ) = Dec(z; )xN(, )

    x

    (, ) = Dec(z;

    x

    (1) z N(0, I)(2) (, ) = Dec(z; (3) x N(, I)

    p(x) = p(x|z)p(z)dz

  • VAE

    l p(x|z)p(z)p(x) = p(x|z)p(z)dz

  • VAE

    ELBO: Evidence lowerbound

    q(z|x)p(z|x)

    q(z|x)KL(q(z|x) || p(z|x))

    =

  • VAE (1/3)

    z

    (, ) = Dec(z; )xN(, I)

    x

    z

    x

    (, ) = Enc(x; )zN(, I)

    q(z|x)xz

    ()()

  • VA3 (2/3)

    x'

    z

    x

    xzzxxx

    KL(q(z|x)||p(z))

  • VAE (3/3)

    ((x-)/)2

    x'

    z

    x

  • VAE (3/3)

    x'

    z

    x

  • VAE

    z

  • VAE http://vdumoulin.github.io/morphing_faces/

    l 29 x : z : 29

    40

  • VAE[Kingma+ 14]

    41

    x'

    z

    x

    y

    y09

    z

    y

  • VAE

    l

    l [Burda+ 15] [Maaloe+ 16]

    l

  • (1/2)

    l [Kingma+14]l MNIST, 09

    l 1003000 10010

    l 8.10% 3.33%M1+M2)

  • (2/2)

    l l ADGM[Maaloe+16]1000.96% SVM (RBF)500001.4%

  • VAE

    l

    l

  • GANGenerative Adversarial Net[Goodfellow+14]

    l l Generator

    Discriminator

    l Discriminator Generator

    Generator

    Discriminator

    ?

    1/2

  • GAN

    z

    x = G(z)

    x

    x

    (1) z U(0, I)(2) x = G(z

    p(z)Gaussian

    U

  • GAN

    l D(x) 1, 0

    l DG

    p(z)G(z)dz=P(x), D(x)=1/2

    z

    x'

    x = G(z)

    {1(), 0)}

    y = D(x)

    x

  • GANhttp://www.inference.vc/an-alternative-update-rule-for-generative-adversarial-networks/

    49

    D/

  • GANhttps://github.com/mattya/chainer-DCGAN

    50

  • 2

    51

  • 1

    52

  • 53

  • GAN

    l

    l

  • GAN[Salimans+ 16]

    l GAN 10000x/100%

    20121

  • l p(x) = i p (xi|x1, x2, , xi-1)

    Pixel RNN/CNN [Oord+16a] [Oord+16b], wavenet [Oord+16c]

  • [Kim+16]

    l xE(x)pE(x) = exp(-E(x)) / N

    l p(x)MCMCGANG(x) G(x)pE(x)G(x)

    l p(x)

  • [Li+ 15]

    l p(x)=q(x)(x)Ep(x)[(x)]=Eq(x)[(x)]l xi~p(x), xi~q(x)

    l ((1/n)i(xi) - (1/m) i(xi))2 (x)

    l GAN GANmin max

  • 60

    [Dahl+ 14]

  • Result:

    l

    l D

    61

    Community Learning

    AUC values

    0.93870.94130.92740.89130.9214

  • microRNAbindingDeep Target [Lee+ 2016]

    62/50

    RNA, miRNA

    RNN

  • l

    l

    l

  • l [Lin+ 16] Why does deep and cheap learning work so well?, H. W. Lin, M. Tegmark

    l [Vinnikov+ 14] K-means Recovers ICA Filters when Independent Components are Sparse, ICML 2014, A. Vinnikov, S. S.-Shwartz

    l [Kingma+ 13] Auto-encoding Variational Bayes, D. P. Kingma, M. Welling

    l [Kingma+ 14] Semi-supervised Learning with Deep Generative Models, D. P. Kingma, D. J. Rezende, S. Mohamed, M. Welling

    l [Burda+ 15] Importance weighted autoencoders, Y. Burda, R. Grosse, R. Salakhutdinov

    l [Maaloe+ 16] Auxiliary Deep Generative Models, L. Maaloe, c. K. Sonderby, S. K. Sonderby, O. Winther

    l [Goodfellow+ 14] Gerative Adversarial Networks, I. J. Goodfellow and et. al.

  • l [Salimans+ 16] Improved Techniques for Training GANs, T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen

    l [Oord+ 16a] Pixcel Reucurrent Neural Network, A. Oord. et al.

    l [Oord+ 16b] Conditional Image Generation with PixelCNN Decoders, A. Oord et al.

    l [Oord+ 16c] WaveNet: A Generative Model for Raw Audito, A. Oord et al.

    l [Kim+ 16] Deep Directed Generative Models with Energy-based Probability estimation, T. Kim, Y. Bengio

    l [Li+ 15] Generative Moment Matching Network, Y. Li, K. Swersky, R. Zemel

    l [Dahl+ 14] Multi-task Neural Networks for QSAR Predictions, G. E. Dahl, N. Jaitly, R. alakhutdinov

    l [Lee+ 16] DeepTarget: End-to-end Learning Framework for microRNA Target Prediction using Deep Recurrent Neural Networks, B. Leett, J. Baek, S. Park, S. Yoon

  • l Ql A

  • l l

    1) 2)

  • l by

    l

    l

    l p(x|z)p(z|x) p(z), p(x|z), p(x, z), p(x), p(z|x)

    68/50