ディープラーニング最近の発展とビジネス応用への課題

47
ディープラーニング最近の進展と ビジネス応への課題 株式会社 Preferred Networks 野健太 [email protected] 2015/5/26 ディープラーニングフォーラム2015 @ベルサール秋原

Transcript of ディープラーニング最近の発展とビジネス応用への課題

1. Preferred Networks [email protected] 2015/5/26 2015 @ 2. 2015/5/262015 2015/6/4 2015/6/10Chainer URL 2 3. (@delta2323_) 2012.3 PFI 2014.10 PFN http://delta2323.github.io 3 4. 4 NNNeural Network DNNDeep Neural Network CNNConvolutional Neural Network RNNRecurrent Neural Network 5. Preferred Networks Preferred InfrastructurePFI2006 l Preferred NetworksPFN2014 l IoT l 5 6. IoT 1000PB/Year 6 7. 7 8. Sense Organize Analyze Act 8 9. (NTT) Cisco Entpreneurs in Residence (EIR) iPS(CiRA)iPS 9 10. (1/2) 10 + (8) () () Deep Learning 11. (2/2) 11 Intel ITPRO EXPO AWARD 2014 UI 12. AI 12 DNN+ CNN+(MCTS) 13. 1 1 14. (QSAR) Quantitative Structure-Activity Relationship MerckQSAR (2012) ** 14 CYP3A4 * QSAR HTS * hERG potassium channels and cardiac arrhythmia, Michael C. Sanguinetti & Martin Tristani-Firouzi, Nature 440, 463-469(23 March 2006) (doi:10.1038/nature04710) Fig. 5 ** Dahl, George E., Navdeep Jaitly, and Ruslan Salakhutdinov. "Multi- task Neural Networks for QSAR Predictions." arXiv preprint arXiv:1406.1231 (2014). *** http://blog.kaggle.com/2012/10/31/merck-competition-results- deep-nn-and-gpus-come-out-to-play/ *** 15. 15 2-3 500-2500/ Dropout Minibatch SGD 10010011010100010 A 1 0 Active!! 1 0 Active!! 0 1 Inactive!! PubChem 19 20 DNN C B 16. * ID 1851 (1a2) 0.926 0.938 0.012 1851 (2c19) 0.897 0.903 0.006 1851 (2c9) 0.889 0.907 0.018 1851 (2d6) 0.863 0.861 -0.002 1851 (3a4) 0.895 0.897 0.002 1915 0.756 0.752 -0.004 2358 0.738 0.751 0.013 463213 0.651 0.676 0.025 463215 0.613 0.654 0.041 488912 0.664 0.816 0.152 488915 0.723 0.873 0.15 488917 0.835 0.894 0.059 488918 0.784 0.842 0.058 ID 492992 0.803 0.829 0.026 504607 0.684 0.67 -0.014 624504 0.871 0.889 0.018 651739 0.79 0.825 0.035 651744 0.886 0.9 0.014 652065 0.793 0.792 -0.001 15/19(78.9%) AUC 16 * Dahl, George E., Navdeep Jaitly, and Ruslan Salakhutdinov. "Multi-task Neural Networks for QSAR Predictions." arXiv preprint Table 2 17. 1 17 18. GPU 1 GPU GPUFAIR () CPUDistBelief, Project Adam, H2O GPUCOTS HPC, Minerva+Parameter Server GPU GPU GPU CPU GPU G G G C G G G C G G G C 18 19. Model ParallelData Parallel Model ParallelNNNN Data ParallelNN Model Parallel Data Parallel Model Parallel Data Parallel 19 20. + + NN 10012 20 Dean, Jeffrey, et al. "Large scale distributed deep networks. Advances in Neural Information Processing Systems. 2012. Figure 2(), Figure 3() 21. Distillation NN NN1 NN2 NN3 0.80 0.65 0.50 0.10 0.25 0.30 0.10 0.10 0.20 Label 1 0 0 Hard Target Soft Targets NN NN NN 21 22. Communication Learning NN(Hard Target) NN (Soft Target) NN >> NN NN1 NN2 NN3 0.90 0.89 0.80 0.05 0.10 0.15 0.05 0.01 0.05 Label 1 0 0 Hard Target Soft Targets 0.80 0.65 0.50 0.10 0.25 0.30 0.10 0.10 0.20 22 23. GPUCommunity Learning 23 PubChem 5 5 Soft target Soft target ~8 TSUBAME 3GPU(K40), 54GB / MPI mshadow 24. Community Learning 24NNModel Parallel 5 24 () () 1 10.5719 1 0.0318 2 5.2267 x 2.022 0.1377 3 3.9455 x 2.679 0.1284 4 2.5978 x 4.070 0.1367 8 1.5417 x 6.857 0.1281 25. Community Learning 5Community Learning 25 ID Community Learning 1851 (1a2) 0.926 0.938 0.9387 1851 (2c19) 0.897 0.903 0.9413 1851 (2c9) 0.889 0.907 0.9274 1851 (2d6) 0.863 0.861 0.8913 1851 (3a4) 0.895 0.897 0.9214 26. KinaseGPCR l l l 26 27. Community Learning Community Learning NNCommunity Learning NN 27 28. TSUBAME H27 28 PubChem 2 100 Soft target Soft target >100 29. 30. 30 Sense Organize Analyze Action S3 06/3 Simple DB 07/12 EBS 08/8 Glacier 12/8 RDS 09/10 Dynamo DB 09/10 Aurora 14/11 EMR 09/4 Red Shift 12/11 Kinesis 13/11 Storage Database Analytics AWS Amazon ML 15/4 31. Sense Organize Analyze Action 31 32. 1. ImageNet*1400 Sports1M**100 32 * Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. () ** Karpathy, Andrej, et al. "Large-scale video classification with convolutional neural networks." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014. () 33. 2. 1980LeCunCNN CNNMaxPooling 2006BengioNN NN 2 1, 2 33 34. 3. EC 36 * (Jubatus) 34 * PFIJubatus http://itpro.nikkeibp.co.jp/article/NEWS/20140212/536349/ 35. 4. 35 V 36. GPU 36 37. Community Learning8 Community Learning 37 38. Chainer http://chainer.org 39. x 1 x N h 1 h H k M k 1 , CNN RNN, LSTM, NTM CNN(LeNet)* RNN LSTM ** 39 * Deep Learning Tutorials http://deeplearning.net/tutorial/lenet.html ** Graves, Alex, et al. "A novel connectionist system for unconstrained handwriting recognition. " Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.5 (2009): 855-868. Figure 11 LSTM=Long Short-term Memory NTM=Neural Turing Machine 40. GoogLeNet, NTM, Recursive Net, LSTM Chainer Caffe 167 2058 GoogleNet (2012)AlexNet*, 7 (2014) GoogLeNet**, 22 40 * ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf ** Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014). 41. Chainer : Python + CUDA (cuDNN) GPU(Model Parallel / Data Parallel) PFI / PFN 6OSS 41 42. Python C++ Lua Python Preferred Networks Inc. BVLC Idiap Research Institute, DeepMind Univ. of Montreal - RNN/LSTM DSL (prototxt) DSL (YAML) LuaJIT 42 Pylearn2 43. Copyright 2014- Preferred Networks All Right Reserved. 44. 45. 45 (1/3) NetNeural Net(NN) Node = NeuronUnit Node LayerNode x 1 x N h 1 h H k M k 1 y M y 1 t M t 1 Forward Backward Net Node Layer 46. (2/3)Layer NodeLayer Y=(WX) W X X WX Y W 46 47. minibatch j (3/3) Epoch 1 Epoch N Epoch 2 Epoch i Epoch i minibatch 1 2 minibatch 2 minibatch M minibatch j 1 B Epoch (Iteration)1 NetN SolverNet minibatchNN 47