A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ......
Transcript of A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ......
![Page 1: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/1.jpg)
APowerful,Flexible,andIntui5veDeepLearningFramework�
@NVIDIAGTC,April6th,2016 �
ShoheiHido
ChiefResearchOfficer
PreferredNetworks,Inc.
![Page 2: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/2.jpg)
Overview
l ChainerisaPython-baseddeeplearningframework
l Chainerv1.0wasreleasedasanopensourceonJune2015
l ItDOESN’TrelyonTheano,unlikeotherPythonframeworks
l ChainerusesauniqueschemenamedDefine-by-Run
http://chainer.org/
l WhydouserssOllneedanotherframework?
l HowdifferentandeffecOveChaineris?
2
![Page 3: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/3.jpg)
Preferred Networks (PFN) A startup that applies deep learning to industrial IoT
l Founded: March 2014
l Headquarter: Tokyo, Japan
l U.S. Subsidiary: San Mateo, California
l Company size: 35 engineers & researchers
l Investors: Toyota, FANUC, NTT
Deep learning Industrial IoT
3
Manufacturing
Automotive
Healthcare
![Page 4: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/4.jpg)
Partnering with world-leading companies using Chainer
l R&DcollaboraOononindustrialproblemswithreal-worlddata Specificrequirements,modifiedalgorithms,manytrialsanderrors,etc
Differentfrommakinggeneral-purposerecogniOonsystem
4
Toyota FANUC
Panasonic
NTT
Cisco NVIDIA
![Page 5: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/5.jpg)
Two types of background behind DL frameworks
1.Scalability-oriented
l Use-casesinmind
Image/speechrecogniOonsystem
FastDLasaserviceincloud
l Problemtype
AfewgeneralapplicaOons
10+milliontrainingsamples
10+nodesclusterw/fastnetwork
l PossibleboZleneck
Tuningofwell-knownalgorithms
DistributedcomputaOonformodel/data-paralleltraining
2.Flexibility-oriented
l Use-casesinmind
Algorithmresearch
R&Dprojectsfornewproducts
l Problemtype
VariousspecificapplicaOons
10+ktrainingsamples
1nodewithmulOpleGPUs
l PossibleboZleneck
Trial-and-errorinprototyping
Debugging,profiling&refactoring
(waitOmeduringcompilaOon)
![Page 6: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/6.jpg)
Designed for efficient research & development
l Flexible:newkindsofcomplexmodelsforvariousapplicaOons
l IntuiOve:rapidprototypingandefficienttrial-and-error
l Powerful:comparableperformancefor1node&mulO-GPUs
6
Scalability-oriented Flexibility-oriented
![Page 7: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/7.jpg)
Agenda
l Deeplearningframeworkbasics
l IntroducOontoChainer
l CuPy:NumPy-compaObleGPUlibrary
l PerformanceandapplicaOons
7
![Page 8: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/8.jpg)
Neural network and computation
x1
xN
・・
h1
hH
・・・・
kM
k1
yM
y1
Forward computation
Backward computation (backpropagation)
・・
・・
Input Hidden units OutputText
Image
Sensor
Object: Tulip
Anomaly score: 0.35
Category: Sports
・・
・・
・・
8
![Page 9: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/9.jpg)
Chainer focuses on network representation/training
l Designchoicesfordeeplearningframeworks
Howtobuildneuralnetworks?
Howtotrainneuralnetworks?
Whichtextformat/languageformodeling?
WhichlanguageforcompuOng?
RunwithGPU?
RunonmulOpleGPUs?
RunonmulOplecomputenodes?
9
![Page 10: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/10.jpg)
Building and training neural networks: Computational graph construction is the key
1. ConstructacomputaOonalgraph
BasedonnetworkdefiniOongivenbyusers
ChainsoffuncOonsandoperaOonsoninputvariables
2. Computelossandgradients
ForwardcomputaOontocalculatelossforaminibatch
BackpropagaOongivesgradientstoallofparameters
3. OpOmizemodel
Updateeachparameterwiththegradient
RepeatunOlconvergence
Step 1. is the most important and there are many approaches
10
![Page 11: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/11.jpg)
Building blocks
l ThesefuncOonaliOesareverysimilarbetweenframeworks
l Butthestructure,abstracOonlevel,andinterfacearedifferent
l Itcomestothedesignofdomain-specificlanguageforNN
Array data structure (vector/matrix/tensor)
Operations & functions
Network (computational graph)
Optimizer (SGD/AdaGrad/Adam)
11
![Page 12: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/12.jpg)
Types of domain-specific language for neural networks
l TextDSL Ex.Caffe(prototxt)
Ex.CNTK(NDL)
l Symbolicprogram OperaOons
onsymbols
Ex.Theano
Ex.TensorFlow
l ImperaOveprogram DirectcomputaOons
onrawdataarrays
Ex.Torch.nn
Ex.Chainer
#SymbolicdefiniOonA=Variable(‘A’)B=Variable(‘B’)C=B*AD=C+Constant(1)#Compilef=compile(D)d=f(A=np.ones(10),B=np.ones(10)*2)
#ImperaOvedeclaraOona=np.ones(10)b=np.ones(10)*2c=b*ad=c+1
%%DefiniOonintextf:{“A”:“Variable”,“B”:“Variable”,“C”:[“B”,“*”,“A”],“ret”:[“C”,“+”,1]}
#Compilef=compile(“f.txt”)d=f(A=np.ones(10),B=np.ones(10)*2)
12
Ex. MXNet
![Page 13: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/13.jpg)
Comparison of DSL type
DSLtype Pros. Cons.
TextDSL
• Human-readabledefiniOon• Non-programmercaneasily
editthenetwork
• Usersmuststudytheformat• Formatmighthavetobe
extendedfornewalgorithms
InternalDSL Symbolic
• StaOcanalysisatcompile• OpOmizaOonbeforetraining• Easytoparallelize
• Usersmuststudyspecialsyntax• Mayneedmoreeffortsto
implementnewalgorithms
ImperaOve
• Lesseffortstolearnsyntax• Easydebuggingandprofiling• Suitablefornewalgorithms
withcomplexlogic
• HardtoopOmizeinadvance• Lessefficientinmemory
allocaOonandparallelizaOon
ChainerisattheextremeendofimperaOveprogramforhighflexibility
13
![Page 14: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/14.jpg)
Agenda
l Deeplearningframeworkbasics
l IntroducOontoChainer
l CuPy:NumPy-compaObleGPUlibrary
l PerformanceandapplicaOons
14
![Page 15: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/15.jpg)
Chainer as an open-source project
l hZps://github.com/pfnet/chainerl 50contributors
l 1,277stars&255fork
l 3,708commits
l AcOvedevelopment&releaseforlast10months v1.0.0(June2015)tov1.7.2(March2016)
15
Original developerSeiya Tokui
![Page 16: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/16.jpg)
CuPy
Chainer software stack
CPU NVIDIAGPU
CUDA
cuDNN
BLAS
NumPy
Chainer
l ChainerisbuiltontopofNumPyandCUDA
l CuPyisalsointroducedasanequivalentofNumPyonGPU
16
![Page 17: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/17.jpg)
Run
Define
Graph build scheme (1/2) - Define-and-Run: Most of frameworks use this scheme (Chainer does not)
l Define:buildacomputaOonalgraphbasedondefiniOon
l Run:updatethemodel(parameters)usingtrainingdataset
NetworkdefiniOon
ComputaOonalgraph
GradientfuncOon
Parameters
ComputaOonalgraph
GradientfuncOon
Parameters
Trainingdata
Update
Loss&gradient
AutodifferenOaOon
17
![Page 18: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/18.jpg)
Define-by-Run
Graph build scheme (2/2) - Define-by-Run: Computational graph construction on the fly
l Nographisconstructedbeforetraining
l Instead,thegraphisbuiltateachforwardcomputaOon
l ComputaOonalgraphcanbemodifieddynamicallyforeachiteraOon/sampleordependingonsomecondiOons
ModeldefiniOon
ComputaOonalgraph
GradientfuncOon
Parameters
Trainingdata
Update
DynamicchangeCondiOons
18
![Page 19: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/19.jpg)
Define-by-Run example: MLP for MNIST
l OnlytransformaOonsbetweenunitsaresetbeforetraining
l ConnecOonisgivenasforwardcomputaOon
l1 = Linear(784, n_units) l2 = Linear(n_units, 10))
Linear l2
Linear l1 x yh1
W bias
059
W bias
ReLU
def forward(x): h1 = ReLU(l1(x)) return l2(h1)
19
![Page 20: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/20.jpg)
Define-by-Run: An interpreted language for neural network
l Idea ForwardcomputaOonactuallygoesthroughcomputaOonalgraph
Byrememberingthehistory,theactualgraphcanbeobtained
l Advantage Flexibilityfornewalgorithmswithcomplexcomponents
u Ex.recurrent,recursive,aZenOon,memory,adversarial,etc
IntuiOvecodingwithhighlyimperaOvenetworkdefiniOon
u Ex.stochasOcnetworkofwhichgraphchangesforeachiteraOon
l Currentdrawbacks GraphisgeneratedeveryOmealsoforfixednetworks
NoopOmizaOonevenforstaOcpartofgraphs
u JIT-likeanalysisandsubgraphcachemightbeuseful20
![Page 21: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/21.jpg)
Basic components (1/2): Variable and Function
l Variable Variablewrapsarrays(.data)
ItremembersparentfuncOon(.creator)
Itwillbeassignedgradient(.grad)
ItkeepstrackofnotonlydatabutalsocomputaOons
l FuncOon TransformaOonbetweenVariable
Stateless
e.g.sigmoid,tanh,ReLU,maxpooling,dropout
Function
x y
Variable
x yh1059
21
![Page 22: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/22.jpg)
Chain (MLP2)
Basic components (2/2): Link and Chain
l Link=funcOonwithstate ParametersarealsoVariable
andgradientswillbeassigned
e.g.Linear(fully-connected),LSTMConvoluOon2d,word-embedding
l Chain=network ChainhasasetofchildLink
ForwardcomputaOonisdefinedin.__call__()
e.g.MLP2,AlexNet,GoogLeNet,RNNLM,seq2seq,
Link (Linear)
y=f(W*x+b)
x y
W b
Linear l2
Linear l1 �� y�h1 �
W� bias�
W� bias�
ReLU
22
![Page 23: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/23.jpg)
Backpropagation through computational graph
l ConsideranobjecOve(Link.Linear):L = f(x * w + b)
l ThiscomputesthevalueofLinforwardcomputaOon,andsimultaneouslybuildsthefollowingcomputaOonalgraph
l ThegradientofLcanbecomputedwithrespecttoanyvariablesbybackpropagaOon
l ThentheopOmizerupdatesthevalueofparameters
* x
W
+
b
f L
isVariableisFuncOon
23
![Page 24: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/24.jpg)
Code sample (1/4): Multi-layer perceptron
class MLP2(Chain): def __init__(self): super(MLP2, self).__init__( l1=L.Linear(784, 100), l2=L.Linear(100, 10), ) def __call__(self, x): h1 = F.relu(self.l1(x)) y = self.l2(h1) return y class Classifier(Chain): def __init__(self, predictor): super(Classifier, self). __init__(predictor=predictor) def __call__(self, x, t): y = self.predictor(x) self.accuracy = F.accuracy(y, t) self.loss = F.softmax_cross_entropy(y, t) return self.loss, self.accuracy
# Model and optimizer setup model = Classifier(MLP2()) optimizer = optimizers.SGD() optimizer.setup(model) # training loop with minibatch for i in range(0, datasize, batchsize): x = Variable(x_tr[i:i+batchsize]) t = Variable(y_tr[i:i+batchsize]) model.zerograds() loss, acc = model(x, t) loss.backward() optimizer.update()
Chain (MLP2)
Linear l2
Linear l1 �� y�h1 �
W� bias�
W� bias�
ReLU
24
![Page 25: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/25.jpg)
Code sample (2/4): Convolutional neural networkclass AlexNet(Chain): def __init__(self): super(AlexNet, self).__init__( conv1=L.Convolution2D(3, 96, 11, stride=4), conv2=L.Convolution2D(96, 256, 5, pad=2), conv3=L.Convolution2D(256, 384, 3, pad=1), conv4=L.Convolution2D(384, 384, 3, pad=1), conv5=L.Convolution2D(384, 256, 3, pad=1), fc6=L.Linear(9216, 4096), fc7=L.Linear(4096, 4096), fc8=L.Linear(4096, 1000), ) def __call__(self, x, t): h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv1(x))), 3, stride=2) h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv2(h))), 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv4(h)) h = F.max_pooling_2d(F.relu(self.conv5(h)), 3, stride=2) h = F.dropout(F.relu(self.fc6(h)), train=self.train) h = F.dropout(F.relu(self.fc7(h)), train=self.train) y = self.fc8(h) return y
* ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf
conv2d
conv2d
conv2d
conv2d
conv2d
linear
linear
25
linear
![Page 26: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/26.jpg)
Code sample (3/4): Recurrent neural network
class SimpleRNN(Chain): def __init__(self, n_vocab, n_units): super(SimpleRNN, self).__init__( embed=L.EmbedID(n_vocab, n_units) x2h=L.Linear(n_units, n_units), h2h=L.Linear(n_units, n_units), h2y=L.Linear(n_units, n_vocab),) self.h = None def __call__(self, x): y, h_new = self.fwd_one_step(x, self.h) self.h = h_new return y def fwd_one_step(self, x, h): x = F.tanh(self.embed(x)) if h is None: h = F.tanh(self.x2h(x)) else: h = F.tanh(self.x2h(x) + self.h2h(h)) y = F.softmax(self.h2y(h)) return y, h
x_1 h y_1
x_2 h y_2
x_3 h y_3
x_4 h y_4
BPTTlength=3
Inputword OutputRecurrentstate
# Truncated BPTT (length=3) for i in range(0, datasize, batchsize): ... accum_loss += model(x, t) if i % bptt_length == 0: model.zerograds() accum_loss.backward() accum_loss.unchain_backward() optimizer.update()
26
![Page 27: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/27.jpg)
Code sample (4/4): Deep Networks with Stochastic Depth A paper published on arXiv, March 30, 2016
l AvariantofResidualNetthatskipsconnecOonsstochasOcally OutperformedtheoriginalResidualNet(ImageNet2015winner,MSR)
StochasOcskip:
Taken from http://arxiv.org/abs/1603.09382v2G. Huang et al.
# Mock code in Chainer
class StochasticResNet(Chain):
def __init__(self, prob, size, …):
super(StochasticResNet, size, …).__init__(
## Define f[i] as same for Residual Net )
self.p = prob # Survival probabilities
def __call__(self, h):
for i in range(self.size): b = numpy.random.binomial(1, self.p[i])
c = self.f[i](h) + h if b == 1 else h
h = F.relu(c)
return h
w/ survival probability:
27
![Page 28: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/28.jpg)
Miscellaneous
l Otherfeatures Installwithpipinoneline:
MulO-GPUsupportbyexplicitlyselecOngtheIDtouse
Pre-trainedCaffemodelimportfromModelZoo
ModelserializaOon&save&load:HDF5orNumPynpz
l FuturedirecOon(notonlyforChainer) JIT-likeopOmizaOonduringDefine-by-Run
MemoryconsumpOonreducOon(GPUmemoryissOllsmall)
Handlingvariable-lengthinputswithoutminibatch
MaximizingperformanceonmulO-node&mulO-GPUenvironment
$ pip install chainer
28
![Page 29: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/29.jpg)
Agenda
l Deeplearningframeworkbasics
l IntroducOontoChainer
l CuPy:NumPy-compaObleGPUlibrary
l PerformanceandapplicaOons
29
![Page 30: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/30.jpg)
CuPy: (partially-)NumPy-compatible GPU library
l MoOvaOon:NumPy+CUDA=CuPy NumPyisthestandardlibraryinPythonfornumericalcomputaOon
CUDAisthestandardAPIsforusingGPUforhigh-performance
Unfortunately,NumPydoesNOTworkwithCUDA
l CuPysupports: FastcomputaOonusingNVIDIA’scuBLASandcuDNN
Arrayindexing,slicing,transpose,andreshape
MostofoperaOons/funcOonsinNumPy
u Chainerv1.7.2alreadysupportsmorethan170funcOons
User-definedfuncOonsandkernels
alldtypes,broadcasOng,memorypool,etc30
![Page 31: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/31.jpg)
How to use CuPy
l UsageofCuPy:justreplaceNumPy withCuPy
l Conversionbetweennumpy.ndarrayandcupy.ndarray
l Ex.CPU/GPU-agnosOclogsumexpfuncOon def logsumexp(x, axis=None): xp = cuda.get_array_module(x) #Get CuPy or NumPy x_max = x.max(axis) exp_sum = xp.exp(x - x_max).sum(axis) return x_max + xp.log(exp_sum)
import numpy, cupy enable_cupy = True xp = cupy if enable_cupy else numpy
w_c = cupy.asarray(numpy.ones(10)) # cupy.ndarray w_n = cupy.asnumpy(cupy.ones(10)) # numpy.ndarray
31
![Page 32: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/32.jpg)
CuPy implementation: Optimized for performance & NumPy-compatibility
l UseCythonforcupy.core&cupy.cuda
l DynamiccodegeneraOon&compile CUDAcodeisgeneratedforspecifictensordimension&datatype
On-the-flycompilebynvccandbinarycache(fasterawer1stuse)
CUDAlibraries(cuBLAS,cuRAND,cuDNN)�
ndarray�
ufunc,elementwise,reduc5on
CUDAPythonwrapper � cupy.cuda
cupy.core
Tensoropera5ons&func5ons � cupy
32
![Page 33: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/33.jpg)
CuPy performance on linear algebra: 5 to 25 times faster than NumPy
def test(xp): a = xp.arange(1000000).reshape(1000, -1) return a.T * 2 test(numpy) t1 = datetime.datetime.now() for i in range(1000): test(numpy) t2 = datetime.datetime.now() print(t2 -t1) test(cupy) t1 = datetime.datetime.now() for i in range(1000): test(cupy) t2 = datetime.datetime.now() print(t2 -t1)
msec � speedup�
NumPy 2,929 1.0CuPy 585 5.0CuPy+MemoryPool
123 23.8
[email protected],32GB,GeForceGTX970
33
![Page 34: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/34.jpg)
Use CuPy for GPU-based computation
l SupportthreepaZernsaswrappers ElementwiseKernel:forelement-wisecomputaOon
ReducOonKernel:forreduceoperaOonalongaxis
ufunc:universalfuncOonasinNumpy
l Ex.definiOonofanelement-wisefuncOon
l Usage(automaOcbroadcastandtypecheckaresupported)
squared_diff = cupy.ElementwiseKernel( ‘float32 x, float32 y’, # Input
‘float32 z’, # Output
‘z = (x - y) * (x - y)’, # Operation
‘squared_diff’) # Name
squared_diff(cupy.arange(10), 10)
34
![Page 35: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/35.jpg)
Agenda
l Deeplearningframeworkbasics
l IntroducOontoChainer
l CuPy:NumPy-compaObleGPUlibrary
l PerformanceandapplicaOons
35
![Page 36: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/36.jpg)
Public benchmark results (CNN): Chainer shows comparable performance
l ForwardcomputaOonisalmostthesamewithTensorFlow
l TrainingwithbackwardcomputaOonisslower,butitcanbeoffsetbynocompilaOonOmewhiledebugging/tuning
0
200
400
600
800
1000
1200
AlexNet GoogLeNet VGG-A OverFeat
TorchTensorFlowChainerCaffe(naCve)
0
200
400
600
800
1000
1200
AlexNet GoogLeNet VGG-A OverFeat
TorchTensorFlowChainerCaffe(naCve)
Forward computation (msec) Backward computation (msec)
Taken from https://github.com/soumith/convnet-benchmarks, using cuDNN except Caffe 36
![Page 37: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/37.jpg)
Chainer can benefit from latest CUDA libraries: Ex. Winograd algorithm in cuDNN v5
l Conv3x3iscommoninCNNs&nowcomputedwithWinograd
l State-of-the-artCNNmodels(e.g.,GoogLeNet,VGG-A)canbeacceleratedupto2.0xattestOme(forwardonly)
0
100
200
300
400
500
600
AlexNet GoogLeNet VGG-A OverFeat
cuDNNv4cuDNNv5
0
100
200
300
400
500
600
AlexNet GoogLeNet VGG-A OverFeat
cuDNNv4cuDNNv5
Forward computation (msec) Backward computation (msec)
Independently measured by a modified version of soumith/convnet-benchmarkscuDNN v5 can be used in Chainer v1.8.0 37
![Page 38: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/38.jpg)
Algorithm implementation in Chainer: A Neural Algorithm of Artistic Style (Gatys et al., 2015)
l hZps://github.com/maZya/chainer-gogh
Contentimage (cat)
Style image
New artistic image
+ =
Main code (45 lines) 38
![Page 39: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/39.jpg)
l ManycollaboraOonsareon-goingw/Chainer-basedcomputervision,deepreinforcementlearning,etc…
l Ex.1Chainer-controlledtoycarsinToyotaboothatCES2016
l Ex.2HighlyaccurateFANUC’sbin-pickingrobotatIREX2015 8hourstrainingtoreachexpert-level,commercializaOonby2016end
Chainer in industry: Used in demonstrations & being commercialized
http://tinyurl.com/pfn-irex15http://tinyurl.com/pfn-ces16
39
![Page 40: A Powerful, Flexible, and Intui5ve Deep Learning Framework€¦ · Toyota FANUC Panasonic NTT ... Computational graph construction is the key 1. Construct a computaonal graph ...](https://reader033.fdocuments.net/reader033/viewer/2022042921/5f6bca6a724bfb6f2b093d58/html5/thumbnails/40.jpg)
Summary
l ChainerisaPython-baseddeeplearningframeworkwithdynamicnetworkconstrucOonschemeandCuPy
l ItisdesignedforefficientresearchandprototypingwhilekeepingcomparableperformancethankstoNVIDIAGPU
l Officialweb:hZp://chainer.org/
l Github:hZps://github.com/pfnet/chainer
YourcontribuOonswillbeappreciated&wearehiring!
40