AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

Deep Learning & MXNet

Name: Jhen-Wei Huang (黃振維)Department: Solutions ArchitectCompany: AWS

Image understanding

Speech recognition

Natural language processing

Autonomy

Deep Learning Appl icat ions

https://trends.google.com/

Deep Learning Applications

Autonomous Dr iv ing SystemsAutonomous Driving Systems

Personal ass is tantsPersonal assistants

Line-free shoppingLine-free shopping

Why It’s Different This Time

Everything is digital: large data sets are availableImagenet: 14M+ labeled images - http://www.image-net.org/

YouTube-8M: 7M+ labeled videos - https://research.google.com/youtube8m/ AWS public data sets - https://aws.amazon.com/public-datasets/

The parallel computing power of GPUs make training possibleSimard et al (2005), Ciresan et al (2011)State of the art networks have hundreds of layers Baidu’s Chinese speech recognition: 4TB of training data, +/- 10 Exaflops

Cloud scalability and elasticity make training affordableGrab a lot of resources for fast training, then release themUsing a DL model is lightweight: you can do it on a Raspberry Pi

Deep Learning ModelsDeep Learning Models

Input Output

1 1 11 0 10 0 0

mx.sym.Convolution(data, kernel=(5,5), num_filter=20)

mx.sym.Pooling(data, pool_type="max", kernel=(2,2),

stride=(2,2)

lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)

4 2 2 0 4=Max

13...4

0.2-0.1...0.7

mx.sym.FullyConnected(data, num_hidden=128)

mx.symbol.Embedding(data, input_dim, output_dim = k)

0.2-0.1...0.7

4 2 2 0 2=Avg

Input Weights

cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)

mx.sym.Activation(data, act_type="xxxx")

"relu"

"tanh"

"sigmoid"

"softrelu"

Neural Art

Face Search

Image Segmentation

Image Caption

“People Riding Bikes”

Bicycle, People, Road, SportImage Labels

Speech

“People Riding Bikes”

Machine Translation

“Οι άνθρωποι ιππασίας ποδήλατα”

Events

mx.model.FeedForward model.fit

mx.sym.SoftmaxOutput

Anatomy of a Deep Learning Model

Neural Art

Deep Learning ModelsDeep Learning Models

Apache MXNetApache MXNet2015 Project created2016 Nov. Be Amazon DL framework of choice2017 Jan. Project enters incubationLatest Version : 0.11Contributors : 436Star : 11,578License : Apache 2.0

Apache MXNet 0.11

- Julien Simon, https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0

https://aws.amazon.com/cn/blogs/ai/bring-machine-learning-to-ios-apps-using-apache-mxnet-and-apple-core-ml/

Apache MXNet 0.11 Released

Apache MXNet for Deep Learning

https://github.com/apache/incubator-mxnet

Apache MXNet for Deep Learning

Apache MXNet for IoT & the EdgeApache MXNet for IoT & the Edge

BlindTool by Joseph Paul Cohen, demo on Nexus 4

• Fit the core library with all dependencies into a single C++ source file

• Easy to compile on any platform

AmalgamationRuns in browser with Javascript

Apache MXNet – Per formance

https://github.com/ilkarman/DeepLearningFrameworks

Apache MXNet - Performance

Mult i -GPU Scal ing With MXNetMulti-GPU Scaling With MXNet

Mult i -Machine Sca l ing With MXNetMulti-Machine Scaling With MXnet

• Cloud Formation with Deep Learning AMI• 16x P2.16xlarge • Mounted on EFS• ImageNet, 1.2M images, 1K classes• 152-layer ResNet , 5.4d on 4xK80s (1.2h per epoch)• 0.22 top-1 validation error• 6h on 128x K80s , achieves the same validation error

Apache MXNet | The Bas icsApache MXNet | The Basics

• NDArray: Manipulate multi-dimensional arrays in a command line paradigm (imperative).

• Symbol: Symbolic expression for neural networks (declarative).• Module: Intermediate-level and high-level interface for neural network

training and inference.• Loading Data: Feeding data into training/inference programs.• Mixed Programming: Training algorithms developed using NDArrays in

concert with Symbols.https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-1-848febdcf8ab

Apache MXNet – NDArr yApache MXNet - NDArray

NumPy• Only CPU• No automatic differentiation

NDArray• Support CPUs/GPUs• Scale to distributed system in

the cloud• Automatic differentiation• Lazy evaluation

Apache MXNet – NDArr y

context = mx.cpu()

context = mx.gpu(0)context = mx.gpu(1)

g = copyto(c)g = c.as_in_context(mx.gpu(0))

Apache MXNet - NDArray

Imperative Programming

import numpy as npa = np.ones(10)b = np.ones(10) * 2c = b * ad = c + 1

• Straightforward and flexible.• Take advantage of language

native features (loop, condition, debugger).

• E.g. Numpy, Matlab, Torch, …

• Hard to optimize

CONSEasy to tweak in Python

Declarative Programming

• More chances for optimization

• Cross different languages• E.g. TensorFlow, Theano,

• Less flexible

CONSC can share memory with D because C is deleted later

A = Variable('A')B = Variable('B')C = B * AD = C + 1f = compile(D)d = f(A=np.ones(10),

B=np.ones(10)*2)

Mixed Programming Paradigm

IMPERATIVE NDARRAY API

DECLARATIVE SYMBOLIC EXECUTOR

>>> import mxnet as mx>>> a = mx.nd.zeros((100, 50))>>> b = mx.nd.ones((100, 50))>>> c = a + b>>> c += 1>>> print(c)

>>> import mxnet as mx>>> net = mx.symbol.Variable('data')>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)>>> net = mx.symbol.SoftmaxOutput(data=net)>>> texec = mx.module.Module(net)>>> texec.forward(data=c)>>> texec.backward()

NDArray can be set as input to the graph

Demo – Training MXNet on MNISThttps://medium.com/@julsimon/training-mxnet-part-1-mnist-6f0dc4210c62 https://github.com/juliensimon/aws/tree/master/mxnet/mnist

Apache MXNet New In ter face – GluonNew Interface - Gluon

Apache MXNet New In ter face – Gluon

net.hybridize()develop and debug models

with imperative programming

and switch to efficient symbolic

execution

New Interface - Gluon

Apache MXNet New In ter face – Gluon

Symbolic Gluon Imperative

• Efficient & Portable

• But hard to use

• Imperative for developing

• Symbolic for deploying• Flexible

• May be slow

New Interface - Gluon

Apache MXNet Gluon 演示 – Fashion-MNISTMXNet Gluon – Fashion-MNIST

https://github.com/zalandoresearch/fashion-mnist

Apache MXNetDeveloper Tools and Resources

Apache MXNet 框架 – AWS 的基础设施

CPU Instance

• C4.8xlarge (36 threads, 60GB RAM, 4Gbit)

• M4.16xlarge (64 threads, 256GB RAM, 10Gbit)

GPU Instance

• P2.16xlarge ( 16X NVIDIA Kepler K80, 64 threads, 732GB RAM, 20Gbit)

• G3.16xlarge (4XNVIDIA Maxwell M60, 64 threads, 488GB RAM, 20Gbit)

• NVIDIA Volta – coming to an instance near you

AWS EC2 Instance

One-Click GPU or CPUDeep Learning

AWS Deep Learning AMI

Up to~40k CUDA coresApache MXNet

TensorFlowTheanoCaffeTorchKeras

Pre-configured CUDA drivers, MKLAnaconda, Python3

Ubuntu and Amazon Linux

+ AWS CloudFormation template

+ Container image

Apache MXNet 框架 – AWS 的基础设施AWS Machine Image for Deep Learning

http://docs.aws.amazon.com/mxnet/latest/dg/appendix-ami-release-notes.html

Apache MXNet 框架 – AWS 的基础设施AWS CloudFormation Template for Deep Learning

https://github.com/awslabs/deeplearning-cfn

• The AWS CloudFormation Deep Learning template uses the Amazon Deep Learning AMI to launch a cluster of EC2 instances and other AWS resources needed to perform distributed deep learning.

Apache MXNet 框架演示 – AWS上的集群图像分类训练

$ssh –A –i xxx.pem ubuntu@xxx.xxx.xxx.xxx$mkdir $EFS_MOUNT/cifar_model/

$cd $EFS_MOUNT/deeplearning-cfn/examples/mxnet/example/image classification/

$ ../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT \-H $DEEPLEARNING_WORKERS_PATH python train_cifar10.py \

--network resnet --num-layers 110 --kv-store dist_device_sync \--model-prefix $EFS_MOUNT/cifar_model/cifar --num-epochs 300 --batch-size 128

The CIFAR-10 dataset consists of 60000 32x32 colour

images in 10 classes, with 6000 images per class. There

are 50000 training images and 10000 test images

Running Distributed Training

Apache MXNet 框架演示 – AWS上的集群图像分类训练

Environment 1 x P2.16xLarge 5 x P2.16xlarge

Execution Time 36,122s = 10.3h 7,632s = 2.12h

Performance Improved 473%

Running Distributed Training

Apache MXNet 框架 – 从源代码编译安装

USE_CUDA=1

USE_CUDNN=1

USE_OPENMP=1

USE_CUDA_PATH=/usr/local/cuda

USE_BLAS=mkl

USE_MKL2017=1

USE_MKL2017_EXPERIMENTAL=1

MKLML_ROOT=/usr/local

USE_INTEL_PATH=/opt/intel

USE_OPENCV=1

USE_S3=1

USE_NVRTC=1

USE_DIST_KVSTORE=1

##K80 3.7,M60 5.2, GTX 1070 6.1

MSHADOW_NVCCFLAGS += -gencode

arch=compute_61,code=sm_61

mxnet/mshadow/make/mshadow.mk

mxnet/make/config.mk

Build and Installation

在Amazon ECS容器服务上部署Apache MXNetDeploy a Deep Learning Framework on Amazon ECS

Deploy MXNet on AWS using Docker containerhttps://github.com/awslabs/ecs-deep-learning-workshop

使用AWS Lambda和MXNet 进行预测Seamlessly Scale Predictions with AWS Lambda and MXNet

Leverage AWS Lambda and MXNet to build a scalable prediction pipeline• https: / /g i thub.com/awslabs/mxnet- lambda• https: / /aws.amazon.com/cn/blogs/compute/seamlessl

y-scale-predict ions-with-aws- lambda-and-mxnet /

https://github.com/dmlc/mxnet-notebooks• Basic concepts

• NDArray - multi-dimensional array computation• Symbol - symbolic expression for neural networks• Module - neural network training and inference

• Applications• MNIST: recognize handwritten digits• Check out the distributed training results• Predict with pre-trained models• LSTMs for sequence learning• Recommender systems• Train a state of the art Computer Vision model (CNN)• Lots more..

Application Examples | Jupyter Notebooks

MXNet Resources:• MXNet Blog Post | AWS Endorsement • Read up on MXNet and Learn More: mxnet.io• MXNet Github Repo • MXNet Recommender Systems Talk | Leo DiracDeveloper Resources:• Deep Learning AMI |Amazon Linux• Deep Learning AMI | Ubuntu • CloudFormation Template Instructions• Deep Learning Benchmark • MXNet on Lambda • MXNet on ECS/Docker• MXNet on Raspberry Pi | Image Detector using Inception Network

Developer Resources

Neural art

https://github.com/apache/incubator-mxnet/tree/master/example/neural-style

Thank you!

AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

Documents

Transcript of AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet

應用學習（高中課程） 2020-22 學年€¦ · 學習範疇／課程組別 創意學習／表演藝術 4. 教學語言 中文 5. 學習成果 完成本科目後，學生應能：

TensorFlow 深度學習快速上手班--機器學習

行動學習手冊教師篇 - ctld.ntnu.edu.tw¡Œ動學習手冊.pdf · 行動學習手冊教師篇 . 何謂行動學習？ 行動學習（Mobile learning) 縮寫為M-Learning、MLearning或mLearning），是數位

Live數學學習網 ─

如何引發學生學習動機 -- 分享與學習

304 學習學研究室

行動 學習與創新教學

新高中「其他學習經歷」及 「學生學習概覽」 簡介

全方位學習活動示例 中學 - Education Bureau...全方位學習活動示例（中學）（2019年10月） 3 學習領域或科目／ 跨學習領域或科目／ 基要學習經歷

TensorFlow 深度學習快速上手班--深度學習

靜宜大學 學生學習檔案 - cge.pu.edu.t · 1 靜宜大學 學生學習檔案 修習學群別：宗教與哲學思維 修習課名：困境與人生 授課教師：賴淑蘭

學習範疇：代數 學習單位： 6A-E1 數型

服務學習不只是「服務 」 更 是一種「學習 」

一個較受忽略的學習障礙 — 數學學習障礙

餐桌禮儀學習 教學篇

教育雲學習拍 2 - 163.28.84.149163.28.84.149/file/學習拍2.0操作手冊-學生.pdf · 教育雲學習拍2.0 學習管理系統操作手冊(學生篇) 2017.12

學習樹 心智圖的學習與應用

學校組織學習、組織創新與 學校效能的關係¸校組織學習、組織創新與學校... · 學校組織學習、組織創新與學校效能的關係 45 學校組織學習、組織創新與

學習障礙生 學習特質與補救教學策略

學習 Android

應用學習（高中課程） 2020-22 學年€¦ · 學習範疇／課程組別創意學習／表演藝術 4. 教學語言中文 5. 學習成果完成本科目後，學生應能：

行動學習手冊教師篇 - ctld.ntnu.edu.tw¡Œ動學習手冊.pdf · 行動學習手冊教師篇 . 何謂行動學習？行動學習（Mobile learning) 縮寫為M-Learning、MLearning或mLearning），是數位

行動學習與創新教學

新高中「其他學習經歷」及「學生學習概覽」簡介

全方位學習活動示例中學 - Education Bureau...全方位學習活動示例（中學）（2019年10月） 3 學習領域或科目／跨學習領域或科目／基要學習經歷

靜宜大學學生學習檔案 - cge.pu.edu.t · 1 靜宜大學學生學習檔案修習學群別：宗教與哲學思維修習課名：困境與人生授課教師：賴淑蘭

學習範疇：代數學習單位： 6A-E1 數型

服務學習不只是「服務」更是一種「學習」

餐桌禮儀學習教學篇

學習樹心智圖的學習與應用

學校組織學習、組織創新與學校效能的關係¸校組織學習、組織創新與學校... · 學校組織學習、組織創新與學校效能的關係 45 學校組織學習、組織創新與

學習障礙生學習特質與補救教學策略