Distributed Deep Learning on Spark

28
© 2014 MapR Technologies 1 © 2014 MapR Technologies Distributed Deep Learning on Spark Mathieu Dumoulin - Data Engineer MapR Professional Services APAC

Transcript of Distributed Deep Learning on Spark

Page 1: Distributed Deep Learning on Spark

© 2014 MapR Technologies 1© 2014 MapR Technologies

Distributed Deep Learning on Spark

Mathieu Dumoulin - Data Engineer

MapR Professional Services APAC

Page 2: Distributed Deep Learning on Spark

© 2014 MapR Technologies 2

Tonight’s Presentation FAQ-Style

• Short intro on machine learning• What’s Deep learning?• Why distributed? Why do we need a computer cluster?• Why run it on Spark?• How does it work?

– Case study of SparkNet: Training Deep Networks in Spark– Case Study of CaffeOnSpark

• Can I see a Demo?– Installation Process – Caffe demo– CaffeOnSpark demo

Page 3: Distributed Deep Learning on Spark

© 2014 MapR Technologies 3

Machine Learning is all around us!

• Internet search with Google and Bing• Contextual ads (Adsense)• Apple iOS 9&10 (interesting link with details!)• Google GMail/Inbox (Priority Inbox, Spam filtering)• Fraud Detection • Recommendations (Amazon)• Image recognition (I can see… cats!)• Language Modeling & Speech Recognition (Siri, Google Now,

Google Translate)

Page 4: Distributed Deep Learning on Spark

© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential

Classification of images

Page 5: Distributed Deep Learning on Spark

© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential

Why Deep Learning?

• Because they work really, really well!• Deep learning is the state of the art in applied machine learning

– Wins in every major machine learning competition• Kaggle• ImageNet

• Especially well suited for:– Images (classification, object detection, etc)– Sounds (speech, music)– Text (translation)

• Deep Learning is very CPU intensive– More processing for better models– More processing for faster training

Page 7: Distributed Deep Learning on Spark

© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential

Results are now competitive with humans!

Page 8: Distributed Deep Learning on Spark

© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential

Why Distributed

“training can be time consuming, often requiring multiple days on a single GPU using [SGD]” - Moritz et al - SparkNet

• The most GPU for one physical node is 3-4• A cluster can spread the CPU/GPU load at the cost of increased

complexity• Google coded such software from scratch early 2010.

Page 9: Distributed Deep Learning on Spark

© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential

How to Distribute: Parameter Server

• Li et al propose the “Parameter Server” approach in 2014– https://www.cs.cmu.edu/~dga/papers/osdi14-paper-li_mu.pdf

From Arimo’s Distributed TensorFlow blog post (link)

Page 10: Distributed Deep Learning on Spark

© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential

Why Spark?

• Integrates well with existing “big data” batch processing frameworks (Hadoop/MapReduce)

• Allows data to be kept in memory from start to finish• Work with a single computational framework

• Relatively easy to implement parameter server

Page 11: Distributed Deep Learning on Spark

© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential

New frameworks for spark-based Distributed DL

• CaffeOnSpark (Yahoo America)• SparkNet (Berkeley University’s Amplab)• DeepLearning4J (Skymind)• Elephas (Keras team) • Distributed Tensor Flow (Arimo)

Page 12: Distributed Deep Learning on Spark

© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential

SparkNet implementation

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 13: Distributed Deep Learning on Spark

© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential

SparkNet implementation 2

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 14: Distributed Deep Learning on Spark

© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential

SparkNet implementation 3

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 15: Distributed Deep Learning on Spark

© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential

We need a Solver: Caffe● (+) Good for feedforward networks and image processing

● (+) Good for finetuning existing networks

● (+) Train models without writing any code

● (+) Python interface is pretty useful

● (-) Need to write C++ / CUDA for new GPU layers

● (-) Not good for recurrent networks

● (-) Cumbersome for big networks (GoogLeNet, ResNet)

● (-) Not extensible, bit of a hairball

● (-) No commercial support

taken from: http://deeplearning4j.org/compare-dl4j-torch7-pylearn.html#caffe

Page 16: Distributed Deep Learning on Spark

© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential

Distributed SGD and Parameter Server

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 17: Distributed Deep Learning on Spark

© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential

SparkNet’s implementation of DSGD

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 18: Distributed Deep Learning on Spark

© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential

Benefits of the approach

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 19: Distributed Deep Learning on Spark

© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential

Scaling performance of SparkNet

From: https://arxiv.org/pdf/1511.06051v4.pdf

Page 20: Distributed Deep Learning on Spark

© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential

CaffeOnSpark

• Mix Java and Scala implementation• Developed and used in production at Yahoo America• Much easier to install than SparkNet, less buggy• Can take advantage of Infiniband network• Enhanced Caffe to use multi-GPU• CaffeOnSpark executors communicate to each other via MPI

allreduce style interface• Spark+MPI architecture achieves similar performance as

dedicated deep learning clusters– Peer-to-peer parameter server

• Faster than SparkNet

Page 21: Distributed Deep Learning on Spark

© 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential

CaffeOnSpark System Architecture

From: http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop

Page 22: Distributed Deep Learning on Spark

© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential

CaffeOnSpark vs. SparkNet

• Much faster communication between nodes (Infiniband capability)

• Peer-to-peer parameter exchange model is a much faster implementation

• Enhanced multi-GPU Caffe also faster

Page 23: Distributed Deep Learning on Spark

© 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential

Comparison of Frameworks (Spark Summit 2016)

By Yu Cao (EMC) and Zhe Dong (EMC) (Slideshare)

Page 24: Distributed Deep Learning on Spark

© 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential

Benchmark 2

By Yu Cao (EMC) and Zhe Dong (EMC) (Slideshare)

Page 25: Distributed Deep Learning on Spark

© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential

Installing CaffeOnSpark

• I recommend Centos 7 or Ubuntu 14+ • Process is very “touchy”, easy to mess up• Go step by step!

Process:1. Update the OS and kernel, install dev tools (gcc, etc.) reboot

a. Disable “nouveau” driver!!!2. Install NVidia Drivers latest, Cuda 7.5, cuDNN 43. Install Caffe

a. Install all caffe dependencies, make sure it compiles and examples run.

4. Install CaffeOnSpark

Page 26: Distributed Deep Learning on Spark

© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential

Installing Caffe

Good tutorials are quite few!• Ubuntu works more “out of the box” the default paths are all

correct• Centos7: a few changes are needed but it’s still OK

The caffe web site instructions for Centos are a bit outdated.

Page 27: Distributed Deep Learning on Spark

© 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential

Demos

• Running an example on Caffe– Caffe deep network description files– MNIST example

• Running an example with CaffeOnSpark– MNIST example– running on YARN/Spark Standalone

Page 28: Distributed Deep Learning on Spark

© 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies

Q&A time