Deep learning on a mixed cluster with deeplearning4j and spark

Post on 16-Apr-2017

175 views 0 download

Transcript of Deep learning on a mixed cluster with deeplearning4j and spark

Deep learning on a mixedcluster with Deeplearning4j

and SparkBarcelona Spark meetup, Dec 9, 2016

(right after NIPS)francois@garillot.net @huitseeker

AgendaIntro

Why Deep Learning on aCluster

Big Data Architecture

Deeplearning4j

Spark challenges

Introduction : Deep Learningin the trenches today

The bad thing about doing atalk right after NIPS

you guys are scary.

The good thing about doing atalk right after NIPS

You guys don't need to be told SkyNet is a fantasy (for now).

Paying algorithmsAnomaly detection in many forms (bad guys / predictivemaintenance / market rally)

Fraud detection

Network intrusion

Fintech secutiries churn prediction

Video object detection (security)

Models that are beingneglected in benchmarks and

implementation efforts

LSTMs

Autoencoders

How to deal with this in theSpark world ?

experiment with trained model application: Tensorframes,

what are the deep learning frameworks that let you train?

Why Deep Learning on acluster ?

Practically ... let's look at benchmarks

Practically ... let's look at benchmarks

Practically ... let's look at benchmarks

Practically ... let's look at benchmarks

Training, but how ?

New Amazon GPU instances

Training, but how ?

Training, but how ?

Cluster training in theenterprise

it's really about multi-tenancy & economies of scale

a big bunch of machines shared among everybody sharesbetter

if only because you can reuse it for other workloads

Minor reasons

enterprises may not haveGPUs

Distributing training

basically distributing SGD (R)

challenge is AllReduce Communication

Sparse updates, asynccommunications

Distributing training : goodengineering matters

Cluster training in your(experimentor) case ?

it's a fun problem : AllReduce

Ultimately solved for people with a large amount of images

that solution is not open-source (but at Facebook, Google,Amazon, Microsoft¹, Baidu)

¹: 1-bit SGD is under non-commercial license in CNTK 2.0

Big Data architecture

With a parameter server

With SparkSpark does the initial ETL

Spark ingests the �nal result

In the middle : parameterserver.

Spark cluster modesMesos GPU support merged

devices cgroups !

YARN GPU support throughtags

Spark Standalone : ?

Deeplearning4j

Deeplearning4jthe �rst commercial-grade, open-source, distributed deep-learning library written for Java and Scala

Skymind its commercial support arm

Scienti�c computing on the JVMlibnd4j : Vectorization, 32-bit addressing, linalg (BLAS!)

JavaCPP: generates JNI bindings to your CPP libs

ND4J : numpy for the JVM, native superfast arrays

Datavec : one-stop interface to an NDArray

DeepLearning4J: orchestration, backprop, layer de�nition

ScalNet: gateway drug, inspired from (and closely following)Keras

RL4J : Reinforcement learning for the JVM

With SparkJavaSparkContent sc = ...; JavaRDD<DataSet> trainingData = ...; MultiLayerConfiguration networkConfig = ...; //Create the TrainingMaster instance int examplesPerDataSetObject = 1; TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObject) .(other configuration options) .build(); //Create the SparkDl4jMultiLayer instance SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster); //Fit the network using the training data: sparkNetwork.fit(trainingData);

Spark Challenges

Even if you don't care about Deeplearning

(from Kazuaki Ishizaki @ IBM Japan)

SPARK-6442 : better linear algebra thanbreeze

ND4J will have sparse representations soon

Even if you don't care about Deeplearning II

Meta-RDDs

Killing the bottlenecksSpark has already changed its networking backend once.

better support for parameters servers and their faulttolerance.

A Last Word (from Andrew Y. Ng)get involved !

don't just read papers, reproduce researchresults

AlsoWe're happy to mentor contributions, and there's a book !

Questions ?