Neural Networks, Spark MLlib, Deep Learning

Post on 21-Apr-2017

6.688 views 16 download

Transcript of Neural Networks, Spark MLlib, Deep Learning

NEURAL NETWORKS AND DEEPLEARNING

BY ASIM JALISGALVANIZE

WHO AM I?

ASIM JALISGalvanize/Zipfian, Data EngineeringCloudera, Microso!, SalesforceMS in Computer Science from University of Virginiahttps://www.linkedin.com/in/asimjalis

WHAT IS GALVANIZE’S DATAENGINEERING PROGRAM?

DO YOU WANT TO . . .Play with terabytes of dataBuild data applications using Spark, Hadoop, Hive, Kafka,Storm, HBaseUse Data Science algorithms at scale

WHAT IS INVOLVED?Learn concepts in interactive lecturesDevelop skills in hands-on labsDesign and build your Capstone ProjectShow project to SF tech companies at Hiring Day

FOR MORE INFORMATIONCheck out Talk to me

http://galvanize.com

asim.jalis@galvanize.com

INTRO

WHAT IS THIS TALK ABOUT?What are Neural Networks and how do they work?What is Deep Learning?What is the difference?How can we build neural networks in Apache Spark?

HOW MANY PEOPLE HERE AREFAMILIAR WITH NEURAL

NETWORKS?

HOW MANY PEOPLE HERE AREFAMILIAR WITH CONVOLUTION

NEURAL NETWORKS?

HOW MANY PEOPLE HERE AREFAMILIAR WITH DEEP LEARNING?

HOW MANY PEOPLE HERE AREFAMILIAR WITH APACHE SPARK

AND MLLIB?

NEURAL NETWORKS

WHAT IS A NEURON?

Receives signal on synapseWhen trigger sends signal on axon

Mathematical abstractionInspired by biological neuronEither on or off based on sum of input

Neuron is a mathematical functionAdds up (weighted) inputsApplies the sigmoid functionThis determines if it fires or not

WHAT ARE NEURAL NETWORKS?Biologically inspired machine learning algorithmMathematical neurons arranged in layersAccumulate signals from the previous layerFire when signal reaches threshold

HOW MANY NEURONS SHOULD IHAVE IN MY NETWORK?

HOW MANY INPUT LAYERNEURONS SHOULD WE HAVE?

The number of inputs or features

HOW MANY OUTPUT LAYERNEURONS SHOULD WE HAVE?

The number of classes we are classifying the input into.

HOW MANY HIDDEN LAYERNEURONS SHOULD WE HAVE?

SIMPLEST OPTION IS TO USE 0.

SINGLE LAYER PERCEPTRON

WHAT ARE THE DOWNSIDES OFNO HIDDEN LAYERS?Only works if data is linearly separable.Identical to logistic regression.

MULTILAYER PERCEPTRONFor most realistic classification tasks you will need ahidden layer.Rule of thumb:

Number of hidden layers equals oneNumber of neurons in hidden layer is mean of size ofinput and output layers.

HOW DO WE USE THIS THING?

NEURAL NETWORK WORKFLOWSplit labeled data into train and test setsTrain with labeled dataTest and compare prediction with actual labels

HOW DO WE TRAIN IT?

FEED FORWARDAlso called forward propagation or forward propInitialize inputsWeigh inputs into hidden layer, sum, apply sigmoidCalculate activation of hidden layerWeight inputs into output layer, sum, apply sigmoidCalculate activation of output layer

BACK PROPAGATIONUse forward prop to calculate the errorError is function of all network weightsAdjust weights using gradient descentRepeat with next recordKeep going over training set until convergence

WHAT IS GRADIENT DESCENT?

HOW DO YOU FIND THE MINIMUMIN AN N-DIMENSIONAL SPACE?

Take a step in the steepest direction.Steepest direction is vector sum of all derivatives.

PUTTING ALL THIS TOGETHER

Use forward prop to activateUse back prop to trainThen use forward prop to test

WHY NOT HAVE MULTIPLELAYERS?

DOWNSIDE OF MULTIPLE LAYERSNumber of weights is a product of the layer sizesThe mathematics quickly becomes intractableParticularly when your input is an image with tens ofthousands of pixels

APACHE SPARK MLLIB

WHAT IS SPARK

Framework for processing data across a clusterBy sending the code to the dataAnd executing the code where the data lives

WHAT IS MLLIB?Library for Machine Learning.Builds on top of Spark RDDs.Provides RDDs for Machine Learning.Implements common Machine Learning algorithms.

DEMO USING APACHE TOREE

WHAT IS APACHE TOREE?Like IPython Notebook but for Spark/Scala.Jupyter kernel for Spark/Scala.

HOW CAN I INSTALL TOREE?Use pip to install IPython or Jupyter.Install Apache Spark by downloading tgz file andexpanding.SPARK_HOME=$HOME/spark-1.6.0pip install toreejupyter toree install \--spark_home=$SPARK_HOME

HOW CAN I RUN A TOREENOTEBOOK

jupyter notebookVisit Create new notebook.Set kernel to Toree.sc in notebook should print Spark Context.

http://localhost:8888

NEURAL NETWORKCONSTRUCTION

HOW CAN I FIGURE OUT HOWMANY LAYERS?

To figure out how many layers to use and what topologyto use you have to rely on standard machine learningtechniques.Use cross-validation.In general k-fold cross validation.10-fold cross validation is popular.

WHAT IS 10-FOLD CROSSVALIDATION OR K-FOLD CROSS

VALIDATION?

Split your data into 10 (or in general k) equal-sizedsubsets.Train model on 9 of them, set one aside for cross-validation.Validate model on 10th and remember your error rate.Repeat by setting aside each one of the 10.Average the 10 error rates.Then repeat for the next model.Choose the model with the lowest error rate.

HOW DO I DEPLOY MY NEURALNETWORK INTO PRODUCTION?There are two phases.The training phase can be run on the back-end servers.Cross-validate your model and its hyper-parameters onthe back-end.Then deploy the model to the front-end servers, browsers,devices.The front-end only uses forward prop and is always fast.

DEEP LEARNING

WHAT IS DEEP LEARNING?Deep Learning is a learning method that can train thesystem with more than 2 or 3 non-linear hidden layers.

WHAT IS DEEP LEARNING?Machine learning techniques which enable unsupervisedfeature learning and pattern analysis/classification.The essence of deep learning is to computerepresentations of the data.Higher-level features are defined from lower-level ones.

HOW IS DEEP LEARNINGDIFFERENT FROM REGULAR

NEURAL NETWORKS?Training neural networks requires applying gradientdescent on millions of dimensions.This is intractable for large networks.Deep learning places constraints on neural networks.This allows them to be solvable iteratively.The constraints are generic.

WHAT IS THE BIG DEAL ABOUTIT?

AlexNet submitted to the ImageNet ILSVRC challenge in2012 is partly responsible for the renaissance.Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton usedDeep Learning techniques.They combined this with GPUs, some other techniques.The result was a neural network that could classify imagesof cats and dogs.It had an error 16% compared to 26% for the runner up.

ILYA SUTSKEVER, ALEXKRIZHEVSKY, GEOFFREY HINTON

WHAT ARE THE DIFFERENT KINDSOF DEEP ARCHITECTURES?

GenerativeDiscriminativeHybrid

WHAT ARE GENERATIVEARCHITECTURES

Extract features from dataFind common features in unlabelled dataLike Principal Component AnalysisUnsupervised: no labels required

WHAT ARE DISCRIMINATIVEARCHITECTURES

Classify inputs into classesRequire labelsRequire supervised training

WHAT ARE HYBRIDARCHITECTURES?

STEP 1Combination of generative and discriminativeExtract features using generative networkUse unsupervised learning

STEP 2Train discriminative network on extracted featuresUse supervised learning

WHAT ARE AUTO-ENCODERS?An auto-encoder is a learning algorithm.It applies backpropagation and sets the target values tobe equal to its inputs.In other words it trains itself to do the identitytransformation.

WHY DOES IT DO THIS?By placing constraints on it, like restricting the number ofhidden neurons, it can find a good representation of thedata.

IS THE AUTO-ENCODERSUPERVISED OR UNSUPERVISED?

It is unsupervised.The data is unlabeled.Auto-encoders are similar to PCA (Principal ComponentAnalysis).PCA is a technique for reducing the dimensions of data.

WHAT ARE CONVOLUTIONNEURAL NETWORKS?

Feedforward neural networks.Connection pattern inspired by visual cortex.

CONVOLUTION NEURALNETWORKS

The convolution layer’s parameters are a set of learnablefilters.Every filter is small along width and height.During the forward pass, each filter slides across the widthand height of the input, producing a 2-dimensionalactivation map.As we slide across the input we compute the dot productbetween the filter and the input.

CONVOLUTION NEURALNETWORKS

Intuitively, the network learns filters that activate whenthey see a specific type of feature anywhere.In this way it creates translation invariance.

WHAT IS A POOLING LAYER?The pooling layer reduces the resolution of the imagefurther.It tiles the output area with 2x2 mask and takes themaximum activation value of the area.

DOES SPARK SUPPORT DEEPLEARNING?

Not directly yethttps://issues.apache.org/jira/browse/SPARK-2352

WHAT ARE SOME MAJOR DEEPLEARNING PLATFORMS?

Theano: Low-level GPU-enabled tensor library.Lasagne, Blocks: NN libraries that make Theano easier touse.Torch7: NN library. Uses Lua for binding. Used byFacebook and Google.Caffe: NN library by Berkeley AMPLab.Pylearn2: ML library based on Theano by University ofToronto. Google DeepMind.cuDNN: NN library by Nvidia based on CUDA. Can be usedwith Torch7, Caffe.Chainer: NN library that uses CUDA.TensorFlow: NN library from Google.

WHAT LANGUAGE ARE THESE IN?All the frameworks support Python.Except Torch7 which uses Lua for its binding language.

WHAT CAN I DO ON SPARK?SparkNet: Integrates running Caffe with Spark.Sparkling Water: Integrates H2O with Spark.DeepLearning4J: Built on top of Spark.TensorFlow on Spark (experimental)

QUESTIONS

GALVANIZE DATA ENGINEERING