Neural Networks, Spark MLlib, Deep Learning

Author
asimjalis 
Category
Data & Analytics

view
6.673 
download
16
Embed Size (px)
Transcript of Neural Networks, Spark MLlib, Deep Learning

NEURAL NETWORKS AND DEEPLEARNING
BY ASIM JALISGALVANIZE

WHO AM I?

ASIM JALISGalvanize/Zipfian, Data EngineeringCloudera, Microso!, SalesforceMS in Computer Science from University of Virginiahttps://www.linkedin.com/in/asimjalis
https://www.linkedin.com/in/asimjalis

WHAT IS GALVANIZES DATAENGINEERING PROGRAM?

DO YOU WANT TO . . .Play with terabytes of dataBuild data applications using Spark, Hadoop, Hive, Kafka,Storm, HBaseUse Data Science algorithms at scale

WHAT IS INVOLVED?Learn concepts in interactive lecturesDevelop skills in handson labsDesign and build your Capstone ProjectShow project to SF tech companies at Hiring Day

FOR MORE INFORMATIONCheck out Talk to me
http://galvanize.com
http://galvanize.com/file:///Users/asimjalis/git/talks/neuralnetworks/[email protected]

INTRO

WHAT IS THIS TALK ABOUT?What are Neural Networks and how do they work?What is Deep Learning?What is the difference?How can we build neural networks in Apache Spark?

HOW MANY PEOPLE HERE AREFAMILIAR WITH NEURAL
NETWORKS?

HOW MANY PEOPLE HERE AREFAMILIAR WITH CONVOLUTION
NEURAL NETWORKS?

HOW MANY PEOPLE HERE AREFAMILIAR WITH DEEP LEARNING?

HOW MANY PEOPLE HERE AREFAMILIAR WITH APACHE SPARK
AND MLLIB?

NEURAL NETWORKS

WHAT IS A NEURON?

Receives signal on synapseWhen trigger sends signal on axon

Mathematical abstractionInspired by biological neuronEither on or off based on sum of input

Neuron is a mathematical functionAdds up (weighted) inputsApplies the sigmoid functionThis determines if it fires or not

WHAT ARE NEURAL NETWORKS?Biologically inspired machine learning algorithmMathematical neurons arranged in layersAccumulate signals from the previous layerFire when signal reaches threshold

HOW MANY NEURONS SHOULD IHAVE IN MY NETWORK?

HOW MANY INPUT LAYERNEURONS SHOULD WE HAVE?

The number of inputs or features

HOW MANY OUTPUT LAYERNEURONS SHOULD WE HAVE?

The number of classes we are classifying the input into.

HOW MANY HIDDEN LAYERNEURONS SHOULD WE HAVE?

SIMPLEST OPTION IS TO USE 0.

SINGLE LAYER PERCEPTRON

WHAT ARE THE DOWNSIDES OFNO HIDDEN LAYERS?Only works if data is linearly separable.Identical to logistic regression.

MULTILAYER PERCEPTRONFor most realistic classification tasks you will need ahidden layer.Rule of thumb:
Number of hidden layers equals oneNumber of neurons in hidden layer is mean of size ofinput and output layers.

HOW DO WE USE THIS THING?

NEURAL NETWORK WORKFLOWSplit labeled data into train and test setsTrain with labeled dataTest and compare prediction with actual labels

HOW DO WE TRAIN IT?

FEED FORWARDAlso called forward propagation or forward propInitialize inputsWeigh inputs into hidden layer, sum, apply sigmoidCalculate activation of hidden layerWeight inputs into output layer, sum, apply sigmoidCalculate activation of output layer

BACK PROPAGATIONUse forward prop to calculate the errorError is function of all network weightsAdjust weights using gradient descentRepeat with next recordKeep going over training set until convergence

WHAT IS GRADIENT DESCENT?

HOW DO YOU FIND THE MINIMUMIN AN NDIMENSIONAL SPACE?
Take a step in the steepest direction.Steepest direction is vector sum of all derivatives.

PUTTING ALL THIS TOGETHER

Use forward prop to activateUse back prop to trainThen use forward prop to test

WHY NOT HAVE MULTIPLELAYERS?

DOWNSIDE OF MULTIPLE LAYERSNumber of weights is a product of the layer sizesThe mathematics quickly becomes intractableParticularly when your input is an image with tens ofthousands of pixels

APACHE SPARK MLLIB

WHAT IS SPARK

Framework for processing data across a clusterBy sending the code to the dataAnd executing the code where the data lives

WHAT IS MLLIB?Library for Machine Learning.Builds on top of Spark RDDs.Provides RDDs for Machine Learning.Implements common Machine Learning algorithms.

DEMO USING APACHE TOREE

WHAT IS APACHE TOREE?Like IPython Notebook but for Spark/Scala.Jupyter kernel for Spark/Scala.

HOW CAN I INSTALL TOREE?Use pip to install IPython or Jupyter.Install Apache Spark by downloading tgz file andexpanding.SPARK_HOME=$HOME/spark1.6.0pip install toreejupyter toree install \spark_home=$SPARK_HOME

HOW CAN I RUN A TOREENOTEBOOK
jupyter notebookVisit Create new notebook.Set kernel to Toree.sc in notebook should print Spark Context.
http://localhost:8888
http://localhost:8888/

NEURAL NETWORKCONSTRUCTION

HOW CAN I FIGURE OUT HOWMANY LAYERS?
To figure out how many layers to use and what topologyto use you have to rely on standard machine learningtechniques.Use crossvalidation.In general kfold cross validation.10fold cross validation is popular.

WHAT IS 10FOLD CROSSVALIDATION OR KFOLD CROSS
VALIDATION?

Split your data into 10 (or in general k) equalsizedsubsets.Train model on 9 of them, set one aside for crossvalidation.Validate model on 10th and remember your error rate.Repeat by setting aside each one of the 10.Average the 10 error rates.Then repeat for the next model.Choose the model with the lowest error rate.

HOW DO I DEPLOY MY NEURALNETWORK INTO PRODUCTION?There are two phases.The training phase can be run on the backend servers.Crossvalidate your model and its hyperparameters onthe backend.Then deploy the model to the frontend servers, browsers,devices.The frontend only uses forward prop and is always fast.

DEEP LEARNING

WHAT IS DEEP LEARNING?Deep Learning is a learning method that can train thesystem with more than 2 or 3 nonlinear hidden layers.

WHAT IS DEEP LEARNING?Machine learning techniques which enable unsupervisedfeature learning and pattern analysis/classification.The essence of deep learning is to computerepresentations of the data.Higherlevel features are defined from lowerlevel ones.

HOW IS DEEP LEARNINGDIFFERENT FROM REGULAR
NEURAL NETWORKS?Training neural networks requires applying gradientdescent on millions of dimensions.This is intractable for large networks.Deep learning places constraints on neural networks.This allows them to be solvable iteratively.The constraints are generic.

WHAT IS THE BIG DEAL ABOUTIT?
AlexNet submitted to the ImageNet ILSVRC challenge in2012 is partly responsible for the renaissance.Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton usedDeep Learning techniques.They combined this with GPUs, some other techniques.The result was a neural network that could classify imagesof cats and dogs.It had an error 16% compared to 26% for the runner up.

ILYA SUTSKEVER, ALEXKRIZHEVSKY, GEOFFREY HINTON

WHAT ARE THE DIFFERENT KINDSOF DEEP ARCHITECTURES?
GenerativeDiscriminativeHybrid

WHAT ARE GENERATIVEARCHITECTURES
Extract features from dataFind common features in unlabelled dataLike Principal Component AnalysisUnsupervised: no labels required

WHAT ARE DISCRIMINATIVEARCHITECTURES
Classify inputs into classesRequire labelsRequire supervised training

WHAT ARE HYBRIDARCHITECTURES?
STEP 1Combination of generative and discriminativeExtract features using generative networkUse unsupervised learning
STEP 2Train discriminative network on extracted featuresUse supervised learning

WHAT ARE AUTOENCODERS?An autoencoder is a learning algorithm.It applies backpropagation and sets the target values tobe equal to its inputs.In other words it trains itself to do the identitytransformation.

WHY DOES IT DO THIS?By placing constraints on it, like restricting the number ofhidden neurons, it can find a good representation of thedata.

IS THE AUTOENCODERSUPERVISED OR UNSUPERVISED?

It is unsupervised.The data is unlabeled.Autoencoders are similar to PCA (Principal ComponentAnalysis).PCA is a technique for reducing the dimensions of data.

WHAT ARE CONVOLUTIONNEURAL NETWORKS?
Feedforward neural networks.Connection pattern inspired by visual cortex.

CONVOLUTION NEURALNETWORKS
The convolution layers parameters are a set of learnablefilters.Every filter is small along width and height.During the forward pass, each filter slides across the widthand height of the input, producing a 2dimensionalactivation map.As we slide across the input we compute the dot productbetween the filter and the input.

CONVOLUTION NEURALNETWORKS
Intuitively, the network learns filters that activate whenthey see a specific type of feature anywhere.In this way it creates translation invariance.

WHAT IS A POOLING LAYER?The pooling layer reduces the resolution of the imagefurther.It tiles the output area with 2x2 mask and takes themaximum activation value of the area.

DOES SPARK SUPPORT DEEPLEARNING?
Not directly yethttps://issues.apache.org/jira/browse/SPARK2352
https://issues.apache.org/jira/browse/SPARK2352

WHAT ARE SOME MAJOR DEEPLEARNING PLATFORMS?

Theano: Lowlevel GPUenabled tensor library.Lasagne, Blocks: NN libraries that make Theano easier touse.Torch7: NN library. Uses Lua for binding. Used byFacebook and Google.Caffe: NN library by Berkeley AMPLab.Pylearn2: ML library based on Theano by University ofToronto. Google DeepMind.cuDNN: NN library by Nvidia based on CUDA. Can be usedwith Torch7, Caffe.Chainer: NN library that uses CUDA.TensorFlow: NN library from Google.

WHAT LANGUAGE ARE THESE IN?All the frameworks support Python.Except Torch7 which uses Lua for its binding language.

WHAT CAN I DO ON SPARK?SparkNet: Integrates running Caffe with Spark.Sparkling Water: Integrates H2O with Spark.DeepLearning4J: Built on top of Spark.TensorFlow on Spark (experimental)

QUESTIONS

GALVANIZE DATA ENGINEERING