Let’s learn deep

31
LET’S LEARN DEEP SHUBHANSHU MISHRA @THESHUBHANSHU

Transcript of Let’s learn deep

Page 1: Let’s learn deep

LET’S LEARN DEEPSHUBHANSHU MISHRA@THESHUBHANSHU

Page 2: Let’s learn deep

Some Interesting Results

Image Source: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ Distributed representations of words and phrases and their compositionalityT Mikolov, I Sutskever, K Chen, GS Corrado, J Dean - Advances in neural information processing systems, 2013

Page 3: Let’s learn deep

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013).

Deep learningY LeCun, Y Bengio, G Hinton - Nature, 2015

Page 4: Let’s learn deep

http://www.socher.org/uploads/Main/MultipleVectorWordEmbedding.png

Page 5: Let’s learn deep

Zou, Will Y., et al. "Bilingual Word Embeddings for Phrase-Based Machine Translation." EMNLP. 2013.

Page 6: Let’s learn deep

Paraphrase Detection

Socher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in Neural Information Processing Systems. 2011.

Page 7: Let’s learn deep

Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.

http://cs.stanford.edu/people/karpathy/deepimagesent/

Page 8: Let’s learn deep

Why Neural Networks? - The perceptron algorithm can learn to classify linearly separable samples. ALWAYS.

- BUT, how to tackle non-linearity?

Enter NEURAL NETWORKS

- Add a non linear transform to the data

- 1 layer ANNs can approximate any continuous function [1,2]

- Can be trained through BACKPROPOGRATION

http://cs231n.github.io/neural-networks-1/[1] Cybenko, George. "Approximation by superpositions of a sigmoidal function."Mathematics of control, signals and systems 2.4 (1989): 303-314.[2] http://neuralnetworksanddeeplearning.com/chap4.html

Page 9: Let’s learn deep

A simple Neural Network

http://ufldl.stanford.edu/wiki/images/thumb/9/99/Network331.png/400px-Network331.png

Y

𝑙𝑜𝑠𝑠=𝐻 ( 𝑓 (𝑊 ,𝑋 ) ,𝑌 )

log 𝑙𝑜𝑠𝑠❑=∑ 𝑦 ∗ log ( 𝑓 (𝑊 ,𝑋 ))h𝑖𝑛𝑔𝑒𝑙𝑜𝑠𝑠=∑ max (0 ,1− 𝑓 (𝑊 , 𝑋 )∗ 𝑦)

Train it through back propagation

𝑊 𝑡=𝑊 𝑡− 1− 𝑙∗𝜕𝑙𝑜𝑠𝑠(𝑊 )

𝜕𝑊

Page 10: Let’s learn deep

Types of ANN: Vanilla Feed Forward NN

https://class.coursera.org/neuralnets-2012-001/lecture

Hinton, Geoffrey E. "Learning distributed representations of concepts."Proceedings of the eighth annual conference of the cognitive science society. Vol. 1. 1986.

Page 11: Let’s learn deep

https://class.coursera.org/neuralnets-2012-001/lecture

Page 12: Let’s learn deep

https://class.coursera.org/neuralnets-2012-001/lecture

Page 13: Let’s learn deep

Collobert, Ronan, et al. "Natural language processing (almost) from scratch."The Journal of Machine Learning Research 12 (2011): 2493-2537.

Example of multitasking with NN. Task 1 and Task 2 are two tasks trained with the window approach architecture presented in Figure 1. Lookup tables as well as the first hidden layer are shared. The last layer is task specific. The principle is the same with more than two tasks.

Page 14: Let’s learn deep

AI Question AnsweringCounting Compound Coreference

Factoid Q/A with supporting facts

Weston J, Bordes A, Chopra S, Mikolov T. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In: Unpublished.; 2015. doi:10.1016/j.jpowsour.2014.09.131.

Reasoning about agents motivation

Bordes A, Usunier N, Chopra S, Weston J. Large-scale Simple Question Answering with Memory Networks. arXiv. 2015.

Weston J, Chopra S, Bordes A. Memory Networks. In: International Conference on Learning Representations.; 2015:1-14. http://arxiv.org/abs/1410.3916.

Total 20 tasks. System should solve all tasks. No task specific system. Use Memory Network to solve these tasks. Accuracy of ~42% beats the older benchmarks.

http://www.thespermwhale.com/jaseweston/babi/abordes-ICLR.pdf

Page 15: Let’s learn deep

Types of ANN: Recurrent NN

http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-shorttermdepdencies.png

Learn sequential structures like sequence of chars, words, audio signals etc.

Page 16: Let’s learn deep

Types of ANN: Recurrent NN

http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/img/Bottou-Atree.png

From Machine Learning to Machine Reasoning Léon Bottou

Learn arbitrary structures like parse trees.

Page 17: Let’s learn deep

Types of ANN: Convolutional Neural Nets

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/img/Conv-9-Conv2Max2Conv2.png

Learn similar features in different parts of the inputs

Are used heavily in Image Data because various parts of the image can refer to the same data.

Page 18: Let’s learn deep

Types of ANN: Auto Encoders

From Machine Learning to Machine Reasoning Léon Bottou

http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/img/Bottou-unfold.png

Learn to reconstruct the input

Page 19: Let’s learn deep

Types of ANN: RBMsand DBNsRBM: Restricted Boltzmann MachineDBN: Deep Belief NetworksGenerative graphical model

Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. "Restricted Boltzmann machines for collaborative filtering." Proceedings of the 24th international conference on Machine learning. ACM, 2007.

Page 20: Let’s learn deep

What is Deep About Deep Learning?

1. Deep Belief networks

2. RBMs, Auto encoders

3. Convolutional Neural Networks

4. Stacked Auto Encoders

Deeper NNs are helpful so that number of parameters to learn are of polynomial order compared to less layers where number of parameters to learn will increase exponentially.

Wolf, Lior. "Deepface: Closing the gap to human-level performance in face verification." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE. 2014.

Page 21: Let’s learn deep

What is Deep Learning? Like a Lego Building exercise.

Stacking of various models and propagating the error from the output of this architecture to each layer.

Solves the issue of feature selection

Non linear relationship between features

Much easier to train a model on large data than to hand craft features.

Page 22: Let’s learn deep

When Deep Learning?

LARGE DATALARGE

COMPUTATIONAL RESOURCES

USEFUL QUESTIONS

Page 23: Let’s learn deep

Why were Deep ANN’s in shadows?

There were major challenges in training ANNs:◦ Need large amounts of data to train (for better function approximation)◦ More weights to train for (Standard image classification models have weights in millions or billions)◦ Vanishing and exploding gradient problem (for Deeper Neural Networks)

Page 24: Let’s learn deep

What changed? Algorithms for training ANN:

◦ Stochastic Gradient Descent (with momentum)◦ RMSProp◦ Adam, AdaDelta

Fixed vanishing and exploding gradient problems:

◦ LSTM, GRU Units (for vanishing gradients)◦ Gradient Clipping (for exploding gradients)

Methods to prevent overfitting:◦ Regularization◦ Dropout◦ Adversial Networks

Computation Resources:◦ GPU Computing◦ HPC, MPI

Larger Datasets:◦ ImageNet (for image classifications)◦ Google Billion Words Corpus (for auto

generated word vectors)

Methods to gain sparsity:◦ DropOut◦ ReLU, MaxOut activations

Page 25: Let’s learn deep

Machine Learning to Neural NetworksMACHINE LEARNING METHODS

Deterministic Models◦ Linear Regression◦ Logistic Regression◦ SVM◦ CRFGenerative Models◦ HMM◦ LDA◦ Collaborative FilteringUnsupervised◦ K-means◦ Hierarchal Clustering

NEURAL NETWORK METHODS

Deterministic Models◦ ANN Squared Error loss◦ ANN Softmax layer and log loss◦ ANN Hinge loss◦ RNN with prediction at endGenerative Models◦ RNN generating sequences◦ RBMs◦ RBMsUnsupervised◦ Auto Encoders◦ RBMs◦ Deep Belief Networks

Page 26: Let’s learn deep

LITTLE MATHOPTIONAL

Page 27: Let’s learn deep

Loss Functions & Optimization

Rmsprop and Adagrad, Adadelta are used in high performance networks.

Idea is:

For some f(W, X) minimize the loss

Between y and f(W,X).

This is done using a loss function.

Major one is log-loss

Page 28: Let’s learn deep

Open Questions Autoencoders for text data

AI Question Answering

Sarcasm Sentiment analysis

Page 29: Let’s learn deep

Collaborate SEMEVAL 2016 is coming up and there are tasks like

◦ Sentiment analysis◦ Question Answering◦ http://alt.qcri.org/semeval2016/task4/

Page 30: Let’s learn deep

the didbend first water.bond warmerial in roid.the lagents to duttersprantessi harkian, arow ... with enkyber fanter-indoug tood cool... the summer small winding skates the moutledday markedgly searl.doupy of it your sold all ic house bat she - etther of thouder fol my old starsgream trains ond cat out the song"saurand shide of gres dewill a now centher mother of at, the creaking passs cool sunsing sapcingatale dowthing aland suncaking in.do a back-end stliagh in in ithicn like into whereso to the touther pate patin on' gal on the aloopmesaterfleoss the sound i lean

I andhe had begetter by His husband, brought unto a hundred cruelings,shrouded me, pierced Arjuna, on thy foe, proud directions and urged bySatyaki in the heart as the filled hill with his flying poison. Untothy host, called Earth, recognise him, by means of her abode, 'Thou shaltconquer thy car is in all kinds of righteousness. Whatever I is filledwith respect. In thee enjoyment will iniunto that Kshatriya enjoys verilyto that as to him that I have now take for me of Kuru's race.'"

SECTION LXXXVIII

"Drona said, 'Renounced still, thou art my great science and foreholder,thou wilt, O best of men, go now, may be said to be Pandu. Persons offooly acts also may injury With regions of entirety? Thou art the deterioryfrom this point of desire. There should be and enjoyeth rites defeatedby the world meet with without injury.

Page 31: Let’s learn deep

THANK YOU =)MANY OF THE RESOURCES USED CAN BE FOUND AT: HTTP://SHUBHANSHU.COM/DEEPLEARNING.HTML