PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and...

80
PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi [email protected] Marco Lippi Machine Learning 1 / 80

Transcript of PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and...

Page 1: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

PhD in Computer Science and EngineeringBologna, April 2016

Machine Learning

Marco Lippi

[email protected]

Marco Lippi Machine Learning 1 / 80

Page 2: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks

Marco Lippi Machine Learning 2 / 80

Page 3: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Material and references

Many figures are taken from

Cristopher Olah’s tutorial(colah.github.io)

and

Alex Karpathy’s post“The unreasonable effectiveness of Recurrent Neural Networks”

Marco Lippi Machine Learning 3 / 80

Page 4: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

Marco Lippi Machine Learning 4 / 80

Page 5: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

1. Standard tasks such as image classification

Marco Lippi Machine Learning 5 / 80

Page 6: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

2. Fixed input, sequence output: e.g., image captioning

Marco Lippi Machine Learning 6 / 80

Page 7: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

3. Sequence classification: e.g., sentiment classification

Marco Lippi Machine Learning 7 / 80

Page 8: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

4. Sequence input/output: e.g., machine translation

Marco Lippi Machine Learning 8 / 80

Page 9: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In many tasks we should deal with variable-size input (text, audio,video) while most algorithms can deal with just fixed-size input

5. Synced sequence input/output: e.g., video frame classification

Marco Lippi Machine Learning 9 / 80

Page 10: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

One possibility: use convolutional neural networks with maxpooling over all feature maps

[Figure by Kim, 2014]

Problem: transalation invariance. . .

Marco Lippi Machine Learning 10 / 80

Page 11: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

In a recurrent neural network the hidden state at time t depends

on the input

on the hidden state at time t − 1 (memory)

yt = fV (ht)

ht = fW (ht−1, xt)

Marco Lippi Machine Learning 11 / 80

Page 12: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Processing sequences

We can parametrize:

input-hidden connections

hidden-hidden connections

hidden-output connections

yt = Whyht

ht = tanh(Whhht−1 + Wxhxt)

Marco Lippi Machine Learning 12 / 80

Page 13: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Network Language Model

[Figure by Mikolov et al., 2010]

x(t) = w(t) + s(t − 1)

sj(t) = f (∑i

xi (t)uji )

yk(t) = g(∑j

sj(t)vkj)

Wrt NNLM, no need to specifycontext dimension in advance !

Marco Lippi Machine Learning 13 / 80

Page 14: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

A classic RNN can be unrolled through time, so that the loopingconnections are made explicit

Marco Lippi Machine Learning 14 / 80

Page 15: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

Backpropagation Through Time (BPTT)

[Figure from Wikipedia]

Marco Lippi Machine Learning 15 / 80

Page 16: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

BPTT drawbacks:

decide the value k for unfolding

exploding or vanishing gradients

exploding could be controlled with gradient clipping

vanishing has to be faced with different models (LSTM)

Marco Lippi Machine Learning 16 / 80

Page 17: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

Marco Lippi Machine Learning 17 / 80

Page 18: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

Some short- or mid-term dependencies can be afforded. . .

Marco Lippi Machine Learning 18 / 80

Page 19: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recurrent Neural Networks (RNNs)

But the model fails in learning long-term dependencies !

The main problem is vanishing gradients in BPTT. . .

Marco Lippi Machine Learning 19 / 80

Page 20: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

An LSTM is basically an RNN with a different computational blockLSTMs were designed by Hochreiter & Schmidhuber in 1997 !

Marco Lippi Machine Learning 20 / 80

Page 21: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

An LSTM block features “a memory cell which can maintain itsstate over time, and non-linear gating units which regulate theinformation flow into and out of the cell” [Greff et al., 2015]

Marco Lippi Machine Learning 21 / 80

Page 22: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Cell state

Just let the information go through the networkOther gates can optionally let information through

Marco Lippi Machine Learning 22 / 80

Page 23: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Forget gate

Sigmoid layer that produces weights for the state cell Ct−1

Decides what to keep (1) or forget (0) of past cell state

ft = σ(Wf · [ht−1, xt ] + bf )

Marco Lippi Machine Learning 23 / 80

Page 24: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Input gate

Allows novel information be used to update state cell Ct−1

it = σ(Wi · [ht−1, xt ] + bi )

Ct = tanh(WC · [ht−1, xt ] + bC )

Marco Lippi Machine Learning 24 / 80

Page 25: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Cell state update

Combine old state (after forgetting) with novel input

Ct = ft � Ct−1 + it � Ct

Marco Lippi Machine Learning 25 / 80

Page 26: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Output gate

Build the output to be sent to next layer and upper block

ot = σ(Wo [ht−1, xt ] + bO)

ht = ot � tanh(Ct)

Marco Lippi Machine Learning 26 / 80

Page 27: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Long Short-Term Memory Networks (LSTMs)

Putting all together

ft = σ(Wf · [ht−1, xt ] + bf )

it = σ(Wi · [ht−1, xt ] + bi )

ot = σ(Wo [ht−1, xt ] + bO)

ht = ot � tanh(Ct)

Ct = ft � Ct−1 + it � Ct

Marco Lippi Machine Learning 27 / 80

Page 28: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: character-level language model

[Figure by A. Karpathy]

Marco Lippi Machine Learning 28 / 80

Page 29: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: character-level language model

Pick up a (large) plain text file

Feed the LSTM with the text

Predict next character given past history

At prediction time, sample from output distribution

Get LSTM-generated text

Marco Lippi Machine Learning 29 / 80

Page 30: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

[Figure by A. Karpathy]

Marco Lippi Machine Learning 30 / 80

Page 31: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Interpretation of cells (blue=off, red=on)

[Figure by A. Karpathy]

Marco Lippi Machine Learning 31 / 80

Page 32: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Interpretation of cells (blue=off, red=on)

[Figure by A. Karpathy]

Marco Lippi Machine Learning 32 / 80

Page 33: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: training with Shakespeare poems

[Figure by A. Karpathy]

Marco Lippi Machine Learning 33 / 80

Page 34: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: training with LaTeX math papers

[Figure by A. Karpathy]

Marco Lippi Machine Learning 34 / 80

Page 35: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: training with LaTeX math papers

[Figure by A. Karpathy]

Marco Lippi Machine Learning 35 / 80

Page 36: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: text generation

Example: training with Linux Kernel (C code)

[Figure by A. Karpathy]

Marco Lippi Machine Learning 36 / 80

Page 37: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: image captioning

[Figure by A. Karpathy]

Marco Lippi Machine Learning 37 / 80

Page 38: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Application: handwriting text generation

[Figure by Graves, 2014]

Marco Lippi Machine Learning 38 / 80

Page 39: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Variants of the LSTM model

Many different LSTM variants have been proposed:

Peep-hole connections (almost standard now)all gates can also look at the state

Coupling forget and input gatesbasically we input only when we forget

Gated Recurrent Units (GRUs)where a single gate controls forgetting and update

See “LSTM: a search space odyssey” by Greff et al., 2015

Marco Lippi Machine Learning 39 / 80

Page 40: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Networks (RecNNs)

A generalization of RNNs to handle structured data in the form ofa dependency graph such as a tree

[Figure by Socher et al., 2014]

RNNs can be seen as RecNNs having a linear chain structure.

Marco Lippi Machine Learning 40 / 80

Page 41: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Networks (RecNNs)

Exploit compositionality (e.g., parse trees in text)

[Figure by R. Socher]

Marco Lippi Machine Learning 41 / 80

Page 42: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Networks (RecNNs)

Exploit compositionality (e.g., object parts in images)

[Figure by Socher et al., 2011]

Marco Lippi Machine Learning 42 / 80

Page 43: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

Recently (2013) proposed by Stanford for sentiment analysis

composition function over sentence parse tree

exploit parameters (tensors) that are common to all nodes

(tensor) backpropagation through structure

Later (2014-2015) also a tree-structured version of LSTMs

Marco Lippi Machine Learning 43 / 80

Page 44: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

Sentiment Treebank built by crowdsourcing

11,855 sentences from movie reviews

215,154 labeled phrases (sub-sentences)

[Figure by Socher et al., 2014]

Marco Lippi Machine Learning 44 / 80

Page 45: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

[Figure by Socher et al., 2014]

Marco Lippi Machine Learning 45 / 80

Page 46: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

[Figure by Socher et al., 2014]

Marco Lippi Machine Learning 46 / 80

Page 47: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

[Figure by Socher et al., 2014]

Marco Lippi Machine Learning 47 / 80

Page 48: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

[Figure by Socher et al., 2014]

Marco Lippi Machine Learning 48 / 80

Page 49: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Recursive Neural Tensor Networks (RNTNs)

Very nice model but:

not easy to adapt it to other domains (Reddit, Twitter, etc.)

need to compute parse tree in advance

need classification at phrase-level (expensive)

Marco Lippi Machine Learning 49 / 80

Page 50: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Other applications

Marco Lippi Machine Learning 50 / 80

Page 51: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Modeling speech

Several tasks. . .

Speech recognition

Speaker identification

Speaker gender classification

Phone classification

Music genre classification

Artist classification

Music retrieval

Marco Lippi Machine Learning 51 / 80

Page 52: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Modeling speech

Feature extraction from acoustic signals. . .. . . Tons of unsupervised data !

[Figure by H. Lee]

Marco Lippi Machine Learning 52 / 80

Page 53: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Modeling speech

[Figure by H. Lee]

Marco Lippi Machine Learning 53 / 80

Page 54: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Multimodal Deep Learning

[Figure by Ngiam et al.]

Can we gain something from shared representations... ?... Most probably yes !

Marco Lippi Machine Learning 54 / 80

Page 55: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Other applications: Bioinformatics, Neuroscience, etc...

Protein structure prediction (contact maps)

Deep Spatio-Temporal Architectures [Di Lena et al., 2012]

Sequence: ASCDEVVGSACH...CPPGAERMMAYGV

[Figure by Rafferty et al.]

Marco Lippi Machine Learning 55 / 80

Page 56: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Other applications: Bioinformatics, Neuroscience, etc...

Predicting the aqueous solubility of drug-like molecules

Recursive neural networks [Lusci et al., 2013]

[Figure by Lusci et al.]

Marco Lippi Machine Learning 56 / 80

Page 57: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Ongoing Research andConcluding Remarks

Marco Lippi Machine Learning 57 / 80

Page 58: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Summary

Deep learning has obtained breakthrough results in many tasks

exploit unsupervised data

learning feature hierarchies

some optimization tricks (dropout, rectification, ...)

Is this the solution to all AI problems ? Probably not but...

for certain types of task it is hard to compete

huge datasets and many computational resources

big companies will likely play the major role

huge space for applications upon deep learning systems

...But what is missing ?

Marco Lippi Machine Learning 58 / 80

Page 59: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

A nice overview of methods

[Figure by M.A. Ranzato]Marco Lippi Machine Learning 59 / 80

Page 60: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

We are not yet there. . .

Still far from human-level in unrestricted domains...

Marco Lippi Machine Learning 60 / 80

Page 61: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Again: symbolic vs. sum-symbolic

Still basically a sub-symbolic approach

Representation learning: a step towards symbols...

What are the connections with symbolic approaches ?

What about logic and reasoning ?

Some steps forward:

bAbI tasks @ Facebook

Neural Conversational Models @ Google

Neural Turing Machines (NTMs) @ GoogleDeepMind

Marco Lippi Machine Learning 61 / 80

Page 62: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

bAbI tasks et al. (Facebook)

Text understanding and reasoning with deep networks

The (20) bAbI tasks

The Children’s Book Test

The Movie Dialog dataset

The SimpleQuestions dataset

Marco Lippi Machine Learning 62 / 80

Page 63: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

bAbI tasks (Facebook)

[Table by Weston et al.]

Marco Lippi Machine Learning 63 / 80

Page 64: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

bAbI tasks (Facebook)

[Table by Weston et al.]

Marco Lippi Machine Learning 64 / 80

Page 65: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Children’s Book Test

[Table by Hill et al., 2016]

Marco Lippi Machine Learning 65 / 80

Page 66: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Movie Dialog dataset

[Table by Dodge et al., 2016]

Marco Lippi Machine Learning 66 / 80

Page 67: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

SimpleQuestions dataset

[Table by Bordes et al., 2015]

Marco Lippi Machine Learning 67 / 80

Page 68: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Neural Conversational Model (Google)

[Table by Vinyalis & Le, 2015]

Marco Lippi Machine Learning 68 / 80

Page 69: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Neural Conversational Model (Google)

[Table by Vinyalis & Le, 2015]

Marco Lippi Machine Learning 69 / 80

Page 70: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Neural Turing Machines (Google Deepmind)

[Figure by Graves et al., 2014]

Inspired by Turing MachinesMemory (tape) to read/write through deep networksCapable of learning “simple” algorithms (e.g., sorting)

Marco Lippi Machine Learning 70 / 80

Page 71: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Learning concepts

Examples from the ImageNet category “Restaurant”

Is it enough to learn the concept of a restaurant ?

Marco Lippi Machine Learning 71 / 80

Page 72: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Software packagesand a few references

Marco Lippi Machine Learning 72 / 80

Page 73: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Tensorflow

Developed by Google

Python

Computational graph abstraction

Parallelize over both data and model

The multi-machine part is not open source

Nice tool to visualize stuff (Tensorboard)

A little slower than other systems

Provides only a few (one?) pre-trained models

Marco Lippi Machine Learning 73 / 80

Page 74: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Theano

University of Montreal (Yoshua Bengio’s group)

Python

Computational graph abstraction

The code is somehow difficult (low-level)

Long compile times

Easily GPU and multi-GPU

Marco Lippi Machine Learning 74 / 80

Page 75: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Keras and Lasagne

Keras is a wrapper for Theano or Tensorflow

Python

Plug-and-play layers, losses, optimizers, etc.

Sometimes error messages can be criptic. . .

Lasagne is a wrapper for Theano

Python

Plug-and-play layers, losses, optimizers, etc.

Model zoo with plenty pre-trained architectures

Still employs some symbolic computation as plain Theano

Marco Lippi Machine Learning 75 / 80

Page 76: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Torch

Facebook, Google, Twitter, IDIAP, . . .

mostly written in Lua and C

sharing similarities to python (e.g. tensors vs. numpy arrays)

module nn to train neural networks

same code running for both CPU and GPU

One notable product by Torch: OverfeatA CNN for image classification, object detection, etc.

Marco Lippi Machine Learning 76 / 80

Page 77: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Caffe

Berkeley Vision and Learning Center (BVLC)

written in C++

Python and MATLAB bindings (although not muchdocumented)

very popular for CNNs, not much for RNNs. . .

quite a large model zoo (AlexNet, GoogLeNet, ResNet, . . . )

scripts for training without writing code

Just to give an idea of Caffe’s performance...

During training ∼60M images per day with a single GPU

At test time ∼1 ms/image

Marco Lippi Machine Learning 77 / 80

Page 78: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Matlab

Plenty of resources

Code by G. Hinton on RBMs and DBNs (easy to try !)

Autoencoders (many different implementations)

Convolutional neural networks

. . .

Marco Lippi Machine Learning 78 / 80

Page 79: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Websites and references

http://deeplearning.net

http://www.cs.toronto.edu/~hinton/

http://www.iro.umontreal.ca/~bengioy/

http://yann.lecun.com/

Introductory paper by Yoshua Bengio:“Learning Deep Architectures for AI”

Marco Lippi Machine Learning 79 / 80

Page 80: PhD in Computer Science and Engineering Bologna, April 2016 · PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it ... Many

Videos

Geoffrey Hinton on DBNs:https://www.youtube.com/watch?v=AyzOUbkUf3M

Yoshua Bengio (Deep Learning lectures):https://www.youtube.com/watch?v=JuimBuvEWBg

Yann LeCun (CNNs, energy-based models):https://www.youtube.com/watch?v=oOB4evKlEmQ

Marco Lippi Machine Learning 80 / 80