Machine Learning and Computer Vision Group...Computer graphics motivation - lters for...

29
Machine Learning and Computer Vision Group Deep Learning with TensorFlow http://cvml.ist.ac.at/courses/DLWT_W18 Lecture 5: Convolutional Networks

Transcript of Machine Learning and Computer Vision Group...Computer graphics motivation - lters for...

Page 1: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Machine Learning and Computer Vision Group

Deep Learning with TensorFlowhttp://cvml.ist.ac.at/courses/DLWT_W18

Lecture 5:Convolutional Networks

Page 2: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Neural Networks

Martin Topfer

IST Austria

December 17, 2018

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 1 / 14

Page 3: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Outline

1 Motivation

2 Convolution Layer

3 Pooling Layer

4 CNN Architecture

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 2 / 14

Page 4: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Biological motivation

1958: David Hubel, Torsten Weisel: neurons in visual cortex haveoften small local receptive field

some reacts only to certain shapes

Some neurons have larger receptive field

they recognize more complex patternscombines outputs of lower-level neurons

hskld

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 3 / 14

Page 5: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Biological motivation

1958: David Hubel, Torsten Weisel: neurons in visual cortex haveoften small local receptive field

some reacts only to certain shapes

Some neurons have larger receptive field

they recognize more complex patternscombines outputs of lower-level neurons

hskld

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 3 / 14

Page 6: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Computer graphics motivation - filters

for capturing features of an image like edge detection etc.

multiply each pixel and its neighbours by some matrix

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 4 / 14

Page 7: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

ML motivation - fully-connected networks disadvantages

not using structure of the input data

2D picture treated as 1D vector

a lot of parameters to learn

especcially for larger input

locality

pattern recognized on a particular place is not recognized elsewhere

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 5 / 14

Page 8: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 9: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 10: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 11: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 12: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 13: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 14: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 15: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 16: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 17: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer

Each neuron is connect only toits receptive field = smallrectangle in previous layer

1st layer: low-level features

2nd layer: higher-level features

...

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 6 / 14

Page 18: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

What does neuron do?

each neuron outputsmultiplication of its receptionfield by a matrix = filter

followed by activationfunction (oftenly ReLU)

the same filter for all neurons

filters are learned

feature map

zi ,j = a

(bk +

wk∑u=0

hk∑v=0

xi+u,j+v · wu,v

)zi,j = output of filter in position i , j ; wk , hk = width and height of the filter;wu,v = weights of filter; xi,j = input from previous layer in position i , j ;bk = bias term; a = activation function

towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 7 / 14

Page 19: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Stacking feature maps

more filters in each conv. layer

filter uses all feature maps fromprevious layer

zi,j,k = output of a filter k in position i , jwk , hk = width and height of the filternl−1 = num. of f. maps in previous layerwu,v,k′,k = weights of filter kxi,j,k′ = output of a filter k ′ from previous layer inposition i , jbk = bias of filter ka = activation function

zi ,j ,k = a

(bk +

wk∑u=0

hk∑v=0

nl−1∑k ′=0

xi+u,j+v ,k ′ · wu,v ,k ′,k

)

http://index-of.es/Varios2/HandsonMachineLearningwithScikitLearnandTensorflow.pdf

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 8 / 14

Page 20: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Padding

How to behave on the border?

VALID

new layer slightly smaller

less common

SAME

new layer of the same size

add zeros around

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 9 / 14

Page 21: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Strides

strides = [1, 1, 1, 1] strides = [1, 2, 2, 1]

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 10 / 14

Page 22: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Strides

strides = [1, 1, 1, 1] strides = [1, 2, 2, 1]

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 10 / 14

Page 23: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Strides

strides = [1, 1, 1, 1] strides = [1, 2, 2, 1]

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 10 / 14

Page 24: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Strides

strides = [1, 1, 1, 1] strides = [1, 2, 2, 1]

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 10 / 14

Page 25: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Strides

strides = [1, 1, 1, 1] strides = [1, 2, 2, 1]

http://www.cvc.uab.es/people/joans/slides_tensorflow/tensorflow_html/layers.html

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 10 / 14

Page 26: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Convolutional Layer - Tensorflow overview

c f i l t e r s = 32 # number o f f i l t e r sc k e r n e l s i z e = 3 # f o r s q u a r e k e r n e l 3 x3c s t r i d e s = [ 1 , 2 , 2 , 1 ] # [ mini−batch s i z e , h e i g h t ,

width , depth i n f e a t u r e maps ]c pad = ”SAME” # SAME o r VALIDc a c t = t f . nn . r e l u # a c t i v a t i o n f u n c t i o n

c o n v l a y e r = t f . l a y e r s . conv2d ( p r e v i o u s l a y e r ,k e r n e l s i z e = c k e r n e l s i z e , s t r i d e s = c s t r i d e s ,padd ing = c pad , a c t i v a t i o n = c a c t )

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 11 / 14

Page 27: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

Pooling layer

shrinking image - either taking maximum or average

reduction of computational load (memory)

reduction of parameters (overfitting)

p o o l l a y e r = t f . nn . max pool ( i n p u t l a y e r ,k s i z e =[1 , 2 , 2 , 1 ] ,s t r i d e s =[1 , 2 , 2 , 1 ] ,padd ing=”VALID ”)

https://skymind.ai/wiki/convolutional-network

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 12 / 14

Page 28: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

CNN Architecture

Le-Net5 (1998):

https://www.researchgate.net/publication/312170477_On_Classification_of_Distorted_Images_with_Deep_

Convolutional_Neural_Networks

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 13 / 14

Page 29: Machine Learning and Computer Vision Group...Computer graphics motivation - lters for capturingfeaturesof an image like edge detection etc. multiply each pixel and its neighbours by

CNN compared to fully-connected networks

smaller number of parameters (filters need a lot of memory but theyshare parameters)

better results than fully-connected networks

much more hyperparameters (different architectures +hyperparameters for each convolutional layer)

Martin Topfer (IST Austria) Convolutional Neural Networks December 17, 2018 14 / 14