Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf ·...

103
Discriminative & Deep Learning for Big Data Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg AIMS-CDT Michaelmas 2019

Transcript of Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf ·...

Page 1: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Discriminative & Deep Learning for

Big Data

Andrew Zisserman

Visual Geometry Group

University of Oxford

http://www.robots.ox.ac.uk/~vgg

AIMS-CDT Michaelmas 2019

Page 2: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Overview

Two aspects:

1. Learning from Big Data:

• Supervised and discriminative learning: provides training data

for large capacity models

• e.g. for data hungry deep neural networks

2. Searching and modelling Big Data:

• Need efficient (fast and low memory) methods to address this

Page 3: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Mon: Discriminative Learning 1 [AZ]

• Introduction, NN, linear classifiers; regression, SVMs, loss functions

Tue: Discriminative Learning 2 and searching Big Data [AZ,RA]

• Multiple classes, large scale retrieval; ANN, LSH, PQ

• Practical 1: Image classification

Wed: Deep Learning 1 [AV]

• Neural networks, CNNs, back-prop, applications

Thu: Deep learning 2 and Large Scale Learning [AV]

• Architectures, visualization, applications; large scale learning

• Practical 2: CNNs

Mon 9th: Seminar by Alyosha Efros

• Applications of large scale matching in Computer Vision

Course Outline

Page 4: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Recommended book: general introduction

• Pattern Recognition and

Machine Learning

Christopher Bishop, Springer, 2006.

• Excellent on classification and

regression

Page 5: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Textbooks 2: also general

• Elements of Statistical

Learning

Hastie, Tibshirani, Friedman,

Springer, 2009, second edition

• Good explanation of algorithms

• pdf available online

Page 6: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Learning with Kernels

Bernhard Schölkopf and Alexander J. Smola

MIT, 2002.

Textbooks 3: specialized for SVMs and kernels

Page 7: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Textbooks 4: specialized for deep learning

• Deep Learning

Goodfellow, Bengio, Courville

MIT Press, 2016

• html version available online

• http://www.deeplearningbook.org

Page 8: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Introduction: What is Machine Learning?

Algorithms that can improve their performance using

training data

• Typically the algorithm has a (large) number of

parameters whose values are learnt from the data

• Can be applied in situations where it is very challenging

(= impossible) to define rules by hand, e.g.:

• Face recognition

• Speech recognition

• Stock prediction

Page 9: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 1: hand-written digit recognition

Images are 28 x 28 pixels

Page 10: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

How to proceed …

As a supervised classification problem

Start with training data, e.g. 6000 examples of each digit

• Can achieve testing error of 0.4%

• One of first commercial and widely used ML systems (for zip codes & checks)

Page 11: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 2: Face detection

• Again, a supervised classification problem

• Need to classify an image window into three classes:

• non-face

• frontal-face

• profile-face

Page 12: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Classifier is learnt from labelled data

Training data for frontal faces

• 5000 faces

All near frontal

Age, race, gender, lighting

• 108 non faces

• faces are normalized

scale, translation

Page 13: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 3: Spam detection

• This is a classification problem

• Task is to classify email into spam/non-spam

• Data xi is word count, e.g. of viagra, outperform, “you may be

surprized to be contacted” …

• Requires a learning system as “enemy” keeps innovating

Page 14: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 4: Stock price prediction

• Task is to predict stock price at future date

• This is a regression task, as the output is continuous

Page 15: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Protein Structure and Disulfide Bridges

Protein: 1IMT

AVITGACERDLQCG

KGTCCAVSLWIKSV

RVCTPVGTSGEDCH

PASHKIPFSGQRMH

HTCPCAPNLACVQT

SPKKFKCLSK

Regression task: given sequence predict 3D structure

Example 5: Computational biology

Page 16: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

abL

greyscale image

“Free”

supervisory

signal

Regression task: predict pixel colour from a monochrome input

Example 6: Colourization

colour image

Page 17: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Colourization examples

Train network to predict pixel colour from a monochrome input

Colorful Image Colorization, Zhang et al., ECCV 2016

Page 18: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 7: Machine translation

What is the anticipated cost of collecting fees under the new proposal?

En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?

x y

Whatis

theanticipated

costof

collecting fees

under the

new proposal

?

Envertudelesnouvellespropositions, quelestle coûtprévude perception de les droits?

e.g. Google translate

Use of aligned text

Page 19: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

What is the anticipated cost of collecting fees under the new proposal?

Google Translate (2019)

Page 20: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example 8: Machine transcription – sequences

Use of aligned speech and text

• Automated speech recognition

Use of aligned images and text

• Text spotting and recognition

Use of aligned face video and text

• Automated lip reading

Page 21: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation
Page 22: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation
Page 23: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Lip reading sentences

Chung, Senior, Vinyals, Zisserman, 2017

Page 24: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Speech and natural language

https://www.skype.com/en/features/skype-translator/

Google Translate App

• Translate between 103 languages by typing

• Offline: Translate 52 languages when you have no Internet

• Instant camera translation: Use your camera to translate

text instantly in 30 languages

• Camera Mode: Take pictures of text for higher-quality

translations in 37 languages

• Conversation Mode: Two-way instant speech translation in

32 languages

https://play.google.com/store/apps/details?id=com.google.android.apps.tra

nslate&hl=en

See also: The Great AI Awakening (New York Times Magazine)

Slide credit: Lana Lazebnik

Page 25: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Lecture outline

• Classification

• Regression

• Overfitting

• Regularization and Loss functions

• Part II

• Linear classifiers

• Support Vector Machine (SVM)

• Logistic regression (LR)

Page 26: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Supervised Learning: Overview

Learning machine

Page 27: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Classification

• Suppose we are given a training set of N observations

• Classification problem is to estimate f(x) from this data such that

Page 28: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K Nearest Neighbour (K-NN) Classifier

Algorithm

• For each test point, x, to be classified, find the K nearest

samples in the training data

• Classify the point, x, according to the majority vote of their

class labels

e.g. K = 3

• applicable to

multi-class case

Page 29: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 1

Voronoi diagram:

• partitions the space into regions

• boundaries are equal distance

from training points

Classification boundary:

• non-linear

Page 30: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

A sampling assumption: training and test data

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Training data Testing data

• Assume that the training examples are drawn independently from the set of all

possible examples.

• This makes it very unlikely that a strong regularity in the training data will be absent in

the test data.

• Measure classification error as

loss function

The “risk”

Page 31: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 1

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.0

Training data

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.15

Testing data

Page 32: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 3

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.0760 error = 0.1340

Training data Testing data

Page 33: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Generalization

• The real aim of supervised learning is to do well on test data that is not known during learning

• Choosing the values for the parameters that minimize the loss function on the training data is not necessarily the best policy

• We want the learning machine to model the true regularities in the data and to ignore the noise in the data.

Page 34: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 1

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.0 error = 0.15

Training data Testing data

Page 35: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 3

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.0760 error = 0.1340

Training data Testing data

Page 36: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 7

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.1320 error = 0.1110

Training data Testing data

Page 37: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K = 21

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-1.5 -1 -0.5 0 0.5 1 1.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

error = 0.1120 error = 0.0920

Training data Testing data

Page 38: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Properties and training

As K increases:

• Classification boundary becomes smoother

• Training error can increase

Choose (learn) K on a validation set

• Split training data into training and validation

• Hold out validation data and measure error on this

• Validation set acts as a proxy for the test set

Page 39: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example: hand written digit recognition

• MNIST data set

• Distance = raw pixel distance between images

• 60K training examples

• 10K testing examples

• K-NN gives 5% classification error

Page 40: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Summary

Advantages:

• K-NN is a simple but effective classification procedure

• Applies to multi-class classification

• Decision surfaces are non-linear

• Quality of predictions automatically improves with more training

data

• Only a single parameter, K; easily tuned by cross-validation

• Use as a baseline/debugging classifier

-1.5 -1 -0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Page 41: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Summary

Disadvantages:

• What does nearest mean? Need to specify or learn a distance

metric.

• Computational cost: must store and search through the entire

training set at test time. Can alleviate this problem by thinning,

and use of efficient data structures like approximate NN

Page 42: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Lecture outline

• Classification

• Regression

• Overfitting

• Regularization and Loss functions

• Part II

• Linear classifiers

• Support Vector Machine (SVM)

• Logistic regression (LR)

Page 43: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Regression

• Suppose we are given a training set of N observations

• Regression problem is to estimate y(x) from this data

y

x

Page 44: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

K-NN Regression

Algorithm

• For each test point, x, find the K nearest samples xi in the

training data and their values yi

• Output is mean of their values

• Again, need to choose (learn) K

y

Page 45: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Example:

Real time head pose regression

Fanelli et al., DAGM 2011

Page 46: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation
Page 48: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Regression example: polynomial curve fitting

• The green curve is the true function (which is

not a polynomial)

• The data points are uniform in x but have

noise in y.

• We will use a loss function that measures the

squared error in the prediction of y(x) from x.

The loss for the red polynomial is the sum of

the squared vertical errors.

from Bishop

target value

polynomial

regression

y

y

Page 49: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Some fits to the data: which is best?

from Bishop

over fitting

y y

yy

Page 50: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Over-fitting

Root-Mean-Square (RMS) Error:

• test data: a different sample from the same true function

• training error goes to zero, but test error increases with M

Page 51: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Trading off goodness of fit against model complexity

• If the model has as many degrees of freedom as the data, it

can fit the training data perfectly

• But the objective in ML is generalization

• Can expect a model to generalize well if it explains the training

data surprisingly well given the complexity of the model.

Page 52: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

How to prevent over fitting? I

• Add more data than the model “complexity”

• For 9th order polynomial:

y y

Page 53: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Polynomial Coefficients

Page 54: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• Regularization: penalize large coefficient values

How to prevent over fitting? II

loss function regularization

• In practice use validation data to choose lambda (not test data)

• Convex cost function, closed form solution

• This is “weight decay” in deep learning

“ridge” regression

y

Page 55: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Polynomial Coefficients

Page 56: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Summary Point: How to set parameters?

Use a validation set

Divide the total dataset into three subsets:

• Training data is used for learning the parameters of the model.

• Validation data is not used for learning, but is used to determine the hyper-parameters, e.g. for deciding the type of model and the amount of regularization.

• Test data is used to get a final, unbiased estimate of how well the learning machine works. Expect this estimate to be worse than on the validation data.

Can re-divide the total dataset to get another unbiased estimate of the true error rate.

Page 57: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• Again, need to control the complexity of the (discriminant)

function

Page 58: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Lecture outline

• Classification

• Regression

• Overfitting

• Regularization and Loss functions

• Part II

• Linear classifiers

• Support Vector Machine (SVM)

• Logistic regression (LR)

Page 59: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

What comes next?

• Learning by optimizing a cost function:

loss function regularization

• In general

• choose loss function for: classification, regression, ranking, clustering …

• choose regularization function

Page 60: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

The “Lasso” or L1 norm regularization

loss function regularization

• This is a quadratic optimization problem

• There is a unique solution

• p-Norm definition:

• LASSO = Least Absolute Shrinkage and Selection

Page 61: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Sparsity property of the Lasso

• contour plots for d = 2

ridge regression lasso

• Minimum where contours of loss and regularizer are tangent

• For the lasso case, minima occur at “corners”

• Consequently one of the weights is zero

• In high dimensions many weights can be zero

Page 62: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation
Page 63: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

regularization parameter lambda

0 0.5 1-1

-0.5

0

0.5

1

1.5Ridge Regression

percent of lambdaMax

co

effic

ien

t va

lue

s

0 0.5 1-1

-0.5

0

0.5

1

1.5L1-Regularized Least Squares

percent of lambdaMax

co

effic

ien

t va

lue

s

Lasso in action

Page 64: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Sparse weight vectors

• Weights being zero is a method of “feature selection” –

zeroing out the unimportant features

• (The SVM classifier also has this property (sparse alpha

in the dual representation))

• Ridge regression does not

Page 65: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

More loss functions for regression

Page 66: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• Linear classifiers

• Linear separability

• Support Vector Machine (SVM) classifier

• Wide margin

• Cost function

• Hard and soft margins

• Optimization

• Logistic Regression classifier

• Cost function

Part II Outline

Page 67: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Linear Classifiers and the SVM

Page 68: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Binary Classification

Page 69: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Linear separability

linearly

separable

not

linearly

separable

Page 70: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Linear classifiers

X2

X1

A linear classifier has the form

• in 2D the discriminant is a line

• is the normal to the line, and b the bias

• is known as the weight vector

Page 71: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Linear classifiers

A linear classifier has the form

• in 3D the discriminant is a plane, and in nD it is a hyperplane

For a K-NN classifier it was necessary to `carry’ the training data

For a linear classifier, the training data is used to learn w and then discarded

Only w is needed for classifying new data

Page 72: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

What is the best w?

• maximum margin solution: most stable under perturbations of the inputs

Page 73: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Interlude: Why should we care about linear classifiers?

Example: bicycle image classification

Page 74: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Pre-deep learning – non-linear classifiers

interpretationinput representation

Feature

Extractor

Page 75: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Deep learning – train network for linear classification

interpretationinput representation

CNN Feature

Extractor

• Optimize network with loss

function for a linear classifier

• Learns to produce feature vectors

x that are linearly separable

Page 76: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

How to find the maximum margin?

linearly separable data

Page 77: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Support Vector Machine

w

Support Vector

Support Vector

support vectors

wTx + b = 0

linearly separable data

Page 78: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

How to find the maximum margin?

linearly separable data

Page 79: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

SVM – sketch derivation

Page 80: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Support Vector Machine

w

Support Vector

Support Vector

wTx + b = 0

wTx + b = 1

wTx + b = -1

Margin =

linearly separable data

Page 81: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

SVM – Optimization

Page 82: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Linear separability again: What is the best w?

• the points can be linearly separated but

there is a very narrow margin

• but possibly the large margin solution is

better, even though one constraint is violated

In general there is a trade off between the margin and the number of

mistakes on the training data

Page 83: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Margin violations and misclassifications

w

Support Vector

Support Vector

wTx + b = 0

wTx + b = 1

wTx + b = -1

Misclassified point

Margin violationPenalize points according

to their distance from the

margin

Page 84: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Loss functions

• SVM uses “hinge” loss

• in contrast to the 0-1 loss

Page 85: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

“Soft” margin problem

loss functionregularization

Page 86: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Loss function

w

Support Vector

Support Vector

wTx + b = 0loss function

Page 87: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

feature x

featu

re y

File: az-margin.mat, # of points K = 27

• data is linearly separable

• but only with a narrow margin

Page 88: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

C = Infinity hard margin

STPRTool Franc & Hlavac

Page 89: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

C = 10 soft margin

Page 90: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• Does this cost function have a unique solution?

• Does the solution depend on the starting point of an iterative

optimization algorithm (such as gradient descent)?

local

minimum

global

minimum

If the cost function is convex, then a locally optimal point is globally optimal

(provided the optimization is over a convex set, which it is in our case)

Optimization

Page 91: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Convex functions

Page 92: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Convex function examples

convex Not convex

A non-negative sum of convex functions is convex

Page 93: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

+

convex

Page 94: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• We have seen – “ridge” regression

squared loss: squared regularizer:

• Lasso regression

squared loss: lasso regularizer:

• SVM

hinge loss: squared regularizer:

Summary: Learning by optimizing cost functions

loss function regularization

Page 95: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Logistic Regression Classifier

Page 96: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Overview

Page 97: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

The logistic function or sigmoid function

-20 -15 -10 -5 0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z

Page 98: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

-20 -15 -10 -5 0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

A sigmoid favours a larger margin cf a step classifier

Margin property

-20 -15 -10 -5 0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Though, need to control the gradient. How?

Page 99: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Learning

-20 -15 -10 -5 0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

1

0.5

Page 100: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Maximum Likelihood Estimation

Page 101: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Logistic Regression Loss function

Page 102: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

Comparison of SVM and LR cost functions

Note:

• both approximate 0-1 loss

• very similar asymptotic behaviour

• main difference is smoothness of LR,

and non-zero outside SVM margin

• SVM gives sparse solution for ai

yi f (x i )

Page 103: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation

• Beyond Binary Classification

• Multi-class and Multi-label

• Using binary classifiers

• Big Data

• retrieval and ranking

• precision-recall curves

• Nearest Neighbours (NN)

• Approximate NN, Locality Sensitive Hashing, Product Quantization

• Intro to practical

Next lecture