Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course...

86
Applied Machine Learning in Biomedicine Enrico Grisan [email protected]

Transcript of Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course...

Page 1: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Applied Machine Learning in Biomedicine

Enrico Grisan

[email protected]

Page 2: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Course details

Tue-Thu 14.30-16.00 Room 318

May 24th through June 7th

Contact [email protected]

Exam: project assignment

Page 3: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Cancer detection

Page 4: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Face detection

How would you detect a face?

How does album software tag your friends?

Page 5: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

What do we do?

Page 6: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

What do we do?

Page 7: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Speech recognition

Page 8: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Brain-computer interface

Page 9: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Recommender systems

Amazon, Netflix, Spotify tell you what you might like

The Netflix Prize was an open competition: predict user ratings for films, based on previous ratings without any other information about the users or films,

The grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%

Page 10: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Prediction systems

Just ahead of the kickoff for Season 6 of the television series “Games of Thrones,” computer science students at the Technical University of Munich have implemented a project that answers questions preoccupying fans of the series: Has Jon Snow survived Season 5? Who is going to die next?

The students used an array of machine learning algorithms to answer these questions.

The algorithm, which accurately predicted 74 percent of character deaths in the show and books, has many surprises in store, placing a number of characters thought to be relatively safe in grave danger.

Page 11: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

The age of big data

“Every day, people create the equivalent of 2.5 quintillionbytes of data from sensors, mobile devices, onlinetransactions, and social networks; so much that 90 percent ofthe world's data has been generated in the past two years..”

The Huffington Post: Arnal Dayaratna: IBM Releases Big Data

CERN Collider320x1012 bytes/s

Personal connectome1018 bytes/person

109 messages/day

30x106 messages/day

Page 12: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Mining brain networks

Page 13: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

The role of machine learning

Design and analyze algorithms that

- improve their performance- at some task- with experience

Data(experience)

Learningalgorithm

Knowledge(performance on task)

Page 14: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Imagenet challenge

Page 15: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Kaggle challenge

100 000 $ prize

35000 retinal images4 DR classes

Page 16: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Challenges

• Quantitative modelling

• Noise or information

• Simple or complex models

• Test learning before deploying to real world

• Acquire reliable domain knowledge

Page 17: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Machine learning in biomedicine

Usually extreme conditions:

Very few samples (with respect to the problem)

Very large amount of descriptors per sample

Very large amount of noise/uncertainty

Usually critical consequences:

Results might lead decision making

Results might lead the understanding of phenomena

Page 18: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Categories

– Supervised learning

classification, regression

– Unsupervised learning

Density estimation, clustering, dimensionality reduction

– Semisupervised learning– Active learning– Reinforcement learning– …

Page 19: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Supervised learning

Feature space Target space𝒳 𝒴

Given 𝑿 ∈ 𝓧 and 𝒀 ∈ 𝓨predict 𝒀 = 𝒇 𝐗

Page 20: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Supervised learningFeature space Target space

NormalMetaplastic

Benign neoplasticMalign neoplastic

Gene expression Discrete labelsClassification

CHD risk scoreDemographic and

Clinical dataContinuous labels

Regression

𝒳 𝒴

𝑦 = 𝑓(𝑥)

𝑦 = 𝑓(𝑥)

Page 21: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Typical machine learning workflow

Training data(past)

Domain knowledgeLearned

knowledgeExpert

knowledge

Training

ModelUnknown data

(future)Output

Page 22: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayesian framework: MLE vs MAP

• Maximum likelihood:

choose the value that maximizes the probability of observed data

𝜃𝑀𝐿𝐸 = 𝑎𝑟𝑔 max𝜃

𝑃(𝐷|𝜃)

• Maximum a posteriori

choose the value that is most probable given the observeddata and the prior belief

𝜃𝑀𝐴𝑃 = 𝑎𝑟𝑔 max𝜃

𝑃 𝜃 𝐷 = 𝑎𝑟𝑔 max𝜃

𝑃 𝐷 𝜃 𝑃)𝑃(𝜃)

Page 23: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Decision theory

You have a chest X-ray of a patient, you must decide if the distribution of intensities iscompatible with having malignant lung nodulesor not.

Suppose that you are able to summarize the intensities distribution with a small (possibly 1) number of measures.

Page 24: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Decision theory

Formalization:

𝑋: intensity measures from the image

𝐶1: normal X-ray class

𝐶2: presence of lung nodules class

𝑝(𝑋, 𝐶)

Page 25: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Decision theory

Is it really necessary to determine 𝑝 𝑋, 𝐶 ?

For a given 𝑥, determine the optimal 𝐶𝑖 (cancer or no cancer):

𝑝 𝐶𝑘 𝑥 =𝑝 𝑥 𝐶𝑘 𝑝(𝐶𝑘)

𝑝(𝑥)

Page 26: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Minimum misclassification rate

Page 27: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Minimum misclassification rate

𝑝 𝑥, 𝐶𝑘 = 𝑝 𝐶𝑘| 𝑥 𝑝(𝑥)

Optimal decision: assign 𝑥 to the class for which 𝒑(𝑪𝒌|𝒙) is largest

Page 28: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Penalizing prediction errors

Expected prediction error

𝑝 𝑚𝑖𝑠𝑡𝑎𝑘𝑒 ~𝐸𝑃𝐸 𝑓 = 𝐸[(𝑌 − 𝑓(𝑋))2] = (𝑦 − 𝑓(𝑥))2𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦

= 𝐸𝑋 𝐸𝑌|𝑋 (𝑌 − 𝑓(𝑋))2|𝑋

𝑝 𝑥, 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥)

Page 29: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Penalizing prediction errors

Expected prediction error

𝑝 𝑚𝑖𝑠𝑡𝑎𝑘𝑒 ~𝐸𝑃𝐸 𝑓 = 𝐸[𝐶 ≠ 𝑓(𝑋)] =

𝑘

𝐻(𝒞𝑘 = 𝑓(𝑥))𝑝 𝑥, 𝒞𝑘 𝑑𝑥

= 𝐸𝑋 𝐸𝐶|𝑋 𝐶 ≠ 𝑓(𝑋)|𝑋

𝑝 𝑥, 𝒞𝑘 = 𝑝 𝒞𝑘 𝑥 𝑝(𝑥)

Page 30: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Weighting the loss

Cancer Normal

Cancer 60 40

Normal 5 95

Cancer Normal

Cancer TP FP

Normal FN TN

Esti

mat

edd

iagn

osi

s

True diagnosis

Page 31: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Weighting the loss

Positive Negative

Positive TP FP

Negative FN TN

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁

𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃

𝑇𝑃 + 𝐹𝑁

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁

𝑇𝑁 + 𝐹𝑃

Esti

mat

edd

iagn

osi

s

Page 32: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Weighting the loss

Esti

mat

edd

iagn

osi

s

True diagnosis

Cancer Normal

Cancer 60 40

Normal 5 95

Cancer Normal

Cancer 0 1000

Normal 1 0

Loss Matrix

𝐿11 𝐿21

𝐿12 𝐿22

Page 33: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Expected Loss

Expected prediction error

𝑝 𝑚𝑖𝑠𝑡𝑎𝑘𝑒 ~𝐸𝑃𝐸 𝐶 = 𝐸[𝐿(𝐶, 𝐶(𝑋)]

= 𝐸𝑋

𝑘=1

𝐾

𝐿 𝒞𝑘 , 𝐶 𝑋 𝑃(𝒞𝑘|𝑋)

𝐶(𝑥) = 𝑎𝑟𝑔 min𝑐∈𝒞

𝑘=1

𝐾

𝐿 𝒞𝑘 , 𝑐 𝑃(𝒞𝑘|𝑋 = 𝑥)

Page 34: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (0-1 loss)

𝐶(𝑥) = 𝑎𝑟𝑔 min𝑐∈𝒞

𝑘=1

𝐾

𝐿 𝒞𝑘 , 𝐺 𝑋 𝑃(𝒞𝑘|𝑋)

𝐶 𝑥 = 𝒞𝑘 if 𝑃 𝒞𝑘 𝑋 = 𝑥 = max𝑐∈𝒞

𝑃(𝑐|𝑋 = 𝑥)

𝑃 𝒞2 𝑋 = 𝑥𝑃 𝒞1 𝑋 = 𝑥

Page 35: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (0-1 loss)

𝐶 𝑥 = 𝒞2 if 𝑥 > 1𝐸𝑃𝐸 𝐶 = 0.893

Page 36: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (0-1 loss)

𝐶 𝑥 = 𝒞2 if 𝑥 > 3𝐸𝑃𝐸 𝐶 = 0.437

Page 37: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (0-1 loss)

𝐶 𝑥 = 𝒞2 if 𝑥 > 4𝐸𝑃𝐸 𝐶 = 0.892

Page 38: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (0-1 loss)

𝐶 𝑥 = 𝒞2 if 𝑥 > 2.5𝐸𝑃𝐸 𝐶 = 0.363

𝐸𝑃

𝐸(𝐶

)

Page 39: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Bayes classifier (generic loss)

min𝑅

𝐸 𝐿 = min𝑅

𝑘

𝑗

𝑅𝑗

𝐿𝑘𝑗𝑝 𝑥, 𝐶𝑘 𝑑𝑥

⇒ min𝑅

𝑘

𝐿𝑘𝑗𝑝 𝑥, 𝐶𝑘

min𝑅

𝑘

𝐿𝑘𝑗𝑝 𝐶𝑘|𝑥 𝑝(𝑥) ∝ min𝑅

𝑘

𝐿𝑘𝑗𝑝 𝐶𝑘|𝑥 𝑝(𝑥)

Page 40: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Reject option

Page 41: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Classification approaches (1)

Generative models1) Inference step:

use training data to learn a model for 𝑝 𝑋 𝐶𝑘

use data to infer priors 𝑝(𝐶𝑘)use Bayes’ formula to find posteriors 𝑝(𝐶𝑘|𝑋)

1b) Inference step:model 𝑝(𝑋, 𝐶𝑘) directly and obtain posteriors

2) Decision step:use the posterior to make optimal assignments

Page 42: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Classification approaches (2)

Discriminative models

1) Inference step:

use training data to learn a model for 𝑝(𝐶𝑘|𝑋)

2) Decision step:use the posterior to make optimal assignments

Page 43: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Classification approaches (3)

Discriminative function

Use training data to learn a discriminative function 𝑓(𝑥) directly mapping the input onto a class label

Page 44: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Oranges and Lemons

Page 45: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

A two dimensional space

Page 46: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Stars and galaxies

Minor elliptical axis (y) against Major elliptical axis (x) for stars (red) and galaxies (blue)

Page 47: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Coronoray Heart Disease

Patients with (red) and without (blue) coronary heart disease in South Africa (Rousseauw et al, 1983)

Page 48: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Parametric model

A mapping from a set of input 𝑥 to a set of output 𝑦 (labels or values), is parametric if itdepends on a set of (fixed) parameters 𝒘:

1) The number of parameters is finite

2) The number of parameters is independent of the number of data points

Page 49: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Linear classifier

Straight cut (hyperplane) dividing input space 𝑿 into two

A linear classifier assumes

𝑦 = 𝑓 𝑿 = 𝒘𝑇𝑿 + 𝑏

is a linear function of 𝑿

𝑿 =ℎ𝑒𝑖𝑔ℎ𝑡𝑤𝑖𝑑𝑡ℎ

𝒘 =𝑤1

𝑤2

Page 50: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

The weight vector

Define the positive class region 𝒘𝑇𝒙𝑖 + 𝑏 > 0

𝑤1𝑥𝑖1 + 𝑤2𝑥𝑖2 + ⋯ + 𝑏 > 0

𝑑=1

𝐷

𝑤𝑑𝑥𝑖𝑑 + 𝑏 > 0

Setting 𝑏 = 0𝒘𝑇𝑿 = 0

Page 51: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Geometric meaning

ℛ1

ℛ2

𝑦 > 0

𝑦 < 0

𝑦 = 0

Page 52: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Geometric meaning

𝐿 = 𝒙 𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏 = 0

1) For any 𝒙1 and 𝒙2 in 𝐿, 𝒘𝑇 𝒙1 − 𝒙2 = 0⇒ 𝒘∗ =

𝒘

𝒘is the normal to 𝐿

2) For any 𝒙0 in 𝐿, 𝒘𝑇𝒙0 = −𝑏

3) The signed distance of a point 𝒙 to 𝐿 is:

𝒘∗𝑇 𝒙 − 𝒙0 =1

𝒘𝒘𝑇𝒙 + 𝑏 =

𝑓(𝒙)

𝑓′(𝒙)

Page 53: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

The weight vector

% ww = Dx1 weights

% Xstar = NxD test cases

y_pred = sign(Xstar*ww); % Nx1

Page 54: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it
Page 55: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Learning the weights

Rosenblatt’s Perceptron Learning

Perceptron criterion:

𝐷 𝒘, 𝑏 = −

𝑖∈ℳ

𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 = −

𝑖∈ℳ

𝐷𝑖 𝒘, 𝑏

𝜕𝐷𝑖 𝒘, 𝑏

𝜕𝑤= −𝑦𝑖𝒙𝑖

𝜕𝐷𝑖 𝒘, 𝑏

𝜕𝑏= −𝑦𝑖

𝒘𝑏

𝜏+1=

𝒘𝑏

𝜏+ 𝜂

𝑦𝑖𝒙𝑖

𝑦𝑖

Stochastic gradientdescent:

Page 56: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Learning the weights

% ww = Dx1 weights

% xx = NxD test cases

% yy = Nx1 targets (-1,+1)

old_ww=[];

ww=zeros(D,1);

while (~isequal(ww,old_ww))

old_ww=ww;

for ct=1:N,

pred=sign(xx(ct,:)*ww);

ww=ww+(yy(ct)-pred)*xx(ct,:)’;

end;

end;

𝑦 = 𝑠𝑔𝑛 𝒘𝑇𝒙𝒘𝜏+1 = 𝒘𝜏 + 𝑦 − 𝑦 𝒙

Page 57: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Learning the weights

Page 58: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Implementing the bias

What about 𝑏?

Page 59: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Output of the perceptron

Page 60: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Linear classifier revisited

If not linearly separable must- extend model- add features

Page 61: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Nonlinear basis function

Page 62: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

From model to no model

Faith in previous knowledgeStrong assumption on- data structure

- separating boundary shape

Faith in the dataNo assumption on theunderlying structureData tell me everything I need

Page 63: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

K-nearest neighbours classifier

Fix an Hodges 1951

Page 64: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Decision boundaries

Linear classification1-nearest neighbour

classifier15-nearest neighbour

classifier

Page 65: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Brain MRI application

MICCAI MS lesion challenge 2008http://www.ia.unc.edu/MSseg/index.html

Page 66: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

LANDSAT application

Page 67: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Identification via gait analysis

Nowlan 2009Choi 2014

Characterize each personby the way he moves:

gait signature

Page 68: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Parametric vs non-parametric

• Starting assuming decision boundary is a plane

• Non-parametric KNN has no fixed assumption:boundaries gets more complicated with more data

• Non-parametric methods may need more data and can be computationally intensive

Page 69: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Batch supervised learning

Given: example inputs and targets (training set)Task: predicting target for new inputs (test set)

Examples:- classification (binary or multi-class)- regression- ordinal regression- Poisson regression- ranking…

Page 70: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Batch supervised learning

• Many ways of mapping inputs outputs

• How do we choose what to do?

• How do we know if we are doing well?

Page 71: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Algorithm’s objective cost

Formal objective for algorithms:- minimize a cost function- maximize an objective function

Proving convergence:- does objective monotonically improve?

Considering alternatives:- does another algorithm score better?

Page 72: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Loss function

We want to specify the objective of an algorithm

One idea: consider a loss function 𝐿 𝑦 𝒙∗ ; 𝑦∗

Would like to minimize loss at test time

Minimizing empirical loss might be a reasonableproxy:

𝑖

𝐿 𝑦 𝒙𝑖 ; 𝑦𝑖

Page 73: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Choosing a loss function

• Motivated by the application– 0-1 error, achieving a tolerance, business cost

• Computational convenience:– Differentiability, convexity

• Beware of loss dominated by artifacts:– Outliers

– Unbalanced classes

Page 74: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

A step into linear regression

Find a linear function

𝑦 = 𝒘𝑇𝒙

That approximates the mapping:

𝒙 → 𝑦

Page 75: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

A step into linear regression

Find a linear function

𝑦 = 𝒘𝑇𝒙

That minimizes the sum of squared residuals from 𝑦:

𝑅𝑆𝑆 𝑤 =

𝑖=1

𝑁

(𝑦𝑖 − 𝑦𝑖)2

=

𝑖=1

𝑁

𝑦𝑖 − 𝑏 −

𝑗=1

𝑝

𝑥𝑖𝑗𝑤𝑗

2

Page 76: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Vector form for RSS

𝒙𝑖 =

1𝑥𝑖1

𝑥𝑖2

⋮𝑥𝑖𝐷

𝑥𝑖 =

1ageBMI

⋮glycemia

𝑿 =

1 𝑥11 ⋯ 𝑥1𝐷

1 𝑥21

1 𝑥31

⋯⋯⋯

𝑥2𝐷

𝑥3𝐷

⋮1 𝑥𝑁1 ⋯ 𝑥𝑁𝐷

=

𝒙1

𝑇

𝒙2

𝑇

𝒙3

𝑇

𝒙𝑁

𝑇

𝒘 =

𝑤0𝑤1

𝑤2

⋮𝑤𝐷

Page 77: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least squares estimation

𝑅𝑆𝑆 𝒘 = 𝑦 − 𝑿𝒘 𝑇 𝑦 − 𝑿𝒘

𝜕𝑅𝑆𝑆

𝜕𝒘= −2𝑿𝑇 𝑦 − 𝑿𝒘

𝜕2𝑅𝑆𝑆

𝜕𝒘𝜕𝒘𝑇= 2𝑿𝑇𝑿

−2𝑿𝑇 𝑦 − 𝑿𝒘 =0 𝒘 = 𝑿𝑇𝑿 −1𝑿𝑇𝑦

Page 78: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Geometry of least squares

𝑦 = 𝒘𝑇𝒙 = 𝒙𝑻 𝑿𝑇𝑿 −1𝑿𝑇𝑦

The columns of X span a subspaceof ℝ𝐷+1

The closest point to y in thissubspace is its othogonalprojection

The orthogonal projection isgiven by the dot product

𝑦 ≈ 𝑦 = 𝑿 𝑿𝑇𝑿 −1𝑿𝑇𝑦

Page 79: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least square estimation

% ww = Dx1 weights

% X = NxD test cases

% Y = Nx1

ww = X\Y;

𝑌 = 𝑿𝒘

Page 80: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least square estimation (2)If we want to minimize the RSS

𝐿 𝒘𝑇𝒙; 𝑦 = 𝑿𝒘 − 𝑌 𝑇 𝑿𝒘 − 𝑌

we can use the iterative scheme with Newton update:

𝒘𝜏+1 = 𝒘𝜏 − 𝜂𝛻𝐿 𝒘𝑇𝒙; 𝑦

𝛻𝐿 𝒘𝑇𝒙; 𝑦 =

𝑖=1

𝑁

(𝒘𝑇𝒙𝑖 − 𝑦𝑖)𝒙𝑖 = 𝑿𝑇𝑿𝒘 − 𝑿𝑇𝒀

Page 81: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least square estimation (2)

𝒘𝜏+1 = 𝒘𝜏 − 𝜂𝑿𝑇 𝑿𝒘𝜏 − 𝒀 = 𝒘𝜏 − 𝜂𝑿𝑇 𝒀 − 𝒀

1. Initialize 𝒘0

2. Update

2. Check termination conditiona) 𝒘𝜏+1 = 𝒘𝜏

b) |𝒘𝜏+1 − 𝒘𝜏| < 𝜀c) 𝜏 > 𝑇d) 𝑚𝑎𝑥 𝛻𝐿 < 𝜀

Page 82: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

The importance of the step

𝑳(𝒘0)

𝐿( 𝒘)

Page 83: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least squares classifier

Why not using linear leastsquares to fit regressors on binary targets?

% fit yy = ww*xx

% ww = Dx1 weights

% xx = NxD test cases

% yy = Nx1

ww = xx\yy;

𝑦 < 0

𝑦 > 0

Page 84: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least squares classifier

𝑦 > 0𝑦 < 0

Page 85: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least squares classifier

Page 86: Applied Machine Learning in Biomedicineenrigri/Public/AMLB_2016/Slides/AMLB_Lecture_1… · Course details Tue-Thu 14.30-16.00 Room 318 May 24th through June 7th Contact enrico.grisan@dei.unipd.it

Least squares classifier

Why not using linear leastsquares to fit regressors on binary targets?

% fit yy = ww*xx

% ww = Dx1 weights

% xx = NxD test cases

% yy = Nx1

ww = xx\yy;

𝑦 > 0

𝑦 < 0