Introduction to Machine Learning -...
Transcript of Introduction to Machine Learning -...
![Page 1: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/1.jpg)
1/77
Introduction to Machine Learning
Eric Medvet
16/3/2017
![Page 2: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/2.jpg)
2/77
Outline
Machine Learning: what and why?Motivating example
Tree-based methodsRegression treesTrees aggregation
![Page 3: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/3.jpg)
3/77
Teachers
I Eric MedvetI Dipartimento di Ingegneria e Architettura (DIA)I http://medvet.inginf.units.it/
![Page 4: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/4.jpg)
4/77
Section 1
Machine Learning: what and why?
![Page 5: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/5.jpg)
5/77
What is Machine Learning?
DefinitionMachine Learning is the science of getting computer to learnwithout being explicitly programmed.
DefinitionData Mining is the science of discovering patterns in data.
![Page 6: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/6.jpg)
6/77
In practice
A set of mathematical and statistical tools for:
I building a model which allows to predict an output, given aninput (supervised learning)
I learn relationships and structures in data (unsupervisedlearning)
![Page 7: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/7.jpg)
7/77
Machine Learning everyday
Example problem: spam
Discriminate between spam and non-spam emails.
Figure: Spam filtering in Gmail.
![Page 8: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/8.jpg)
8/77
Machine Learning everyday
Example problem: image understanding
Recognize objects in images.
Figure: Object recognition in Google Photos.
![Page 9: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/9.jpg)
9/77
Why ML/DM “today”?
I we collect more and more data (big data)
I we have more and more computational power
Figure: From http://www.mkomo.com/cost-per-gigabyte-update.
![Page 10: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/10.jpg)
10/77
ML/DM is popular!
Figure: Popular areas of interest, from the Skill Up 2016: Developer SkillsReport2
1https://techcus.com/p/r1zSmbXut/
top-5-highest-paying-programming-languages-of-2016/.2https://techcus.com/p/r1zSmbXut/
top-5-highest-paying-programming-languages-of-2016/.
![Page 11: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/11.jpg)
11/77
What does the Machine Learning practitioner?
Be able to:
1. design
2. implement
3. assess experimentally
an end-to-end Machine Learning or Data Mining system.
I Which is the problem to be solved? Which are the input andoutput? Which are the most suitable algorithms? How shoulddata be prepared? Does computation time matter?
I Write some code!
I How to measure solution quality? How to compare solutions?Is my solution general?
![Page 12: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/12.jpg)
11/77
What does the Machine Learning practitioner?
Be able to:
1. design
2. implement
3. assess experimentally
an end-to-end Machine Learning or Data Mining system.
I Which is the problem to be solved? Which are the input andoutput? Which are the most suitable algorithms? How shoulddata be prepared? Does computation time matter?
I Write some code!
I How to measure solution quality? How to compare solutions?Is my solution general?
![Page 13: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/13.jpg)
11/77
What does the Machine Learning practitioner?
Be able to:
1. design
2. implement
3. assess experimentally
an end-to-end Machine Learning or Data Mining system.
I Which is the problem to be solved? Which are the input andoutput? Which are the most suitable algorithms? How shoulddata be prepared? Does computation time matter?
I Write some code!
I How to measure solution quality? How to compare solutions?Is my solution general?
![Page 14: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/14.jpg)
11/77
What does the Machine Learning practitioner?
Be able to:
1. design
2. implement
3. assess experimentally
an end-to-end Machine Learning or Data Mining system.
I Which is the problem to be solved? Which are the input andoutput? Which are the most suitable algorithms? How shoulddata be prepared? Does computation time matter?
I Write some code!
I How to measure solution quality? How to compare solutions?Is my solution general?
![Page 15: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/15.jpg)
12/77
Subsection 1
Motivating example
![Page 16: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/16.jpg)
13/77
The amateur botanist friend
He likes to collect Iris plants. He “realized” that there are 3species, in particular, that he likes: Iris setosa, Iris virginica, andIris versicolor. He’d like to have a tool to automatically classifycollected samples in one of the 3 species.
Figure: Iris versicolor.
How to help him?
![Page 17: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/17.jpg)
14/77
Let’s help him
I Which is the problem to be solved?
I Assign exactly one specie to a sample.
I Which are the input and output?
I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 18: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/18.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?
I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 19: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/19.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?
I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 20: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/20.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.
I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 21: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/21.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 22: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/22.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?
I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 23: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/23.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?
I DNA sequences?I some measurements of the sample!
![Page 24: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/24.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?
I some measurements of the sample!
![Page 25: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/25.jpg)
14/77
Let’s help him
I Which is the problem to be solved?I Assign exactly one specie to a sample.
I Which are the input and output?I Output: one species among I. setosa, I. virginica, I. versicolor.I Input: the plant sample. . .
I a description in natural language?I a digital photo?I DNA sequences?I some measurements of the sample!
![Page 26: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/26.jpg)
15/77
Iris: input and output
Figure: Sepal and petal.
Input: sepal length and width, petal length and width (in cm)Output: the classExample: (5.1, 3.5, 1.4, 0.2)→ I. setosa
![Page 27: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/27.jpg)
16/77
Other information
The botanist friend asked a senior botanist to inspect severalsamples and label them with the corresponding species.
Sep
alle
ngt
h
Sep
alw
idth
Pet
alle
ngt
h
Pet
alw
idth
Species
5.1 3.5 1.4 0.2 I. setosa4.9 3.0 1.4 0.2 I. setosa7.0 3.2 4.7 1.4 I. versicolor6.0 2.2 5.0 1.5 I. virginica
![Page 28: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/28.jpg)
17/77
Notation and terminology
I Sepal length, sepal width, petal length, and petal width areinput variables (or independent variables, or features, orattributes).
I Species is the output variable (or dependent variable, orresponse).
![Page 29: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/29.jpg)
18/77
Notation and terminology
X =
x1,1 x1,2 · · · x1,px2,1 x2,2 · · · x2,p
......
. . ....
xn,1 xn,2 · · · xn,p
y =
y1y2...yn
I xT1 = (x1,1, x1,2, . . . , x1,p) is an observation (or instance, or
data point), composed of p variable values;
y1 is thecorresponding output variable value
I xT2 = (x1,2, x2,2, . . . , xn,2) is the vector of all the n values forthe 2nd variable (X2).
![Page 30: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/30.jpg)
18/77
Notation and terminology
X =
x1,1 x1,2 · · · x1,px2,1 x2,2 · · · x2,p
......
. . ....
xn,1 xn,2 · · · xn,p
y =
y1y2...yn
I xT1 = (x1,1, x1,2, . . . , x1,p) is an observation (or instance, or
data point), composed of p variable values; y1 is thecorresponding output variable value
I xT2 = (x1,2, x2,2, . . . , xn,2) is the vector of all the n values forthe 2nd variable (X2).
![Page 31: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/31.jpg)
18/77
Notation and terminology
X =
x1,1 x1,2 · · · x1,px2,1 x2,2 · · · x2,p
......
. . ....
xn,1 xn,2 · · · xn,p
y =
y1y2...yn
I xT1 = (x1,1, x1,2, . . . , x1,p) is an observation (or instance, or
data point), composed of p variable values; y1 is thecorresponding output variable value
I xT2 = (x1,2, x2,2, . . . , xn,2) is the vector of all the n values forthe 2nd variable (X2).
![Page 32: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/32.jpg)
19/77
Notation and terminology
Different communities (e.g., statistical learning vs. machinelearning vs. artificial intelligence) use different terms and notation:
I x(i)j instead of xi ,j (hence x (i) instead of xi )
I m instead of n and n instead of p
I . . .
Focus on the meaning!
![Page 33: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/33.jpg)
20/77
Iris: visual interpretation
Simplification: forget petal andI. virginica → 2 variables, 2species (binary classificationproblem).
I Problem: given any newobservation, we want toautomatically assign thespecies.
I Sketch of a possiblesolution:
1. learn a model (classifier)2. “use” model on new
observations
4 5 6 72
3
4
5
Sepal lengthS
epal
wid
th
I. setosaI. versicolor
![Page 34: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/34.jpg)
20/77
Iris: visual interpretation
Simplification: forget petal andI. virginica → 2 variables, 2species (binary classificationproblem).
I Problem: given any newobservation, we want toautomatically assign thespecies.
I Sketch of a possiblesolution:
1. learn a model (classifier)2. “use” model on new
observations
4 5 6 72
3
4
5
Sepal lengthS
epal
wid
th
I. setosaI. versicolor
![Page 35: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/35.jpg)
20/77
Iris: visual interpretation
Simplification: forget petal andI. virginica → 2 variables, 2species (binary classificationproblem).
I Problem: given any newobservation, we want toautomatically assign thespecies.
I Sketch of a possiblesolution:
1. learn a model (classifier)2. “use” model on new
observations
4 5 6 72
3
4
5
Sepal lengthS
epal
wid
th
I. setosaI. versicolor
![Page 36: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/36.jpg)
20/77
Iris: visual interpretation
Simplification: forget petal andI. virginica → 2 variables, 2species (binary classificationproblem).
I Problem: given any newobservation, we want toautomatically assign thespecies.
I Sketch of a possiblesolution:
1. learn a model (classifier)
2. “use” model on newobservations
4 5 6 72
3
4
5
Sepal lengthS
epal
wid
th
I. setosaI. versicolor
![Page 37: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/37.jpg)
20/77
Iris: visual interpretation
Simplification: forget petal andI. virginica → 2 variables, 2species (binary classificationproblem).
I Problem: given any newobservation, we want toautomatically assign thespecies.
I Sketch of a possiblesolution:
1. learn a model (classifier)2. “use” model on new
observations
4 5 6 72
3
4
5
Sepal lengthS
epal
wid
th
I. setosaI. versicolor
![Page 38: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/38.jpg)
21/77
“A” model?
There could be many possible models:
I how to choose?
I how to compare?
![Page 39: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/39.jpg)
22/77
Choosing the model
The choice of the model/tool/algorithm to be used is determinedby many factors:
I Problem size (n and p)
I Availability of an output variable (y)
I Computational effort (when learning or “using”)
I Explicability of the model
I . . .
We will see many options.
![Page 40: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/40.jpg)
23/77
Comparing many models
Experimentally: does the model work well on (new) data?
Define “works well”:
I a single performance index?
I how to measure?
I repeatability/reproducibility. . .
We will see/discuss many options.
![Page 41: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/41.jpg)
23/77
Comparing many models
Experimentally: does the model work well on (new) data?Define “works well”:
I a single performance index?
I how to measure?
I repeatability/reproducibility. . .
We will see/discuss many options.
![Page 42: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/42.jpg)
24/77
It does not work well. . .
Why?
I the data is not informative
I the data is not representative
I the data has changed
I the data is too noisy
We will see/discuss these issues.
![Page 43: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/43.jpg)
25/77
ML is not magic
Problem: find birth town from height/weight.
60 70 80 90 100140
160
180
200
Weight [kg]
Hei
ght
[cm
]
TriesteUdine
Q: which is the data issue here?
![Page 44: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/44.jpg)
26/77
Implementation
When “solving” a problem, we usually need:
I explore/visualize data
I apply one or more learning algorithms
I assess learned models
“By hands?” No, with software!
![Page 45: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/45.jpg)
27/77
ML/DM software
Many options:I libraries for general purpose languages:
I Java: e.g., http://haifengl.github.io/smile/I Python: e.g., http://scikit-learn.org/stable/I . . .
I specialized sw environments:I Octave: https://en.wikipedia.org/wiki/GNU_OctaveI R: https:
//en.wikipedia.org/wiki/R_(programming_language)
I from scratch
![Page 46: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/46.jpg)
28/77
ML/DM software: which one?
I production/prototype
I platform constraints
I degree of (data) customization
I documentation availability/community size
I . . .
I previous knowledge/skills
![Page 47: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/47.jpg)
29/77
Section 2
Tree-based methods
![Page 48: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/48.jpg)
30/77
The carousel robot attendant
Problem: replace the carousel attendant with a robot whichautomatically decides who can ride the carousel.
![Page 49: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/49.jpg)
31/77
Carousel: data
Observed human attendant’s decisions.
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride
How can the robot takethe decision?
I if younger than 10 →can’t!
I otherwise:
I if shorter than 120→ can’t!
I otherwise → can!
Decision tree!
a < 10
T
h < 120
T F
F
![Page 50: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/50.jpg)
31/77
Carousel: data
Observed human attendant’s decisions.
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride
How can the robot takethe decision?
I if younger than 10 →can’t!
I otherwise:
I if shorter than 120→ can’t!
I otherwise → can!
Decision tree!
a < 10
T
h < 120
T F
F
![Page 51: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/51.jpg)
31/77
Carousel: data
Observed human attendant’s decisions.
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride
How can the robot takethe decision?
I if younger than 10 →can’t!
I otherwise:
I if shorter than 120→ can’t!
I otherwise → can!
Decision tree!
a < 10
T
h < 120
T F
F
![Page 52: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/52.jpg)
31/77
Carousel: data
Observed human attendant’s decisions.
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride
How can the robot takethe decision?
I if younger than 10 →can’t!
I otherwise:I if shorter than 120→ can’t!
I otherwise → can!
Decision tree!
a < 10
T
h < 120
T F
F
![Page 53: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/53.jpg)
31/77
Carousel: data
Observed human attendant’s decisions.
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride
How can the robot takethe decision?
I if younger than 10 →can’t!
I otherwise:I if shorter than 120→ can’t!
I otherwise → can!
Decision tree!
a < 10
T
h < 120
T F
F
![Page 54: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/54.jpg)
32/77
How to build a decision tree
Dividi-et-impera (recursively):
I find a cut variable and a cut value
I for left-branch, dividi-et-impera
I for right-branch, dividi-et-impera
![Page 55: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/55.jpg)
33/77
How to build a decision tree: detail
Recursive binary splitting
function BuildDecisionTree(X, y)if ShouldStop(y) then
y ← most common class in yreturn new terminal node with y
else(i , t)← BestBranch(X, y)n← new branch node with (i , t)append child BuildDecisionTree(X|xi<t , y|xi<t) to nappend child BuildDecisionTree(X|xi≥t , y|xi≥t) to nreturn n
end ifend function
I Recursive binary splitting
I Top down (start from the “big” problem)
![Page 56: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/56.jpg)
34/77
Best branch
function BestBranch(X, y)(i?, t?)← arg mini ,t E (y|xi≥t) + E (y|xi<t)return (i?, t?)
end function
Classification error on subset:
E (y) =|{y ∈ y : y 6= y}|
|y|y = the most common class in y
I Greedy (choose split to minimize error now, not in later steps)
![Page 57: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/57.jpg)
35/77
Best branch
(i?, t?)← arg mini ,t
E (y|xi≥t) + E (y|xi<t)
The formula say what is done, not how is done!
Q: different “how” can differ? how?
![Page 58: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/58.jpg)
36/77
Stopping criterion
function ShouldStop(y)if y contains only one class then
return trueelse if |y| < kmin then
return trueelse
return falseend if
end function
Other possible criterion:
I tree depth larger than dmax
![Page 59: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/59.jpg)
37/77
Categorical independent variables
I Trees can work with categorical variables
I Branch node is xi = c or xi ∈ C ′ ⊂ C (c is a class)
I Can mix categorical and numeric variables
![Page 60: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/60.jpg)
38/77
Stopping criterion: role of kmin
Suppose kmin = 1 (never stop for y size)
5 10 15
100
150
200
Age a [year]
Hei
ghth
[cm
]
Cannot rideCan ride h < 120
a < 9.0
a < 9.6
a < 9.1
a < 9.4
a < 10
Q: what’s wrong?
![Page 61: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/61.jpg)
39/77
Tree complexity
When the tree is “too complex”
I less readable/understandable/explicable
I maybe there was noise into the data
Q: what’s noise in carousel data?
Tree complexity issue is not related (only) with kmin
![Page 62: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/62.jpg)
40/77
Tree complexity: other interpretation
I maybe there was noise into the data
The tree fits the learning data too much:
I it overfits (overfitting)
I does not generalize (high variance: model varies if learningdata varies)
![Page 63: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/63.jpg)
41/77
High variance
“model varies if learning data varies”: what? why data varies?I learning data is about the system/phenomenon/nature S
I a collection of observations of SI a point of view on S
I learning is about understanding/knowing/explaining S
I if I change the point of view on S , my knowledge about Sshould remain the same!
![Page 64: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/64.jpg)
41/77
High variance
“model varies if learning data varies”: what? why data varies?I learning data is about the system/phenomenon/nature S
I a collection of observations of SI a point of view on S
I learning is about understanding/knowing/explaining S
I if I change the point of view on S , my knowledge about Sshould remain the same!
![Page 65: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/65.jpg)
41/77
High variance
“model varies if learning data varies”: what? why data varies?I learning data is about the system/phenomenon/nature S
I a collection of observations of SI a point of view on S
I learning is about understanding/knowing/explaining SI if I change the point of view on S , my knowledge about S
should remain the same!
![Page 66: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/66.jpg)
42/77
Fighting overfitting
I large kmin (large w.r.t what?)
I when building, limit depth
I when building, don’t split if low overall impurity decrease
I after building, prune
(bias, variance will be detailed later)
![Page 67: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/67.jpg)
43/77
Evaluation: k-fold cross-validation
How to estimate the predictor performance on new (unavailable)data?
1. split learning data (X and y) in k equal slices (each of nk
observations/elements)
2. for each split (i.e., each i ∈ {1, . . . , k} )
2.1 learn on all but k-th slice2.2 compute classification error on unseen k-th slice
3. average the k classification errors
In essence:
I can the learner generalize on available data?
I how the learned artifact will behave on unseen data?
![Page 68: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/68.jpg)
44/77
Evaluation: k-fold cross-validation
folding 1 accuracy1
folding 2 accuracy2
folding 3 accuracy3
folding 4 accuracy4
folding 5 accuracy5
accuracy =1
k
i=k∑i=1
accuracyi
Or with classification error rate or any other meaningful(effectiveness) measure
Q: how should data be split?
![Page 69: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/69.jpg)
45/77
Subsection 1
Regression trees
![Page 70: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/70.jpg)
46/77
Regression with trees
Trees can be used for regression, instead of classification.
decision tree vs. regression tree
![Page 71: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/71.jpg)
47/77
Tree building: decision → regression
function BuildDecisionTree(X, y)if ShouldStop(y) then
y ← most common class in yreturn new terminal node with y
else(i , t)← BestBranch(X, y)n← new branch node with (i , t)append child BuildDecisionTree(X|xi<t , y|xi<t) to nappend child BuildDecisionTree(X|xi≥t , y|xi≥t) to nreturn n
end ifend function
Q: what should we change?
![Page 72: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/72.jpg)
47/77
Tree building: decision → regression
function BuildDecisionTree(X, y)if ShouldStop(y) then
y ← y . mean yreturn new terminal node with y
else(i , t)← BestBranch(X, y)n← new branch node with (i , t)append child BuildDecisionTree(X|xi<t , y|xi<t) to nappend child BuildDecisionTree(X|xi≥t , y|xi≥t) to nreturn n
end ifend function
Q: what should we change?
![Page 73: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/73.jpg)
48/77
Interpretation
0 5 10 15 20 25 30
0
2
4
![Page 74: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/74.jpg)
49/77
Regression and overfitting
Image from F. Daolio
![Page 75: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/75.jpg)
50/77
Trees in summary
Pros:
N easily interpretable/explicable
N learning and regression/classification easily understandable
N can handle both numeric and categorical values
Cons:
H not so accurate (Q: always?)
![Page 76: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/76.jpg)
51/77
Tree accuracy?
Image from An Introduction to Statistical Learning
![Page 77: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/77.jpg)
52/77
Subsection 2
Trees aggregation
![Page 78: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/78.jpg)
53/77
Weakness of the tree
0 20 40 60 80 100
15
20
25
30
Small tree:
I low complexity
I will hardly fit the “curve”part
I high bias, low variance
Big tree:
I high complexity
I may overfit the noise on theright part
I low bias, high variance
![Page 79: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/79.jpg)
54/77
The trees view
Small tree:
I “a car is something thatmoves”
Big tree:
I “a car is a made-in-Germanyblue object with 4 wheels, 2doors, chromed fenders,curved rear enclosingengine”
![Page 80: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/80.jpg)
55/77
Big tree view
A big tree:
I has a detailed view of the learning data (high complexity)
I “trusts too much” the learning data (high variance)
What if we “combine” different big tree views and ignore detailson which they disagree?
![Page 81: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/81.jpg)
56/77
Wisdom of the crowds
What if we “combine” different big tree views and ignore detailson which they disagree?
I many views
I independent views
I aggregation of views
≈ the wisdom of the crowds: a collective opinion may be betterthan a single expert’s opinion
![Page 82: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/82.jpg)
57/77
Wisdom of the trees
I many views
I just use many trees
I independent views
I ??? learning is deterministic: same data ⇒ same tree ⇒ sameview
I aggregation of views
I just average prediction (regression) or take most commonprediction (classification)
![Page 83: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/83.jpg)
57/77
Wisdom of the trees
I many viewsI just use many trees
I independent views
I ??? learning is deterministic: same data ⇒ same tree ⇒ sameview
I aggregation of views
I just average prediction (regression) or take most commonprediction (classification)
![Page 84: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/84.jpg)
57/77
Wisdom of the trees
I many viewsI just use many trees
I independent views
I ??? learning is deterministic: same data ⇒ same tree ⇒ sameview
I aggregation of viewsI just average prediction (regression) or take most common
prediction (classification)
![Page 85: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/85.jpg)
57/77
Wisdom of the trees
I many viewsI just use many trees
I independent viewsI ??? learning is deterministic: same data ⇒ same tree ⇒ same
view
I aggregation of viewsI just average prediction (regression) or take most common
prediction (classification)
![Page 86: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/86.jpg)
58/77
Independent views
Independent views ≡ different points of view ≡ different learningdata
But we have only one learning data!
![Page 87: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/87.jpg)
59/77
Independent views: idea!
Like in cross-fold, consider only a part of the data, but:
I instead of a subset
I a sample with repetitions
X = (xT1 xT2 xT3 xT4 xT5 ) original learning data
X1 = (xT1 xT5 xT3 xT2 xT5 ) sample 1
X2 = (xT4 xT2 xT3 xT1 xT1 ) sample 2
Xi = . . . sample i
I (y omitted for brevity)I learning data size is not a limitation (differently than with
subset)
Bagging of trees (bootstrap, more in general)
![Page 88: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/88.jpg)
59/77
Independent views: idea!
Like in cross-fold, consider only a part of the data, but:
I instead of a subset
I a sample with repetitions
X = (xT1 xT2 xT3 xT4 xT5 ) original learning data
X1 = (xT1 xT5 xT3 xT2 xT5 ) sample 1
X2 = (xT4 xT2 xT3 xT1 xT1 ) sample 2
Xi = . . . sample i
I (y omitted for brevity)I learning data size is not a limitation (differently than with
subset)
Bagging of trees (bootstrap, more in general)
![Page 89: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/89.jpg)
59/77
Independent views: idea!
Like in cross-fold, consider only a part of the data, but:
I instead of a subset
I a sample with repetitions
X = (xT1 xT2 xT3 xT4 xT5 ) original learning data
X1 = (xT1 xT5 xT3 xT2 xT5 ) sample 1
X2 = (xT4 xT2 xT3 xT1 xT1 ) sample 2
Xi = . . . sample i
I (y omitted for brevity)I learning data size is not a limitation (differently than with
subset)
Bagging of trees (bootstrap, more in general)
![Page 90: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/90.jpg)
60/77
Tree bagging
When learning:
1. Repeat B times
1.1 take a sample of the learning data1.2 learn a tree (unpruned)
When predicting:
1. Repeat B times
1.1 get a prediction from ith learned tree
2. predict the average (or most common) prediction
For classification, other aggregations can be done: majority voting(most common) is the simplest
![Page 91: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/91.jpg)
61/77
How many trees?
B is a parameter:
I when there is a parameter, there is the problem of finding agood value
I remember kmin, depth (Q: impact on?)
I it has been shown (experimentally) thatI for “large” B, bagging is better than single treeI increasing B does not cause overfittingI (for us: default B is ok! “large” ≈ hundreds)
Q: how better? at which cost?
![Page 92: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/92.jpg)
61/77
How many trees?
B is a parameter:
I when there is a parameter, there is the problem of finding agood value
I remember kmin, depth (Q: impact on?)I it has been shown (experimentally) that
I for “large” B, bagging is better than single treeI increasing B does not cause overfittingI (for us: default B is ok! “large” ≈ hundreds)
Q: how better? at which cost?
![Page 93: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/93.jpg)
62/77
Bagging
0 100 200 300 400 500
5
6
7
8
·10−2
Number B of trees
Tes
ter
ror
![Page 94: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/94.jpg)
63/77
Independent view: improvement
Despite being learned on different samples, bagging trees may becorrelated, hence views are not very independent
I e.g., one variable is much more important than others forpredicting (strong predictor)
Idea: force point of view differentiation by “hiding” variables
![Page 95: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/95.jpg)
64/77
Random forest
When learning:
1. Repeat B times
1.1 take a sample of the learning data1.2 consider only m on p independent variables1.3 learn a tree (unpruned)
When predicting:
1. Repeat B times
1.1 get a prediction from ith learned tree
2. predict the average (or most common) prediction
I (observations and) variables are randomly chosen. . .
I . . . to learn a forest of trees
Q: are missing variables a problem?
![Page 96: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/96.jpg)
65/77
Random forest: parameter m
How to choose the value for m?
I m = p → baggingI it has been shown (experimentally) that
I m does not relate with overfittingI m =
√p is good for classification
I m = p3 is good for regression
I (for us, default m is ok!)
![Page 97: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/97.jpg)
66/77
Random forest
Experimentally shown: one of the “best” multi-purpose supervisedclassification methods
I Manuel Fernandez-Delgado et al. “Do we need hundreds ofclassifiers to solve real world classification problems”. In: J.Mach. Learn. Res 15.1 (2014), pp. 3133–3181
but. . .
![Page 98: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/98.jpg)
67/77
No free lunch!
“Any two optimization algorithms are equivalent when theirperformance is averaged across all possible problems”
I David H Wolpert. “The lack of a priori distinctions betweenlearning algorithms”. In: Neural computation 8.7 (1996),pp. 1341–1390
Why free lunch?
I many restaurants, many items on menus, many possibly pricesfor each item: where to go to eat?
I no general answer
I but, if you are a vegan, or like pizza, then a best choice couldexist
Q: problem? algorithm?
![Page 99: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/99.jpg)
68/77
Nature of the prediction
Consider classification:I tree → the class
I “virginica” is just “virginica”
I forest → the class, as resulting from a voting
I “241 virginica, 170 versicolor, 89 setosa” is different than “478virginica, 10 versicolor, 2 setosa”
Is this information useful/exploitable?
![Page 100: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/100.jpg)
68/77
Nature of the prediction
Consider classification:I tree → the class
I “virginica” is just “virginica”
I forest → the class, as resulting from a votingI “241 virginica, 170 versicolor, 89 setosa” is different than “478
virginica, 10 versicolor, 2 setosa”
Is this information useful/exploitable?
![Page 101: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/101.jpg)
69/77
Confidence/tunability
Voting outcome:
I in classification, a measure of confidence of the decision
I in binary classification, voting threshold can be tuned toadjust bias towards one class (sensitivity)
Q: in regression?
![Page 102: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/102.jpg)
70/77
Binary classification
Consider the problem of classifying a person (’s data) as sufferingor not suffering from a disease X.
I positive: an observation of “suffering” class
I negative: an observation of “not suffering” class
In other problems, positive may mean a different thing: define it!
![Page 103: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/103.jpg)
71/77
FPR, FNR
Given some labeled data and a classifier for the disease X problem,we can measure:
I the number of negative observations wrongly classified aspositives: False Positives (FP)
I the number of positive observations wrongly classified asnegatives: False Negatives (FN)
To decouple FP, FN from data size:
FPR =FP
N=
FP
FP + TN
FNR =FN
P=
FN
FN + TP
![Page 104: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/104.jpg)
72/77
Accuracy and error rate
Accuracy = 1− Error Rate
Error Rate =FN + FP
P + N
Q: Error Rate?= FPR+FNR
2
![Page 105: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/105.jpg)
73/77
FPR, FNR and sensitivity
I Suppose FPR = 0.06, FNR = 0.04 with threshold set to 0.5(default for RF)
I One could be interested in “limiting” the FNR. . .
Experimentally:
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
Threshold t
Err
orra
te
FPRFNR
![Page 106: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/106.jpg)
74/77
Receiver operating characteristic (ROC)
FPR, FNR vs. t
0 0.5 1
0
0.2
0.4
EER
Threshold t
Err
orra
te
FPRFNR
TPR vs. FPR
0 0.2 0.4
0.6
0.8
1
EER
FPRT
PR
I Equal error rate (EER)
![Page 107: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/107.jpg)
75/77
. . . is better than
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
t
FP
R,
FN
R
I which is the best?
I robustness w.r.t. t?
![Page 108: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/108.jpg)
76/77
ROC and comparison
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
FPR
TP
RClassifier C1Classifier C2
Random classifier
C1 is better than C2: how much?
I EER
I Area under the curve (AUC)
![Page 109: Introduction to Machine Learning - Wirelesswireless.ictp.it/.../week2/2017_ICTP_IAEA_WS_MachineLearning_slides.pdf · What does the Machine Learning practitioner? Be able to: 1.design](https://reader036.fdocuments.net/reader036/viewer/2022062602/5edad9f509ac2c67fa6868d8/html5/thumbnails/109.jpg)
77/77
Bagging/RF/boosting in summary
Tree Bagging RF Boosting
interpretability Nnumeric/categorical N N N Naccuracy H N Ntest error estimate N Nvariable importance N N Nconfidence/tunability N Nfast to learn N∗ H(almost) non-parametric N N∗: Q: how faster? when? does it matter?