Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke...
-
Upload
antonia-hamilton -
Category
Documents
-
view
217 -
download
0
Transcript of Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke...
![Page 1: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/1.jpg)
Machine LearningSaarland University, SS 2007
Holger BastMarjan CelikikKevin ChangStefan Funke
Joachim Giesen
Max-Planck-Institut für InformatikSaarbrücken, Germany
Lecture 1, Friday April 19th, 2007(basics and example applications)
![Page 2: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/2.jpg)
Overview of this Lecture
Machine Learning Basics
– Classification
– Objects as feature vectors
– Regression
– Clustering
Example applications
– Surface reconstruction
– Preference Learning
– Netflix challenge (how to earn $1,000,000)
– Text search
![Page 3: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/3.jpg)
Classification
Given a set of points, each labeled + or –
– learn something from them …
– … in order to predict label of new points
+ +
+
+
+
+
++ – –
––
–
––
–?–
this is an instance of supervised learning
![Page 4: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/4.jpg)
Classification — Quality
Which classifier is better?
– answer requires a model of where the data comes from
– and a measure of quality/accuracy
+ +
+
+
+
+
++ – –
––
–
––
–?
![Page 5: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/5.jpg)
Classification — Outliers and Overfitting
We have to find a balance between two extremes
– oversimplification ( large classification error)
– overfitting ( lack of regularity)
– again: requires a model of the data
+ +
+
+
+
+
++ – –
––
–
––
–
+
––
–
![Page 6: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/6.jpg)
Classification — Point Transformation
If a classifier does not work for the original data
– try it on a transformation of the data
– typically: make points linearly separable by a suitable mapping to a higher-dimensional space
+ ++ +++– – – ++0
++
++
+
– –
–
++
+
map x to (x , |x|)
![Page 7: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/7.jpg)
Classification — more labels
+ +
+
+
+
+
++
– –
––
–
––
–
o o
o
o
o
oo
Typically:
– first, basic technique for binary classification
– then, extension to more labels
![Page 8: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/8.jpg)
Objects as Feature Vectors
But why learn something about points ?
General Idea:
– represent objects as points in a space of fixed dimension
– each dimension corresponds to a so-called feature of the object
Very crucial:
– selection of features
– normalization of vectors
![Page 9: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/9.jpg)
Objects as Feature Vectors
Example: Objects with attributes
– features = values
– normalize by reference value for each feature
Person 1 Person 2 Person 3
188 cm 181 cm 190 cm75 kg 90 kg 77 kg
age 36 age 32 age 34
1887536
1819033
Person 4
176 cm55 kg
age 24
heightweightage
1907734
1725534
1.040.940.90
1.011.130.83
height/180weight/70age/30
1.060.960.85
0.960.690.60
![Page 10: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/10.jpg)
Objects as Feature Vectors
2 8 28 5 82 7 2
282858272
Example: Images
– features = pixels(with grey values)
– often fine without further normalization
1 6 16 6 61 6 1
Image 1 Image 2
pixel (1,1)
pixel (1,2)
pixel (1,3)
pixel (2,1)
pixel (2,2)
pixel (2,3)
pixel (3,1)
pixel (3,2)
pixel (3,3)
161666161
![Page 11: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/11.jpg)
Objects as Feature Vectors
Example: Text documents– features = words
– normalize to unit norm
1110001
LearningMachineSSStatisticalTheory20062007
Doc. 1
Machine LearningSS 2007
Doc. 1
Machine LearningSS 2007
Doc. 2
Statistical
LearningTheory
SS 2007
Doc. 2
Statistical
LearningTheory
SS 2007
Doc. 3
Statistical
LearningTheory
SS 2006
Doc. 3
Statistical
LearningTheory
SS 2006
1011101
1011110
![Page 12: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/12.jpg)
Objects as Feature Vectors
Example: Text documents– features = words
– normalize to unit norm
0.50.50.50000.5
LearningMachineSSStatisticalTheory20062007
Doc. 1
Machine LearningSS 2007
Doc. 1
Machine LearningSS 2007
Doc. 2
Statistical
LearningTheory
SS 2007
Doc. 2
Statistical
LearningTheory
SS 2007
Doc. 3
Statistical
LearningTheory
SS 2006
Doc. 3
Statistical
LearningTheory
SS 2006
0.400.40.40.400.4
0.400.40.40.40.40
![Page 13: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/13.jpg)
Regression
Learn a function that maps objects to values
Similar trade-off as for classification:
– risk of oversimplification vs. risk of overfitting
xx
x
xx
?
x
x
x
given value(typically multi-dimensional)
value to learn(typically a real number)
![Page 14: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/14.jpg)
Regression
Learn a function that maps objects to values
Similar trade-off as for classification:
– risk of oversimplification vs. risk of overfitting
xx
x
xx
?
x
x
x
given value(typically multi-dimensional)
value to learn(typically a real number)
![Page 15: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/15.jpg)
Clustering
Partition given set of points into clusters
Similar problems as for classification
– follow data distribution, but not too closely
– transformation often helps (next slide)
xx
x xx
xx
x
x
x
xx
this is an instance of unsupervised learning
![Page 16: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/16.jpg)
Clustering
Partition given set of points into clusters
Similar problems as for classification
– follow data distribution, but not too closely
– transformation often helps (next slide)
xx
x xx
xx
x
x
x
xx
![Page 17: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/17.jpg)
Clustering
Partition given set of points into clusters
Similar problems as for classification
– follow data distribution, but not too closely
– transformation often helps (next slide)
xx
x xx
xx
x
x
x
xx
![Page 18: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/18.jpg)
Clustering
Partition given set of points into clusters
Similar problems as for classification
– follow data distribution, but not too closely
– transformation often helps (next slide)
xx
x xx
xx
x
x
x
xx
![Page 19: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/19.jpg)
Clustering — Transformation
For clustering, typically dimension reduction helps
– whereas in classification, embedding in a higher-dimensional space typically helps
1 0 1 0 01 1 0 0 01 1 1 1 00 0 0 1 1
internetwebsurfingbeach
vectors fordocuments 2, 3, and 4
equally dissimilar
0.9 0.8 0.8 0.0 0.0-0.1 0.0 0.0 1.1 0.9
project to 2 dimensions
2-clustering wouldwork fine now
doc1 doc2 doc3 doc4 doc5
![Page 20: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/20.jpg)
![Page 21: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,](https://reader030.fdocuments.net/reader030/viewer/2022033107/5697c0151a28abf838ccde50/html5/thumbnails/21.jpg)
Application Example: Text Search
676 abstracts from the Max-Planck-Institute
– for example:
We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.
– 3283 words (words like and, or, this, … removed)
– abstracts come from 5 working groups: Algorithms, Logic, Graphics, CompBio, Databases
– reduce to 10 concepts
No dictionary, no training, only the plain text itself !