REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 ·...

14
REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods Prof. Ben Marlin Prof. David Jensen [email protected] [email protected]

Transcript of REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 ·...

Page 1: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

REUMass Amherst 2015 Data Science Bootcamp

Day 2: Machine Learning and

Research Methods  

Prof. Ben Marlin Prof. David Jensen [email protected] [email protected]

 

Page 2: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Plan  for  Day  2:    •  Introduc2on  to  Machine  Learning  •  Introduc2on  to  Research  Methods  

Page 3: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Introduc2on  to    Machine  Learning  

Page 4: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Mitchell  (1997):  “A  computer  program  is  said  to  learn  from  experience  E  with  respect  to  some  class  of  tasks  T  and  performance  measure  P,  if  its  performance  at  tasks  in  T,  as  measured  by  P,  improves  with  experience  E.”      SubsBtute  “data  D”  for  “experience  E.”      

What  is  Machine  Learning?  

Page 5: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

ClassificaBon   Regression  

Clustering   Dimensionality    ReducBon  

Supe

rvised

 Unsup

ervised  

Learning  to  predict.  

Learning  to  organize  and  represent.  

Machine  Learning  Tasks  

Page 6: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Classifica2on  

Page 7: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Example:  Digits  

Page 8: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Example:  Natural  Images  

Page 9: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

sepal  length  

sepal  width  

petal  length    

petal    width    

Type  

5.1   3.5   1.4   0.2   Setosa  

4.9   3.0   1.4   0.2   Setosa  

4.7   3.2   1.3   0.2   Setosa  

4.6   3.1   1.5   0.2   Setosa  

!" !" !" !" !"

Example:  Fisher’s  Iris  Data    

Page 10: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Classifier  Learning  

sepal  length  

sepal  width  

petal  length    

petal    width    

Type  

5.1   3.5   1.4   0.2   Setosa  

4.9   3.0   1.4   0.2   Setosa  

4.7   3.2   1.3   0.2   Setosa  

4.6   3.1   1.5   0.2   Setosa  

!" !" !" !" !"

x1   y1  x2   y2  

Page 11: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Classifier  Learning  

5.1                          3.5                          1.4                          0.2   Setosa  f( )=

Page 12: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Classifier  Performance  

Page 13: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

K  Nearest  Neighbors  

•  Need  to  pick  number  of  neighbors  K.  

•  Need  to  define  a  distance  funcBon.  

Feature  1  

Feature  2  

Page 14: REUMass Amherst 2015 Data Science Bootcampmarlin/teaching/REU/bootcamp-day2.pdf · 2015-06-02 · REUMass Amherst 2015 Data Science Bootcamp Day 2: Machine Learning and Research Methods

 

Exercises:  KNN  and  Fisher’s  Iris  Data      

•  Q1:  What  pair  of  features  gives  the  lowest  training  error  when  using  K=5?  •  Q2:  What  is  the  training  error  rate  for  K=5  when  using  all  of  the  features?  •  Q3:  What  value  of  K  gives  the  lowest  error  on  the  training  data  when  using  all  of  the  features?