Svm Tricia

download Svm Tricia

of 45

  • date post

    13-Apr-2015
  • Category

    Documents

  • view

    20
  • download

    1

Embed Size (px)

Transcript of Svm Tricia

Machine Learning and Data Mining Support Vector Machines Patricia Hoffman, PhD Slides are from Tan, Steinbach, and Kumar Ethem Alpaydin Support Vector Machines Main Sources Patricia Hoffman Mathematics behind SVM http://patriciahoffmanphd.com/resources/papers/SVM/SupportVectorJune3.pdf Andrew Ng Lecture Notes http://cs229.stanford.edu/notes/cs229-notes3.pdf John Platt Fast Training of SVM http://research.microsoft.com/apps/pubs/default.aspx?id=68391 Kernel Methods http://www.kernel-machines.org/ C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html Example Code & R Package e1071 http://cran.r-project.org/web/packages/e1071/e1071.pdf svm1.r - Example of SVM classification plots, tuning plots, confusion tables Classification using SVM on Iris Data Set, Glass Data Set Regression using SVM on Toy data set SVM Main Advantage There are NO assumptions Understanding Distribution NOT necessary NO distribution parameter estimation Vapnik Chervonenkis (VC Dimension) Linearly Separable The line Shatters the three points Any three points in a plane can be separated by a line VC Dimension Example observation target -1 + 0 - +1 + observation squared target -1 1 + 0 0 - +1 1 + Cant Shatter with line Higher Dimension Shattered with a line + - + + + - VC (Vapnik-Chervonenkis) Dimension N points can be labeled in 2N ways as +/ H SHATTERS N if there exists a set of N points such that h H is consistent with all of these possible labels: Denoted as: VC(H ) = N Measures the capacity of H Any learning problem definable by N examples can be learned with no error by a hypothesis drawn from H Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 7 Formal Definition Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 8 VC dim of linear classifiers in m-dimensions If input space is m-dimensional and if f is sign(w.x-b), what is the VC-dimension? h=m+1 Lines in 2D can shatter 3 points Planes in 3D space can shatter 4 points ... Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 9 Support Vector Machines c Tan, Steinbach, Kumar Find a linear hyperplane (decision boundary) that will separate the data Support Vector Machines c Tan, Steinbach, Kumar One Possible Solution B1Support Vector Machines c Tan, Steinbach, Kumar Another possible solution B2Support Vector Machines c Tan, Steinbach, Kumar Other possible solutions B2Support Vector Machines Which one is better? B1 or B2? How do you define better? B1B2Support Vector Machines c Tan, Steinbach, Kumar Find hyperplane maximizes the margin => B1 is better than B2 B1B2b11b12b21b22marginSupport Vector Machines 0 = + b x w c Tan, Steinbach, Kumar B1b11b121 = + b x w 1 + = + b x w + + =1 b x w if 11 b x w if 1) ( x f2|| ||2 Marginw=Support Vector Machines We want to maximize: Which is equivalent to minimizing: But subjected to the following constraints: This is a constrained optimization problem Numerical approaches to solve it (e.g., quadratic programming) c Tan, Steinbach, Kumar 2|| ||2 Marginw= + + =1 b x w if 11 b x w if 1) (ii ix f2|| ||) (2ww L = What if the problem is not linearly separable? Support Vector Machines c Tan, Steinbach, Kumar What if the problem is not linearly separable? Introduce slack variables Need to minimize: Subject to: Support Vector Machines c Tan, Steinbach, Kumar + + + =i ii i1 b x w if 1- 1 b x w if 1) ( ix f|.|

\|+ = =NikiCww L122|| ||) ( Nonlinear Support Vector Machines c Tan, Steinbach, Kumar What if decision boundary is not linear? Nonlinear Support Vector Machines c Tan, Steinbach, Kumar Transform data into higher dimensional space Kernel Machines ( )( ) ( ) ( ) ( )( ) ( ) = = = = =tt t t tTt t t Ttt t ttt t t, K r gr gr rx x xx x x w xx z wLecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 22 Preprocess input x by basis functions z = (x) g(z)=wTz g(x)=wT (x) The SVM solution Kernel Functions ( ) ( )qt T t, K 1 + = x x x xLecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 23 Polynomials of degree q: Radial-basis functions: Sigmoidal functions: ( ) ( )( )( ) | |TTx , x , x x , x , x ,y x y x y y x x y x y x y x y x, K2221 2 1 2 122222121 2 1 2 1 2 2 1 122 2 1 122 2 2 12 2 2 111= + + + + + = + + = + =xy x y x( ) (((

=22exp x xx x tt, K(Cherkassky and Mulier, 1998) ( ) ( ) 1 2 tanh + = t T t, K x x x xSigmoid (Logistic) Function ( ) ( )( ) 5 0 if choose and sigmoid Calculate 2.or , 0 if choose and Calculate 1.1 01 0. y C w y g C w gTT> + = > + =x w x x w xLecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 24 Linear Kernel http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSVM.html Polynomial Kernel http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSVM.html Radial Basis Function http://www.eee.metu.edu.tr/~alatan/Courses/Demo/AppletSVM.html Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 28 Insight into Kernels http://patriciahoffmanphd.com/resources/papers/SVM/SupportVectorJ une3.pdf Use Cross Validation to determine gamma, r, d, and C (the cost of constraint violation for soft margin) Complexity Alternative to Cross Validation Complexity is a measure of a set of classifiers, not any specific (fixed) classifier Many possible measures degrees of freedom description length Vapnik-Chervonenkis (VC) dimension etc. Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 30 Expected and Empirical error Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 31 Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 32 Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 33 Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 34 Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 36 Example svm1.r svm1.r - Example of SVM classification plots, tuning plots, confusion tables Classification using SVM on Iris Data Set, Glass Data Set Regression using SVM on Toy data set svm1.r Iris Tuning Plot (heat map) gamma cost error dispersion 0.5 7.0 0.04666667 0.04499657 1.0 8.0 0.04666667 0.03220306 Tune SVM via Cross Validation obj