CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

34
CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

Transcript of CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Page 1: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

CS 59000 Statistical Machine learningLecture 18

Yuan (Alan) QiPurdue CS

Oct. 30 2008

Page 2: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Outline

• Review of Support Vector Machines for Linearly Separable Case

• Support Vector Machines for Overlapping Class Distributions

• Support Vector Machines for Regression

Page 3: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Support Vector Machines

Support Vector Machines: motivated by statistical learning theory.

Maximum margin classifiers

Margin: the smallest distance between the decision boundary and any of the samples

Page 4: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Maximizing Margin

Since scaling w and b together will not change the above ratio, we set

In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

Page 5: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Optimization Problem

Quadratic programming:

Subject to

Page 6: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Lagrange Multiplier

Maximize

Subject to

Gradient of constraint:

Page 7: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Geometrical Illustration of Lagrange Multiplier

Page 8: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Lagrange Multiplier with Inequality Constraints

Page 9: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Karush-Kuhn-Tucker (KKT) condition

Page 10: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Lagrange Function for SVM

Quadratic programming:Subject to

Lagrange function:

Page 11: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Dual Variables

Setting derivatives over L to zero:

Page 12: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Dual Problem

Page 13: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Prediction

Page 14: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

KKT Condition, Support Vectors, and Bias

The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:

Page 15: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Computational Complexity

Quadratic programming:

When Dimension < Number of data points, Solving the Dual problem is more costly.

Dual representation allows the use of kernels

Page 16: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Example: SVM Classification

Page 17: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Classification for Overlapping Classes

Soft Margin:

Page 18: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

New Cost Function

To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize

Page 19: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Lagrange Function

Where we have Lagrange multipliers:

Page 20: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

KKT Condition

Page 21: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Gradients

Page 22: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Dual Lagrangian

Since and , we have

Page 23: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Dual Lagrangian with Constraints

Maximize

Subject to

Page 24: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Support Vectors

Discussions on two cases of support vectors.

Page 25: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Solve Bias Term

Discussion on solving SVMs...

Page 26: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Interpretation from Regularization Framework

Page 27: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Regularized Logistic Regression

For logistic regression, we have

Page 28: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Visualization of Hinge Error Function

Page 29: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

SVM for Regression

Using sum of square errors, we have

However, the solution for ridge regression is not sparse.

Page 30: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Є-insensitive Error Function

Minimize

Page 31: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Slack Variables

How many slack variables do we need?

Minimize

Page 32: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Visualization of SVM Regression

Page 33: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Support Vectors for Regression

Which points will be support vectors for regression?

Why?

Page 34: CS 59000 Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008.

Sparsity Revisited

Discussion: Error function or regularizer (Lasso)