Fuzzy Support Vector Machines (FSVM s )

Fuzzy Support Vector Machines (FSVMs)

Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta

Outline

• Review of SVMs

• Formalization of FSVMs

•Training algorithm for FSVMs• Noisy distribution model

• Determination of heuristic function

• Experiment results

SVM – brief review

• Classification technique

• Method:

• Maps points into high-dimensional feature space

• Finds a separating hyperplane that maximizes the margin

Set S of labeled training points:

Each point belongs to one of the two classes,

Let be feature space vector, with mapping from to feature space

Then equation of hyperplane:

For linearly separable data, Optimization problem:

Subject to

For non-linearly separable data (soft margin), introduce slack variables

Optimization problem:

-> some measure of amount of misclassifications

Limitation: All training points are treated equal

FSVM – Fuzzy SVM

• each training point belongs exactly to no more than one class

• some training points are more important than others- these meaningful data points must be classified correctly

(even if some noisy, less important points, are misclassified).

Fuzzy membership: si

: how much point xi belongs to one class (amount of meaningful information in the data point)

: amount of noise in the data point

Set S of labeled training points:


large C -> narrower margin, less misclassifications

- Regularization constant

Lagrange function:

Taking derivatives:


Kuhn-Tucker conditions :

λ – lagrange multiplier

g(x) – inequality constraint

Points with are support vectors (lie on red boundary).

=> Points with same could be different types of support vectors in FSVM due to

=> SVM – one free parameter (C)

FSVM - number of free params = C, si (~ number of training points)

lies on margin of hyperplane

Two types of support vectors:

misclassified if > 1

Training algorithm for FSVMs

Objective function for optimization

Minimization of the error function Maximization of the margin The balance is controlled by tuning C

Selection of error function

Least absolute value in SVMs

Least square value in LS-SVMs Suykens and Vanewalle, 1999 the QP is transformed to solving a linear system the support values are mostly nonzero


maximum likelihood method when the underlying error probability can be estimated optimization problem becomes

Maximum likelihood error

limitation

the precision of estimation of hyperplane depends on estimation of error function

the estimation of error is reliable only when the underlying hyperplane is well estimated


Weighted least absolute value each data is associated with a cost or

importance factor

when the noise distribution model of data given px(x) is the probability that point x is not a noise optimization becomes

Weighted least absolute value

Relation with FSVMs take px(x) as a fuzzy membership, i.e

px(x) = s

Selection of max margin term

Generalized optimal plane (GOP)

Robust linear programming(RLP)

Implementation of NDM

Goal build a probability distribution model for data

Ingredients

a heuristic function: highly relevant to px(x) confident factor: hC trashy factor: hT

Density function for data

Heuristic function

Kernel-target alignment

K-nearest neighbors

Basic idea: Outliers have higher probability to be noise

Kernel-target alignment

Measurement of how likely the point xi is noise.

K-nearest neighbors: example

Gaussian kernel

can be written as

the cosine of the angel between two vectors in the feature space

The outlier data point xi will have smaller value of fK(xi,yi)

Use fK(x,y) as a heuristic function h(x)

K-nearest neighbors (k-NN)

For each xi, the set Sik

consists k nearest neighbors of xi

ni is the number of data points in the set Si

k that the class label is the same as the class label of data point xi

Heuristic function h(xi)=ni

Comparison of two heuristic function

Kernel-target alignment Operate in the feature space, use the

information of all data points to determine the heuristic for one point

k-NN Operate in the original space, use the

information of k data points to determine the heuristic for one point

How about combine them two?!

Overall Procedure for FSVMs

1. Use SVM algorithm to get the optimal kernel parameters and the regularization parameter C

2. Fix the kernel parameters and the regularization parameter C, determine heuristic function h(x), and use exhaustive search to choose the confident factor hc and trashy factor hT, mapping degree d and the fuzzy membership lower bound σ

kiS

Experiments

Data with time property

SVM results for data with time property

FSVM results for data with time property

Experiments

Two classes with different weighting

Results from SVM

Results from FSVM

Experiments

Using class center to reduce effect of outliers.

Results from SVM

Results from FSVM

Experiments (setting fuzzy membership)

Kernel Target Alignment

Two step strategy

Fix fUBk and fLB

k as following:

fUBk = maxi fk (xi, yi) and fLB

k = mini fk (xi, yi)

Find σ and d using a two-dimensional search.

Now, find fUBk and fLB

k

Experiments (setting fuzzy membership)

k-Nearest Neighbor

Perform a two-dimensional search for

parameters σ and k.

kUB = k/2 and d=1 are fixed.

Experiments

Comparison of results from KTA and k-NN with other classifiers (Test Errors)

Conclusion

FSVMs work well when the average training error is high, which means it can improve performance of SVMs for noisy data.

No. of free parameters for FSVMs is very high C, si for each data point.

Results using KTA and k-NN are similar but KTA is more complicated and takes more time to find optimal values of parameters.

This papers studies FSVMs only for two classes, multi-class scenarios are not explored.

Fuzzy Support Vector Machines (FSVM s )

Documents

Transcript of Fuzzy Support Vector Machines (FSVM s )