Fuzzy Support Vector Machines (FSVM s )
description
Transcript of Fuzzy Support Vector Machines (FSVM s )
![Page 1: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/1.jpg)
Fuzzy Support Vector Machines (FSVMs)
Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta
![Page 2: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/2.jpg)
Outline
• Review of SVMs
• Formalization of FSVMs
•Training algorithm for FSVMs• Noisy distribution model
• Determination of heuristic function
• Experiment results
![Page 3: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/3.jpg)
SVM – brief review
• Classification technique
• Method:
• Maps points into high-dimensional feature space
• Finds a separating hyperplane that maximizes the margin
![Page 4: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/4.jpg)
Set S of labeled training points:
Each point belongs to one of the two classes,
Let be feature space vector, with mapping from to feature space
Then equation of hyperplane:
For linearly separable data, Optimization problem:
Subject to
![Page 5: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/5.jpg)
For non-linearly separable data (soft margin), introduce slack variables
Optimization problem:
-> some measure of amount of misclassifications
Limitation: All training points are treated equal
![Page 6: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/6.jpg)
FSVM – Fuzzy SVM
• each training point belongs exactly to no more than one class
• some training points are more important than others- these meaningful data points must be classified correctly
(even if some noisy, less important points, are misclassified).
Fuzzy membership: si
: how much point xi belongs to one class (amount of meaningful information in the data point)
: amount of noise in the data point
![Page 7: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/7.jpg)
Set S of labeled training points:
Optimization problem:
large C -> narrower margin, less misclassifications
- Regularization constant
![Page 8: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/8.jpg)
Lagrange function:
Taking derivatives:
![Page 9: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/9.jpg)
Optimization problem:
Kuhn-Tucker conditions :
λ – lagrange multiplier
g(x) – inequality constraint
![Page 10: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/10.jpg)
Points with are support vectors (lie on red boundary).
=> Points with same could be different types of support vectors in FSVM due to
=> SVM – one free parameter (C)
FSVM - number of free params = C, si (~ number of training points)
lies on margin of hyperplane
Two types of support vectors:
misclassified if > 1
![Page 11: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/11.jpg)
Training algorithm for FSVMs
Objective function for optimization
Minimization of the error function Maximization of the margin The balance is controlled by tuning C
![Page 12: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/12.jpg)
Selection of error function
Least absolute value in SVMs
Least square value in LS-SVMs Suykens and Vanewalle, 1999 the QP is transformed to solving a linear system the support values are mostly nonzero
![Page 13: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/13.jpg)
Selection of error function
maximum likelihood method when the underlying error probability can be estimated optimization problem becomes
![Page 14: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/14.jpg)
Maximum likelihood error
limitation
the precision of estimation of hyperplane depends on estimation of error function
the estimation of error is reliable only when the underlying hyperplane is well estimated
![Page 15: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/15.jpg)
Selection of error function
Weighted least absolute value each data is associated with a cost or
importance factor
when the noise distribution model of data given px(x) is the probability that point x is not a noise optimization becomes
![Page 16: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/16.jpg)
Weighted least absolute value
Relation with FSVMs take px(x) as a fuzzy membership, i.e
px(x) = s
![Page 17: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/17.jpg)
Selection of max margin term
Generalized optimal plane (GOP)
Robust linear programming(RLP)
![Page 18: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/18.jpg)
Implementation of NDM
Goal build a probability distribution model for data
Ingredients
a heuristic function: highly relevant to px(x) confident factor: hC trashy factor: hT
![Page 19: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/19.jpg)
Density function for data
![Page 20: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/20.jpg)
Density function for data
![Page 21: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/21.jpg)
Heuristic function
Kernel-target alignment
K-nearest neighbors
Basic idea: Outliers have higher probability to be noise
![Page 22: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/22.jpg)
Kernel-target alignment
Measurement of how likely the point xi is noise.
![Page 23: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/23.jpg)
K-nearest neighbors: example
Gaussian kernel
can be written as
the cosine of the angel between two vectors in the feature space
![Page 24: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/24.jpg)
The outlier data point xi will have smaller value of fK(xi,yi)
Use fK(x,y) as a heuristic function h(x)
![Page 25: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/25.jpg)
K-nearest neighbors (k-NN)
For each xi, the set Sik
consists k nearest neighbors of xi
ni is the number of data points in the set Si
k that the class label is the same as the class label of data point xi
Heuristic function h(xi)=ni
![Page 26: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/26.jpg)
Comparison of two heuristic function
Kernel-target alignment Operate in the feature space, use the
information of all data points to determine the heuristic for one point
k-NN Operate in the original space, use the
information of k data points to determine the heuristic for one point
How about combine them two?!
![Page 27: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/27.jpg)
Overall Procedure for FSVMs
1. Use SVM algorithm to get the optimal kernel parameters and the regularization parameter C
2. Fix the kernel parameters and the regularization parameter C, determine heuristic function h(x), and use exhaustive search to choose the confident factor hc and trashy factor hT, mapping degree d and the fuzzy membership lower bound σ
kiS
![Page 28: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/28.jpg)
Experiments
Data with time property
![Page 29: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/29.jpg)
SVM results for data with time property
FSVM results for data with time property
![Page 30: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/30.jpg)
Experiments
Two classes with different weighting
![Page 31: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/31.jpg)
Results from SVM
Results from FSVM
![Page 32: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/32.jpg)
Experiments
Using class center to reduce effect of outliers.
![Page 33: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/33.jpg)
Results from SVM
Results from FSVM
![Page 34: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/34.jpg)
Experiments (setting fuzzy membership)
Kernel Target Alignment
Two step strategy
Fix fUBk and fLB
k as following:
fUBk = maxi fk (xi, yi) and fLB
k = mini fk (xi, yi)
Find σ and d using a two-dimensional search.
Now, find fUBk and fLB
k
![Page 35: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/35.jpg)
Experiments (setting fuzzy membership)
k-Nearest Neighbor
Perform a two-dimensional search for
parameters σ and k.
kUB = k/2 and d=1 are fixed.
![Page 36: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/36.jpg)
Experiments
Comparison of results from KTA and k-NN with other classifiers (Test Errors)
![Page 37: Fuzzy Support Vector Machines (FSVM s )](https://reader035.fdocuments.net/reader035/viewer/2022081419/56815884550346895dc5e748/html5/thumbnails/37.jpg)
Conclusion
FSVMs work well when the average training error is high, which means it can improve performance of SVMs for noisy data.
No. of free parameters for FSVMs is very high C, si for each data point.
Results using KTA and k-NN are similar but KTA is more complicated and takes more time to find optimal values of parameters.
This papers studies FSVMs only for two classes, multi-class scenarios are not explored.