Support Vector Machines for Classification of Flow ...

Support Vector Machines for Classification of Flow DataClassification of Flow Data

Funded by SBIR Grant # R43 RR024094-01A1FlowCap 2010pJohn Quinn Ph.D.

Treestarjohn@treestar.com

Our ObjectiveOur Objective• Demonstrate that supervised training

algorithms can effectively replicate user created gates – Very useful for high throughput settings

– Can increase robustness

• We believe this will be the first application in ppwhich algorithmic gate placement becomes the norm.

Selected AlgorithmSelected Algorithm• Support Vector Machine (SVM)pp ( )

– Radial kernel

• Supervised linear classifier that solves an optimization problem to find the hyperplane(s) that separate classes with the maximum distance between classes

Wi h li i d h i li l– With non-linear mapping data that is not linearly separable can be classified

SVM OperationSVM OperationOptimization:p• Determine which

elements of the training data marktraining data mark the boundary of maximum distance

between two classes

or Support vectorsClass 1Class 2

D Maximum separation

SVM OperationSVM Operation

• Optimization problemOptimization problemFor data:

A h l th t t t l b d fi dA hyperplane that separates any two classes can be defined as:For ci=1For ci=-1

Knowing that the data points should be outside of the margin, we can impose the constraint:p

SVM OperationSVM OperationWe know that the support vectors will have a perpendicular di t f th h l fdistance from the hyperplane of:

The distance between SV’s can then be expressed as:

So optimization is the minimization of

SVM OperationSVM OperationWe then use the inequality, q y,

as a constraint to fix a critical point and useas a constraint to fix a critical point and use Lagrangian multipliers αi, to express w as a linear combination of the training vectors:

The support vectors, NSV, are then the Xiassociated with non-negative Lagrange multipliers

SVM OperationSVM OperationOnce w is known, and the support vectors have been identified, b can be solved as:

If there are more than two classes, the operation remains the same but the hyperplanes are determined either as onehyperplanes are determined either as one versus all or pairwise

• We chose a one versus all format

SVM OperationSVM Operation• Data not linearly separable? Map it to a y p p

space where it is!– We assume that flow data will have a Gaussian

Gdistribution and selected a Gaussian mapping

Input Space Mapped Space

Why use an SVM?Why use an SVM?• SVM’s are deterministic • Find the global maxima and not local

maxima– If the training data are representative of the

real data, you cannot do better.• SVM’s are fast

– They solve a maximization problem, as d d i i i fi iopposed to doing an iterative fitting

PreprocessingPreprocessing• To prepare the training data, we:

N li th d t t f 1 t 1– Normalize the data to a range of -1 to 1– Identified the training data set with the largest number

of clusters• Used this data set as the reference set

– Calculated the centroid of each cluster in the reference set

– In all other training data, calculated the Euclidean distance of each cluster to the clusters in the reference set and assigned them cluster ID’s matchingreference set and assigned them cluster ID s matching the reference cluster with the smallest distance measureTook a sample of each training data set and combined– Took a sample of each training data set and combined them into one training vector to present to the SVM

Algorithm choiceAlgorithm choiceMatlab has a free file share repository

Someone has already put almost any algorithm p y gyou can think of into code

I d th SVM d d bI used the SVM coded by By Junshui Ma, and Yi Zhao of Ohio St. University

It received 5 stars

Training DataTraining Data• Example training datap g

– Showing parameters 1 & 2, and 3 & 4 of the stem cell data set

ResultsResults

ResultsResultsSpeed:pData set Training time Classification time

• CFSE 4 sec 2 min 48 sec (13 files)• CFSE 4 sec 2 min 48 sec (13 files)

• DLBCL 5 sec 67 sec (30 files)

• GvHD 5 sec 38 sec (12 files)

• NDD 11 sec 27 min 28 sec (30 files)

• Stem cell 4 sec 19 sec (30 files)Stem cell 4 sec 19 sec (30 files)

Room for improvement…Room for improvement…• The SVM’s are highly dependant on g y p

identifying a transform that maps the data to a linearly separable space.

• We could experiment with a number of different transforms

FlowCap FeedbackFlowCap Feedback

• What went wellWhat went well– Data easily available– Submission process easySubmission process easy– Questions answered immediately!

• What could be improvedWid bli it ti l l t f– Wider publicity particularly out of our domain

Questions?Questions?

Support Vector Machines for Classification of Flow ...

Documents

Transcript of Support Vector Machines for Classification of Flow ...

Using Support Vector Machines, Convolutional Neural ... · Using Support Vector Machines, Convolutional Neural Networks and ... Using Support Vector Machines, Convolutional Neural

Support vector machines in ordinal classification · Support vector machines in ordinal classiﬁcation A revision of the abnamro corporate credit rating system ... counterparty to

Support Vector Machines & Kernel Machines

290N Lecture 15: Support Vector Machines. Support Vector Machines (SVM) Supervised learning methods for classification and regression relatively new.

Enhanced land use/cover classification using support vector machines and fuzzy k-means clustering algorithms · Enhanced land use/cover classification using support vector machines

DATA MINING LECTURE 11 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning.

Support Vector Machines for Multi-Attribute ABC Analysisijmlc.org/papers/292-LC056.pdf · Abstract—This paper examined the classification performance of Support Vector Machines

Support Vector Machines for Classification

Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and

Support Vector Machines for Classification and …users.ecs.soton.ac.uk/srg/publications/pdf/SVM.pdf · UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classiﬁcation and Regression

Support Vector Machines (Contd.), Classification Loss ...latecki/Courses/AI-Fall12/Lectures/SVM.pdf · Support Vector Machines (Contd.), ... SVM ﬁnds the maximum margin hyperplane

Support Vector Machines (Contd.), Classification Loss

Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

LNCS 4234 - Multi-class Cancer Classification with OVR-Support Vector …sclab.yonsei.ac.kr/publications/Papers/LNCS/ICONIP06... · 2007-03-16 · Support vector machines (SVMs),

Support Vector Machines (Vapnik, 1979)web.cecs.pdx.edu/~mm/AIFall2011/SVMs.pdf · Support Vector Machines (Vapnik, 1979) • Assume a binary classification problem. – Instances

Support Vector Machines without Tears · 2017-07-10 · The Support Vector Machine (SVM) approach. 16 • Support vector machines (SVMs) is a binary classification algorithm that

Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Vector Machines and Extended Semantic Analysis

Structural Damage Classification using Support Vector Machines

Classification of microarray gene expression data using support vector machines ( SVM )

Support Vector Machines for Structured Classification and The Kernel Trick