Support Vector Machines for Classification of Flow ...

18
Support Vector Machines for Classification of Flow Data Classification of Flow Data Funded by SBIR Grant # R43 RR024094-01A1 FlowCap 2010 John Quinn Ph.D. Treestar [email protected]

Transcript of Support Vector Machines for Classification of Flow ...

Page 1: Support Vector Machines for Classification of Flow ...

Support Vector Machines for Classification of Flow DataClassification of Flow Data

Funded by SBIR Grant # R43 RR024094-01A1FlowCap 2010pJohn Quinn Ph.D.

[email protected]

Page 2: Support Vector Machines for Classification of Flow ...

Our ObjectiveOur Objective• Demonstrate that supervised training

algorithms can effectively replicate user created gates – Very useful for high throughput settings

– Can increase robustness

• We believe this will be the first application in ppwhich algorithmic gate placement becomes the norm.

Page 3: Support Vector Machines for Classification of Flow ...

Selected AlgorithmSelected Algorithm• Support Vector Machine (SVM)pp ( )

– Radial kernel

• Supervised linear classifier that solves an optimization problem to find the hyperplane(s) that separate classes with the maximum distance between classes

Wi h li i d h i li l– With non-linear mapping data that is not linearly separable can be classified

Page 4: Support Vector Machines for Classification of Flow ...

SVM OperationSVM OperationOptimization:p• Determine which

elements of the training data marktraining data mark the boundary of maximum distance

D

between two classes

or Support vectorsClass 1Class 2

D Maximum separation

Page 5: Support Vector Machines for Classification of Flow ...

SVM OperationSVM Operation

• Optimization problemOptimization problemFor data:

A h l th t t t l b d fi dA hyperplane that separates any two classes can be defined as:For ci=1For ci=-1

Knowing that the data points should be outside of the margin, we can impose the constraint:p

Page 6: Support Vector Machines for Classification of Flow ...

SVM OperationSVM OperationWe know that the support vectors will have a perpendicular di t f th h l fdistance from the hyperplane of:

and

The distance between SV’s can then be expressed as:

So optimization is the minimization of

D

Page 7: Support Vector Machines for Classification of Flow ...

SVM OperationSVM OperationWe then use the inequality, q y,

as a constraint to fix a critical point and useas a constraint to fix a critical point and use Lagrangian multipliers αi, to express w as a linear combination of the training vectors:

The support vectors, NSV, are then the Xiassociated with non-negative Lagrange multipliers

Page 8: Support Vector Machines for Classification of Flow ...

SVM OperationSVM OperationOnce w is known, and the support vectors have been identified, b can be solved as:

If there are more than two classes, the operation remains the same but the hyperplanes are determined either as onehyperplanes are determined either as one versus all or pairwise

• We chose a one versus all format

Page 9: Support Vector Machines for Classification of Flow ...

SVM OperationSVM Operation• Data not linearly separable? Map it to a y p p

space where it is!– We assume that flow data will have a Gaussian

Gdistribution and selected a Gaussian mapping

Input Space Mapped Space

Page 10: Support Vector Machines for Classification of Flow ...

Why use an SVM?Why use an SVM?• SVM’s are deterministic • Find the global maxima and not local

maxima– If the training data are representative of the

real data, you cannot do better.• SVM’s are fast

– They solve a maximization problem, as d d i i i fi iopposed to doing an iterative fitting

Page 11: Support Vector Machines for Classification of Flow ...

PreprocessingPreprocessing• To prepare the training data, we:

N li th d t t f 1 t 1– Normalize the data to a range of -1 to 1– Identified the training data set with the largest number

of clusters• Used this data set as the reference set

– Calculated the centroid of each cluster in the reference set

– In all other training data, calculated the Euclidean distance of each cluster to the clusters in the reference set and assigned them cluster ID’s matchingreference set and assigned them cluster ID s matching the reference cluster with the smallest distance measureTook a sample of each training data set and combined– Took a sample of each training data set and combined them into one training vector to present to the SVM

Page 12: Support Vector Machines for Classification of Flow ...

Algorithm choiceAlgorithm choiceMatlab has a free file share repository

Someone has already put almost any algorithm p y gyou can think of into code

I d th SVM d d bI used the SVM coded by By Junshui Ma, and Yi Zhao of Ohio St. University

It received 5 stars

Page 13: Support Vector Machines for Classification of Flow ...

Training DataTraining Data• Example training datap g

– Showing parameters 1 & 2, and 3 & 4 of the stem cell data set

Page 14: Support Vector Machines for Classification of Flow ...

ResultsResults

Page 15: Support Vector Machines for Classification of Flow ...

ResultsResultsSpeed:pData set Training time Classification time

• CFSE 4 sec 2 min 48 sec (13 files)• CFSE 4 sec 2 min 48 sec (13 files)

• DLBCL 5 sec 67 sec (30 files)

• GvHD 5 sec 38 sec (12 files)

• NDD 11 sec 27 min 28 sec (30 files)

• Stem cell 4 sec 19 sec (30 files)Stem cell 4 sec 19 sec (30 files)

Page 16: Support Vector Machines for Classification of Flow ...

Room for improvement…Room for improvement…• The SVM’s are highly dependant on g y p

identifying a transform that maps the data to a linearly separable space.

• We could experiment with a number of different transforms

Page 17: Support Vector Machines for Classification of Flow ...

FlowCap FeedbackFlowCap Feedback

• What went wellWhat went well– Data easily available– Submission process easySubmission process easy– Questions answered immediately!

• What could be improvedWid bli it ti l l t f– Wider publicity particularly out of our domain

Page 18: Support Vector Machines for Classification of Flow ...

Questions?Questions?