A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf ·...
Transcript of A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf ·...
![Page 1: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/1.jpg)
AHybridEvolu.onaryFeatureSelec.onMethodforMicroarrayData
DensonSmithSumaiyaIqbal
MdTamjidulHoque{dsmith8,siqbal1,thoque}@uno.edu
UniversityofNewOrleans
![Page 2: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/2.jpg)
AbstractDNAmicroarraydataallowstheanalysisoftheexpressionlevelofthousandsofgenessimultaneously.Thisprocesscancapturethecurrentstateofthegeneregula.onwithinacellbycapturingmRNAexpressions,insteadoftediousquan.tateandqualita.vemeasurementofproteinexpressions,whichwouldhavebeenmoreaccuratemeasureofthecellularac.vi.es.Aswearemeasuringtheindirectinterac.onusingmRNAexpression,wethereforeneedtohaverobustapproachestoinferthetruesta.s.cs.Thisapproachwillmakeitpossibletohaveclinicallyand/orscien.ficallyusefulpredic.onssuchasdiagnosingdiseases,theiden.fica.onoftumortypesandtreatmentselec.on.Manysta.s.calclassifica.onmethodsareavailableforthistypeoftask.Further,acentraldifficultyinsuchsta.s.calclassifica.onisthat,someofthefeatures(variables)inthedatamaybeirrelevantorredundanttothepredic.ontask.Irrelevantandredundantdatacomplicateandconfoundtheclassifica.onprocess,therefore,itisdesirabletoiden.fyandeliminatevariablesthatarenotusefulfortheclassifica.ontask.TheaimofthisresearchistoproposearobustmethodologyforclassifyingDNAmicroarraydatausingfeatureselec.on,whichistheprocessofiden.fyingandelimina.ngfeaturesthatareirrelevantorredundant.Theproposedmethodperformseffec.vefeatureselec.ontoiden.fyasubsetofgenesthatbestdescribeadisease.Twowell-knownDNAmicroarraydatasetswereusedtovalidatethemethod.
![Page 3: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/3.jpg)
FeatureSelec.on• Theprocessofselec.ngasubsetofrelevantfeatures(variables)forusein
classifica.onmodelconstruc.onisknownasfeatureselec.on(a.k.a.:variableselec.on,aXributeselec.onorvariablesubsetselec.on).
• Classifica.onmodelsconstructedwithanop.malsubsetoffeatureshavebeendemonstratedbothintheoryandprac.cetobefastertotrain,fastertorun,provideabeXerunderstandingoftheunderlyingprocesses,haveimprovedpredic.veaccuracy,beXergeneraliza.onandreducedmodelcomplexity
![Page 4: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/4.jpg)
MicroarrayDataChallengesforClassifica.on
• Manydatasetsarehighdimensional,i.e.thousandsortensofthousandsoffeatures.
• Manyofthefeaturesareredundant,irrelevantorweaklyrelevant.
• Datasetso[encontainsmissingand/orincorrectvalues.
• Therearepossiblymislabeledsamples.
• Usually,therearerela.velyfewsamplesavailablefortrainingandvalida.onofthemodel.
![Page 5: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/5.jpg)
ExampleDataset:BreastCancer• Goalistoclassifytestsamplesasrelapseornot-relapse(binary
classifica.on).
• “WellKnown”datasetfromKentRidgeBio-medicalDatasetRepository.
• 24481geneexpressionra.os
• 78trainingsamples
• 19testsamples
• Missingdata
![Page 6: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/6.jpg)
Gene.cForestFeatureSelec.onAlgorithm
![Page 7: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/7.jpg)
ExtraTreeClassifier
![Page 8: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/8.jpg)
FeatureImportanceEs.mates
![Page 9: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/9.jpg)
FeatureImportanceEs.mates
![Page 10: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/10.jpg)
Workflow
![Page 11: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/11.jpg)
ResultsDarkercolorsindicatefeaturesthatappearinmorecandidatefeaturesets.Lightercolorsindicatefeaturesthatappearinfewercandidatefeaturesets.Featuresthatdonotappearinanycandidatefeaturesetarelikelytobeirrelevant.Rowswithequalornearequalperformancebutdifferentfeatureslikelycontainfeaturesthataremutuallyredundant.Asetof10candidatefeaturesisgeneratedforeachfitnessmetric:1. MCC2. AUC3. accuracy4. F15. (MCC+AUC)/26. (F1+AUC)/27. (accuracy+AUC)/28. (precision+recall)/2
![Page 12: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/12.jpg)
Results
bestMCCfound metric:accuracy+AUC elite:4
#features 32AUC 0.8571
accuracy 0.9474precision 1.0000
recall 0.8571F1 0.9231
MCC 0.8895
allfeaturesmetric:None
#features 24187AUC 0.8393
accuracy 0.8421precision 0.8333
recall 0.7143F1 0.7692
MCC 0.6548!!
MCC = (TP ×TN)−(FP ×FN)(TP +FP)(TP +FN)(TN +FP)(TN +FN)
where,TP = the!number!of!true!positivesTN = the!number!of!true!negativesFP = the!number!of!false!positivesFN = the!number!of!false!negatives
MaXhewsCorrela.onCoefficient
![Page 13: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/13.jpg)
MethodComparison
Classifica?ontechnique Selec?ontechnique #ofgenes %accuracy ReferenceSVM PSO 20 1.0000[2]SVM ABC 5 0.9470[3]ET GFFS 32 0.9470ProposedmethodJ48 GA 41 0.9381[4]SMV DRF0-1G 44 0.8421[1]
• PSO–par.cleswarmop.miza.on• ABC–ar.ficialbeecolony• GFFS–gene.cforestfeatureselector• GA–gene.calgorithm• J48–decisiontree• LDAGA–lineardiscriminateanalysisgene.calgorithm• Filter–correla.onofindividualgeneexpressionwithtargetclass
![Page 14: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/14.jpg)
Overfisng?
• Somecandidatefeaturesetsthatperformedwellwiththetrainingdataperformedverypoorlywiththevalida.ondata.
• Thisislikelyduetospuriousrela.onshipsbetweenirrelevantfeaturesandthetargetclass.• Ifthisisthecausethenfeatureselec.onmaybeviewedasaformofoverfisngthetrainingdata.• Thisillustrateswhyavalida.onsetkeptseparateduringfeatureselec.oniscrucial.
![Page 15: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/15.jpg)
Conclusions• Theusualgoaloffeatureselec.onistoiden.fyandremoveallirrelevant
andredundantfeatures
• Redundantfeaturesprovideanopportunitytomi.gateoratleastpredictperformancelossduetomissingdata
• Selectedfeaturesmayprovideinsightsofgenescorrelatedwiththedisease
• Featureselec.onmaybeaformofoverfisngtrainingdata
• Avalida.ondatasetiscrucialtothefeatureselec.onprocess
![Page 16: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/16.jpg)
FutureWork• Reapplyfeatureselec.onusingonlythecandidatefeaturesetsto
determineifresultsimprove
• AXempttoreduceoverfisngofthetrainingdataduringfeatureselec.on
• Formalizethemethodofchoosinganalterna.vefeaturesetinthecaseofmissingdata
• Completetheprocessonaddi.onalmicroarraydatasets
• Completetheprocessondatasetsfromdifferentproblemdomains
![Page 17: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f91e1074e3767626d23555a/html5/thumbnails/17.jpg)
References• [1]Huerta,E.B.,Duval,B.andHao,J.-K.Geneselec(onfor
microarraydatabyaLDA-basedgene(calgorithm.Springer,City,2008.
• [2]Sahu,B.andMishra,D.Anovelfeatureselec.onalgorithmusingpar.cleswarmop.miza.onforcancermicroarraydata.ProcediaEngineering,382012),27-31.
• [3]Garro,B.A.,Rodríguez,K.andVázquez,R.A.Classifica.onofDNAmicroarraysusingar.ficialneuralnetworksandABCalgorithm.AppliedSo=Compu(ng,382016),548-560.
• [4]Sasikala,S.,aliasBalamurugan,S.A.andGeetha,S.ANovelFeatureSelec.onTechniqueforImprovedSurvivabilityDiagnosisofBreastCancer.ProcediaComputerScience,502015),16-23.