Aim 1: Aim 2cs229.stanford.edu/proj2017/final-posters/5143905.pdf · This offers us unprecedented...

AcknowledgementsThisworkwouldnothavebeenpossiblewithoutthehelpofDennisWall,Kelley

Paskov,andtheothermembersoftheWalllab,aswellasthefundingandcomputingresourcesoftheWallLabandtheStanfordUniversitySchoolofMedicine.

ThanksalsototheMachineLearninginstructorsandTAs.

1. Phillips,R.D.etal.EnrichmentProceduresforSoftClusters:AStatisticalTestanditsApplications.(2010).2.UnpublishedworkfromtheWalllab,StanfordUniversity

Clusterin

gandClassifyingAutism

TheiHart ConsortiumhashelpedtocollectoneofthelargestAutismSpectrumDisorder(ASD)datasetsever,includinggeneticandbehavioraldataforseveralthousandASDCasesandControls.

ThisoffersusunprecedentedopportunitytotakeMachineLearningApproachestotwomajorAutismResearchproblems:

Aim2:AnAutismGeneticRiskScore

Goal:BuildageneticriskpredictorforASD

TheProblem:Autismisacomplexdisease– itisdeterminedabout50%bygeneticsand50%byaperson’senvironment

Asaresult,itisimpossibletoperfectlypredictautismfromgenetics.

However,animperfectclassifiercan:• giveusameasureofaperson’sgeneticriskofautism• provideintuitionaboutwhichgeneticfeaturesare

mostpredictiveofdisease.

Genotype+Environment=Phenotype

TheFeatureSet:

0 0 1

Eachgenomeisshownasa1109× 1binarydescribingwhereeachpersonhasaloss-of-functioninagene.

ALogisticRegressionClassifier: AGradientBoostedClassifier:

ConclusionsandFutureWork:Ourbestperformanceisachievedfromaveragingthepredictionsfrom

thetwoclassifiersabove(seeright).Thisclassifieroutperformspreviousmethods(bestAU-ROC=0.54[2]),showingpromiseasa

geneticriskscorepredictorforASD.

WefirsttrainedaLogisticRegressionClassifierbecausethesemodelsareoftensimpletointerpret.

Wealsotrainedagradientboostedtreeclassifiertocapturenon-lineargene-generelationships.

F1score:

0.634

AreaunderROC:

0.565

F1score:

0.647

AreaunderROC:

0.580

RachaelA

iken

s,(ra

iken

[email protected]

)and

Bria

nnaKo

zemzak(kozem

zak@

stanford.edu

)StanfordUniversity

Dep

artm

ento

fBiomed

icalInformatics,WallLab

F1score:

0.642AreaunderROC:

0.602Futureworkwill:• Continuetooptimizeensembleandnon-linearclassificationmodels• Analyzefeatureimportancetoinferwhichgeneticvariantsaremostpredictive

Aim1:ClusteringAutismSubtypes

Goal::Developaclustervalidationtoolkitanduseittoanalyzeclusteringresults

FeatureHeatMaps:

LabelPieCharts:

Featuresonthex-axisandcentroidsonthey-axis.Lighterfeaturevaluesusuallyindicatemoreneurotypical behavior.Weseeseparationofneurotypical individualsfromatypicalindividualsandthenamixedcluster.

Cluster1(3980) Cluster2(2683) Cluster3 (6830)

ADOSDiagnosis

ADI-RDiagnosis

Piechartsweregeneratedfor29differentlabelsincludingdiagnostic,demographic,andcomputedADOS/ADI-Rlabels.ThecontrolgroupappearstoseparatefromtheASDindividuals.

Data:• 13,493individuals• 123featuresfromADOSandADI-Rinstruments• Diagnostic,medical,demographic,etc.labels

IndividualMovement:Cluster1(3980) Cluster2(2683) Cluster3 (6830)

ClusterMovedTo

Movementbetweenclusterswasnotrandom.Thisindicatessomecommonunderlyingfeaturesdriving

clusterformationforallkvalues.

ASDcanmanifestoverabroadspectrumofsymptoms,fromgreatintellectualandcommunicationdisabilitytonear-normal‘high-functioning’forms.Asaresult,itisoftenaskedwhetherASDisinfactcomposedofsomenumberofAutism‘sub-types’thatarebestdiagnosed,studied,andtreatedindifferentways.

Featuresonx-axisandexamplesony-axis,sortedbycluster.Thiswastoocomplextobeuseful,sowelookedonlyatthecentroidsofthecluster(alowrankrepresentationofexamples)instead.

TheProblem:

PriorWorkinWallLab:• Imputedmissingvaluesandclustereddatausing

generalizedlowrankmodelwithlogisticloss• Crispandsoftk-meansclusteringswerecreated

fork=1,2,...,6.

ConclusionsandFutureWork:Conclusions• “Best”clusteringresultwassoftk-meanswithk=3,

whereeachindividualisassignedtoasingleclusterbasedonmaximumpartialmembership

• Why?Clustersareseparatedbydiagnosis,medicalhistory,andcomputedADOS/ADI-Rlabelswithoutcreatingindistinguishableextraclusters

Futureworkwill:• Employmethodstoworkdirectlywiththesoft

clusteringresultsbyusingenrichmenttestsdevelopedforsoftclustering[1]andimplementingweightedmembershipforpiecharts

• Applyotherclusteringmethodstodatasetandcomparewithk-meansandsoftk-meansresults

Aim 1: Aim 2cs229.stanford.edu/proj2017/final-posters/5143905.pdf · This offers us unprecedented...

Documents

Transcript of Aim 1: Aim 2cs229.stanford.edu/proj2017/final-posters/5143905.pdf · This offers us unprecedented...