Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York...
Transcript of Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York...
![Page 1: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/1.jpg)
1
MachineLearningforIntegratingSocialDeterminantsin1
CardiovascularDiseasePredictionModels:ASystematicReview2YuanZhao1,EricaP.Wood2,NicholasMirin,2RajeshVedanthan,3,4StephanieH.Cook,2,5Rumi3Chunara5,64
51NewYorkUniversity,SchoolofGlobalPublicHealth,DepartmentofEpidemiology62NewYorkUniversity,SchoolofGlobalPublicHealth,DepartmentofSocialandBehavioralSciences73NewYorkUniversityGrossmanSchoolofMedicine,DepartmentofPopulationHealth84NewYorkUniversityGrossmanSchoolofMedicine,DepartmentofMedicine95NewYorkUniversity,SchoolofGlobalPublicHealth,DepartmentofBiostatistics106NewYorkUniversityTandonSchoolofEngineering,DepartmentofComputerScience&Engineering1112
Summary13
Background 14Cardiovasculardisease(CVD)isthenumberonecauseofdeathworldwide,andCVDburdenisincreasinginlow-15resourcesettingsandforlowersocioeconomicgroupsworldwide.Machinelearning(ML)algorithmsarerapidly16beingdevelopedandincorporatedintoclinicalpracticeforCVDpredictionandtreatmentdecisions.Significant17opportunitiesforreducingdeathanddisabilityfromcardiovasculardiseaseworldwideliewithaddressingthe18socialdeterminantsofcardiovascularoutcomes.Wesoughttoreviewhowsocialdeterminantsofhealth(SDoH)19andvariablesalongtheircausalpathwayarebeingincludedinMLalgorithmsinordertodevelopbestpracticesfor20developmentoffuturemachinelearningalgorithmsthatincludesocialdeterminants.21 22Methods 23Weconductedasystematicreviewusingfivedatabases(PubMed,Embase,WebofScience,IEEEXploreandACM24DigitalLibrary).WeidentifiedEnglishlanguagearticlespublishedfrominceptiontoApril10,2020,whichreported25ontheuseofmachinelearningforcardiovasculardiseaseprediction,thatincorporatedSDoHandrelatedvariables.26Weincludedstudiesthatuseddatafromanysourceorstudytype.Studieswereexcludediftheydidnotincludethe27useofanymachinelearningalgorithm,weredevelopedfornon-humans,theoutcomeswerebio-markers,28mediators,surgeryormedicationofCVD,rehabilitationormentalhealthoutcomesafterCVDorcost-effective29analysisofCVD,themanuscriptwasnon-English,orwasareviewormeta-analysis.Wealsoexcludedarticles30presentedatconferencesasabstractsandthefulltextswerenotobtainable.Thestudywasregisteredwith31PROSPERO(CRD42020175466).32 33Findings 34Of2870articlesidentified,96wereeligibleforinclusion.MoststudiesthatcomparedMLandregressionshowed35increasedperformanceofML,andmoststudiesthatcomparedperformancewithorwithoutSDoH/related36variablesshowedincreasedperformancewiththem.ThemostfrequentlyincludedSDoHvariableswere37race/ethnicity,income,educationandmaritalstatus.StudieswerelargelyfromNorthAmerica,EuropeandChina,38limitingthediversityofincludedpopulationsandvarianceinsocialdeterminants.39 40Interpretation 41Findingsshowthatmachinelearningmodels,aswellasSDoHandrelatedvariables,improveCVDpredictionmodel42performance.Thelimitedvarietyofsourcesanddatainstudiesemphasizethatthereisopportunitytoincludemore43SDoHvariables,especiallyenvironmentalones,thatareknownCVDriskfactorsinmachinelearningCVDprediction44models.Giventheirflexibility,MLmayprovideopportunitytoincorporateandmodelthecomplexnatureofsocial45determinants.Suchdatashouldberecordedinelectronicdatabasestoenabletheiruse.46 47Funding 48WeacknowledgefundingfromBlueCrossBlueShieldofLouisiana.Thefunderhadnoroleinthedecisionto49publish.5051
Introduction52
Anestimated17.9millionpeopledieeachyearfromcardiovasculardiseases(CVD),whichrepresent31%ofall53
deathsworldwideandthenumberonecauseofdeath.1Low-incomeandmiddle-incomecountriescarry75%of54
theburdenofCVDdeathsworldwideandinhigh-incomecountries,lowersocioeconomicgroupshaveahigher55
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
![Page 2: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/2.jpg)
2
incidenceofCVDandhighermortalityduetoCVD.1,2Inhigh-incomecountriessuchastheUnitedStates,the56
prevalenceofCVDisexpectedtorise10%between2010and2030,3notonlyintheagingpopulationbutalso57
notablyviastarkdisparitiesamongsocioeconomicandracialgroups.4,5DirectcausesfortheseshiftsinCVD58
burdenhavebeenwell-studied,attributedtochangesindiet(increasedconsumptionofprocessedfoods)6and59
physicalactivity(moresedentarylifestyles),7resultinginadramaticriseinconditionssuchasobesity,60
hypertension,anddiabetesmellitus.Thesechangesareshapedbythe“conditionsinwhichpeopleareborn,61
grow,live,workandage”,referredtobytheWorldHealthOrganizationassocialdeterminantsofhealth(SDoH).862
Multinational,prospectivecohortstudiesaswellasecologicanalyseshaveshownthatSDoHcontributeto63
over35%ofthepopulationattributableriskofvariouscardiovasculardiseases,9,10amongwhicheducation,64
incomeandoccupationareparticularlyinfluential.11Researchhasalsoilluminatedmechanismsofaction;social65
factorsusuallyinteractwitheachotherthroughthemediationoforeffectmodificationbypsychologicaland66
biologicalpathways,exertingalong-termeffectoncardiovascularoutcomes.5,12Socialdeterminantsalsoresult67
inunequalsharingofthebenefitofadvancesinCVDpreventionandtreatment.13Giventhecriticalimportanceof68
socialdeterminantswithrespecttodiseaserisk,itisclearthatbettercapturingtheinteractionandrelative69
influenceofsuchfactorsinrelationtotraditionalCVDriskfactorsofhypertension,diabetesandhyperlipidemia70
providesthemostsignificantopportunitytoreduceCVDburden.5,11,12,1471
Meanwhile,artificialintelligence(AI)andmachinelearning(anapplicationofAIfordetectingpatternsfrom72
data)15toolshavestartedtobeadoptedinclinicalresearch,promptedbyrecentprogressinadvancedcomputing73
strategiesaswellastheproliferationofelectronicmedicalrecorddatabases.16Machinelearningmethodshave74
demonstratedimprovementacrossmultiplemetricsforpredictionofCVDrisk,incidenceandoutcomes17-19over75
traditionalriskscoressuchasthosefromtheAmericanCollegeofCardiologyorAmericanHeartAssociation.20As76
adata-drivenapproach,machinelearningprovidesmoreflexibilityinmodelingcomplexrelationshipsbetween77
predictors,whichcanbeparticularlyadvantageousinaddressingthemulti-levelinteractionsbetweendifferent78
socialdeterminantsandCVDoutcomes,aswellasuncoveringnovelriskfactors.Thoughtheincreasedflexibility79
ofmachinelearningmodelsisappealing,giventherapidriseofmachinelearningapproachesincludingstudies80
whichincorporatesocialdeterminants,weneedtobetterunderstandbestpracticesforsuchmodelling81
approachesforCVDriskpredictionparticularlyinthecontextofthoseincludingSDoH.82
Thus,weperformedasystematicreviewtounderstandthecurrentlandscapeofhowsocialdeterminantsare83
beingusedinmachinelearningmodelsforCVDprediction.Specifically,wesoughttoexaminewhichtypesof84
machinelearningalgorithmsandtypesofsocialdeterminantvariablesarebeingused,andforwhich85
populations.Indeed,understandingthemannerinwhichSDoHareincorporatedintosuchmodelsiscriticalin86
ordertoteaseapartthedistinctthebiologicalandsocialinfluences,alongwiththeirinteractions,thatmake87
populationsdifferentandinneedofadifferentstandardofcare.Findingsfromthisreviewservetoinformthe88
designoffuturemachinelearningapproachesandidentifyareasformethodologicalinnovationinorderto89
improveearlypredictionofCVDandreduceitssignificantdiseaseburden.21,2290
91
Method92
Searchstrategyandselectioncriteria93
First,YZwiththehelpofanexpertlibrarian,didacomprehensivesearchoffivedatabases:PubMed,Embase,94
WebofScience,IEEEXploreandACMDigitalLibraryonApril10th,2020,toidentifyallrelevantarticleson95
machinelearningintegratingsocialdeterminantsincardiovasculardiseasepredictionmodelspublishedin96
English.IEEEXploreandACMDigitalLibrarywereincludedspecificallytocomprehensivelycapturecomputer97
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 3: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/3.jpg)
3
sciencearticlesrelatedtoourreview.Papersfrominceptionuntilthesearchdatewereincluded.Toensurethe98
qualityofincludedpapers,weonlyincludedpeer-reviewedarticlespublishedonjournalsoracceptedin99
conferencesandexcludednon-peerreviewedgreyliteratureorarXiv/medRxivpapers.100
Weidentifiedthetermsofsocialdeterminantsofhealth(SDoH)usingthebroaderdefinitionfromtheWorld101
HealthOrganizationandCenterforDiseaseControlandPreventionHealthyPeople2020initiativewhichdelineates102
SDoHinfivekeyareas:economicstability,education,socialandcommunitycontext(e.g.“race/ethnicity”,“income”103
and“education”),healthandhealthcare,neighborhoodandbuiltenvironment(e.g.“livingenvironment”,104
“pollution”and“residencecharacteristics”).23Figure1isasocio-ecologicalconceptualmodeladaptedfromHealthy105
People2020,theUnitedStatesfederalgovernment’snationalhealthagenda,24whichillustratesthe multifactorial106
natureofsocial-ecologicalinfluencesonhealth.Theframeworkemphasizestheexistenceofproximate,or107
“downstream,”healthinfluences(e.g.,smoking)thatareshapedbydistal,or“upstream,”factors(e.g.,socialnorms108
regardingsmoking,tobaccoregulations).Therefore,forarobustreview,wealsoincludedprominentfactors109
includinghealth-relatedbehaviorsalongthecausalpathway(e.g.“diet”,“smoking”and“physicalactivity”),as110
althoughtheseareenactedattheindividuallevel,theyareshapedatsocialandeconomiclevels.25Thisenablesusto111
understandcomprehensivelyhowsocialdeterminantsandfactorstheydirectlyshapeareassessedinrelationto112
CVD.Age,genderandracearealsointhecausalpathway.26BMIwasalsoincludedasitisinfluencedbysocial113
factorsandcausesdiabeteswhichdirectlyaffectsCVD.27114
Forsearchtermsrelatedtomachinelearning,weincludedallcommonlyusedsupervisedmachinelearning115
methods.Supervisedmachinelearningalgorithmsarethosethatperformreasoning(i.e.prediction)from116
observationsofthefeatures(e.g.clinicaldata,socialdeterminants)basedonexternallysuppliedexampleswhich117
includethefeatureslinkedtooutcome“labels”(e.g.CVDoutcomes).Thussupervisedmachinelearningwasa118
focusasthetypesoftasksconsideredintheliteratureusuallyutilizedlabeledoutcomesofCVD.28Commonly119
usedunsupervisedmachinelearningalgorithmscapturedbythesearchwerealsoincludedintheabstractand120
fulltextscreeningtoensurealltypesofpossiblestudieswereconsidered.Wealsoaddedsearchtermstocapture121
deeplearningandensemblemethodsastheyarewidelyusedincurrentclinicalresearch.29122
ThesearchtermsforCVDoutcomesincludedcardiovascularischemicoutcomes,coronaryheartdiseaseand123
cerebrovasculardiseasewhicharecausedbyatheroscleroticcardiovasculardisease(ASCVD).These124
cardiovasculardiseasescausethehighestmortality,andestimatedyearsofliveslostattributedtothese125
conditionshaveincreasedinrecentyears.1,2,30Foreachofthekeyareasofsocialdeterminantsandincluded126
variables,machinelearningandCVD,weidentifiedkeywordsbyreferencingpreviousreviewpapersonsocial127
determinantsandcardiovasculardiseases,11,31relatedstudiesofdifferentsocialdeterminants31-35orconsulting128
expertstoincluderelevantconcepts.Fullsearchstrategiesareprovidedintheappendix.129
Oncepaperswereidentifiedviathesearchterms,allstudydesignsandallpopulationswereincludedifthe130
articleutilizedanySDoHorhealthbehaviorsasfeaturesinthemachinelearningmodels(inadditiontoageand131
gender,aswefoundthatthesewerecommonlyincludedasstandardpracticeandnotspecificallytorepresenttheir132
contributionassocialdeterminants)weredeemedeligible.Eligibilitywasalsoconsiderediftheoutcomeswere133
CVD-related,includingincidence,survival,mortality,hospitaladmissionandreadmissionetc.Wedidnotrestrict134
timeofpublicationtoenablecapturingthetrendofthesetypesofpapersovertime.Studieswereexcludedifthey135
didnotincludetheuseofanymachinelearningalgorithm,weredevelopedfornon-humans,theoutcomeswere136
bio-markers,mediators,surgeryormedicationforCVD,rehabilitationormentalhealthoutcomesafterCVD137
diagnosisorcost-effectivenessanalysisofCVDtreatment,themanuscriptwasnon-English,orwasareviewor138
meta-analysis.Wealsoexcludedarticlespresentedatconferencesasabstractsandthefulltextswerenot139
obtainable.ThisreviewwasregisteredwithPROSPERO(CRD42020175466)andconductedinaccordancewiththe140
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 4: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/4.jpg)
4
PreferredReportingItemsforSystematicReviewsandMeta-Analyses(PRISMA)method.Tosupplementthe141
bibliographicdatabasesearches,wealsousedGoogleScholartoscrutinizeallkeywordsregardingtheirrelevance142
inarticlesaswellasexaminepotentialarticlestoidentifyiftheywereeligible.Duplicateswereremovedinthe143
process.144
Threeinvestigators(YZ,EPW,andNM)screenedthetitleandabstract:eacharticleretrievedwas145independentlyassessedbytworeviewerstodetermineitseligibilityforfull-textreview.Conflictswereresolved146bydiscussionandvalidationfromathirdreviewer.Afterinitialappraisal,weretrievedfulltextsofeligible147articles.148
Dataanalysis149
Datawereextractedfromindividualarticlesindependentlybytworeviewers(ofYZ,EPW,andNM)andchecked150
bythethirdrevieweraccordingtocriteriainastandardizedextractionform.Alldataextractionwascross-151
checked,anddisagreementswereresolvedbydiscussionorreferraltothethirdreviewer.Informationextracted152
includedyearofpublication,country,population,socialdeterminantsincludedinthemachinelearning153
algorithm,machinelearningalgorithms,cardiovasculardiseaseoutcomes,datasourceandperformanceofthe154
algorithms.Foreacharticle,wedefinedseveralcriteriatoassessthequalityofthestudybasedonbestpractices155
inmachinelearning36including(1)whethermachinelearningmodelperformancewasevaluated;(2)whethera156
hyperparameter(aparameterwhosevalueisusedtocontrolthelearningprocess)tuningprocesswasdescribed;157
(3)whetherdata-drivenvariableselectionwasperformed;(4)whethermethodswereusedtospecifically158
interpretthecontributionofincludedvariablesintheprediction.Eachitemwasscoredasno(notpresent),159
unclear,oryes(present),andthensummarizedalongsideallitemstogetastudyqualityscore.160
161
RoleoftheFundingSource162
Thefunderofthestudyhadnoroleinstudydesign,datacollection,dataanalysis,datainterpretation,orwriting163
ofthereport.Thecorrespondingauthorhadfullaccesstoallthedatainthestudyandhadfinalresponsibilityfor164
thedecisiontosubmitforpublication.165
166
Results167
Ourdatabasesearchidentified2728distinctarticles;afterafull-textreviewof298papers,96wereincludedin168
thesystematicreview(Figure2).Amongtheincludedstudies,oneofthestudiesuseddatafromaclinicaltrial,169
whiletheothersutilizedobservationaldata.Oftheobservationalstudies,datafromcohortstudieswasthemost170
frequent(34studies),followedbydatafromelectronicmedicalrecords(32studies),surveys(14studies)and171
datafromopen-accessrepositoriesofregistryornationalsurveydata(7studies)(e.g.ScientificRegistryof172
TransplantRecipientsRegistry37).Mostoftheobservationaldatawerestructureddata(clearlydefineddata173
features),while9studiesincludedunstructureddata(e.g.electrocardiogram,imageandheartsound).The174
earliestyearofpublicationwas1992(artificialneuralnetworkalgorithm)38,andpublicationsfulfillingour175
inclusioncriteriahavebeenincreasingovertime(Figure3).Figure4summarizesvariables(4A),outcomes(4A),176
authorlocations(4B)andtypesofvenueswerestudieswerepublished(4C).Moredetailsonthedatasources177
andpopulationsincluded,alongwithallstudydetailsareinTableS1.178
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 5: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/5.jpg)
5
179
Socialdeterminantandvariablesinthecausalpathway180
Includedstudiesreporteddiversevariablesacrosssocialdeterminantsandvariablesconsidered,including181
race/ethnicity,education,maritalstatus,occupation/employment,individualorhouseholdincome,medical182
insurance,areaofresidence(e.g.urbanversusruraloreasternvs.westernUSA)andothercommunity-levelfactors183
ofdeprivation,incomeandeducationandenvironmentalpollutantsaswellassmoking,alcoholconsumption,184
physicalactivities,substanceabuseanddiet.Inmoststudies,genderandagewereincludedasstandardvariables185
collectedinthesurveyorEHR.Afewstudiesassessedphysicalactivitiesanddietasmodifiableriskfactorsforearly186
preventionofCVD.14,39Halfofthestudiesreportedfeatureimportanceofvariables,inwhichage,gender,smoking187
(e.g.currentsmoking/pastsmoking/non-smoking)andBMIweremostfrequentlyreportedtocontribute188
significantlytotheCVDoutcomeprediction.Otherfrequentlyreporteddeterminantsincludingrace/ethnicity,189
alcoholconsumption(e.g.dailyintakeoralcoholism),andphysicalactivity/exercise(e.g.weeklyexercisetime).190
Besidesage,gender,BMIandsmokingwhichwerefrequentlyreportedinallCVDoutcomes,alcoholconsumption191
andphysicalactivitieswerefrequentlyassociatedwithstrokewhileBMIwasfrequentlyassociatedwithcoronary192
arterydisease.Thetoptenvariablesconsideredinextractedpapers,andtheirfrequency,areillustratedinFigure193
4A,whichincludemaritalstatus,education,incomeandrace/ethnicityasthemostcommonsocialdeterminants.194
Fourofthestudiescomparedmodelperformancewithsocialdeterminantsandwithoutsocialdeterminants;three195
showedsocialdeterminantssignificantlyimprovedprediction,othersshowedimprovedpredictionbyadditionof196
age,genderandrace.40,41Thestudythatshoweddecreasedperformanceaimedtoforecastthepatternofthe197
demandforhemorrhagicstrokehealthcareservicesbasedonairquality;itispossiblethattherelationshipbetween198
specificvariabletestedandoutcomehavelittledirectrelationship.42199
200
Algorithmsandmodeldevelopment201
Themostcommonmachinelearningmethodswereneuralnetwork(NN,36studies),randomforest(RF,28202
studies),anddecisiontrees(DT.21studies).Threestudiesusedunsupervisedmachinelearningalgorithms,such203
asclusteringtogroupCVDrisklevelsorprincipalcomponentanalysis(PCA)toextractfeaturespriorto204
supervisedmachinelearningclassification.14,43,44ThemostfrequentlyusedalgorithmsaredescribedinTable1.205
Ofthe35studiesusingneuralnetworks,12usedonehiddenlayer,23usedmultiplehiddenlayers,including206
mostcommonlythree-layerperceptron,convolutionalneuralnetworkandrecurrentneuralnetwork.Herewe207
refertothesestudiescollectivelyas“neuralnetworks”(NN)asdeeplearningtypicallyreferstoanneural208
networkwithmultiplelayers.45Ofthe42studiesincludingmultiplemachinelearningalgorithms,randomforest209
(9studies)andneuralnetwork(9studies)weremostfrequentlyreportedasthebestperformingmachine210
learningalgorithms.FormostcommonlystudiedCVDoutcomes,randomforestwasfrequentlyreportedtohave211
thebestpredictionforstrokewhilesupportvectormachine(SVM)performedbestforcoronaryarterydisease.212
Therewere24studiesthatcomparedmachinelearningalgorithmswithstandardlinearregression,logistic213
regressionorsurvivalanalysis;amongthose21showedimprovedperformancewithmachinelearning.One214
studyofriskpredictionforin-hospitalmortalityinwomenwithST-elevationmyocardialinfarctionusingdata215
fromtheNationalInpatientSampleintheUnitedStates,foundcomparableperformanceusingrandomforestand216
logisticregression.46Inanotherstudy,neuralnetworkmodelsforpredictionofacutecoronarysyndromesusing217
clinicaldataandNNshowedsimilarperformancetologisticregressioninpredictingacutecoronarysyndrome;218
however,only13variableswereconsidered.47Athirdstudyonpredictingadversecardiovasculareventsby219
modelsintegratingstress-relatedventricularfunctionalandangiographicdatashowedthatwhilealogistic220
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 6: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/6.jpg)
6
modeldemonstratedbetterperformanceinthistaskandimplementation,aBayesiannetworkmodelshowed221
goodperformanceandalsowashighlightedasbeingbetteratdefiningcausalrelationships,andthususefulfor222
designingfuturemodelsinwhichnewvariablescanbeincorporatedinthepredictiontask.48223
224
Model,validationandperformanceandstudyquality225
Moststudiesevaluatedtheperformanceofmachinelearningalgorithm(s)developed.Areaunderthereceiver226
operatingcharacteristiccurve(AUC)wasthemostcommonevaluationmetricused(45)studies,followedby227
sensitivity(43studies),specificity(32studies)andaccuracy(32studies).Atleastthreeofthefourmetricswere228
usedin31studies.Otherevaluationmetricsusedincludedaccuracy,positivepredictivevalue,negative229
predictivevalueandF1-score,whichistheharmonicmeanoftheprecisionandrecall,commonlyusedto230
evaluatemachinelearningmethodsviatheirbalanceofthesemetrics.Externalevaluationwasperformedin11231
studies,whereintheauthorstestedthemachinelearningmodelsdevelopedinonehospitalonanotherhospital232
orpopulation.Forexample,onestudyspecificallytestedthegeneralizabilityofarecurrentneuralnetworkmodel233
forpredictingheartfailureriskinalargedatasetfrom10hospitals;evaluatingtheperformanceofamodel234
trainedoneachhospital’straining(andvalidation)setsoverthe10hospitals’testsets.Theyalsoevaluatedthe235
modelthattrainedonallhospitals’trainingsetsoverthe10hospitals’testsets.41andanotheruseddatafromone236
hospitaltotrainneuralnetworkmodelsfordiagnosisofacutecoronarysyndromeandtestedthemodelondata237
fromtwootherhospitals.47238
Amongthosereported,mostAUCwerehigherthan0.70(Figure5).Asmoststudieswerepublishedin239
biomedicalandclinicaljournals,moststudiesexplicitlyinterpretedthefindingsandtheirrelevancytoclinical240
applications.Almosthalf(40/96)ofthestudiescomparedmorethanonemachinelearningalgorithm,ofwhich241
RandomForestwasmostcommonlythebestperformingmodel.Themeanscoreofincludedstudiesinthe4-item242
qualityassessmentscale(basedonevaluationofML,data-drivenselectionoffeatures,hyperparametertuning243
description,interpretationofthemodel)was3.34.Halfofthestudies(49)hadfullscoresand30studiesmissed244
oneofthefouritems.Commonlymisseditemsweredata-drivenfeatureselectionanddetailsofhyperparameter245
tuning(cross-validationorgridsearchstrategieswereutilizedin68studiestotunehyperparameters;other246
studiesdidn’tgivedetailsabouthyper-parametertuningprocess).Halfofallthestudiesutilizedadata-driven247
selectionmethodtoidentifyfeaturesbeforefittingmachinelearningmodels,whichisdefinedasextractinga248
subsetofusefulvariablesamongtheoriginalvariablesandtransformingdatafromahigh-toalow-dimensional249
space.49Asdeeplearningmodelstoextractfeatureswhiletraining,thosestudiesdidnotalwaysincludeafeature250
selectionprocess.251
252
Discussion253
Toourknowledge,thisisthefirstsystematicreviewtoillustratehowmachinelearningisbeingusedto254
integrateSDoHincardiovasculardiseasepredictionmodels.Thisreviewdistillswhichtypesofalgorithmsand255
SDoHandrelatedvariableshavebeenconsideredandresultingperformance.Wefoundthattheflexibilityof256
machinelearningmodelshasprovedusefulinCVDpredictionmodels,withthemcommonlyperformingbetter257
thanregressionapproaches.WefindthatmodelsthatconsiderSDoHandrelatedvariablesalsobenefitfrom258
flexiblemodelingapproaches,withneuralnetworksconsistentlyoutperformingregressionacrossallCVD259
outcomes.260
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 7: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/7.jpg)
7
Broadly,wefoundseverallimitationsinthecontentcoveredbyincludedpapers.First,thestudieswerehighly261
skewedtooriginatefromUSA,EuropeandChina,withlower-incomelocationsnotbeingwellrepresented.262
Moreover,wefoundthattheraceandethnicitydistributioninsomestudieswasalsonotveryrepresentativeof263
underlyingpopulations.ThisisparticularlystrikinggiventhehighandincreasingCVDburdenandchanging264
socio-environmentalcircumstancesinlower-incomecountriesandregionsanddisparitiesinCVDburden.The265
varianceofsocialdeterminantsincorporatedintomodels,andthustheperformanceandapplicabilityofthose266
modelsacrosscontexts,willbedecreasedwithlessdiversityinthestudysample.50SDoHvariablesthemselves267
werealsonotveryfrequentlyincluded,withonlymaritalstatus,education,incomeandrace/ethnicityinthetop268
ten.EnvironmentalattributesthathavebeenshownasimportantmodifiablecomponentsofCVDrisksuchas269
greenspacesandstress32,51,52wereveryfew,andeventhenwereverybroad(e.g.regionofcountry).42Ifsuch270
area-basedvariablesareincluded,machinelearningmayalsoproveusefultounweavethestrandsof271
environmentalinfluencesbutalsointegratetheeffectsofthevariouscomponentsoftheenvironmentintoa272
comprehensivemodel.32Wealsofindthatmodelsdidnottakeintoaccountsocialprocessesassociatedwith273
socioeconomicconditionsacrossthelifecourse.Socioeconomicposition,psychosocialfactorsandbehaviors274
duringadolescenceandyouthareimportantlikelytobeimportantinthedevelopmentofCVDandprecursors275
(dyslipidemia,hypertension,andsmoking).53Finally,studiesgenerallyincludedgenderinterchangeablywithsex,276
whichprecludesconsiderationofthesocially-determinedaspectofgender.54277
Despitetheselimitations,ourresultslargelyfoundsocialdeterminantsandvariablesconsideredtoimprove278
modelperformance.Intermsofalgorithms,severaltypesofmachinelearningalgorithmswereevaluated,with279
resultsshowingthatwhencomparedwithinstudies,themostflexiblemodelssuchasneuralnetworksand280
randomforestmodelswerebestperforming.Neuralnetworksalsomostcommonlyoutperformedregression281
models.Thisisunderstoodtobethecasebecauseneuralnetworksincludehiddenlayerswhichcantakeinto282
accountmorecomplexrelationsinthedata,andthereforethismaybeanotherpossibleexplanationforthe283
improvedperformance.55,56Moreover,recentstudiesuncoveringnetworkandspillovereffects(social284
environment)andshareddecision-making57involvedinphysicalactivity,58,59diet60andsmoking61indicatethat285
thepathwaysthatinformthesebehaviorsareintricate.However,thismayilluminateanopportunityformachine286
learning,whichbasedonflexibility,canhelpcapturesuchcomplexinteractions.287
TheconstraintsonincludeddataarelikelyduetodifficultiesincapturingcertainSDoHvariablesandlinking288
themwithindividualrecordsindatabasesusedinmanyoftheincludedstudies.Studieshavelargelyusedsocial289
variablesfromavailabledatasources;commonlythoseintheelectronichealthrecord.Theuseofflexible,290
machinelearningmodelsalsobringconcernsregardinginterpretabilityandpotentialover-fittingtodata,62291
thoughthiswasnotacommondiscussiontopicacrossallpapers.Thisislikelybecausemostmodelsselected292
variablesbasedonpriorclinicalsignificance,thuspredictionperformancewouldbebasedonsuchfactorswhich293
areknowntoberelevanttoCVDevenifthespecificimportanceofeachvariablewasnotmeasured.Furthermore,294
mostpapers(66)papersusedmethodssuchasautomaticrelevancedetermination55orfeatureselection63to295
examineand/orranktheimportanceofvariablesinmachinelearningmodels.Thiswasthecaseevenasarticles296
werepublishedinavarietyofvenues(Figure4D).297
Whilethisisthefirstreviewthatgivesfindingsrelatedtotheuseofmachinelearningandsocialdeterminants298
forCVDprediction,thereareindividualstudiesthatsupportcomponentsofthefindingsofthisstudy.First,299
machinelearningingeneralhasshownpromisewithrespecttocardiovasculardiseaseprediction.64-66Compared300
totheestablishedAmericanCollegeofCardiology/AmericanHeartAssociationriskcalculatortopredict301
incidenceandprognosisofASCVD,20previousworkhasshownthatmachine-learningalgorithms(especially302
randomforest,gradientboostingmachinesandneuralnetworks)werebetteratidentifyingindividualswhowill303
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 8: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/8.jpg)
8
developCVDandthosewhowillnot.17-19ThesestudieshaveattributedthistothefactthatstandardCVDrisk304
assessmentmodelsmakeanimplicitassumptionthateachriskfactorisrelatedinalinearfashiontoCVD305
outcomesandsuchmodelsmaythusoversimplifycomplexrelationshipswhichincludelargenumbersofrisk306
factorswithnon-linearinteractions.Theroleofsocialdeterminantsincardiovasculardisease(notspecifically307
machinelearning-related)hasbeenstudiedthroughseveralpapersandsystematicreviews.Whilefull308
summariesofthisworkhavebeenperformedelsewhere,11wenotethattherehavebeenseveralstudiesof309
variousproximalanddistalsocialdeterminantsandcardiovasculardisease.Ingeneral,studiesindicatethatthe310
changingburdenofdiseaseduetosocietalandenvironmentalconditions,aswellasincreasingadvancesin311
treatmentandpreventionhavenotbeensharedequallyacrosseconomic,racialandethnicgroups,compelling312
theneedforbroadrangeconsiderationofsocialdeterminantsinCVDprediction.11,31Finally,themodelsthathave313
incorporatedsocialdeterminantsandmachinelearningforCVDpredictionalsoreflectlimitationsofmany314
machinelearningalgorithmsthathavebeenhighlightedrecently,whicharebasedonhomogenouspopulations,315
particularwithrespecttorace(capturedthroughthelimitedgeographicdiversityinFigure4C).17316
Ourreviewwaslimitedinseveralaspects.First,theincludedstudiesevaluateddifferenttypesof317
cardiovascularoutcomes,andheterogeneityofoutcomemetricsmakesitdifficulttocomparemachinelearning318
performanceacrossstudies.Thepopulationconsideredalsoincludessamplesfromdifferentdatasources,319
hospitalsandcountrieswhichtakentogethermakethecomparisonacrossstudiesnotstandardized.Third,most320
studiesdidnotevaluateexternalvalidity,leavingtheapplicabilityofthealgorithmstootherpopulationsor321
healthcaresettingsinconclusive.Fourth,thereviewwasalsolimitedtostudiespublishedinEnglish,whichmight322
havecreatedsomebiasinthearticlesthatwereultimatelyretainedfortheanalysis.323
Findingsemphasizetheneedtocomprehensivelycapturebothproximalanddistalsocialdeterminant324
variablesinmodels.Wheremechanismsarenotwellunderstood,machinelearningcanalsobeusedto325
understandrelationshipsbetweensocialandbiologicalvariablescomprehensively.Forexample,raceisoften326
conceptualizedasaproxyforvariablesforsocioeconomicpositionorculturalfactorsandbetterwaystocapture327
aswellasunderstandrelationshipsbetweenthesefactorsandtheirimpactonCVDriskshouldbeinvestigated.328
Indeed,identificationofpotentialmediatingandmoderatingfactorsinthesepathwaysofsocialdeterminants329
willinformpublichealthinterventions.Improvedconstructswillalsohelpinincorporationofenvironmentaland330
behavioralvariablessuchasdietandphysicalactivitywhichwerenotwellrepresentedincurrentstudies.Our331
findingssupportbodiesofworkthatpromoteinclusionofsuchinformationintheelectronichealthrecord67,68332
andweaddreasoningthatthiswouldalsoenablestudyofsocialdeterminantsinmachinelearninginlarge333
enoughsamplesizestoreduceoverfittingofmodels.Finally,resultsemphasizetheneedforstudiesthatinclude334
morediversepopulationswithvariedenvironmentalandsocialinfluences,whichwouldrepresentandensure335
validityofpredictionmodelsacrossthesediverseinteractions50toimprovecardiovasculardiseasepredictionin336
diversesettings,inparticularthosewherediseaseriskisincreasing.337
338
Researchincontext339
Evidencebeforethisstudy340
Whiletherearenoreviewsthatspecificallyaddresssocialdeterminants,machinelearningandcardiovascular341
disease(CVD),thelatestresearchoncardiovasculardiseaseindicatestheimperativerelevanceofsocial342
determinants.Societalandenvironmentalconditionsdistributedunequallyamonggroupsaredrivinga343
significantandincreasingglobalburdenofcardiovasculardiseasesparticularlytolowandmiddle-income344
countriesaswellaslower-socioeconomicgroupsinhigh-incomecountries.Atthesametime,researchshows345
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 9: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/9.jpg)
9
thatmachinelearningoffersthepotentialforcapturingflexiblerelationshipscomparedtothelinear346
relationshipsassumedintypicalCVDriskscores,whichisofparticularrelevancefortheconsiderationofsocial347
variables.SelectmodelshaveexaminedtheperformanceofcertainsocialvariablesinCVDprediction,including348
thosethatusemachinelearning,butgiventheserapidlyadvancingresearchareas,asystematicexaminationinto349
whichsocialdeterminantshavebeenmodelledandhowdifferentmethodshaveperformed,isneeded.350
Addedvalueofthisstudy351
Througharigorousandcomprehensivesystematicreview,weassessedthestate-of-theartmethodsprediction352
ofthetwotypesofCVDwiththehighestrecentmortality(ischemicheartdiseaseandstroke),thatusemachine353
learningandincorporatesocialdeterminantsandrelatedvariablesintheircausalpathway.Weshowthat354
environmentalandarea-baseddeterminantsarelackingfrommostmodels.Machinelearning,especiallyflexible355
modelssuchasneuralnetworks,showgoodperformanceinrelationtoregressionmodels.Weaccountedfor356
modelandinformationgaindifferencesacrossbyexaminingwithinstudyperformanceofbestalgorithm357
comparedtoregression.Weassessedthequalityoftheirimplementationsviabest-practicesfromthemachine358
learningliterature,findingthatqualitywasgenerallyrigorous.Finally,weshowthattheoriginofstudiesis359
highlyskewedtoUSAandmiddle/high-incomecountriesinEuropeandAsia,whichindicatesthatknowledge360
regardingthediversityofsocialdeterminantsandtheirimpactislimited.361
Implicationsofalltheavailableevidence362
WiththesignificantburdenofCVDandlargeburdeninlow-andmiddle-incomecountries,thisworkdirectly363
informshowwecanaugmentpredictionmodels,usingstateoftheartmachinelearningmethods,whilealso364
takingintoaccountgrowingsocial,environmentalriskfactorsthatshapeCVDrisk.Accordingtothefindingsof365
thisreview,strategiestocapturesocialvariables,especiallyenvironmentaldeterminantsareneededinthe366
electronichealthrecorddatabasesfromwhichmachinelearningmethodsarecommonlydeveloped.Finally,367
studiesto-daterepresentanarrowsetoflocations;weneedtosupportstudiesinlow-andmiddle-income368
countriestoidentifyandtailorourunderstandingtothespecificsocialdeterminantsinthesepopulations.369
370
Acknowledgements371
WethankDoriceVieiraforvaluablehelpwiththesearchprocess.372
373
374
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 10: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/10.jpg)
10
FiguresandTables375
376
377
Figure1:Socio-ecologicalframeworkofhealth;conceptualmodelusedinthestudy,adapted378fromHealthyPeople2020,theUnitedStatesfederalgovernment’snationalhealthagenda379
380
381
Figure2:PRISMAflowchartofstudyreviewprocessandexclusionofpapers382
Innate individual traits and biological
factors
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 11: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/11.jpg)
11
383
384
Figure3:NumberofMLalgorithmsusedinpublicationsbyyearandtype(NN:neuralnetwork,RF:randomforest,385EN:ensemblemethods(e.g.Adaboost,gradientboosting,baggeddecisiontree),SVM:supportvectormachine,DT:386decisiontree,NB:naïveBayes,BN:Bayesiannetwork,Reg:regularizationmethods(ridge/lassoregression),Other:387multilayerperceptron,maximumentropy,adversarialnetwork,lineardiscriminantanalysis,k-nearestneighbors,388recursivepartitioning,clustering,quadraticdiscriminant,radialbasisfunctionkernel)389 390
391
Figure4:(A)Toptensocialdeterminantandrelatedvariablesincludedbasedonstudyinclusioncriteria(social392determinantsinblue,othervariablesinred),(B)mostfrequentlyreportedCVDoutcomes(AAA:abdominalaortic393aneurysm,LOS:lengthofstay,ACS:acutecoronarysyndrome,HF:heartfailure,MI:myocardialinfarction,CAD:394coronaryarterydisease)(C)countriesofcorrespondingauthorsand(D)journaltypesofpublicationreportedin395systematicreviewpapers(EVS:environmentalsciences)allwithrespecttothepercentageofincludedpapersthey396appearin 397
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 12: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/12.jpg)
12
398
399
Figure5:DifferenceofAreaundertheROCcurvebetweenMLandLRbyMLalgorithmtype400
Table1:Summaryofmachinelearningalgorithms,bestperformingandsamplesizesusedinthestudies401
Algorithm Number(%)ofpapers**
Numasbestalgorithmwhenmultiplealgorithms
SampleSize
<100 100-1000 1000-10,000 >10,000
NeuralNet 35(36.5%) 9 5 8 13 9
RandomForest 32(33.3%) 9 0 3 14 15
DecisionTree 21(21.9%) 2 2 6 8 5
SupportVectorMachine 20(20.8%) 7 1 3 11 5
Ensemble 17(17.7%) 5 0 2 7 8
BayesianNetwork 13(13.5%) 1 1 4 2 6
NaïveBayes 12(12.5%) 1 2 2 5 3
Regularization* 11(11.5%) 0 0 2 6 3
Other 28(29.2%) 1 4 6 12 6
*regularization included Lasso, Ridge and Elastic net 402**Note:eachpapercouldincludemultipleversionsormultiplealgorithms403404
405
406
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 13: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/13.jpg)
13
407
Appendix408
TableS1:Summaryofallincludedpapers(attachedatend).409
SearchtermsandsearchstrategiesPubMed:410(Socialdeterminantsofhealth[Mesh]ORdemography[Mesh]ORdemographic*[tw]ORrace[tw]ORracism[Mesh]411OR“ethnicity”[tw]ORgenderidentity[Mesh]ORgender[tw]ORsocial[tw]ORsocialsupport[Mesh]OR412income[Mesh]OReducation[Mesh]ORemployment[Mesh]ORmaritalstatus[Mesh]ORoccupation[tw]OR413“healthinsurance”[tw]ORhealthliteracy[Mesh]ORmarriage[tw]ORinsurance[tw]ORhousing[tw]OR414home[tw]ORreligion[tw]ORsocioeconomicfactors[Mesh]ORsocialclass[Mesh]OR“socialstatus”[tw]OR415“accesshealthcare”[tw]ORhealthcaredisparities[Mesh]OR“financialdifficulties”[tw]ORpoverty[Mesh]OR416“socialdisparity”[tw]ORunemployment[Mesh]ORsocialcondition[Mesh]OR“socialinequality”[tw]OR417vulnerablepopulation[Mesh]OR“socialenvironment”[tw]ORsociodemographic*[tw]ORsociological418factors[Mesh]ORbodymassindex[Mesh]ORphysicalactivity[Mesh]ORdiet[Mesh]ORsmoking[Mesh]OR419“alcoholconsumption”[tw]ORtobacco[Mesh]OR“substanceuse”[tw]OR“physicalinactivity”[tw]OR“substance420abuse”[tw]ORhealth*behavi*r*[tw]ORhealth*service[tw]ORenvironment[Mesh]OR“livingenvironment”OR421“birthplace”[tw]OR“pollution”[tw]ORresidencecharacteristics[Mesh]OR“geographiclocations”[tw]OR422“rural”[tw]OR“urbanhealth”[tw]ORneighborhood[tw]ORcultur*[tw])AND(machinelearning[Mesh]OR423supervisedmachinelearning[Mesh]ORdecisiontrees[Mesh]ORneuralnetworks[Mesh]OR“NaiveBayes”[tw]424OR“kNN”[tw]ORsupportvectormachine[Mesh]ORperceptron[tw]OR“radialbasisfunction”[tw]OR“Bayesian425Network”[tw]OR“randomforest”[tw]OR“classificationtree”[tw]OR“elasticnet”[tw]OR“multilayer426perceptron”[tw]ORlasso[tw]ORridge[tw]OR“nearestneighbor”[tw]ORdeeplearning[Mesh]ORboosting[tw]427ORbagging[tw]ORensemble[tw])AND(“atheroscleroticcardiovasculardisease”[tw]ORcardiovascular428abnormalities[Mesh]ORheartdisease*[tw]ORheartarrest[Mesh]ORmyocardialischemia[Mesh]ORarterial429occlusivediseases[Mesh]ORcerebrovasculardisorders[Mesh]ORperipheralvasculardiseases[Mesh])Embase:430(exp”socialdeterminantsofhealth”/orexp”demography”/ordemographic*or*race”/or”racism”or431*ethnicity”/orexp”genderidentity”/or”gender”or”social”orexp”socialsupport”/orexp”education”/orexp432”employment”/or”income”or”maritalstatus”orexp”occupation”/orexp”healthinsurance”/or”health433literacy”/orexp”marriage”/or”insurance”orexp”housing”/or”home”or”religion”or”socioeconomicfactors”434orexp”socioeconomics”/or”socialclass”orexp”healthcareaccess”/orexp”healthcaredisparities”/or435”financialdifficulties”orexp”poverty”/or”socialdisparity”orexp”unemployment”/orexp”socialstatus”/or436”socialinequality”orexp”vulnerablepopulation”/or*socialenvironment/orsociodemographic*or*body437mass/or*physicalactivity/or*diet/orexp”smoking”/orexp”alcoholconsumption”/or”tobaccouse”orexp438”substanceuse”/orexp”physicalinactivity”/orexp”substanceabuse”/or*environment/or*birthplace/orexp439”pollution”/or”residencecharacteristics”or*geography/or”neighborhood”orcultur*orexp”ruralhealth”/or440exp”urbanhealth”/)and(exp”machinelearning”/or”supervisedmachinelearning”orexp”decisiontrees”/or441”neuralnetworks”orexp”artificialneuralnetwork”/or”NaiveBayes”orexp”Bayesianlearning”/orexp”k442nearestneighbor”/or”knn”orexp”supportvectormachine”/or”SVM”orexp”perceptron”/orexp”radial443basedfunction”/or”BayesianNetwork”orexp”randomforest”/or”classificationtree”or”elasticnet”or444”multilayerperceptron”or”lasso”or”ridge”orexp”deeplearning”/or”boosting”or”ensemble”)and445(”atheroscleroticcardiovasculardisease”or*coronaryarteryatherosclerosis/or”cardiovascularabnormalities”446orexp”cardiovascularmalformation”/or*heartdisease/orexp”heartarrest”/orexp”myocardialischemia”/or447”arterialocclusivediseases”orexp”cerebrovasculardisorders”/orexp”peripheralvasculardiseases”/)448WebofScience:449TS=((“Socialdeterminantsofhealth”ORdemographyORdemographic*ORraceORethnicityOR“gender450identity”ORgenderORsocialOR“socialsupport”ORincomeOReducationORemploymentOR“maritalstatus”451ORoccupationOR“healthinsurance”ORmarriageORinsuranceORhousingORreligionOR“socioeconomic452factors”OR“socialclass”OR“accesshealthcare”OR“healthcaredisparities”OR“financialdifficult”ORpoverty453
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 14: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/14.jpg)
14
OR“socialdisparity”ORunemploymentOR“socialcondition”OR“socialinequality”OR“vulnerablepopulation”454OR“socialenvironment”ORsociodemographic*OR“bodymassindex”OR“physicalactivity”ORdietORsmoking455OR“alcoholconsumption”ORtobaccoOR“substanceuse”OR“physicalinactivity”OR“substanceabuse”OR456environmentORbirthplaceORpollutionOR“residencecharacteristics”OR“geographiclocations”OR“rural”OR457“urbanhealth”)AND(“machinelearning”OR“supervisedmachinelearning”OR“decisiontrees”OR“neural458networks”OR“NaiveBayes”ORkNNOR“supportvectormachine”OR“perceptron”OR“radialbasisfunction”OR459“BayesianNetwork”OR“randomforest”OR“classificationtree”OR“elasticnet”OR“multilayerperceptron”OR460“lasso”OR“ridge”OR“nearestneighbor”OR“deeplearning”OR“boosting”OR“ensemble”)AND461(“atheroscleroticcardiovasculardisease”OR“cardiovascularabnormalities”ORheartdisease*OR“heartarrest”462OR“myocardialischemia”OR“arterialocclusivediseases”OR“cerebrovasculardisorders”OR“peripheral463vasculardiseases”))464IEEE:465(“Socialdeterminantsofhealth”ORdemographyORdemographic*ORraceORethnicityOR“genderidentity”OR466genderORsocialOR“socialsupport”ORincomeOReducationORemploymentOR“maritalstatus”OR467occupationOR“healthinsurance”ORmarriageORinsuranceORhousingORreligionOR“socioeconomicfactors”468OR“socialclass”OR“accesshealthcare”OR“healthcaredisparities”OR“financialdifficult”ORpovertyOR“social469disparity”ORunemploymentOR“socialcondition”OR“socialinequality”OR“vulnerablepopulation”OR“social470environment”ORsociodemographic*OR“bodymassindex”OR“physicalactivity”ORdietORsmokingOR471“alcoholconsumption”ORtobaccoOR“substanceuse”OR“physicalinactivity”OR“substanceabuse”OR472environmentORbirthplaceORpollutionOR“residencecharacteristics”OR“geographiclocations”OR“rural”OR473“urbanhealth”)AND(“machinelearning”OR“supervisedmachinelearning”OR“decisiontrees”OR“neural474networks”OR“NaiveBayes”ORkNNOR“supportvectormachine”OR“perceptron”OR“radialbasisfunction”OR475“BayesianNetwork”OR“randomforest”OR“classificationtree”OR“elasticnet”OR“multilayerperceptron”OR476“lasso”OR“ridge”OR“nearestneighbor”OR“deeplearning”OR“boosting”OR“ensemble”)AND477(“atheroscleroticcardiovasculardisease”OR“cardiovascularabnormalities”ORheartdisease*OR“heartarrest”478OR“myocardialischemia”OR“arterialocclusivediseases”OR“cerebrovasculardisorders”OR“peripheral479vasculardiseases”)480
References481
1 WorldHealthOrganization.Cardiovasculardiseases(CVDs)factsheet.WorldHealthOrganization482(2017).483
2 Deaton,C.etal.Theglobalburdenofcardiovasculardisease.EuropeanJournalofCardiovascularNursing48410,S5-S13(2011).485
3 Heidenreich,P.A.etal.ForecastingtheimpactofheartfailureintheUnitedStates:apolicystatement486fromtheAmericanHeartAssociation.Circulation:HeartFailure6,606-619(2013).487
4 Kuzawa,C.W.&Sweet,E.Epigeneticsandtheembodimentofrace:developmentaloriginsofUSracial488disparitiesincardiovascularhealth.AmericanJournalofHumanBiology:TheOfficialJournalofthe489HumanBiologyAssociation21,2-15(2009).490
5 Carnethon,M.R.etal.CardiovascularhealthinAfricanAmericans:ascientificstatementfromthe491AmericanHeartAssociation.Circulation136,e393-e423(2017).492
6 Stuckler,D.,McKee,M.,Ebrahim,S.&Basu,S.Manufacturingepidemics:theroleofglobalproducersin493increasedconsumptionofunhealthycommoditiesincludingprocessedfoods,alcohol,andtobacco.PLoS494Med9,e1001235(2012).495
7 Lakka,T.A.etal.Sedentarylifestyle,poorcardiorespiratoryfitness,andthemetabolicsyndrome.496Medicine&ScienceinSports&Exercise(2003).497
8 Health,W.C.o.S.D.o.&Organization,W.H.Closingthegapinageneration:healthequitythroughaction498onthesocialdeterminantsofhealth:CommissiononSocialDeterminantsofHealthfinalreport.(World499HealthOrganization,2008).500
9 Joseph,P.etal.Reducingtheglobalburdenofcardiovasculardisease,part1:theepidemiologyandrisk501factors.Circulationresearch121,677-694(2017).502
10 Tillmann,T.etal.PsychosocialandsocioeconomicdeterminantsofcardiovascularmortalityinEastern503Europe:Amulticentreprospectivecohortstudy.PLoSmedicine14,e1002459(2017).504
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 15: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/15.jpg)
15
11 Havranek,E.P.etal.Socialdeterminantsofriskandoutcomesforcardiovasculardisease:ascientific505statementfromtheAmericanHeartAssociation.Circulation132,873-898(2015).506
12 Theodore,R.F.etal.Childhoodtoearly-midlifesystolicbloodpressuretrajectories:early-lifepredictors,507effectmodifiers,andadultcardiovascularoutcomes.Hypertension66,1108-1115(2015).508
13 Cooper,R.etal.Trendsanddisparitiesincoronaryheartdisease,stroke,andothercardiovascular509diseasesintheUnitedStates:findingsofthenationalconferenceoncardiovasculardiseaseprevention.510Circulation102,3137-3147(2000).511
14 He,X.,Matam,B.R.,Bellary,S.,Ghosh,G.&Chattopadhyay,A.K.cHDRiskMinimizationthroughLifestyle512control:MachineLearningGateway.Scientificreports10,1-10(2020).513
15 Watson,D.S.etal.Clinicalapplicationsofmachinelearningalgorithms:beyondtheblackbox.Bmj364514(2019).515
16 Rajkomar,A.,Dean,J.&Kohane,I.Machinelearninginmedicine.NewEnglandJournalofMedicine380,5161347-1358(2019).517
17 Alaa,A.M.,Bolton,T.,DiAngelantonio,E.,Rudd,J.H.&vanderSchaar,M.Cardiovasculardiseaserisk518predictionusingautomatedmachinelearning:Aprospectivestudyof423,604UKBiobankparticipants.519PloSone14,e0213653(2019).520
18 Dimopoulos,A.C.etal.Machinelearningmethodologiesversuscardiovascularriskscores,inpredicting521diseaserisk.BMCMedicalResearchMethodology18,179(2018).522
19 Kakadiaris,I.A.etal.MachinelearningoutperformsACC/AHACVDriskcalculatorinMESA.Journalof523theAmericanHeartAssociation7,e009476(2018).524
20 Cook,N.R.&Ridker,P.M.Furtherinsightintothecardiovascularriskcalculator:therolesofstatins,525revascularizations,andunderascertainmentintheWomen’sHealthStudy.JAMAinternalmedicine174,5261964-1971(2014).527
21 Caballero,F.F.etal.Advancedanalyticalmethodologiesformeasuringhealthyageingandits528determinants,usingfactoranalysisandmachinelearningtechniques:theATHLOSproject.Scientific529Reports7,43955(2017).530
22 Seligman,B.,Tuljapurkar,S.&Rehkopf,D.Machinelearningapproachestothesocialdeterminantsof531healthinthehealthandretirementstudy.SSM-populationhealth4,95-99(2018).532
23 People,C.o.L.H.I.f.H.,Health,B.o.P.,Practice,P.H.&Medicine,I.o.Leadinghealthindicatorsfor533healthypeople2020:letterreport.(NationalAcademiesPress,2011).534
24 Council,N.R.&Population,C.o.UShealthininternationalperspective:Shorterlives,poorerhealth.535(NationalAcademiesPress,2013).536
25 Short,S.E.&Mollborn,S.Socialdeterminantsandhealthbehaviors:conceptualframesandempirical537advances.Currentopinioninpsychology5,78-84(2015).538
26 Shiroma,E.J.&Lee,I.-M.Physicalactivityandcardiovascularhealth:lessonslearnedfrom539epidemiologicalstudiesacrossage,gender,andrace/ethnicity.Circulation122,743-752(2010).540
27 Nuttall,F.Q.Bodymassindex:obesity,BMI,andhealth:acriticalreview.Nutritiontoday50,117(2015).54128 Kotsiantis,S.B.,Zaharakis,I.&Pintelas,P.Supervisedmachinelearning:Areviewofclassification542
techniques.Emergingartificialintelligenceapplicationsincomputerengineering160,3-24(2007).54329 LeCun,Y.,Bengio,Y.&Hinton,G.Deeplearning.nature521,436-444(2015).54430 Roth,G.A.etal.Global,regional,andnationalage-sex-specificmortalityfor282causesofdeathin195545
countriesandterritories,1980–2017:asystematicanalysisfortheGlobalBurdenofDiseaseStudy2017.546TheLancet392,1736-1788(2018).547
31 Kreatsoulas,C.&Anand,S.S.Theimpactofsocialdeterminantsoncardiovasculardisease.Canadian548JournalofCardiology26,8C-13C(2010).549
32 Bhatnagar,A.Environmentaldeterminantsofcardiovasculardisease.Circulationresearch121,162-180550(2017).551
33 Cheng,I.,Ho,W.E.,Woo,B.K.&Tsiang,J.T.Correlationsbetweenhealthinsurancestatusandrisk552factorsforcardiovasculardiseaseintheelderlyAsianAmericanpopulation.Cureus10(2018).553
34 Fang,J.etal.AssociationofbirthplaceandcoronaryheartdiseaseandstrokeamongUSadults:National554HealthInterviewSurvey,2006to2014.JournaloftheAmericanHeartAssociation7,e008153(2018).555
35 Lapane,K.L.,Lasater,T.M.,Allan,C.&Carleton,R.A.Religionandcardiovasculardiseaserisk.Journalof556ReligionandHealth36,155-164(1997).557
36 Chen,P.-H.C.,Liu,Y.&Peng,L.Howtodevelopmachinelearningmodelsforhealthcare.Nature558materials18,410(2019).559
37 Hsich,E.M.etal.VariablesofimportanceintheScientificRegistryofTransplantRecipientsdatabase560predictiveofhearttransplantwaitlistmortality.AmericanJournalofTransplantation19,2067-2076561(2019).562
38 Akay,M.Noninvasivediagnosisofcoronaryarterydiseaseusinganeuralnetworkalgorithm.Biological563cybernetics67,361-367(1992).564
39 Shao,Z.,Chen,C.,Li,W.,Ren,H.&Chen,W.Assessmentoftheriskfactorsinthedailylifeofstroke565patientsbasedonanoptimizeddecisiontree.TechnologyandHealthCare27,317-329(2019).566
40 McGeachie,M.etal.Anintegrativepredictivemodelofcoronaryarterycalcificationinarteriosclerosis.567Circulation120,2448(2009).568
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 16: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/16.jpg)
16
41 Rasmy,L.etal.Astudyofgeneralizabilityofrecurrentneuralnetwork-basedpredictivemodelsforheart569failureonsetriskusingalargeandheterogeneousEHRdataset.Journalofbiomedicalinformatics84,11-57016(2018).571
42 Chen,J.etal.MachineLearning-BasedForecastofHemorrhagicStrokeHealthcareServiceDemand572consideringAirPollution.Journalofhealthcareengineering2019(2019).573
43 Cheon,S.,Kim,J.&Lim,J.Theuseofdeeplearningtopredictstrokepatientmortality.International574journalofenvironmentalresearchandpublichealth16,1876(2019).575
44 Jabbar,M.,Deekshatulu,B.&Chndra,P.inInternationalConferenceonCircuits,Communication,Control576andComputing.322-328(IEEE).577
45 Illing,B.,Gerstner,W.&Brea,J.Biologicallyplausibledeeplearning—Buthowfarcanwegowith578shallownetworks?NeuralNetworks118,90-101(2019).579
46 Mansoor,H.,Elgendy,I.Y.,Segal,R.,Bavry,A.A.&Bian,J.Riskpredictionmodelforin-hospitalmortality580inwomenwithST-elevationmyocardialinfarction:Amachinelearningapproach.Heart&Lung46,405-581411(2017).582
47 Harrison,R.F.&Kennedy,R.L.Artificialneuralnetworkmodelsforpredictionofacutecoronary583syndromesusingclinicaldatafromthetimeofpresentation.Annalsofemergencymedicine46,431-439584(2005).585
48 Berchialla,P.,Foltran,F.,Bigi,R.&Gregori,D.Integratingstress-relatedventricularfunctionaland586angiographicdatainpreventivecardiology:aunifiedapproachimplementingaBayesiannetwork.587JournalofEvaluationinClinicalPractice18,637-643(2012).588
49 Chu,C.etal.Doesfeatureselectionimproveclassificationaccuracy?Impactofsamplesizeandfeature589selectiononclassificationusinganatomicalmagneticresonanceimages.Neuroimage60,59-70(2012).590
50 Harper,S.,Lynch,J.&Smith,G.D.Socialdeterminantsandthedeclineofcardiovasculardiseases:591understandingthelinks.Annualreviewofpublichealth32,39-69(2011).592
51 Richardson,E.A.&Mitchell,R.Genderdifferencesinrelationshipsbetweenurbangreenspaceand593healthintheUnitedKingdom.Socialscience&medicine71,568-575(2010).594
52 Steptoe,A.&Kivimäki,M.Stressandcardiovasculardisease.NatureReviewsCardiology9,360-370595(2012).596
53 Pollitt,R.A.,Rose,K.M.&Kaufman,J.S.Evaluatingtheevidenceformodelsoflifecoursesocioeconomic597factorsandcardiovascularoutcomes:asystematicreview.BMCpublichealth5,7(2005).598
54 Phillips,S.P.Definingandmeasuringgender:asocialdeterminantofhealthwhosetimehascome.599InternationalJournalforEquityinHealth4,1-4(2005).600
55 Bishop,C.M.Bayesianmethodsforneuralnetworks.(1995).60156 Dreiseitl,S.&Ohno-Machado,L.Logisticregressionandartificialneuralnetworkclassificationmodels:a602
methodologyreview.Journalofbiomedicalinformatics35,352-359(2002).60357 Smith,K.P.&Christakis,N.A.Socialnetworksandhealth.Annu.Rev.Sociol34,405-429(2008).60458 Suglia,S.F.etal.Whytheneighborhoodsocialenvironmentiscriticalinobesityprevention.Journalof605
UrbanHealth93,206-212(2016).60659 Bahr,D.B.,Browning,R.C.,Wyatt,H.R.&Hill,J.O.Exploitingsocialnetworkstomitigatetheobesity607
epidemic.Obesity17,723-728(2009).60860 Pachucki,M.A.,Jacques,P.F.&Christakis,N.A.Socialnetworkconcordanceinfoodchoiceamong609
spouses,friends,andsiblings.Americanjournalofpublichealth101,2170-2177(2011).61061 Årnes,A.P.&Krokstrand,T.T.TheincidenceandprevalenceofChronicFatigueSyndrome,BackPainof611
unknownorigin,Fibromyalgia,andMyalgiainNorwegianwomen,andtheirassociationtophysical612activity.AprospectivecohortstudyofmaterialfromtheNorwegianWomenandCancer(NOWAC)study,613UiTNorgesarktiskeuniversitet,(2014).614
62 Ahmad,M.A.,Eckert,C.&Teredesai,A.inProceedingsofthe2018ACMinternationalconferenceon615bioinformatics,computationalbiology,andhealthinformatics.559-560.616
63 Ni,Y.etal.Towardsphenotypingstroke:Leveragingdatafromalarge-scaleepidemiologicalstudyto617detectstrokediagnosis.PloSone13,e0192586(2018).618
64 Ambale-Venkatesh,B.etal.Cardiovasculareventpredictionbymachinelearning:themulti-ethnicstudy619ofatherosclerosis.Circulationresearch121,1092-1101(2017).620
65 Sitar-tăut,A.,Zdrenghea,D.,Pop,D.&Sitar-tăut,D.Usingmachinelearningalgorithmsincardiovascular621diseaseriskevaluation.Age1,4(2009).622
66 Weng,S.F.,Reps,J.,Kai,J.,Garibaldi,J.M.&Qureshi,N.Canmachine-learningimprovecardiovascular623riskpredictionusingroutineclinicaldata?PloSone12,e0174944(2017).624
67 Bazemore,A.W.etal.“Communityvitalsigns”:incorporatinggeocodedsocialdeterminantsinto625electronicrecordstopromotepatientandpopulationhealth.JournaloftheAmericanMedicalInformatics626Association23,407-412(2016).627
68 Cantor,M.N.&Thorpe,L.Integratingdataonsocialdeterminantsofhealthintoelectronichealth628records.HealthAffairs37,585-590(2018).629
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 17: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/17.jpg)
17
630
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 18: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/18.jpg)
18
Study Country Study Design SDoH and Related Variables Included
CVD Outcomes Algorithms
Alizadehsani,
Roohallah, et al.
(2018)
Australia Unclear Age, BMI, obesity, sex, smoking Coronary artery
disease
DT, SVM, NB
Tamal, Maruf
Ahmed, et al.
(2019)
Bangladesh Observational
(survey)
Age, physical exercise, smoking Heart disease DT, SVM, NB, QDA,
RF, LR
Al’Aref, Subhi J., et al. (2020)
Canada, Germany,
Italy, Korea,
Switzerland,
US
Prospective observational
Age, BMI, ethnicity, sex, smoking
Coronary artery disease
Gradient boosting
Chan, Ka Lung,
et al. (2018)
China Cohort Age, sex, smoking Stroke NN, SVM, NB
Chen, Jian, et al.
(2019)
China Observational
(EHR)
Environmental pollutants Stroke RF, DT, XGB, SVM,
LR, KNN
Gan, Xiu-min, et
al. (2011)
China Case control Age, alcohol intake, BMI,
education, other body metrics,
physical activities, diet, smoking
Stroke DT
Hsiao, Han CW,
et al. (2016)
China Observational
(EHR)
Age, gender, area level social
determinants
All types of CVD DL
Hu, Danqing, et
al. (2016)
China Observational
(EHR))
Age, smoking Other CVD RF, SVM, NB, lasso,
other*
Huang,
Zhengxing, et al. (2015)
China Observational
(EHR)
Age, gender, smoking Coronary artery
disease
SVM, other
Huang,
Zhengxing, et al.
(2019)
China Observational
(EHR)
Age, gender, smoking Other CVD NN, DL, other
Shao, Zeguo, et
al. (2019)
China Observational
(survey)
Age, alcohol intake, BMI, diet,
physical activities, smoking
Stroke RF, DT
Wan, Eric Yuk
Fai, et al. (2017)
China Retrospective
cohort
Age, BMI, gender, smoking All types of CVD DT
Xu, Yuan, et al.
(2019)
China Retrospective
cohort
Age, alcohol intake, gender,
smoking
Rehospitalization Gradient boosting
Karaolis, Minas
A., et al. (2010)
Cyprus Observational
(EHR)
Age, gender, smoking Myocardial
infarction
DT
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 19: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/19.jpg)
19
Anselmino,
Matteo, et al.
(2009)
Europe Cross-sectional Age, BMI, gender, smoking,
other body metrics
Stroke, myocardial
infarction
NN
Deguen,
Séverine, et
al. (2010)
France Cross-sectional Age, education, gender, income,
occupation, residence, area-level
social determinants
Coronary artery
disease
Other
Baars, Theodor,
et al. (2020)
Germany Cohort Age, BMI, gender, smoking Mortality Other ensemble**
Exarchos, Konstantinos P.,
et al. (2015)
Greece Observational (EHR)
Age, gender, smoking Other NB
Tsipouras,
Markos G., et al.
(2008)
Greece Observational
(EHR)
Age, BMI, gender, smoking,
other body metrics
Coronary artery
disease
NN, DT
Jabbar, M. A, et
al. (2014)
India Observational
(EHR)
Age, gender, residence All types of CVD DT, PCA
Naushad, Shaik
Mohammad, et
al. (2018)
India Case control Age, alcohol intake, BMI, diet
gender, smoking
Coronary artery
disease
Other ensemble, other
Amin, Syed
Umar, et al.
(2013)
India Observational
(survey)
Age, alcohol intake, diet, gender,
physical activities, smoking
All types of CVD NN, other
Afarideh,
Mohsen, et al.
(2016)
Iran Open cohort Age, BMI, gender, smoking,
other body metrics,
All types of CVD NN
Amini, Leila, et al. (2013)
Iran Observational (survey)
Age, alcohol intake, BMI, gender, physical activities,
smoking
Stroke DT, other
Ayatollahi,
Haleh, et al.
(2019)
Iran Observational
(EHR)
Age, gender, marital status,
occupation, residence, smoking
Coronary artery
disease
NN, SVM
Parizadeh,
Donna, et al.
(2017)
Iran Cohort Age, gender, smoking, other
body metrics,
Stroke DT
Shakerkhatibi,
M., et al. (2015)
Iran Case–crossover
design
Age, gender, area-level social
determinants
Hospital admission NN
Berchialla, Paola,
et al. (2012)
Italy Cohort Age, smoking Myocardial
infarction,
mortality
RF, NN, SVM, BN
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 20: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/20.jpg)
20
Bigi, Riccardo, et
al. (2005)
Italy Cohort Age, gender, smoking Myocardial
infarction,
mortality
NN, other
Foltran,
Francesca, et al.
(2011)
Italy Observational
(EHR)
Age, BMI, gender, smoking,
other body metrics
Coronary artery
disease
BN
Pasanisi,
Stefania, et al. (2018)
Italy Cohort Age, gender, smoking All types of CVD NN
Cho, In Jeong et
al. (2020)
Korea Cohort BMI, physical activities, smoking All types of CVD DL
Hae, Hyeonyong,
et al. (2018)
Korea Retrospective
Cohort
Age, BMI, gender, smoking Coronary artery
disease
RF, DT, GB, SVM, NB,
Ridge, other
Kwon, Joon-
myoung, et al.
(2019)
Korea Retrospective
Cohort
Age, BMI, gender, smoking Mortality RF, DL
Juarez-Orozco,
Luis Eduardo, et
al. (2020)
Netherlands Observational
(EHR)
Gender, BMI, smoking Coronary artery
disease, other
ensemble
Tay, Darwin, et
al. (2015)
Singapore Cohort Age, diet, gender, physical
activities, built environment
All types of CVD NN, SVM
Fuster-Parra,
Pilar, et al.
(2016)
Spain Observational
(survey)
BMI, gender, physical activities,
smoking, other body metrics
All types of CVD DT, BN, NB, other
Green, Michael,
et al. (2006)
Sweden Observational
(EHR)
Age, gender, smoking Acute coronary
syndrome
NN
Marshall, Adele H., et al. (2010)
UK Cohort BMI, Smoking Coronary artery disease, mortality
BN
Alaa, Ahmed M.,
et al. (2019)
UK Cohort BMI, diet, physical activities,
residence, smoking
All types of CVD RF, NN, ensemble, GB,
Adaboost
Harrison, Robert
F., et al. (2005)
UK Observational
(EHR)
Gender, smoking Acute coronary
syndrome
NN
He, Xi, et al.
(2020)
UK Observational
(survey)
Gender, smoking Coronary artery
disease
PCA, other
Ayala Solares,
Jose Roberto, et
al. (2019)
UK Observational
(EHR)
Gender, income, smoking, area-
level social determinants
All types of CVD BN
Yang, Hui, et al.
(2015)
UK Observational
(EHR)
BMI, smoking Coronary artery
disease
NB, other
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 21: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/21.jpg)
21
Ahmad, Tariq, et
al. (2018)
USA Registry Age, alcohol intake, BMI,
education, gender, income,
marital status
Heart failure RF, other
Akay, Metin., et
al. (1992)
USA Cross-sectional Age, BMI, gender, smoking Coronary artery
disease
NN, other
Ambale-
Venkatesh,
Bharath, et al. (2017)
USA Cohort Age, alcohol intake, BMI,
education, gender, income, race,
smoking, other body metrics
Stroke, all types of
CVD, heart failure,
coronary artery disease, mortality
RF, lasso
Basu, Sanjay, et
al. (2017)
USA Cohort Age, gender, race, smoking Stroke, heart
failure, myocardial
infarction,
mortality
Lasso
Dinh, An, et al.
(2019)
USA Cross-sectional Age, alcohol intake, BMI,
gender, income, physical
activities, race
All types of CVD RF, GB, ensemble,
SVM
Dogan,
Meeshanthini V.,
et al. (2018)
USA Cohort Age, alcohol intake, BMI,
gender, physical activities,
smoking
Stroke RF, ensemble
Edwards,
Dorothy F., et al.
(1999)
USA Observational
(EHR)
Age, gender, race Mortality NN
Golas, Sara
Bersche, et al. (2018)
USA Observational
(EHR)
Education, gender, marital status,
occupation, race
Rehospitalization Deep unified networks,
GB
Gonzales, Tina
K., et al. (2017)
USA Cohort Alcohol intake, BMI, gender,
income, physical activities,
smoking
Myocardial
infarction
RF, other
Hsich, Eileen M.,
et al. (2019)
USA Observational
(survey)
Age, BMI, medical insurance,
race, smoking
Mortality RF
Hu, Danqing, et
al. (2016)
USA Clinical trial Age, BMI, gender, income, race,
residence,
Carotid
atherosclerosis
RF, NB, other
Imran, Tasnim F.,
et al. (2018)
USA Observational
(EHR)
Age, BMI, gender, race, smoking Stroke Lasso
Kerut, Edmund
Kenneth, et al.
(2019)
USA Observational
(survey)
Gender, race, smoking Abdominal aortic
aneurysm
NN
Kogan, Emily, et
al. (2020)
USA Observational
(EHR)
Gender, residence, area-level
social determinants
Stroke RF, NN, GB
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 22: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/22.jpg)
22
Leach, Heather
J., et al. (2016)
USA Cohort BMI, diet, income, physical
activities, smoking, built
environment
All types of CVD DT
Akay, Metin, et
al. (1994)
USA Observational
(EHR)
Gender, smoking Coronary artery
disease
NN
Mansoor, Hend,
et al. (2017)
USA Cohort Alcohol intake, income, medical
insurance, race, smoking,
substance abuse, area-level social determinants
Mortality RF
McGeachie,
Michael, et al.
(2009)
USA Cohort Age, BMI, education, gender,
smoking
Coronary artery
disease
BN
Mobley, Bert A,
et al. (1995)
USA Observational
(EHR)
Gender, medical insurance, race Length of stay in
hospital
NN
Motwani,
Manish, et
al. (2017)
USA Cohort Age, BMI, race, smoking Mortality Other ensemble
Ni, Yizhao, et
al. (2018)
USA Cohort Alcohol intake, gender, marital
status, occupation, race, smoking,
substance abuse
Stroke RF, NN, SVM
Ottenbacher,
Kenneth J., et al.
(2001)
USA Retrospective
Cohort
Age, gender, marital status,
medical insurance, occupation,
residence
Rehospitalization NN
Rasmy, Laila, et
al. (2018)
USA Observational
(EHR)
Age, gender, race Heart failure DL, ridge, lasso
Baldassarre, Damiano, et al.
(2004)
Italy Cross-sectional Age, BMI, gender, smoking All types of CVD NN, other
Bandyopadhyay,
Sunayan, et al.
(2015)
USA Observational
(EHR)
Age, BMI, gender, smoking All types of CVD BN
Beunza, Juan-
Jose, et al. (2019)
Spain Cohort Age, BMI, education, gender,
smoking
Coronary artery
disease
RF, NN, DT, AdaBoost,
SVM
Biesbroek,
Sander, et al.
(2015)
Netherlands Cohort Age, alcohol intake, diet,
education, gender, physical
activities, , smoking
Stroke, Coronary
artery disease
RF, DT, PCA, other
Brisimi,
Theodora S., et
al. (2018)
USA Observational
(EHR)
Age, gender, race, smoking, area-
level social determinants
Hospitalization RF, SVM
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 23: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/23.jpg)
23
Çelik, Güner, et
al. (2014)
Turkey Observational
(EHR)
Age, gender, smoking Stroke NN, other
Cheon, Songhee,
et al. (2019)
Korea Observational
(survey)
Age, gender, medical insurance,
area-level social determinants
Stroke RF, Adaboost, SVM,
DL, PCA, NB, other
Corsetti, James
P., et al. (2016)
USA Observational
(EHR)
BMI, race All types of CVD BN
Cox Jr, Louis
Anthony Tony.
(2017)
USA Observational
(survey)
Age, education, gender, income,
marital status, smoking, area-
level social determinants
Stroke RF, DT, BN
Cox Jr, Louis Anthony Tony.
(2018)
USA Observational (survey)
Age, education, income, gender, marital status, smoking, area-
level social determinants
All CVD BN
Daghistani,
Tahani A., et al.
(2019)
Saudi Arabia Observational
(EHR)
Age, gender, medical insurance,
smoking
Length of stay RF, NN, SVM, BN
Dai, Wuyang, et
al.
(2015)
USA Observational
(EHR)
Age, gender, race, smoking, area-
level social determinants
Hospitalization AdaBoost, SVM, NB,
other
Dogan,
Meeshanthini V.,
et al. 2018
USA Cohort Age, gender, smoking Coronary artery
disease
RF
Li, Yan, et al.
(2019)
USA Observational
(survey)
Age, alcohol intake, education,
gender, income, medical
insurance, physical activities,
race, smoking, area-level social determinants
Stroke, coronary
artery disease
RF
Li, Xuemeng, et
al. (2019)
China Observational
(survey)
Age, alcohol intake, BMI,
gender, physical activities,
smoking
Stroke RF, NN, DT, other
ensemble, BN, NB
Karaolis, M., et
al. (2008)
Cyprus Observational
(EHR)
Age, gender, smoking Coronary artery
disease
DT
Raihan, M., et al.
(2019)
Bangladesh Observational
(EHR)
Age, gender, physical activities,
smoking, substance abuse,
Coronary artery
disease
NN
Martínez-García,
M., et al. (2018)
Mexico Retrospective
Cohort
Age, alcohol intake, BMI,
education, gender, income,
marital status, medical insurance,
smoking
Myocardial
infarction,
rehospitalization
SVM
Miller, C. S., et
al. (2014)
USA Cross-sectional
case-controlled
Age, BMI, gender, race, smoking Myocardial
infarction
DT
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint
![Page 24: Machine Learning for Integrating Social Determinants in ......2020/09/11 · 6 1New York University, School of Global Public Health, Department of Epidemiology 7 2New York University,](https://reader034.fdocuments.net/reader034/viewer/2022052004/601762c4817b0f2c4600df7f/html5/thumbnails/24.jpg)
24
Eswaran,
Chikkannan, et
al. (2012)
Malaysia Cross-sectional Age, BMI, gender, smoking Myocardial
infarction
NN, BN, other
Ross, Elsie
Gyang, et al.
(2016)
USA Prospective
observational
Age, alcohol intake, BMI,
education, income, marital status,
occupation, physical activities,
race, residence, smoking
Mortality, other RF, ridge
Saito, Hiroshi, et al. (2019)
Japan Observational (survey)
Age, gender, occupation, residence, smoking
Rehospitalization Lasso
Van Loo, Hanna
M., et al. (2014)
Netherlands Observational
study
Age, BMI, gender, smoking Mortality Lasso
Vedomske,
Michael A. et al.
(2013)
USA Observational
(EHR)
Age, gender, medical insurance,
race
Rehospitalization RF
Ayyagari,
Rajeev, et al.
(2014)
USA Retrospective
Cohort
Age, gender, race, smoking Stroke Lasso
Vistisen, Dorte,
et al. (2016)
Denmark Cohort Age, alcohol intake, BMI,
gender, physical activities,
smoking
Stroke RF, DT
Wu, Yafei, et al.
(2020)
China Prospective
observational
Age, alcohol intake, gender,
smoking
Stroke RF, SVM, lasso
Yu, Shipeng, et
al. (2015)
USA Retrospective
Cohort
Age, gender, marital status, race Rehospitalization SVM, lasso
Zhuang,
Xiaodong, et al. (2018)
China Observational
(survey)
Age, BMI, gender, income, race,
smoking, area-level social determinants
All types of CVD RF
Abbreviations used: NN: neural network, RF: random forest, SVM: support vector machine, DT: decision tree, NB: naïve Bayes, BN: Bayesian network, Lasso: lasso regression , Ridge: ridge regression, QDA: quadratic discriminant analysis
*“Other” algorithms used include: multilayer perceptron, maximum entropy, adversarial network, k-nearest neighbors, recursive partitioning, clustering,
quadratic discriminant, RBF
**Other ensemble: methods other than: Adaboost, Gradient boosting
. CC-BY-NC 4.0 International licenseIt is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint The copyright holder for thisthis version posted September 13, 2020. ; https://doi.org/10.1101/2020.09.11.20192989doi: medRxiv preprint