Be boundless Advancing data-intensive discovery in all fields...Epic fail, global impact •2010...
Transcript of Be boundless Advancing data-intensive discovery in all fields...Epic fail, global impact •2010...
Reproducibility:failures&futures
DavidA.C.BeckChemicalEngineering&eScience Institute
Advancingdata-intensivediscoveryinallfields
Knowledgeandsolutionsforachangingworld
Beboundless
Reproducibility
• Cananexperimentalresultbereproduced?• Reproducibilitycomesindifferentflavors
– Samedata,sameanalyses(Reproducible)– Similardata,sameanalyses(Replicability)– Samedata,similaranalyses(Robustness)– Others?
– TodayI’lluseReproducibility tocoverallofthese
Reproducibility
• Cananexperimentalresultbereproduced?– Medicalscience
• Drugtrial,Doesadrugprovideabenefit?Isitharmful?• Isthereageneticassociationwithacancer?
– Economics• Isausteritythebestwaytogetanationaleconomyoutofrecession?
• Isa2billiondollarindustrialplantafinanciallysensibleinvestment?
Reproducibility
• Cananexperimentalresultbereproduced?– Socialscience
• Doesanin-personconversationchangeviewsonmarriageequality?
– Engineering• Doesawastewatertreatmentstrategyremovemicro-pollutantsdowntoasafelevel?
Reproducibility
• Cananexperimentalresultbereproduced?– Theaboveexamplesallhavedatasciencecomponents
Isn’tjustacademicscience&engineering!
Reproducibility
• Cananexperimentalresultbereproduced?– Marketing
• Doloyaltyprogramsalterbuyerbehavior?• Doesremovingfieldsfromaregistrationformincreaseusercompletion?
• Doesawebpagelayoutincreasepurchasing?• Sidebar:
– Toseesomeofhowthisworks,checkoutthishowto:» https://webdesign.tutsplus.com/articles/split-testing-with-google-analytics-experiments--webdesign-7879
• Otherexamples?
EpicfailSchadenfreude*parade
*a feeling of joy that comes from seeing or hearing about another person's troubles or failures. - Wikipedia
Epicfail
• In2011,Bayer(pharmaceuticals)triedtoreplicate67importantpapers– Oncology– Women’shealth– Cardiovascularmedicine
Onlyabout21%werereproducible
Begley,C.G.;Ellis,L.M.(2012)."Drugdevelopment:Raisestandardsforpreclinicalcancerresearch".Nature 483 (7391):531–533.
Epicfail,part2
• In2012,AmgenpublishedareportinNature– Examined53landmarkstudiesincancer
6of53(11%)werereproducible
Begley,C.G.;Ellis,L.M.(2012)."Drugdevelopment:Raisestandardsforpreclinicalcancerresearch".Nature 483 (7391):531–533.
Epicfail,part3Primer:microarrays
Miller, M. B. and Y. W. Tang (2009). "Basic concepts of microarrays and potential applications in clinical microbiology." Clin Microbiol Rev 22(4): 611-633.
Epicfail,part3
Ionnidis,P.etal.Repeatabilityofpublishedmicroarraygeneexpressionanalyses.NatGen,41:2,Feb2009
Attempttoreproduce18tablesandfigurespaperspublishedinNatureGeneticsusingmicroarrays
Epicfailsinmedicine
• Whataretherepercussionsofirreproducibleresultsinmedicine?
– Biotechcompanies– Government– People?
Epicfail,globalimpact
• Grabyourway-backhatandputiton!
Epicfail,globalimpact
• Grabyourway-backhatandputiton!
Epicfail,globalimpact
• 2010paperbyReinhart&Rogoff“GrowthinaTimeofDebt”– …highdebt/GDPlevels(90percentandabove)areassociatedwithnotablylowergrowthoutcomes.
– DebttoGDPratiosover90%havereadGDPgrowthof-0.1%
– Seldomdocountries“grow”theirwayoutofdebts.
Reinhart,CarmenM.,andKennethS.Rogoff.2010."GrowthinaTimeofDebt."AmericanEconomicReview,100(2):573-78.
Epicfail,globalimpact
• Paperwaswidelycitedby– Politicalparties– Governments– Internationallendingagencies
• Toshowthatausterity wasthesolutiontotheglobalrecession
• Evenpartofthe2012USpresidentialelection!
Reinhart,CarmenM.,andKennethS.Rogoff.2010."GrowthinaTimeofDebt."AmericanEconomicReview,100(2):573-78.
Epicfail,globalimpact
• UMassAmherstGraduatestudentThomasHerndon– Triedtoreproducetheresultsofthepaperforaclass:couldn’t
– Requestedthe‘code’forthecomputationsfromR&R:gotanExcelspreadsheet
– Foundmultipleerrors
Reinhart,CarmenM.,andKennethS.Rogoff.2010."GrowthinaTimeofDebt."AmericanEconomicReview,100(2):573-78.ThomasHerndon,MichaelAsh&RobertPollin,DoesHighPublicDebtConsistentlyStifleEconomicGrowth?ACritiqueofReinhartandRogoff
Epicfail,globalimpact
• UMassAmherstGraduatestudentThomasHerndon– Foundmultipleerrors
Reinhart,CarmenM.,andKennethS.Rogoff.2010."GrowthinaTimeofDebt."AmericanEconomicReview,100(2):573-78.ThomasHerndon,MichaelAsh&RobertPollin,DoesHighPublicDebtConsistentlyStifleEconomicGrowth?ACritiqueofReinhartandRogoff
Codingerrors,selectiveexclusionofavailabledata,andunconventionalweightingofsummarystatisticsleadtoseriouserrorsthatinaccuratelyrepresenttherelationshipbetweenpublicdebtandGDPgrowth.
Epicfail,globalimpact
• Herndonfixedtheerrorsandreexaminedclaims• Originalclaims
– DebttoGDPratiosover90%haverealGDPgrowthof-0.1%
– Inarecession:Austeritygood,spendingbad• Modifiedclaims
– DebttoGDPratiosover90%haverealGDPgrowthof2.2%
– Inarecession:SpendinggoodReinhart,CarmenM.,andKennethS.Rogoff.2010."GrowthinaTimeofDebt."AmericanEconomicReview,100(2):573-78.ThomasHerndon,MichaelAsh&RobertPollin,DoesHighPublicDebtConsistentlyStifleEconomicGrowth?ACritiqueofReinhartandRogoff
Epicfail,globalimpact
• Grabyourway-backhatandputiton!
Epicfail,globalimpact
• WhateffectdidtheincorrectR&Rpaperhave?
Epicfailure,part4
http://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248
Reproducibility
• Whydowecare?
“Non-reproduciblesingleoccurrencesareofnosignificancetoscience.”
– KarlPopper
Popper, K. R. 1959. The logic of scientific discovery. Hutchinson, London, United Kingdom.
Scienceincrisis?
Baker,M.1,500scientistsliftthelidonreproducibility.Nature 533,452-454(2016).
Reproducibility:Thingsarebad
Whyisthishappening?
• Socialfactors,e.g.– Fraud,misconduct– Pressuretopublish
• p-hacking• Poorexperimentaldesign
– Smalleffectsize– Smallsamplesize
• Datanotdisclosed• Methodsnotdisclosedorproperlydescribed
– Softwarenotavailable
ImportantbutnotDataSciencerelated.WEAREWORKINGONTHESE!
p-hacking
• Doastudytotestsomehypothesis– E.g.anappleadaykeepstheDr.away
• Useap-valueof0.05– i.e.5%chanceofseeingadifferenceatleastasbigaswehave,bychancealone
• Perform1000sofstatisticaltests• Whathappens?
~50significantresultsbychancealone
1. Simmons, J.P., N.D. Nelson, and U. Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22(11):1359-1366.
p-hacking• Testverylargenumberofhypothesisonadataset
searchingforanystatisticallysignificanteffect• Goesbymanynamesindifferentdisciplines
– Multiplecomparisons(1950s,moststatisticians),– Filedrawerproblem(Rosenthal,1979),– Significancequesting(RothmanandBoice,1979),– Datamining,dredging,torturing(Mills,1993),– Datasnooping(White,2000),– Selectiveoutcomereporting(Chanetal.,2004),– Bias(Ioannidis,2005),– Hiddenmultiplicity(Berry,2007),– Specificationsearching(Leamer,1978),and– p-hacking(Simmonsetal.,2011).
https://www.nap.edu/read/21915/chapter/4#43
p-hacking
• Isthisintentionallyevil?• Whyisn’titmisconduct?
• Myopinion:– Mosttimes,probablynot– Reflectslackofunderstandingabouthypothesistesting
p-hacking
• Whatisbeingdoneaboutit?– Registerthestudybeforehand“Preregistration”– Leteveryoneknowwhattheprecisehypothesisbeingtestedbeforedataarecollected
– Getfreefromthetyrannyofthep-value– Betterstatisticseducation
Poorexperimentaldesign
• Wanttotesttoxicityofmynewfluorescentbrowndye
Poorexperimentaldesign
• Wanttotesttoxicityofmynewfluorescentbrowndye– Feedsometo10people– Watchhowlongtheylive
10subjects,day0
Poorexperimentaldesign
• Whataresomeproblemswiththisexperimentaldesign?
– Controlgroup?
WHATDOYOUMEANYOUFORGOTTHECONTROL?
10subjects,nodye
Similardemographics
Poorexperimentaldesign
• Isittoxic?
10subjects,day0 10subjects,day1
*Averagelifespaninusis78years*Averagelifespaninusis78yearswithastandarddeviationof15years
Poorexperimentaldesign
• Isittoxic?
10subjects,day0 10subjects,50years
*Averagelifespaninusis78yearswithastandarddeviationof15years
Poorexperimentaldesign
• Isittoxic?
10subjects,day0 10subjects,50years
*Averagelifespaninusis78yearswithastandarddeviationof15years
Poorexperimentaldesign
• Whataresomeproblemswiththisexperimentaldesign?– Whatistheeffectsizeyouwanttobeabletomeasure?E.g.howmanyyearsdifference?
– Whatisthesamplesizerequiredtoseethateffect?
• Smallsamplecanseeaneffectduetochance– Won’tbereproducible!
Poorexperimentaldesign
• Whatisbeingdoneaboutit?– Betterstatisticseducation– Replicatesignificantresultswithsmalleffectsizewithwaymoresamples
SAMPLES
Datadisclosure
• Dataunavailable– Lostordestroyed– Streamingdatatoobigtostore
• Rawdatanotkept,onlyprocessed• Dataintentionallynotshared
– Bylaw(FERPA,HIPPA)– Corporatedata(e.g.twitter,JSTOR)– Somejerkjustwon’tshare
Datadisclosure
• Dataunavailable– Lostordestroyed– Streamingdatatoobigtostore
• Rawdatanotkept,onlyprocessed• Dataintentionallynotshared
– Bylaw(FERPA,HIPPA)– Corporatedata(e.g.twitter,JSTOR)– Somejerkjustwon’tshare
Datadisclosure
• Whatisbeingdoneaboutit?– Federalfundingagenciesnowrequiredatasharing– Sciencejournalsrequireopendata– Depositrawdataassoonascollected
• Similartopreregistration
– Opendatabadgesforresearchers– Datasharingrepositories
• NationalCenterforBiotechnologyInformation• Dryad(20GBlimit,$100/10GBbeyond)
Methods
• Poorlywrittenmethods– Stepsmissing
• Intentionalmethodsomissions– Toprotectamonopolyonanexperimentalprocedure
• Thefix:– Betterpeerreviewinscience– Bettercommunicationskillseducationinbusiness
Software
• Softwareunavailable– Why?
• Whataresomeotherothersoftwareissues?– Un-runnable,i.e.broken– Notdocumented– Dependenciesnotknownorgiven– Hardwareconstraints
Software
• Whatisbeingdoneaboutit?– Useopensourcesoftware– Virtualenvironments
• UsesomethingthatcanFREEZE thestateofthesoftwareandhardware
• Dockerimages• AmazonMachineImages(AMI)• Virtualmachinesgenerally
– Educatingscientistsinsoftwareengineering• Versioncontrol,documentation,testing,…
Resources
• eScienceInstituteReproducbilityGroup– http://uwescience.github.io/reproducible/
• BerkeleyInstituteforDataScienceReproStuff– https://bids.berkeley.edu/working-groups/reproducibility-and-open-science
• CenterforOpenScience– https://cos.io
• CourserafromJHU– https://www.coursera.org/learn/reproducible-research
• Otherlinksinthispresentation
Thankyou!
• Seeyounextweekforlastseminar!
• CSE491folks:– Don’tforgettotakethequiz!– Don’tforgettotakethequiz!– Don’tforgettotakethequiz!– Don’tforgettotakethequiz!– Don’tforgettotakethequiz!– Don’tforgettotakethequiz!