ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... ·...

13
ORBITA and coronary stents: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin and Brahmajee K Nallamothu 19 Mar 2019 Department of Statistics and Political Science, Columbia University, New York City, NY, United States (Andrew Gelman, professor); Clinical Epidemiology & Biostatistics, Murdoch Children’s Research Institute, Melbourne School of Population and Global Health and Department of Paediatrics, University of Melbourne, Melbourne, Australia (John Carlin, professor); Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, United States (Brahmajee K Nallamothu, professor); Correspondence to: Brahmajee K Nallamothu [email protected] Acknowledgements: We thank Doug Helmreich for bringing this example to our attention, Shira Mitchell for helpful comments, and the Office of Naval Research, Defense Advanced Research Project Agency, and the National Institutes of Health for partial support of this work. Competing interests: Dr. Gelman and Dr. Carlin report no competing interests. Dr. Nallamothu is an interventional cardiologist and Editor-in-Chief of a journal of the American Heart Association but otherwise has no competing interests. Word Count: 3078

Transcript of ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... ·...

Page 1: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

ORBITAandcoronarystents:

Acasestudyintheanalysisandreportingofclinicaltrials

AndrewGelman,JohnCarlinandBrahmajeeKNallamothu

19Mar2019

DepartmentofStatisticsandPoliticalScience,ColumbiaUniversity,NewYorkCity,NY,UnitedStates(AndrewGelman,professor);ClinicalEpidemiology&Biostatistics,MurdochChildren’sResearchInstitute,MelbourneSchoolofPopulationandGlobalHealthandDepartmentofPaediatrics,UniversityofMelbourne,Melbourne,Australia(JohnCarlin,professor);DepartmentofInternalMedicine,UniversityofMichiganMedicalSchool,AnnArbor,MI,UnitedStates(BrahmajeeKNallamothu,professor);Correspondenceto:[email protected]:WethankDougHelmreichforbringingthisexampletoourattention,ShiraMitchellforhelpfulcomments,andtheOfficeofNavalResearch,DefenseAdvancedResearchProjectAgency,andtheNationalInstitutesofHealthforpartialsupportofthiswork.Competinginterests:Dr.GelmanandDr.Carlinreportnocompetinginterests.Dr.NallamothuisaninterventionalcardiologistandEditor-in-ChiefofajournaloftheAmericanHeartAssociationbutotherwisehasnocompetinginterests.WordCount:3078

Page 2: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

1.Introduction

Al-Lameeetal.(2017)reportresultsfromarandomizedcontrolledtrialofpercutaneouscoronaryinterventionusingcoronarystentsforstableangina.Thestudy,calledORBITA(ObjectiveRandomisedBlindedInvestigationWithOptimalMedicalTherapyofAngioplastyinStableAngina),includedapproximately200patientsandwasnotableforbeingablindedexperimentinwhichhalfthepatientsreceivedstentsandhalfreceivedaplaceboprocedureinwhichashamoperationwasperformed.Infollow-up,patientswereaskedtoguesstheirtreatmentandofthosewhowerewillingtoguessonly56%guessedcorrectly,indicatingthattheblindingwaslargelysuccessful.

Thesummaryfindingfromthestudywasthatstentingdidnot“increaseexercisetimebymorethantheeffectofaplaceboprocedure”withthemeandifferenceinthisprimaryoutcomebetweentreatmentandcontrolgroupsreportedas16.6secondswithastandarderrorof9.8(95%confidenceinterval,−8.9to+42.0s)andap-valueof0.20.IntheNewYorkTimes,Kolata(2017)reportedthefindingas“unbelievable,”remarkingthatit“stunnedleadingcardiologistsbycounteringdecadesofclinicalexperience.”Indeed,oneofus(BKN)wasquotedasbeinghumbledbythefinding,asmanycardiologistshadexpectedapositiveresult.Ontheotherhand,Kolatanoted,“therehavelongbeenquestionsabout[stents’]effectiveness.”Attheveryleast,thewillingnessofdoctorsandpatientstoparticipateinacontrolledtrialwithaplaceboproceduresuggestssomedegreeofexistingskepticismandclinicalequipoise.

ORBITAwasalandmarktrialduetoitsinnovativeuseofblindingforasurgicalprocedure.However,substantialquestionsremainregardingtheroleofstentinginstableangina.Itisawell-knownstatisticalfallacytotakearesultthatisnotstatisticallysignificantandreportitaszero,aswasessentiallydoneherebasedonthep-valueof0.20fortheprimaryoutcome.Hadthiscomparisonhappenedtoproduceap-valueof0.04,wouldtheheadlinehavebeen,“‘Believable’:HeartStentsIndeedEaseChestPain”?Alotofcertaintyseemstobehangingonasmallbitofdata.

ThepurposeofthispaperistotakeacloserlookatthelackofstatisticalsignificanceinORBITAandthelargerquestionsitraisesaboutstatisticalanalyses,statisticallybaseddecisionmakingandthereportingofclinicaltrials.ThisreviewofORBITAisparticularlytimelyinthecontextofthewidelypublicizedstatementreleasedbytheAmericanStatisticalAssociationthatcautionedagainsttheuseofsharpthresholdsfortheinterpretationofp-values(WassersteinandLazar,2016).Weendbyofferingpotentialrecommendationstoimprovereporting.

Page 3: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

2.StatisticalanalysisoftheORBITAtrial

Adjustingforbaselinedifferences.InORBITA,exercisetimeinastandardizedtreadmilltest—theprimaryoutcomeinthepreregistereddesign—increasedonaverageby28.4sinthetreatmentgroupcomparedtoanincreaseofonly11.8sinthecontrolgroup.Asnotedabove,thisdifferencewasnotstatisticallysignificantatasignificancethresholdof0.05.Hence,followingconventionalrulesofscientificreportingitwastreatedaszero—aninstanceoftheregrettablycommonstatisticalfallacyofpresentingnon-statistically-significantresultsasconfirmationofthenullhypothesisofnodifference.

However,theestimateusinggaininexercisetimedoesnotmakefulluseofthedatathatwereavailableondifferencesbetweenthecomparisongroupsatbaseline(VickersandAltman,2001,Harrell,2017a).AscanbeseenintheTable,thetreatmentandplacebogroupsdifferintheirpre-treatmentlevelsofexercisetime,withmeanvaluesof528.0and490.0s,respectively.Thissortofdifferenceisfine—randomizationassuresbalanceonlyinexpectation—butitisimportanttoadjustforthisdiscrepancyinestimatingthetreatmenteffect.Inthepublishedpaper,theadjustmentwasperformedbysimplesubtractionofthepre-treatmentvalues:

Gainscoreestimatedeffect: (ypost−ypre)T−(ypost−ypre)

C, (1)

Butthisover-correctsfordifferencesinpre-testscores,becauseofthefamiliarphenomenonof“regressiontothemean”—justfromnaturalvariation,wewouldexpectpatientswithlowerscoresatbaselinetoimprove,relativetotheaverage,andpatientswithhigherscorestoregressdownward.Theoptimallinearestimateofthetreatmenteffectisactually:

Adjustedestimate: (ypost−βypre)T−(ypost−βypre)

C, (2)

whereβisthecoefficientofypreinaleast-squaresregressionofypostonypre,also

controllingforthetreatmentindicator.Theestimatein(1)isaspecialcaseoftheregressionestimate(2)correspondingtoβ=1.Giventhatthepre-testandpost-testmeasurementshavenearlyidenticalvariances(ascanbeseenintheTable),wecananticipatethattheoptimalβwillbelessthan1,whichwillreducethecorrectionfordifferenceinpre-testandthusincreasetheestimatedtreatmenteffectwhilealsodecreasingthestandarderror.Asaresult,anadjustedanalysisofthesedatawouldbeexpectedtoproducealowerp-value.

Theadjustedregressionanalysiscanbedoneusingtheinformationavailableinthe

Page 4: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

Table,asexplainedindetailinBox1.Thep-valuefromthisadjustedanalysisis0.09:asexpected,lowerthanthep=0.20fromtheunadjustedanalysis.

Alternativereporting.Despitemovingclosertotheconventional0.05threshold,thep-valueof0.09remainsonthesideoftheconventionallevelofsignificancewhereonewouldnotrejectthenullhypothesis.Apotentialblockbusterreversalwithanadjustedanalysis—“StatisticalSleuthsTurnReportedNullEffectintoaStatisticallySignificantEffect”—doesnotquitematerialize.

Yetwithindifferentconventionsforscientificreporting,thisexperimentcouldhavebeenpresentedaspositiveevidenceinfavorofstents.Insomesettings,ap-valueof0.09isconsideredtobestatisticallysignificant;forexample,inarecentsocialscienceexperimentpublishedintheProceedingsoftheNationalAcademyofSciences,Sands(2017)presentedacausaleffectbasedonap-valueoflessthan0.10,andthiswasenoughforpublicationinatopjournalandinthepopularpress,with,forexample,thatworkmentioneduncriticallyinthemediaoutletVoxwithoutanyconcernregardingsignificancelevels(Resnick,2017).Bycontrast,VoxreportedthatORBITAshowedstentstobe“dubioustreatments,”aprimeexampleofthe“epidemicofunnecessarymedicaltreatments”(Belluz,2017).HadAl-Lameeetal.performedtheadjustedanalysiswiththeirdataandpublishedinPNASratherthantheLancet,couldtheyhaveconfidentlyreportedacausaleffectofstentsonexercisetime?

Onealsocouldconsiderotheroptions.Forexample,onecouldjusttakethesummariesfromtheTableandreportthemasfollows:Withstents,thereisastatisticallysignificantimprovementof28.4seconds(witha95%confidenceintervalthatclearlyexcludeszero);withplacebo,thereisnostatisticallysignificantchange.Thus,thedatashowthatstentsworkandplacebohasnoeffect.Suchaconclusionwouldbeinappropriate,asitwouldbemakingtheerrorofcomparingsignificancetonon-significance(GelmanandStern,2006).Butthiserrorappearsinpublishedpapersallthetime,includingintopjournals,soitrepresentsanotherwaythedatacouldhavebeenreported(BlandandAltman2015,Allisonetal.,2016).

OurpointhereisnotatalltosuggestthatAl-Lameeelal.engagedinreverse“p-hacking”(Simmons,Nelson,andSimonsohn,2011),choosingananalysisthatproducedanexplosivenullresult.Infact,theauthorsshouldbecongratulatedforpre-registeringtheirstudyandpublishingtheirprotocolpriortoperformingtheiranalyses.Ratherwewishtoemphasizetheflexibilityinherentbothindataanalysisandreporting—eveninthecaseofacleanrandomizedexperiment.Wearepointingoutthepotentialfragilityofthestents-didn’t-workstoryinthiscase.Existingdatacouldeasilyhavebeenpresented

Page 5: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

asasuccessforstentscomparedtoplacebobyauthorswhowereaimingforthatnarrativeandperformingmoreorlessreasonableanalyses.

Fragilityofthefindings.Howsensitiveweretheresultstoslightchangesinthedata?Tobetterunderstandthiscriticalpoint,onecanperformasimplebootstrapanalysis,computingtheresultsthatwouldhavebeenobtainedfromreanalyzingthedata1000times,eachtimeresamplingpatientsfromtheexistingexperimentwithreplacement(Efron,1979).Asrawdatawerenotavailabletous,weapproximatedusingthenormaldistributionbasedontheobservedz-scoreof1.7.Theresultwasthat,in40%ofthesimulations,stentsoutperformedplaceboatatraditionallevelofstatisticalsignificance.Thisisnottosaythatstentsreallyarebetterthanplacebo–thedataalsoappearconsistentwithanulleffect.Thetake-homepointofthisexperimentisthattheresultscouldeasilyhavegone“theotherway,”whenreportingisforcedintoabinaryclassificationofstatisticalsignificance,formanydifferentreasons.

3.Designofthetrialandclinicalsignificance

Inajustificationfortheirstudydesignandsamplesize,Al-Lameeetal.(2017)wrote:“Evidencefromplacebo-controlledrandomisedcontrolledtrialsshowsthatsingleantianginaltherapiesprovideimprovementsinexercisetimeof48–55s...Giventhepreviousevidence,ORBITAwasconservativelydesignedtobeabletodetectaneffectsizeof30s.”Theestimatedeffectof21swithstandarderror12sisconsistentwiththe“conservative”effectsizeestimateof30sgiveninthepublishedarticle.Soalthoughtheexperimentalresultsareconsistentwithanulleffect,theyareevenmoreconsistentwithasmallpositiveeffect.

Onemightask,however,abouttheclinicalsignificanceofsuchatreatmenteffect.,whichwecandiscusswithoutrelevancetop-valuesorstatisticalsignificance.Forsimplicity,supposewetakethepointestimatefromthedataatfacevalue.Howshouldwethinkaboutanincreaseinaverageexercisetimeof21s?Onewaytoconceptualizethisisintermsofpercentiles.Thedatashowapre-randomizationdistribution(averagingthetreatmentandcontrolgroups)withameanof509andastandarddeviationof188.Assuminganormalapproximation,anincreaseinexercisetimeof21sfrom509to530wouldtakeapatientfromthe50thpercentiletothe54thpercentileofthedistribution.Lookedatthatway,itwouldbehardtogetexcitedaboutthiseffectsize,evenifitwerearealpopulationshift.

Beyondexercisetime,therewereothersignalsfromORBITAthatseemedtosuggestconsistentimprovementsinthephysiologicalparameterofischemiathroughendpoints

Page 6: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

suchasfractionalflowreserve,instantaneouswave-freeratio,andstressecho.Actually,findingsfromthestressechohighlightapotentiallyimportantavenueintoanalternativepresentationoftheseresults.Thereisnoquestionthatsomephysiologicalchangesarebeingmadebystents,withverylargeandhighlystatisticallysignificant(p<0.001)effectsseenonechomeasures.Asisoftenthecase,thenullhypothesisthatthesephysicalchangesshouldmakeabsolutelyzerodifferencetoanydownstreamclinicaloutcomesseemsfarfetched.Thus,thesensiblequestiontoaskis“Howlargearetheclinicaldifferencesobserved?”,not“Howsurprisingistheobservedmeandifferenceundera[spurious]nullhypothesis?”Thesimpletextbookwaytotacklethisquestionistoreportconfidenceintervals(CIs)aroundthemeandifferencesandnottofocusonwhethertheintervalshappentoincludezero.Thefactthatthestandard95%CIfortheprimaryoutcomecomfortablyincludesthetargeteffectsizeof30ssuggeststhisvalueshouldbenomore“rejected”thanthenullvalue.Furthermore,withoutthelongitudinaldatatoobservetheoutcomesthatmattermosttopatients—healthandlengthoflife—muchremainsuncertain.

Thelargerquestionhastobeaboutbalancingthelong-termbenefitsofstentswithrisksoftheoperation.Itdoesnotseemreasonableforapersontorisklifeandhealthbysubmittingtoasurgicalprocedurejustforapotentialbenefitof21secondsofexercisetimeonastandardizedtreadmilltest—orevenahypothesizedlargerbenefitof50seconds,whichwouldstillonlyrepresenta10%improvementforanaveragepatientinthisstudy.Yetmaybea5-10%increaseisconsequentialinthiscaseasitcouldimprovequalityoflifeforapatientoutsideofthisartificialsetting.Perhapsthissmallgaininexercisetimeisassociatedwiththeneedforlessmedications,fewerfunctionallimitationsorgreatermobility.Ifso,however,onemightpostulatethisgainwouldhavebeenapparentinassessmentsofanginaburden,anditwasnot.

Partofthebiggerconcernhereisthatthesepatientswerealreadydoingprettywellonmedications—thatis,theyalreadyhadalowsymptomfrequencybeforestenting.Forexample,anginafrequencyasmeasuredbytheSeattleAnginaQuestionnairewas63.2afteroptimizingmedicationsandbeforestentinginthetreatmentgroup.Thisroughlytranslatesas“monthly”angina(JohnSpertus,personalcommunication).Howdoesastudywithafollow-upofjust6weeksexpecttoimproveanoutcomethathappensthisinfrequently?Infact,oneofthegreatdebatessurroundingORBITAisthatthosewhodiscountthetrialsuggestitenrolledpatientswhotypicallydonotreceivestentsinroutinepractice.ThosewhobelieveORBITAisagame-changerarguethattheselesssymptomaticpatientsactuallymakeupalargeproportionofthosereceivingstents—andthatiswhywehavesuchalargeproblemwiththeiroveruse.

Page 7: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

Finally,arestentsreallybeinggiventopatientswithstableanginajusttoimprovefitnessortoreducesymptoms?Oristhereacontinuedexpectationthatstentshavelong-termbenefitsforpatients,despiteearlierdatafromstudiesliketheClinicalOutcomesUtilizingRevascularizationandAggressiveDrugEvaluation(COURAGE)study(Boden,2007)?Thiswouldseemtobethekeyquestion,inwhichcasetheshort-termeffects,orlackthereof,foundintheORBITAstudyarelargelyirrelevant.Otherlargertrials,suchasInternationalStudyofComparativeHealthEffectivenessWithMedicalandInvasiveApproaches(ISCHEMIA,see:https://clinicaltrials.gov/ct2/show/NCT01471522)areconsideringthismorefundamentalquestionbutwillnothaveaplaceboprocedure.

4.Recommendationsforstatisticalreportingoftrials

Thesearchforbettermedicalcareisanincrementalprocess,withincompleteevidenceaccumulatingovertime.Thereisunfortunatelyafundamentalincompatibilitybetweenthatcoreideaandthecommonpractice,bothinmedicaljournalsandthenewsmedia,ofup-or-downreportingofindividualstudiesbasedonstatisticalsignificance.WeoffersomerecommendationssummarizedinBox2thatwebelievewillbehelpfultoauthorsandeditorsmovingforward.

Atthispointit’snotclearhowbesttoincorporatethisrecentexperimentintoroutinepracticedespiteitsnovelandprovocativestudydesign,sotheforcedreportingoftheprimaryoutcomeas“positive”or“negative”isunhelpful.AreanalysisofthesummarydatafromAl-Lameeetal.(2017)revealsastrongerestimatedeffectthatisclosertotheconventionalboundaryofstatisticalsignificance,indicatingthatthestudycouldrathereasilyhavegeneratedandreportedevidenceinfavorof,ratherthanagainst,theeffectivenessofstentsforpatientswithstableangina.Andfromourbriefflurryofexcitementoverthepossibilitythatasimplereanalysiscouldchangethesignificancelevel,weareagainremindedofthesensitivityofheadlineconclusionstodecisionsinstatisticalanalysis.Inanycase,though,theobservedincreasesinexercisetime,evenifstatisticallysignificant,donotappearatfirstglancetobeofmuchclinicalimportance,comparedtothemuchmorerelevantlong-termhealthoutcomesthatremainuncertain.

Inthedesign,evaluation,andreportingofexperimentalstudies,thereisanormoffocusingonthestatisticalsignificanceofaprimaryoutcome–inthiscase,changeinaverageexercisetimeonastandardizedtreadmilltest.Ingeneral,theresultingconclusionswillbefragilebecausep-valuesareextremelynoisyunlesstheunderlyingeffectishuge.Anexperimentmaybedesignedtohave80%power,butthisdoesn’teliminatethefragility,asillustratedbyourbootstrapre-analysis.Itisalsooftenconditionalonanoverestimatedeffectsize(SchulzandGrimes,2005,Gelman,2018)

Page 8: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

anddoesnotaddresstheimportantquestionofvariationintreatmenteffects.ExaminationoftheLancetpaperanditsreceptioninthenewsmediasuggeststhatitexhibitsaclassiccaseof“significantitis”or“dichotomania”(Greenland,2017),withfrequentrepetitionofphrasessuchas“therewasnosignificantdifference.”Wesuggestthatthephraseusedbytheseauthors,“Wedeemedapvaluelessthan0.05tobesignificant,”shouldbestronglydiscouraged,ratherthanactivelydemandedasiscurrentlythecasebymanyjournaleditors.Totheircredit,theORBITAauthorsthemselveshaverecognizedthesecriticalissues(seehttps://twitter.com/ProfDFrancis/status/952008644018753536).

Inthecaseofstents,animportantdisconnectappearsbetweenthefindingsemphasizedintherecentstudy—howeverpresented—andthelargercontextoftreatmentsforheartdisease.Fromastatisticalperspective,thisappearstoreflectaproblemwiththeframingofclinicaltrialsasattemptstodiscoverwhetheratreatmenthasastatisticallysignificanteffect—commonlymisinterpretedtobeequivalenttoareal(non-zero)populationmeandifference.Powercalculationsareusedinanattempttoassurestableestimatesandagoodchanceoftheexperimentbeing“successful”,althoughwithintheseconstraintstherecanbeapushtowardconvenienceratherthanrelevanceofoutcomemeasures—whichisperhapsaninevitablecompromise.ORBITAshowsustheconfusionthatariseswhenatreatmentisreportedasasuccessorfailureinstatisticaltermsthataredivorcedfromclinicalcontext.

ORBITAwasnevermeanttobedefinitiveinabroadsense—itwasdesignedtofindastatisticallysignificantphysiologicaleffectofstentingonmeanexercisetime,withoutclarityontheclinicalrelevanceofanticipatedeffectsonthisoutcomemeasure.Indeed,alikelyreasonwhythestudywaslimitedinitssizeanddesignofthesesurrogateoutcomeswasbecausethisisallthatcouldhavepassedanethicalboardgiventhenoveltyoftheplaceboprocedureinthissetting.FurtherbackgroundonthesetopicsfromDarrelFrancis,theseniorauthoronthestudy,appearsatHarrell(2017b).Beyondimmediatenewsreports,onepositiveimpactofORBITAisthatbiggertrialsofstentingwithplaceboproceduresarenowmuchmorelikelywithamoredefinitivesetofmeasuredoutcomesthataremeaningfulforpatients.

Wedon’tseeanyeasyanswershere—long-termoutcomeswouldrequirealong-termstudy,afterall,andclinicaldecisionsneedtobemaderightaway,everyday—butperhapswecanuseourexaminationofthisparticularstudyanditsreportingtosuggestpracticaldirectionsforimprovementinhearttreatmentstudiesandinthedesignandreportingofclinicaltrialsmoregenerally.

Page 9: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

References

Al-Lamee,R.,Thompson,D.,Dehbi,H.M.,Sen,S.,Tang,K.,Davies,J.,Keeble,T.,Mielewczik,M.,Kaprielian,R.,Malik,I.S.,Nijjer,S.S.,Petraco,R.,Cook,C.,Ahmad,Y.,Howard,J.,Baker,C.,Sharp,A.,Gerber,R.,Talwar,S.,Assomull,R.,Mayet,J.,Wensel,R.,Collier,D.,Shun-Shin,M.,Thom,S.A.,Davies,J.E.,andFrancis,D.P.(2017).Percutaneouscoronaryinterventioninstableangina(ORBITA):adouble-blind,randomisedcontrolledtrial.Lancet.http://dx.doi.org/10.1016/S0140-6736(17)32714-9

Allison,D.B.,Brown,A.W.,George,B.J.,Kaiser,K.A.(2016).Reproducibility:Atragedyoferrors.Nature530,27–29.doi:10.1038/530027a.PubMedPMID:26842041;PubMedCentralPMCID:PMC4831566.

AmericanCollegeofCardiology(2017).ORBITA:Firstplacebo-controlledrandomizedtrialofPCIinCADpatients.ACCNews,2Nov.http://www.acc.org/latest-in-cardiology/articles/2017/10/27/13/34/thurs-1150am-orbita-tct-2017

Belluz,J.(2017).Thousandsofheartpatientsgetstentsthatmaydomoreharmthangood.Vox.com,6Nov.https://www.vox.com/science-and-health/2017/11/3/16599072/stent-chest-pain-treatment-angina-not-effective

Bland,J.M.,andAltman,D.G.(2015).Best(butoftforgotten)practices:Testingfortreatmenteffectsinrandomizedtrialsbyseparateanalysesofchangesfrombaselineineachgroupisamisleadingapproach.AmericanJournalofClinicalNutrition102,991–994.doi:10.3945/ajcn.115.119768.Epub2015Sep9.PubMedPMID:26354536.

Boden,W.E.,O'Rourke,R.A.,Teo,K.K.,Hartigan,P.M.,Maron,D.J.,Kostuk,W.J.,Knudtson,M.,Dada,M.,Casperson,P.,Harris,C.L.,Chaitman,B.R.,Shaw,L.,Gosselin,G.,Nawaz,S.,Title,L.M.,Gau,G.,Blaustein,A.S.,Booth,D.C.,Bates,E.R.,Spertus,J.A.,Berman,D.S.,Mancini,G.B.,andWeintraub,W.S.;COURAGETrialResearchGroup.(2007).OptimalmedicaltherapywithorwithoutPCIforstablecoronarydisease.NewEnglandJournalofMedicine356,1503–16.Epub2007Mar26.

Efron,B.(1979).Bootstrapmethods:Anotherlookatthejackknife.AnnalsofStatistics7,1–26.

Gelman,A.(2004).Treatmenteffectsinbefore-afterdata.InAppliedBayesianModelingandCausalInferencefromIncomplete-dataPerspectives,ed.A.GelmanandX.L.Meng,chapter18.NewYork:Wiley.

Gelman,A.(2018).Thefailureofnullhypothesissignificancetestingwhenstudyingincrementalchanges,andwhattodoaboutit.PersonalityandSocialPsychologyBulletin44,16–23.

Gelman,A.,andCarlin,J.B.(2014).Beyondpowercalculations:AssessingTypeS(sign)

Page 10: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

andTypeM(magnitude)errors.PerspectivesonPsychologicalScience9,641–651.

Gelman,A.,andStern,H.S.(2006).Thedifferencebetween“significant”and“notsignificant”isnotitselfstatisticallysignificant.AmericanStatistician60,328–331.

Greenland,S.(2017).Theneedforcognitivescienceinmethodology.AmericanJournalofEpidemiology186,639–645.

Harrell,F.(2017a).Statisticalerrorsinthemedicalliterature.StatisticalThinkingblog,8Apr.http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

Harrell,F.(2017b).Statisticalcriticismiseasy;Ineedtorememberthatrealpeopleareinvolved.StatisticalThinkingblog,5Nov.http://www.fharrell.com/2017/11/statistiorbita-tct-2017cal-criticism-is-easy-i-need-to.html

Kolata,G.(2017).’Unbelievable’:Heartstentsfailtoeasechestpain.NewYorkTimes,2Nov.https://www.nytimes.com/2017/11/02/health/heart-disease-stents.html

Resnick,B.(2017).Whitefearofdemographicchangeisapowerfulpsychologicalforce.Vox.com,28Jan.https://www.vox.com/science-and-health/2017/1/26/14340542/white-fear-trump-psychology-minority-majority

Sands,M.L.(2017).Exposuretoinequalityaffectssupportforredistribution.ProceedingsoftheNationalAcademyofSciences114,663–668.

Schulz,K.F.,andGrimes,D.A.(2005).Samplesizecalculationsinrandomisedtrials:Mandatoryandmystical.Lancet365,1348–1353.

Simmons,J.,Nelson,L.,andSimonsohn,U.(2011).False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowpresentinganythingassignificant.PsychologicalScience22,1359-1366.

Vickers,A.J.,andAltman,D.G.(2001).Analysingcontrolledtrialswithbaselineandfollowupmeasurements.BritishMedicalJournal323,1123–1124.

Wasserstein,R.L.,andLazar,N.A.(2016).TheASA'sstatementonp-values:Context,process,andpurpose.AmericanStatistician70,129–133.

Page 11: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

Table.Summarydatacomparingstentstoplacebo,fromTable3ofAl-Lameeetal.(2017).

Page 12: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

Box1.Usingthereporteddatasummariestoobtaintheanalysiscontrollingforthepre-treatmentmeasureForeachofthetreatmentandcontrolgroups,wearegiventhestandarddeviationofthepre-testmeasurements,thestandarddeviationofthepost-testmeasurements,andthestandarddeviationoftheirdifference,whichcanbeobtainedbytakingthewidthoftheconfidenceintervalforthedifference,dividingby4togetthestandarderrorofthedifference,andthenmultiplyingby 𝑛togetbacktothestandarddeviation.

Thenweusetherule,sd(y! − y!) = sd y! ! + sd y! !

− 2ρ sd(y!)sd(y!)andsolveforρ,thecorrelationbetweenbeforeandaftermeasurementswithineachgroup.Theresultinthiscaseisρ=0.88withineachgroup.Wethenconvertthecorrelationtoaregressioncoefficientofy!ony!usingthewell-knownformula,β = ρ sd(y!)/sd(y!),whichyieldsβ = 0.88forthetreatedandβ = 0.86forthecontrolgroup.Ifthesetwocoefficientsweremuchdifferentfromeachother,wemightwanttoconsideraninteractionmodel(Gelman,2004),butheretheyarecloseenoughthatwesimplytaketheiraverage.

Weusetheaverage,β=0.87,in(2)andgetanestimatefortheadjustedmeandifferenceof21.3(indeed,quiteabithigherthanthereporteddifferenceingainscoresof16.6)withastandarderrorof12.5(veryslightlylowerthan12.7,thestandarderrorofthedifferenceingainscores)and95%CI−3.2to45.8s.Theestimateisnotquitetwostandarderrorsawayfromzero:thez-scoreis1.7,andthep-valueis0.09.

Page 13: ORBITA and coronary stents: A case study in the analysis ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of blinding for a surgical procedure.

Box2.RecommendationsforAnalysesandReportingAnalyses1.Baselineadjustmentfordifferences:shouldbeprespecifiedfortheprimaryanalysiswherestrongconfounderssuchasabaselinemeasureoftheoutcomeareavailable.2.Beawareoffragilityofinferences.Fragilitycanbedemonstratedusingthesamplingorposteriordistributionasestimatedusingmathematicalmodeling,bootstrapsimulation,orBayesiananalysis.Reporting1.Avoiduseofsharpthresholdsforp-valuesandthuseliminatetheterm“statisticalsignificance”fromthereportingofresults.2.Considerthefullrange(upperandlowerends)ofintervalestimatesforimportantoutcomesandtheirpotentialinclusionofclinicallyimportantdifferences.3.Considerthepotentialforindividualvariabilityinresponses(heterogeneityoftreatmenteffects)andnotjustmeandifferences.