Download - ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

Transcript
Page 1: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

ORBITAandcoronarystents:

Acasestudyintheanalysisandreportingofclinicaltrials

AndrewGelman,JohnB.CarlinandBrahmajeeKNallamothu

25Mar2019

DepartmentofStatisticsandPoliticalScience,ColumbiaUniversity,NewYorkCity,NY,UnitedStates(AndrewGelman,professor);ClinicalEpidemiology&Biostatistics,MurdochChildren’sResearchInstitute,MelbourneSchoolofPopulationandGlobalHealthandDepartmentofPaediatrics,UniversityofMelbourne,Melbourne,Australia(JohnCarlin,professor);DepartmentofInternalMedicine,UniversityofMichiganMedicalSchool,AnnArbor,MI,UnitedStates(BrahmajeeKNallamothu,professor);Correspondenceto:[email protected]:WethankDougHelmreichforbringingthisexampletoourattention,ShiraMitchellforhelpfulcomments,andtheOfficeofNavalResearch,DefenseAdvancedResearchProjectAgency,andtheNationalInstitutesofHealthforpartialsupportofthiswork.Competinginterests:Dr.GelmanandDr.Carlinreportnocompetinginterests.Dr.NallamothuisaninterventionalcardiologistandEditor-in-ChiefofajournaloftheAmericanHeartAssociationbutotherwisehasnocompetinginterests.WordCount:3078

Page 2: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

1.Introduction

Al-Lameeetal.(2017)reportresultsfromarandomizedcontrolledtrialofpercutaneouscoronaryinterventionusingcoronarystentsforstableangina.Thestudy,calledORBITA(ObjectiveRandomisedBlindedInvestigationWithOptimalMedicalTherapyofAngioplastyinStableAngina),includedapproximately200patientsandwasnotableforbeingablindedexperimentinwhichhalfthepatientsreceivedstentsandhalfreceivedaplaceboprocedureinwhichashamoperationwasperformed.Infollow-up,patientswereaskedtoguesstheirtreatmentandofthosewhowerewillingtoguessonly56%guessedcorrectly,indicatingthattheblindingwaslargelysuccessful.

Thesummaryfindingfromthestudywasthatstentingdidnot“increaseexercisetimebymorethantheeffectofaplaceboprocedure”withthemeandifferenceinthisprimaryoutcomebetweentreatmentandcontrolgroupsreportedas16.6secondswithastandarderrorof9.8(95%confidenceinterval,−8.9to+42.0s)andap-valueof0.20.IntheNewYorkTimes,Kolata(2017)reportedthefindingas“unbelievable,”remarkingthatit“stunnedleadingcardiologistsbycounteringdecadesofclinicalexperience.”Indeed,oneofus(BKN)wasquotedasbeinghumbledbythefinding,asmanycardiologistshadexpectedapositiveresult.Ontheotherhand,Kolatanoted,“therehavelongbeenquestionsabout[stents’]effectiveness.”Attheveryleast,thewillingnessofdoctorsandpatientstoparticipateinacontrolledtrialwithaplaceboproceduresuggestssomedegreeofexistingskepticismandclinicalequipoise.

ORBITAwasalandmarktrialduetoitsinnovativeuseofblindingforasurgicalprocedure.However,substantialquestionsremainregardingtheroleofstentinginstableangina.Itisawell-knownstatisticalfallacytotakearesultthatisnotstatisticallysignificantandreportitaszero,aswasessentiallydoneherebasedonthep-valueof0.20fortheprimaryoutcome.Hadthiscomparisonhappenedtoproduceap-valueof0.04,wouldtheheadlinehavebeen,“Confirmed:HeartStentsIndeedEaseChestPain”?Alotofcertaintyseemstobehangingonasmallbitofdata.

ThepurposeofthispaperistotakeacloserlookatthelackofstatisticalsignificanceinORBITAandthelargerquestionsthistrialraisesaboutstatisticalanalyses,statisticallybaseddecisionmaking,andthereportingofclinicaltrials.ThisreviewofORBITAisparticularlytimelyinthecontextofthewidelypublicizedstatementreleasedbytheAmericanStatisticalAssociationthatcautionedagainsttheuseofsharpthresholdsfortheinterpretationofp-values(WassersteinandLazar,2016)andmorerecentextensionsofthisadvicebyourselvesandothers(Amrhein,Greenland,andMcShane,2019,McShaneetal.,2019).Weendbyofferingpotentialrecommendationsto

Page 3: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

improvereporting.

2.StatisticalanalysisoftheORBITAtrial

Adjustingforbaselinedifferences.InORBITA,exercisetimeinastandardizedtreadmilltest—theprimaryoutcomeinthepreregistereddesign—increasedonaverageby28.4sinthetreatmentgroupcomparedtoanincreaseofonly11.8sinthecontrolgroup.Asnotedabove,thisdifferencewasnotstatisticallysignificantatasignificancethresholdof0.05.Followingconventionalrulesofscientificreporting,thetrueeffectwastreatedaszero—aninstanceoftheregrettablycommonstatisticalfallacyofpresentingnon-statistically-significantresultsasconfirmationofthenullhypothesisofnodifference.

However,theestimateusinggaininexercisetimedoesnotmakefulluseofthedatathatwereavailableondifferencesbetweenthecomparisongroupsatbaseline(VickersandAltman,2001,Harrell,2017a).AscanbeseenintheTable,thetreatmentandplacebogroupsdifferintheirpre-treatmentlevelsofexercisetime,withmeanvaluesof528.0and490.0s,respectively.Thissortofdifferenceisnosurprise—randomizationassuresbalanceonlyinexpectation—butitisimportanttoadjustforthisdiscrepancyinestimatingthetreatmenteffect.Inthepublishedpaper,theadjustmentwasperformedbysimplesubtractionofthepre-treatmentvalues:

Gainscoreestimatedeffect: (ypost−ypre)T−(ypost−ypre)

C, (1)

Butthisover-correctsfordifferencesinpre-testscores,becauseofthefamiliarphenomenonof“regressiontothemean”—justfromnaturalvariation,wewouldexpectpatientswithlowerscoresatbaselinetoimprove,relativetotheaverage,andpatientswithhigherscorestoregressdownward.Theoptimallinearestimateofthetreatmenteffectisactually:

Adjustedestimate: (ypost−βypre)T−(ypost−βypre)

C, (2)

whereβisthecoefficientofypreinaleast-squaresregressionofypostonypre,also

controllingforthetreatmentindicator.Theestimatein(1)isaspecialcaseoftheregressionestimate(2)correspondingtoβ=1.Giventhatthepre-testandpost-testmeasurementsarepositivelycorrelatedandhavenearlyidenticalvariances(ascanbeseenintheTable),wecananticipatethattheoptimalβwillbelessthan1,whichwillreducethecorrectionfordifferenceinpre-testandthusincreasetheestimatedtreatmenteffectwhilealsodecreasingthestandarderror.Asaresult,anadjustedanalysisofthesedatawouldbeexpectedtoproducealowerp-value.

Page 4: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

TheadjustedregressionanalysiscanbedoneusingtheinformationavailableintheTable,asexplainedindetailinBox1.Thep-valuefromthisadjustedanalysisis0.09:asanticipated,lowerthanthep=0.20fromtheunadjustedanalysis.

Alternativereporting.Despitemovingclosertotheconventional0.05threshold,thep-valueof0.09remainsabovethetraditionallevelofsignificancewhereatwhichonewouldnotistaughttorejectthenullhypothesis.Apotentialblockbusterreversalwithanadjustedanalysis—“StatisticalSleuthsTurnReportedNullEffectintoaStatisticallySignificantEffect”—doesnotquitematerialize.

Yetwithindifferentconventionsforscientificreporting,thisexperimentcouldhavebeenpresentedaspositiveevidenceinfavorofstents.Insomesettings,ap-valueof0.09isconsideredtobestatisticallysignificant;forexample,inarecentsocialscienceexperimentpublishedintheProceedingsoftheNationalAcademyofSciences,Sands(2017)presentedacausaleffectbasedonap-valueoflessthan0.10,andthiswasenoughforpublicationinatopjournalandinthepopularpress,with,forexample,thatworkmentioneduncriticallyinthemediaoutletVoxwithoutanyconcernregardingsignificancelevels(Resnick,2017).Bycontrast,VoxreportedthatORBITAshowedstentstobe“dubioustreatments,”aprimeexampleofthe“epidemicofunnecessarymedicaltreatments”(Belluz,2017).HadAl-Lameeetal.performedtheadjustedanalysiswiththeirdataandpublishedinPNASratherthantheLancet,couldtheyhaveconfidentlyreportedacausaleffectofstentsonexercisetime?

OurpointhereisnotatalltosuggestthatAl-Lameeelal.engagedinreverse“p-hacking”(Simmons,Nelson,andSimonsohn,2011),choosingananalysisthatproducedanewsworthynullresult.Infact,theauthorsshouldbecongratulatedforpre-registeringtheirstudy,publishingtheirprotocolpriortoperformingtheiranalyses,andreportingapre-specifiedprimaryanalysis.Ratherwewishtoemphasizetheflexibilityinherentbothindataanalysisandreporting—eveninthecaseofacleanrandomizedexperiment.Wearepointingoutthepotentialfragilityofthestents-didn’t-workstoryinthiscase.Existingdatacouldeasilyhavebeenpresentedasasuccessforstentscomparedtoplacebobyauthorswhowereaimingforthatnarrativeandperformingreasonableanalyses.

Fragilityofthefindings.Howsensitiveweretheresultstoslightchangesinthedata?Tobetterunderstandthiscriticalpoint,onecanperformasimplebootstrapanalysis,computingtheresultsthatwouldhavebeenobtainedfromreanalyzingthedata1000times,eachtimeresamplingpatientsfromtheexistingexperimentwithreplacement(Efron,1979).Asrawdatawerenotavailabletous,weapproximatedusingthenormal

Page 5: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

distributionbasedontheobservedz-scoreof1.7.Theresultwasthat,in40%ofthesimulations,stentsoutperformedplaceboatatraditionallevelofstatisticalsignificance.Thisisnottosaythatstentsreallyarebetterthanplacebo—thedataalsoappearconsistentwithanulleffect.Thetake-homepointofthisexperimentisthattheresultscouldeasilyhavegone“theotherway,”whenreportingisforcedintoabinaryclassificationofstatisticalsignificance,formanydifferentreasons.

3.Designofthetrialandclinicalsignificance

Inajustificationfortheirstudydesignandsamplesize,Al-Lameeetal.(2017)wrote:“Evidencefromplacebo-controlledrandomisedcontrolledtrialsshowsthatsingleantianginaltherapiesprovideimprovementsinexercisetimeof48–55s...Giventhepreviousevidence,ORBITAwasconservativelydesignedtobeabletodetectaneffectsizeof30s.”Theestimatedeffectof21swithstandarderror12sisconsistentwiththe“conservative”effectsizeestimateof30sgiveninthepublishedarticle.Soalthoughtheexperimentalresultsareconsistentwithanulleffect,theyareevenmoreconsistentwithasmallpositiveeffect.

Onemightask,however,abouttheclinicalsignificanceofsuchatreatmenteffect,whichwecandiscusswithoutrelevancetop-valuesorstatisticalsignificance.Forsimplicity,supposewetakethepointestimatefromthedataatfacevalue.Howshouldwethinkaboutanincreaseinaverageexercisetimeof21s?Onewaytoconceptualizethisisintermsofpercentiles.Thedatashowapre-randomizationdistribution(averagingthetreatmentandcontrolgroups)withameanof509andastandarddeviationof188.Assuminganormalapproximation,anincreaseinexercisetimeof21sfrom509to530wouldtakeapatientfromthe50thpercentiletothe54thpercentileofthedistribution.Lookedatthatway,itwouldbehardtogetexcitedaboutthiseffectsize,evenifitwerearealpopulationshift.

Beyondexercisetime,therewereothersignalsfromORBITAthatseemedtosuggestconsistentimprovementsinthephysiologicalparameterofischemiathroughendpointssuchasfractionalflowreserve,instantaneouswave-freeratio,andstressecho.Actually,findingsfromthestressechohighlightapotentiallyimportantavenueintoanalternativepresentationoftheseresults.Thereisnoquestionthatsomephysiologicalchangesarebeingmadebystents,withverylargeandhighlystatisticallysignificant(p<0.001)effectsseenonechomeasures.Asisoftenthecase,thenullhypothesisthatthesephysicalchangesshouldmakeabsolutelyzerodifferencetoanydownstreamclinicaloutcomesseemsfarfetched.Thus,thesensiblequestiontoaskis“Howlargearetheclinicaldifferencesobserved?”,not“Howsurprisingistheobservedmeandifference

Page 6: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

undera[spurious]nullhypothesis?”Thesimpletextbookwaytotacklethisquestionistoreportconfidenceintervals(CIs)aroundthemeandifferencesandnottofocusonwhethertheintervalshappentoincludezero.Thefactthatthestandard95%CIfortheprimaryoutcomecomfortablyincludesthetargeteffectsizeof30ssuggeststhisvalueshouldbenomore“rejected”thanthenullvalue.Furthermore,withoutthelongitudinaldatatoobservetheoutcomesthatmattermosttopatients—healthandlengthoflife—muchremainsuncertain.

Thelargerquestionhastobeaboutbalancingthelong-termbenefitsofstentswithrisksoftheoperation.Itdoesnotseemreasonableforapersontorisklifeandhealthbysubmittingtoasurgicalprocedurejustforapotentialbenefitof21secondsofexercisetimeonastandardizedtreadmilltest—orevenahypothesizedlargerbenefitof50seconds,whichwouldstillonlyrepresenta10%improvementforanaveragepatientinthisstudy.Yetmaybea5-10%increaseisconsequentialinthiscaseasitcouldimprovequalityoflifeforapatientoutsideofthisartificialsetting.Perhapsthissmallgaininexercisetimeisassociatedwiththeneedforlessmedications,fewerfunctionallimitationsorgreatermobility.Ifso,however,onemightpostulatethisgainwouldhavebeenapparentinassessmentsofanginaburden,anditwasnot.

Partofthebiggerconcernhereisthatthesepatientswerealreadydoingprettywellonmedications—thatis,theyalreadyhadalowsymptomfrequencybeforestenting.Forexample,anginafrequencyasmeasuredbytheSeattleAnginaQuestionnairewas63.2afteroptimizingmedicationsandbeforestentinginthetreatmentgroup.Thisroughlytranslatesas“monthly”angina(JohnSpertus,personalcommunication).Howdoesastudywithafollow-upofjust6weeksexpecttoimproveanoutcomethathappensthisinfrequently?Infact,oneofthegreatdebatessurroundingORBITAisthatthosewhodiscountthetrialsuggestitenrolledpatientswhotypicallydonotreceivestentsinroutinepractice.ThosewhobelieveORBITAisagame-changerarguethattheselesssymptomaticpatientsactuallymakeupalargeproportionofthosereceivingstents—andthatiswhywehavesuchalargeproblemwiththeiroveruse.

Finally,arestentsreallybeinggiventopatientswithstableanginajusttoimprovefitnessortoreducesymptoms?Oristhereacontinuedexpectationthatstentshavelong-termbenefitsforpatients,despiteearlierdatafromstudiesliketheClinicalOutcomesUtilizingRevascularizationandAggressiveDrugEvaluation(COURAGE)study(Boden,2007)?Thiswouldseemtobethekeyquestion,inwhichcasetheshort-termeffects,orlackthereof,foundintheORBITAstudyarelargelyirrelevant.Otherlargertrials,suchasInternationalStudyofComparativeHealthEffectivenessWithMedicalandInvasiveApproaches(ISCHEMIA,see:https://clinicaltrials.gov/ct2/show/NCT01471522)

Page 7: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

areconsideringthismorefundamentalquestionbutwillnothaveaplaceboprocedure.

4.Recommendationsforstatisticalreportingoftrials

Thesearchforbettermedicalcareisanincrementalprocess,withincompleteevidenceaccumulatingovertime.Thereisunfortunatelyafundamentalincompatibilitybetweenthatcoreideaandthecommonpractice,bothinmedicaljournalsandthenewsmedia,ofup-or-downreportingofindividualstudiesbasedonstatisticalsignificance.WeoffersomerecommendationssummarizedinBox2thatwebelievewillbehelpfultoauthorsandeditorsmovingforward.

Atthispointit’snotclearhowbesttoincorporatethisrecentexperimentintoroutinepracticedespiteitsnovelandprovocativestudydesign,sotheforcedreportingoftheprimaryoutcomeas“positive”or“negative”isunhelpful.AreanalysisofthesummarydatafromAl-Lameeetal.(2017)revealsastrongerestimatedeffectthatisclosertotheconventionalboundaryofstatisticalsignificance,indicatingthatthestudycouldrathereasilyhavegeneratedandreportedevidenceinfavorof,ratherthanagainst,theeffectivenessofstentsforpatientswithstableangina.Andfromourbriefflurryofexcitementoverthepossibilitythatasimplereanalysiscouldchangethesignificancelevel,weareagainremindedofthesensitivityofheadlineconclusionstodecisionsinstatisticalanalysis.Inanycase,though,theobservedincreasesinexercisetime,evenifstatisticallysignificant,donotappearatfirstglancetobeofmuchclinicalimportance,comparedtothemuchmorerelevantlong-termhealthoutcomesthatremainuncertain.

Inthedesign,evaluation,andreportingofexperimentalstudies,thereisanormoffocusingonthestatisticalsignificanceofaprimaryoutcomey—inthiscase,changeinaverageexercisetimeonastandardizedtreadmilltest.Ingeneral,theconclusionsthatfollowwillbefragilebecausep-valuesareextremelynoisyunlesstheunderlyingeffectishuge.Anexperimentmaybedesignedtohave80%power,butthisdoesn’teliminatethefragility,asillustratedbyourbootstrapre-analysis.Powercalculationsareoftenconditionalonanoverestimatedeffectsize(SchulzandGrimes,2005,Gelman,2018)anddoesnotaddresstheimportantquestionofvariationintreatmenteffects.ExaminationoftheLancetpaperanditsreceptioninthenewsmediasuggeststhatitexhibitsaclassiccaseof“significantitis”or“dichotomania”(Greenland,2017),withfrequentrepetitionofphrasessuchas“therewasnosignificantdifference.”Inlinewithcurrentthinking(Amrhein,Greenland,andMcShane,2019),wesuggestthatthephraseusedbytheseauthors,“Wedeemedapvaluelessthan0.05tobesignificant,”shouldbestronglydiscouraged,ratherthanactivelydemandedasiscurrentlythecasebymanyjournaleditors.Totheircredit,theORBITAauthorsthemselveshaverecognizedthese

Page 8: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

criticalissues(seehttps://twitter.com/ProfDFrancis/status/952008644018753536).

Inthecaseofstents,animportantdisconnectappearsbetweenthefindingsemphasizedintherecentstudy—howeverpresented—andthelargercontextoftreatmentsforheartdisease.Fromastatisticalperspective,thisappearstoreflectaproblemwiththeframingofclinicaltrialsasattemptstodiscoverwhetheratreatmenthasastatisticallysignificanteffect—commonlymisinterpretedtobeequivalenttoareal(non-zero)populationmeandifference.Powercalculationsareusedinanattempttoassurestableestimatesandagoodchanceoftheexperimentbeing“successful”,althoughwithintheseconstraintstherecanbeapushtowardconvenienceratherthanrelevanceofoutcomemeasures—whichisperhapsaninevitablecompromise.ORBITAshowsustheconfusionthatariseswhenatreatmentisreportedasasuccessorfailureinstatisticaltermsthataredivorcedfromclinicalcontext.

ORBITAwasnevermeanttobedefinitiveinabroadsense—itwasdesignedtofindastatisticallysignificantphysiologicaleffectofstentingonmeanexercisetime,withoutclarityontheclinicalrelevanceofanticipatedeffectsonthisoutcomemeasure.Indeed,alikelyreasonwhythestudywaslimitedinitssizeanddesignofthesesurrogateoutcomeswasbecausethisisallthatcouldhavepassedanethicalboardgiventhenoveltyoftheplaceboprocedureinthissetting.FurtherbackgroundonthesetopicsfromDarrelFrancis,theseniorauthoronthestudy,appearsatHarrell(2017b).Beyondimmediatenewsreports,onepositiveimpactofORBITAisthatbiggertrialsofstentingwithplaceboproceduresarenowmuchmorelikelywithamoredefinitivesetofmeasuredoutcomesthataremeaningfulforpatients.

Wedon’tseeanyeasyanswershere—long-termoutcomeswouldrequirealong-termstudy,afterall,andclinicaldecisionsneedtobemaderightaway,everyday—butperhapswecanuseourexaminationofthisparticularstudyanditsreportingtosuggestpracticaldirectionsforimprovementinhearttreatmentstudiesandinthedesignandreportingofclinicaltrialsmoregenerally.

References

Al-Lamee,R.,Thompson,D.,Dehbi,H.M.,Sen,S.,Tang,K.,Davies,J.,Keeble,T.,Mielewczik,M.,Kaprielian,R.,Malik,I.S.,Nijjer,S.S.,Petraco,R.,Cook,C.,Ahmad,Y.,Howard,J.,Baker,C.,Sharp,A.,Gerber,R.,Talwar,S.,Assomull,R.,Mayet,J.,Wensel,R.,Collier,D.,Shun-Shin,M.,Thom,S.A.,Davies,J.E.,andFrancis,D.P.(2017).Percutaneouscoronaryinterventioninstableangina(ORBITA):adouble-blind,randomisedcontrolledtrial.Lancet.http://dx.doi.org/10.1016/S0140-6736(17)32714-9

Page 9: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

Amrhein,V.,Greenland,S.,andMcShane,B.(2019).Scientistsriseupagainststatisticalsignificance.Nature567,305-307.AmericanCollegeofCardiology(2017).ORBITA:Firstplacebo-controlledrandomizedtrialofPCIinCADpatients.ACCNews,2Nov.http://www.acc.org/latest-in-cardiology/articles/2017/10/27/13/34/thurs-1150am-orbita-tct-2017

Belluz,J.(2017).Thousandsofheartpatientsgetstentsthatmaydomoreharmthangood.Vox.com,6Nov.https://www.vox.com/science-and-health/2017/11/3/16599072/stent-chest-pain-treatment-angina-not-effective

Boden,W.E.,O'Rourke,R.A.,Teo,K.K.,Hartigan,P.M.,Maron,D.J.,Kostuk,W.J.,Knudtson,M.,Dada,M.,Casperson,P.,Harris,C.L.,Chaitman,B.R.,Shaw,L.,Gosselin,G.,Nawaz,S.,Title,L.M.,Gau,G.,Blaustein,A.S.,Booth,D.C.,Bates,E.R.,Spertus,J.A.,Berman,D.S.,Mancini,G.B.,andWeintraub,W.S.;COURAGETrialResearchGroup.(2007).OptimalmedicaltherapywithorwithoutPCIforstablecoronarydisease.NewEnglandJournalofMedicine356,1503–16.Epub2007Mar26.

Efron,B.(1979).Bootstrapmethods:Anotherlookatthejackknife.AnnalsofStatistics7,1–26.

Gelman,A.(2004).Treatmenteffectsinbefore-afterdata.InAppliedBayesianModelingandCausalInferencefromIncomplete-dataPerspectives,ed.A.GelmanandX.L.Meng,chapter18.NewYork:Wiley.

Gelman,A.(2018).Thefailureofnullhypothesissignificancetestingwhenstudyingincrementalchanges,andwhattodoaboutit.PersonalityandSocialPsychologyBulletin44,16–23.

Gelman,A.,andCarlin,J.B.(2014).Beyondpowercalculations:AssessingTypeS(sign)andTypeM(magnitude)errors.PerspectivesonPsychologicalScience9,641–651.

Greenland,S.(2017).Theneedforcognitivescienceinmethodology.AmericanJournalofEpidemiology186,639–645.

Harrell,F.(2017a).Statisticalerrorsinthemedicalliterature.StatisticalThinkingblog,8Apr.http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

Harrell,F.(2017b).Statisticalcriticismiseasy;Ineedtorememberthatrealpeopleareinvolved.StatisticalThinkingblog,5Nov.http://www.fharrell.com/2017/11/statistiorbita-tct-2017cal-criticism-is-easy-i-need-to.html

Kolata,G.(2017).’Unbelievable’:Heartstentsfailtoeasechestpain.NewYorkTimes,2Nov.https://www.nytimes.com/2017/11/02/health/heart-disease-stents.html

McShane,B.B.,Gal,D.,Gelman,A.,Robert,C.,andTackett,J.L.(2019).Abandon

Page 10: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

statisticalsignificance.AmericanStatistician73(S1),235–245.

Resnick,B.(2017).Whitefearofdemographicchangeisapowerfulpsychologicalforce.Vox.com,28Jan.https://www.vox.com/science-and-health/2017/1/26/14340542/white-fear-trump-psychology-minority-majority

Sands,M.L.(2017).Exposuretoinequalityaffectssupportforredistribution.ProceedingsoftheNationalAcademyofSciences114,663–668.

Schulz,K.F.,andGrimes,D.A.(2005).Samplesizecalculationsinrandomisedtrials:Mandatoryandmystical.Lancet365,1348–1353.

Simmons,J.,Nelson,L.,andSimonsohn,U.(2011).False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowpresentinganythingassignificant.PsychologicalScience22,1359-1366.

Vickers,A.J.,andAltman,D.G.(2001).Analysingcontrolledtrialswithbaselineandfollowupmeasurements.BritishMedicalJournal323,1123–1124.

Wasserstein,R.L.,andLazar,N.A.(2016).TheASA'sstatementonp-values:Context,process,andpurpose.AmericanStatistician70,129–133.

Page 11: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

Table.Summarydatacomparingstentstoplacebo,fromTable3ofAl-Lameeetal.(2017).

Page 12: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

Box1.Usingthereporteddatasummariestoobtaintheanalysiscontrollingforthepre-treatmentmeasureForeachofthetreatmentandcontrolgroups,wearegiventhestandarddeviationofthepre-testmeasurements,thestandarddeviationofthepost-testmeasurements,andthestandarddeviationoftheirdifference,whichcanbeobtainedbytakingthewidthoftheconfidenceintervalforthedifference,dividingby4togetthestandarderrorofthedifference,andthenmultiplyingby 𝑛togetbacktothestandarddeviation.

Thenweusetherule,sd(y! − y!) = sd y! ! + sd y! !

− 2ρ sd(y!)sd(y!)andsolveforρ,thecorrelationbetweenbeforeandaftermeasurementswithineachgroup.Theresultinthiscaseisρ=0.88withineachgroup.Wethenconvertthecorrelationtoaregressioncoefficientofy!ony!usingthewell-knownformula,β = ρ sd(y!)/sd(y!),whichyieldsβ = 0.88forthetreatedandβ = 0.86forthecontrolgroup.Ifthesetwocoefficientsweremuchdifferentfromeachother,wemightwanttoconsideraninteractionmodel(Gelman,2004),butheretheyarecloseenoughthatwesimplytaketheiraverage.

Weusetheaverage,β=0.87,in(2)andgetanestimatefortheadjustedmeandifferenceof21.3(indeed,quiteabithigherthanthereporteddifferenceingainscoresof16.6)withastandarderrorof12.5(veryslightlylowerthan12.7,thestandarderrorofthedifferenceingainscores)and95%CI−3.2to45.8s.Theestimateisnotquitetwostandarderrorsawayfromzero:thez-scoreis1.7,andthep-valueis0.09.

Page 13: ORBITA and coronary stents: A case study in the analysis ...gelman/research/published/Stents...2019/03/25  · ORBITA and coronary stents: A case study in the analysis and reporting

Box2.RecommendationsforAnalysesandReportingAnalyses1.Baselineadjustmentfordifferences:shouldbeprespecifiedfortheprimaryanalysiswherestrongconfounderssuchasabaselinemeasureoftheoutcomeareavailable.2.Beawareoffragilityofinferences.Fragilitycanbedemonstratedusingthesamplingorposteriordistributionasestimatedusingmathematicalmodeling,bootstrapsimulation,orBayesiananalysis.Reporting1.Avoiduseofsharpthresholdsforp-valuesandthuseliminatetheterm“statisticalsignificance”fromthereportingofresults.2.Considerthefullrange(upperandlowerends)ofintervalestimatesforimportantoutcomesandtheirpotentialinclusionofclinicallyimportantdifferences.3.Considerthepotentialforindividualvariabilityinresponses(heterogeneityoftreatmenteffects)andnotjustmeandifferences.