ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA...

12
ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin and Brahmajee K Nallamothu 14 Mar 2018 Department of Statistics and Political Science, Columbia University, New York City, NY, United States (Andrew Gelman, professor); Clinical Epidemiology & Biostatistics, Murdoch Children’s Research Institute, Melbourne School of Population and Global Health and Department of Paediatrics, University of Melbourne, Melbourne, Australia (John Carlin, professor); Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, United States (Brahmajee K Nallamothu, professor); Correspondence to: Brahmajee K Nallamothu [email protected] Acknowledgements: We thank Doug Helmreich for bringing this example to our attention, Shira Mitchell for helpful comments, and the Office of Naval Research, Defense Advanced Research Project Agency, and the National Institutes of Health for partial support of this work. Competing interests: Dr. Gelman and Dr. Carlin report no competing interests. Dr. Nallamothu is an interventional cardiologist and Editor-in-Chief of a journal of the American Heart Association but otherwise has no competing interests. Word Count: 2085

Transcript of ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA...

Page 1: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

ORBITA:Acasestudyintheanalysisandreportingofclinicaltrials

AndrewGelman,JohnCarlinandBrahmajeeKNallamothu

14Mar2018DepartmentofStatisticsandPoliticalScience,ColumbiaUniversity,NewYorkCity,NY,UnitedStates(AndrewGelman,professor);ClinicalEpidemiology&Biostatistics,MurdochChildren’sResearchInstitute,MelbourneSchoolofPopulationandGlobalHealthandDepartmentofPaediatrics,UniversityofMelbourne,Melbourne,Australia(JohnCarlin,professor);DepartmentofInternalMedicine,UniversityofMichiganMedicalSchool,AnnArbor,MI,UnitedStates(BrahmajeeKNallamothu,professor);Correspondenceto:[email protected]:WethankDougHelmreichforbringingthisexampletoourattention,ShiraMitchellforhelpfulcomments,andtheOfficeofNavalResearch,DefenseAdvancedResearchProjectAgency,andtheNationalInstitutesofHealthforpartialsupportofthiswork.Competinginterests:Dr.GelmanandDr.Carlinreportnocompetinginterests.Dr.NallamothuisaninterventionalcardiologistandEditor-in-ChiefofajournaloftheAmericanHeartAssociationbutotherwisehasnocompetinginterests.WordCount:2085

Page 2: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

Introduction

ORBITA(ObjectiveRandomisedBlindedInvestigationWithOptimalMedicalTherapyofAngioplastyinStableAngina)wasarandomizedclinicaltrialofapproximately200patientsinwhichhalfthepatientsreceivedstentsandhalfreceivedaplaceboprocedure.Itssummaryfindingwasthatstentingdidnot“increaseexercisetimebymorethantheeffectofaplaceboprocedure”withthemeandifferenceinthisprimaryoutcomebetweentreatmentandcontrolgroupsreportedas16.6sec(95%confidenceinterval,−8.9to+42.0sec)andap-valueof0.20.

IntheNewYorkTimes,Kolata(2017)reportedthefindingas“unbelievable,”remarkingthatit“stunnedleadingcardiologistsbycounteringdecadesofclinicalexperience.”Indeed,oneofus(BKN)wasquotedasbeinghumbledbythefindingasmanyhadexpectedapositiveresult.Ontheotherhand,Kolatanoted,“therehavelongbeenquestionsabout[stents’]effectiveness.”Attheveryleast,thewillingnessofdoctorsandpatientstoparticipateinacontrolledtrialwithaplaceboproceduresuggestssomedegreeofexistingskepticismandclinicalequipoise.

ORBITAwasalandmarktrialduetoitsinnovativeuseofaplaceboprocedure.However,substantialquestionsremainevenafterORBITAregardingtheroleofstentinginstableangina.Itisawell-knownstatisticalfallacytotakearesultthatisnotstatisticallysignificantandreportitaszero,aswasessentiallydoneherebasedonthep-valueof0.20fortheprimaryoutcome.Hadthiscomparisonhappenedtoproduceap-valueof0.04,wouldtheheadlinehavebeen,“‘Believable’:HeartStentsIndeedEaseChestPain”?

ThepurposeofthispaperistotakeacloserlookatthelackofstatisticalsignificanceinORBITAandthelargerquestionsitraisesaboutstatisticalanalyses,statisticallybasedversusclinicaldecision-making,andthereportingofclinicaltrials.Thisisimportantbecausealotofcertaintyseemstobehangingonasmallbitofdata.

Dichotomizedthresholdsareabigproblem,henceinthispaperwewillavoiddiscussing“statisticalsignificance”exceptwhendiscussingissuesofhowresultsareorcouldbereported.

StatisticalanalysisoftheORBITAtrial

Adjustingforbaselinedifferences.InORBITA,exercisetimeinastandardizedtreadmill

Page 3: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

test—theprimaryoutcomeinthepreregistereddesign—increasedonaverageby28.4secinthetreatmentgroupcomparedtoanincreaseofonly11.8secinthecontrolgroup.Asnotedabove,thisdifferencewasassociatedwithap-valuegreaterthan0.05.Hence,followingconventionalrulesofscientificreportingitwastreatedaszero—aninstanceoftheregrettablycommonstatisticalfallacyofpresentingnon-statistically-significantresultsasconfirmationofthenullhypothesisofnodifference.

However,theestimateusinggaininexercisetimedoesnotmakefulluseofthedatathatwereavailableondifferencesbetweenthegroupsatbaseline(VickersandAltman,2001,Harrell,2017a).Thetreatmentandplacebogroupsdifferedintheirpre-treatmentlevelsofexercisetime,withmeanvaluesof528.0and490.0s,respectively(SupplementaryTable).Thissortofdifferenceisfine—randomizationassuresbalanceonlyinexpectation—butitisimportanttoadjustforthisdiscrepancyinestimatingthetreatmenteffect.Inthepublishedpaper,theadjustmentwasperformedbysimplesubtractionofthepre-treatmentvalues:

Gaininexercisetime: (ypost−ypre)T−(ypost−ypre)

C, (1)

Butthisover-correctsfordifferencesinpre-testscores,becauseofthefamiliarphenomenonof“regressiontothemean”—justfromnaturalvariation,wewouldexpectpatientswithlowerscoresatbaselinetoimprove,relativetotheaverage,andpatientswithhigherscorestoregressdownward.

Theoptimallinearestimateofthetreatmenteffectisactually:

Gaininexercisetime: (ypost−βypre)T−(ypost−βypre)

C, (2)

whereβisthecoefficientofypreinaleast-squaresregressionofypostonypre,also

controllingforthetreatmentindicator.

Theestimatein(1)isaspecialcaseoftheregressionestimate(2)correspondingtoβ=1.Giventhatthepre-testandpost-testmeasurementshavenearlyidenticalvariances,wecananticipatethattheoptimalβwillbelessthan1,whichwillreducethecorrectionfordifferenceinpre-testandthusincreasetheestimatedtreatmenteffectwhiledecreasingthestandarderror.

AnadjustedanalysisusingtheinformationavailableisexplainedindetailinBox1.Thep-valuefromthisadjustedanalysisis0.09:asexpected,lowerthanthep=0.20fromtheunadjustedanalysis.Whatisrelevantisnotwhetherornotthisnewp-valuehasbecome

Page 4: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

“statisticallysignificant”butratherthereportedp-valueissubjecttochangebasedonalternativeanalyses.

Withindifferentconventionsforscientificreportingandfordifferentfields,ap-valueof0.09isconsideredtobestatisticallysignificant;forexample,inarecentsocialscienceexperimentpublishedintheProceedingsoftheNationalAcademyofSciences,Sands(2017)presentedacausaleffectbasedonap-valueoflessthan0.10,andthiswasenoughforpublicationinatopjournalandinthepopularpress.Voxmentionedthatworkuncriticallywithoutanyconcernregardingsignificancelevels(Resnick,2017).Bycontrast,Voxreportedstentsasaprimeexampleofthe“epidemicofunnecessarymedicaltreatments”afterORBITA(Belluz,2017).

TheseconcernsaredeepenedfurtherwhenoneconsidershowsensitiveresultsfromORBITAwerefromastatisticalstandpoint.Tobetterunderstandthisonecanperformasimplebootstrapanalysis,computingtheresultsthatwouldhavebeenobtainedfromreanalyzingthedata1000times,eachtimeresamplingpatientsfromtheexistingexperimentwithreplacement(Efron,1979).Asrawdatawerenotavailabletous,weapproximatedusingthenormaldistributionbasedontheobservedz-scoreof1.7.Theresultwasthat,in40%ofthesimulations,stentsoutperformedplacebowithp-valueslessthan0.05.Thisisnottosaythatstentsreallyarebetteronaveragethanplaceboinimprovingexercisetime—thedataalsoappearconsistentwithanulleffect.Thetake-homepointofthisexperimentisthattheresultscouldeasilyhavegone“theotherway”,whenreportingisforcedintoabinaryclassificationofstatisticalsignificance.

StatisticallyBasedversusClinicalDecision-Making

Injustifyingtheirstudydesignandsamplesize,Al-Lameeetal.(2017)wrote:“Evidencefromplacebo-controlledrandomisedcontrolledtrialsshowsthatsingleantianginaltherapiesprovideimprovementsinexercisetimeof48–55sec…Giventhepreviousevidence,ORBITAwasconservativelydesignedtobeabletodetectaneffectsizeof30sec.”Theestimatedeffectof21secwithstandarderror12secisconsistentwiththe“conservative”effectsizeestimateof30secgiveninthepublishedarticle.Soalthoughtheexperimentalresultsareconsistentwithanulleffect,theyareevenmoreconsistentwithasmallpositiveeffect.

Onemightask,however,abouttheclinicalsignificanceofsuchatreatmenteffect,whichwecandiscusswithoutrelevancetop-valuesorstatisticalsignificance.Forsimplicity,supposewetakethepointestimatefromthedataatfacevalue.Howshouldwethinkaboutanincreaseinaverageexercisetimeof21sec?Onewaytoconceptualizethisisin

Page 5: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

termsofpercentiles.Thedatashowapre-randomizationdistribution(averagingthetreatmentandcontrolgroups)withameanof509secandastandarddeviationof188sec.Assuminganormalapproximation,anincreaseinexercisetimeof21secfrom509to530secwouldtakeapatientfromthe50thpercentiletothe54thpercentileofthedistribution.Lookedatthatway,itwouldbehardtogetexcitedaboutthiseffectsize,evenifitwerearealpopulationshift.Indeed,arecentstudyafterORBITAsuggestedironicallythatsuchgainsarepossibleduringatreadmilltestbysimplyplayingmusic.

Thus,thelargerclinicalquestionishowtobalancethelong-termbenefitsofstentswithrisksoftheprocedure.Itdoesnotseemreasonableforapersontoreceivestentsjustforapotentialbenefitof21secofexercisetimeonastandardizedtreadmilltest—orevenahypothesizedlargerbenefitof50sec,whichwouldstillonlyrepresenta10%improvementforanaveragepatientinthisstudy.Yetmaybea5%to10%increaseisconsequentialinthiscaseasitcouldimprovequalityoflifeforapatient.Perhapsthissmallgaininexercisetimeisassociatedwiththeneedforlessmedications,fewerfunctionallimitationsorgreatermobility.Ifso,however,onemightpostulatethisgainwouldhavebeenapparentinassessmentsofanginaburden,anditwasnot.

Abigconcernhereisthatthesepatientswerealreadydoingprettywellonmedications—thatis,theyalreadyhadalowsymptomfrequencybeforestenting.Forexample,anginafrequencyasmeasuredbytheSeattleAnginaQuestionnairewas63.2afteroptimizingmedicationsandbeforestentinginthetreatmentgroup.Thisroughlytranslatesas“monthly”angina(JohnSpertus,personalcommunication).Howdoesastudywithafollow-upofjust6weeksexpecttoimproveanoutcomethathappensthisinfrequently?Infact,oneofthegreatdebatessurroundingORBITAisthatthosewhodiscountthetrialsuggestitenrolledpatientswhotypicallydonotreceivestentsinroutinepractice.ThosewhobelieveORBITAisagame-changerarguethattheselesssymptomaticpatientsactuallymakeupalargeproportionofthosereceivingstents.

Finally,arestentsreallybeinggiventopatientswithstableanginajusttoimprovefitnessortoreducesymptoms?Oristhereacontinuedexpectationthatstentshavelong-termbenefitsforpatients,despiteearlierdatafromstudiesliketheClinicalOutcomesUtilizingRevascularizationandAggressiveDrugEvaluation(COURAGE)study(Boden,2007)?Thiswouldseemtobethekeyquestion,inwhichcasetheshort-termeffects,orlackthereof,foundintheORBITAstudyarelargelyirrelevant.Otherlargertrials,suchasInternationalStudyofComparativeHealthEffectivenessWithMedicalandInvasiveApproaches(ISCHEMIA,see:https://clinicaltrials.gov/ct2/show/NCT01471522)areconsideringthismorefundamentalquestionbutwillnothaveaplaceboprocedure.

Page 6: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

EvidencefromORBITAthatpointedtowardconsistentimprovementsinthephysiologicalparameterofischemiathroughendpointssuchasfractionalflowreserveandstressechosuggeststhereislittlequestionthatsomephysiologicalchangesarebeingmadebystents,withverylargeandhighlystatisticallysignificant.Asisoftenthecase,thenullhypothesisthatthesephysicalchangesshouldmakeabsolutelyzerodifferencetoanydownstreamclinicaloutcomesseemsfarfetched.Thus,thesensiblequestiontoaskis“Howlargearetheclinicaldifferencesobservedandaretheyworthit?”—not“Howsurprisingistheobservedmeandifferenceundera[spurious]nullhypothesis?”

4.Recommendationsforstatisticalreportingoftrials

Thesearchforbettermedicalcareisanincrementalprocess,withincompleteevidenceaccumulatingovertime.Thereisunfortunatelyafundamentalincompatibilitybetweenthatcoreideaandthecommonpractice,bothinmedicaljournalsandthenewsmedia,ofup-or-downreportingofindividualstudiesbasedonstatisticalsignificance.WeoffersomerecommendationstotacklethisissueinBox2.

Inthedesign,evaluation,andreportingofexperimentalstudies,thereisanormoffocusingonthestatisticalsignificanceofaprimaryoutcome—describedattimesas“significantitis”or“dichotomania”(Greenland,2017).Itleadstoanoverrelianceonphraseslike,“Wedeemedapvaluelessthan0.05tobesignificant,”thatarecommonthroughoutthepublishedliterature.Theresultingconclusionsfromsuchaprocessfrequentlywillbefragilebecausep-valuesareextremelynoisyunlesstheunderlyingeffectishuge.Totheircredit,theORBITAauthorsthemselveshaverecognizedthesecriticalissues(seeonline:https://twitter.com/ProfDFrancis/status/952008644018753536).

ORBITAwasnevermeanttobedefinitiveinabroadsense—itwasdesignedtofindaphysiologicaleffectofstentingonmeanexercisetime,withoutclarityontheclinicalrelevanceofthisoutcome.Indeed,alikelyreasonwhythestudywaslimitedtothisendpointwasbecausethisisallthatcouldhavepassedanethicalboardgiventhenoveltyoftheplaceboprocedureinthissetting.FurtherbackgroundonthesetopicsfromDarrelFrancis,theseniorauthoronthestudy,appearsatHarrell(2017b).OnecertainimpactofORBITAisthatbiggertrialsofstentingwithplaceboproceduresarenowmuchmorelikelywithamoremeaningfulsetofoutcomesthatwillbemeasured.

Wedon’tseeanyeasyanswershere—long-termoutcomeswouldrequirealong-termstudy,afterall,andclinicaldecisionsneedtobemaderightaway,everyday.But

Page 7: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

perhapswecanuseourexaminationofthisparticularstudyanditsreportingtosuggestpracticaldirectionsforimprovementinhearttreatmentstudiesandinthedesignandreportingofclinicaltrialsmoregenerally.

Page 8: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

References

Al-Lamee,R.,Thompson,D.,Dehbi,H.M.,Sen,S.,Tang,K.,Davies,J.,Keeble,T.,Mielewczik,M.,Kaprielian,R.,Malik,I.S.,Nijjer,S.S.,Petraco,R.,Cook,C.,Ahmad,Y.,Howard,J.,Baker,C.,Sharp,A.,Gerber,R.,Talwar,S.,Assomull,R.,Mayet,J.,Wensel,R.,Collier,D.,Shun-Shin,M.,Thom,S.A.,Davies,J.E.,andFrancis,D.P.(2017).Percutaneouscoronaryinterventioninstableangina(ORBITA):adouble-blind,randomisedcontrolledtrial.Lancet.http://dx.doi.org/10.1016/S0140-6736(17)32714-9

Allison,D.B.,Brown,A.W.,George,B.J.,Kaiser,K.A.(2016).Reproducibility:Atragedyoferrors.Nature530,27–29.doi:10.1038/530027a.PubMedPMID:26842041;PubMedCentralPMCID:PMC4831566.

AmericanCollegeofCardiology(2017).ORBITA:Firstplacebo-controlledrandomizedtrialofPCIinCADpatients.ACCNews,2Nov.http://www.acc.org/latest-in-cardiology/articles/2017/10/27/13/34/thurs-1150am-orbita-tct-2017

Belluz,J.(2017).Thousandsofheartpatientsgetstentsthatmaydomoreharmthangood.Vox.com,6Nov.https://www.vox.com/science-and-health/2017/11/3/16599072/stent-chest-pain-treatment-angina-not-effective

Bland,J.M.,andAltman,D.G.(2015).Best(butoftforgotten)practices:Testingfortreatmenteffectsinrandomizedtrialsbyseparateanalysesofchangesfrombaselineineachgroupisamisleadingapproach.AmericanJournalofClinicalNutrition102,991–994.doi:10.3945/ajcn.115.119768.Epub2015Sep9.PubMedPMID:26354536.

Boden,W.E.,O'Rourke,R.A.,Teo,K.K.,Hartigan,P.M.,Maron,D.J.,Kostuk,W.J.,Knudtson,M.,Dada,M.,Casperson,P.,Harris,C.L.,Chaitman,B.R.,Shaw,L.,Gosselin,G.,Nawaz,S.,Title,L.M.,Gau,G.,Blaustein,A.S.,Booth,D.C.,Bates,E.R.,Spertus,J.A.,Berman,D.S.,Mancini,G.B.,andWeintraub,W.S.;COURAGETrialResearchGroup.(2007).OptimalmedicaltherapywithorwithoutPCIforstablecoronarydisease.NewEnglandJournalofMedicine356,1503–16.Epub2007Mar26.

Efron,B.(1979).Bootstrapmethods:Anotherlookatthejackknife.AnnalsofStatistics7,1–26.

Gelman,A.(2004).Treatmenteffectsinbefore-afterdata.InAppliedBayesianModelingandCausalInferencefromIncomplete-dataPerspectives,ed.A.GelmanandX.L.Meng,chapter18.NewYork:Wiley.

Gelman,A.(2018).Thefailureofnullhypothesissignificancetestingwhenstudyingincrementalchanges,andwhattodoaboutit.PersonalityandSocialPsychologyBulletin44,16–23.

Page 9: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

Gelman,A.,andCarlin,J.B.(2014).Beyondpowercalculations:AssessingTypeS(sign)andTypeM(magnitude)errors.PerspectivesonPsychologicalScience9,641–651.

Gelman,A.,andStern,H.S.(2006).Thedifferencebetween“significant”and“notsignificant”isnotitselfstatisticallysignificant.AmericanStatistician60,328–331.

Greenland,S.(2017).Theneedforcognitivescienceinmethodology.AmericanJournalofEpidemiology186,639–645.

Harrell,F.(2017a).Statisticalerrorsinthemedicalliterature.StatisticalThinkingblog,8Apr.http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

Harrell,F.(2017b).Statisticalcriticismiseasy;Ineedtorememberthatrealpeopleareinvolved.StatisticalThinkingblog,5Nov.http://www.fharrell.com/2017/11/statistiorbita-tct-2017cal-criticism-is-easy-i-need-to.html

Kolata,G.(2017).’Unbelievable’:Heartstentsfailtoeasechestpain.NewYorkTimes,2Nov.https://www.nytimes.com/2017/11/02/health/heart-disease-stents.html

Resnick,B.(2017).Whitefearofdemographicchangeisapowerfulpsychologicalforce.Vox.com,28Jan.https://www.vox.com/science-and-health/2017/1/26/14340542/white-fear-trump-psychology-minority-majority

Sands,M.L.(2017).Exposuretoinequalityaffectssupportforredistribution.ProceedingsoftheNationalAcademyofSciences114,663–668.

Schulz,K.F.,andGrimes,D.A.(2005).Samplesizecalculationsinrandomisedtrials:Mandatoryandmystical.Lancet365,1348–1353.

Simmons,J.,Nelson,L.,andSimonsohn,U.(2011).False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowpresentinganythingassignificant.PsychologicalScience22,1359-1366.

Vickers,A.J.,andAltman,D.G.(2001).Analysingcontrolledtrialswithbaselineandfollowupmeasurements.BritishMedicalJournal323,1123–1124.

Wasserstein,R.L.,andLazar,N.A.(2016).TheASA'sstatementonp-values:Context,process,andpurpose.AmericanStatistician70,129–133.

Page 10: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

SupplementaryTable.Summarydatacomparingstentstoplacebo,fromTable3ofAl-Lameeetal.(2017).

Page 11: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

Box1.Usingthereporteddatasummariestoobtaintheanalysiscontrollingforthepre-treatmentmeasureForeachofthetreatmentandcontrolgroups,wearegiventhestandarddeviationofthepre-testmeasurements,thestandarddeviationofthepost-testmeasurements,andthestandarddeviationoftheirdifference,whichcanbeobtainedbytakingthewidthoftheconfidenceintervalforthedifference,dividingby4togetthestandarderrorofthedifference,andthenmultiplyingby 𝑛togetbacktothestandarddeviation.

Thenweusetherule,sd(y! − y!) = sd y! ! + sd y! !

− 2ρ sd(y!)sd(y!)andsolveforρ,thecorrelationbetweenbeforeandaftermeasurementswithineachgroup.Theresultinthiscaseisρ=0.88withineachgroup.Wethenconvertthecorrelationtoaregressioncoefficientofy!ony!usingthewell-knownformula,β = ρ sd(y!)/sd(y!),whichyieldsβ = 0.88forthetreatedandβ = 0.86forthecontrolgroup.Ifthesetwocoefficientsweremuchdifferentfromeachother,wemightwanttoconsideraninteractionmodel(Gelman,2004),butheretheyarecloseenoughthatwesimplytaketheiraverage.

Weusetheaverage,β=0.87,in(2)andgetanestimatefortheadjustedmeandifferenceof21.3(indeed,quiteabithigherthanthereporteddifferenceingainscoresof16.6)withastandarderrorof12.5(veryslightlylowerthan12.7,thestandarderrorofthedifferenceingainscores)and95%CI−3.2to45.8s.Theestimateisnotquitetwostandarderrorsawayfromzero:thez-scoreis1.7,andthep-valueis0.09.

Page 12: ORBITA: A case study in the analysis and reporting of ...gelman/research/unpublished/... · ORBITA was a landmark trial due to its innovative use of a placebo procedure. However,

Box2.RecommendationsforAnalysesandReportingAnalyses1.Baselineadjustmentfordifferences:shouldbeprespecifiedfortheprimaryanalysiswherestrongconfounderssuchasabaselinemeasureoftheoutcomeareavailable.2.Beawareoffragilityofinferences.Fragilitycanbedemonstratedusingthesamplingorposteriordistributionasestimatedusingmathematicalmodeling,bootstrapsimulation,orBayesiananalysis.Reporting1.Avoiduseofsharpthresholdsforp-valuesandthuseliminatetheterm“statisticalsignificance”fromthereportingofresults.2.Considerthefullrange(upperandlowerends)ofintervalestimatesforimportantoutcomesandtheirpotentialinclusionofclinicallyimportantdifferences.3.Considerthepotentialforindividualvariabilityinresponses(heterogeneityoftreatmenteffects)andnotjustmeandifferences.