(and Precision) Effective Research Design Power to Power and Sampling Analysis.pdf · (and...
Transcript of (and Precision) Effective Research Design Power to Power and Sampling Analysis.pdf · (and...
Department of Statistics
9 February 2007 SSP Core Facility 1
(and Precision) Effective Research Design Planningfor Grant Proposals & More
Walt Stroup, Ph.D.Professor & Chair, Department of Statistics
University of Nebraska, Lincoln
Power
Department of Statistics
9 February 2007 SSP Core Facility 2
Outline for Talk
I. What is “Power Analysis”? Why should I do it? II. Essential Background
III. A Word about SoftwareIV. Decisions that Affect Power – several examplesV. Latest Thinking
VI. Final Thoughts
Department of Statistics
9 February 2007 SSP Core Facility 3
Power and Precision Defined
▪ Precision a.k.a “Margin of Error”− In most cases, the standard error of relevant estimate
▪ Power− Prob { reject H0 given H0 false }− Prob { research hypothesis statistically significant }
▪ Power analysis− essentially, “If I do the study this way, power = ?”
▪ Sample size estimation− How many observations required to achieve given power?
Department of Statistics
9 February 2007 SSP Core Facility 4
What’s involved in Power Analysis
▪ WHAT IT’S NOT:▪ “Painting by numbers...”
▪ IF IT’S DONE RIGHT▪ Power analysis should be
− a comprehensive conversation to plan the study− a “dress rehearsal” for the statistical analysis once the
data are collected
Department of Statistics
9 February 2007 SSP Core Facility 5
Why do a Power Analysis?
▪ For NIH Grant Proposal− because it’s required
▪ For many other grant proposals− because it gives you a competitive edge▪ Other reasons
− practical: increases chance of success; reduces “we don’t have time to do it right, but lots of time to do it over” syndrome
− ethical
Department of Statistics
9 February 2007 SSP Core Facility 6
Ethical???▪ Last Ph.D. in U.S. Senate▪ Irritant to doctrinaire left and right▪ Keynote address to 1997
American Stat. Assoc. “... we can continue to make policy based on ‘data-free ideology’ on we can inform policy where possible by competent inquiry...”
late U.S. Senator Daniel Patrick Moynihan
Department of Statistics
9 February 2007 SSP Core Facility 7
Ethical
▪ Results of your study may affect policy▪ Well-conceived research means
− better information− greater chance of sound decisions
▪ Poorly-conceived research− lost opportunity− deprives policy-makers of information that might have
been useful− or worse: bad information misinforms or misleads public
Department of Statistics
9 February 2007 SSP Core Facility 8
What affects Power & Precision?
▪ A short statistics lesson1. What goes into computing test statistics2. What test statistics are supposed to tell us3. A bit about the distribution of test statistics4. Central and non-central t, F, and chi-square
( mostly F )
Department of Statistics
9 February 2007 SSP Core Facility 9
What goes into a test statistic?
Research hypothesis – motivation for studyAssumed not true unless data show
compelling evidence otherwise
Research hypothesis: HA ; opposite: H0
H0 true HA true
Fail to reject H0 ☺ Type II error
Reject H0 Type I error Power
Department of Statistics
9 February 2007 SSP Core Facility 10
What goes into a test statistic?
▪ Visualize using F▪ But same basic principles for t, chi-square, etc▪ F is ratio of variation attributable to factor under
study vs. variation attributable to noise
N of obs effect sizevariance of noise(i.e. among obs)
Department of Statistics
9 February 2007 SSP Core Facility 11
When H0 True – i.e. no trt effect
Department of Statistics
9 February 2007 SSP Core Facility 12
When H0 false (i.e. Research HA true)
Department of Statistics
9 February 2007 SSP Core Facility 13
What affects Power?
N of obs effect sizevariance of noise(i.e. among obs)
Department of Statistics
9 February 2007 SSP Core Facility 14
What should be in a conversation about Power?
N of obs effect sizevariance of noise(i.e. among obs)
▪ Effect size: what is the minimum that matters?▪ Variance: how much “noise” in the response
variable (range? distribution? count? pct?)▪ Practical Constraints▪ Design: same N can produce varying Power
Department of Statistics
9 February 2007 SSP Core Facility 15
About Software (part I)▪ Canned Software
− lots of it− Xiang and Zhou working on report− “painting by numbers”
▪ Simulation− most accurate; not constrained by canned scenarios− you can see what will happen if you actually do this...
▪ “Exemplary data set” + modeling software− nearly as accurate as simulation− “dress rehearsal” for actual analysis− MIXED, GLIMMIX, NLMIXED: if you can model it
you can do power analysis
Department of Statistics
9 February 2007 SSP Core Facility 16
Design Decisions – Some Examples
▪ Main Idea: For the same amount of effort, or $$$, or # observations, power and precision can be quite different▪ Power analysis objective: Work smarter, not
harder
▪ Simple example – design of regression study− From STAT 412 exercise
Department of Statistics
9 February 2007 SSP Core Facility 17
Treatment Design Exercise
▪ Class was asked to predict Bounce Height of basketball from Drop Height and to see if relationship changes depending on floor surface▪ Decision: What drop heights to use???
Department of Statistics
9 February 2007 SSP Core Facility 18
Objectives and Operating Definitions
▪ Recall objective: does drop: bounce height relationship change with floor surface?
operating definition
Department of Statistics
9 February 2007 SSP Core Facility 19
Consequences of Drop Height Decisions▪ Should we use fewer drops heights & more obs per drop
height or vice versa?
table from Stat 412 Avery archive
Department of Statistics
9 February 2007 SSP Core Facility 20
Simulation
▪ CRD example: 3 treatments, 5 reps / treatment▪ Suspected Effect size: 6-10% relative to control,
whose mean is known to be ~ 100▪ Standard deviation: 10 considered “reasonable”▪ Simulate 1000 experiments▪ Reject H0: equal trt means 228 times
− power = 0.228 at alpha=0.05▪ Ctl mean ranked correctly 820 times▪ (intermediate mean ranked correctly 589 times)
Department of Statistics
9 February 2007 SSP Core Facility 21
“Exemplary Data”▪ Many software packages for power & sample size
− e.g SAS PROC POWER− for FIXED effect models only
▪ “Exemplary Data” more general▪ Especially (but not only) when “Mixed Model Issues”
− random effects− split-plot structure− errors potentially correlated: longitudinal or spatial data− any other non-standard model structure
▪ Methods use PROC MIXED or GLIMMIX− adapted from Stroup (2002, JABES)
▪ Chapter 12, SAS for Mixed Models − (Littell, et al, 2006)
Department of Statistics
9 February 2007 SSP Core Facility 22
“Exemplary Data” - Computing Power using SAS
➢ create data set like proposed design
➢ run PROC GLIMMIX (or MIXED) with variance fixed
➢ φ=(F computed by GLIMMIX)×rank(K) [or chi-sq with GLM]
➢ use GLIMMIX to compute φ➢ critical F (Fcrit ) is value s.t.
P{F (rank(K), υ, 0 ) > Fcrit}= α [or chi-square]
➢ Power = P{F [rank(K), υ, φ] >Fcrit }
➢ SAS functions can compute Fcrit & Power
Department of Statistics
9 February 2007 SSP Core Facility 23
/* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values *//* this example shows power for 5, 10, and 15 e.u. per trt */
data crdpwrx1; input trt mu; do n=5 to 15 by 5; do eu=1 to n; output; end; end;cards;1 1002 943 90;
Compute Power with GLIMMIX – CRD example
Department of Statistics
9 February 2007 SSP Core Facility 24
Compute Power with GLIMMIX – CRD example/* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */proc sort data=crdpwrx1;by n;
proc glimmix data=crdpwrx1;by n; class trt; model mu=trt; parms (100)/hold=1; contrast 'et1 v et2' trt 0 1 -1; contrast 'c vs et' trt 2 -1 -1; ods output tests3=b; ods output contrasts=c;run;
Department of Statistics
9 February 2007 SSP Core Facility 25
/* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power */data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm);proc print;
Obs Effect Label DF DenDF alpha nc fcrit power
1 trt 2 12 0.05 2.53333 3.88529 0.223612 et1 v et2 1 12 0.05 0.40000 4.74723 0.089803 c vs et 1 12 0.05 2.13333 4.74723 0.26978
Type III Tests of Fixed Effects
EffectNum
DFDen DF F Value Pr > F
trt 2 12 1.27 0.3169
Contrasts
Label Num DF Den DF F Value Pr > Fet1 v et2 1 12 0.40 0.5390
c vs et 1 12 2.13 0.1698
Note close agreementof Simulated Power(0.228) and “exemplarydata” power (0.224)
Department of Statistics
9 February 2007 SSP Core Facility 26
More Advanced Example
▪ Plots in 8 x 3 grid▪ Main variation along 8 “rows”▪ 3 x 2 treatment design▪ Alternative designs
− randomized complete block (4 blocks, size 6)− incomplete block (8 blocks, size 3)− split plot
▪ RCBD “easy” but ignores natural variation
Department of Statistics
9 February 2007 SSP Core Facility 27
Picture the 8 x 3 Grid
Gradient
e.g. 8 schools, gradient is “SES”, 3 classrooms each
Department of Statistics
9 February 2007 SSP Core Facility 28
SAS Programs to Compare 8 x 3 Designdata a; input bloc trtmnt @@; do s_plot=1 to 3; input dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 2 31 2 1 2 32 1 1 2 32 2 1 2 33 1 1 2 33 2 1 2 34 1 1 2 34 2 1 2 3;
proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; random trtmnt/subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
Split-Plot
Department of Statistics
9 February 2007 SSP Core Facility 29
8 x 3 – Incomplete Blockdata a; input bloc @@; do eu=1 to 3; input trtmnt dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 1 2 1 32 1 1 1 2 2 23 1 1 1 3 2 34 1 1 2 1 2 25 1 2 1 3 2 26 1 2 2 1 2 37 1 3 2 1 2 38 2 1 2 2 2 3;
proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=trtmnt|dose; random intercept / subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
Department of Statistics
9 February 2007 SSP Core Facility 30
8 x 3 Example - RCBDdata a; input trtmnt dose @@; do bloc=1 to 4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 2 1 3 2 1 2 2 2 3; proc glimmix data=a noprofile;
class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
Department of Statistics
9 February 2007 SSP Core Facility 31
How did designs compare?
▪ Suppose main objective is compare regression over 3 levels of doses: do they differ by treatment? (similar to basketball experiment) ▪ Operating definition is thus H0: dose regression
coefficient equal▪ Power for Randomized Block: 0.66▪ Power for Incomplete Block: 0.85▪ Power for Split-Plot: 0.85▪ Same # observations – you can work smarter
Department of Statistics
9 February 2007 SSP Core Facility 32
But what if I don’t know Trt Effect Size or Variance?
▪ “How can I do a power analysis? If I knew the effect size and the variance I wouldn’t have to do the study.”▪ What trt effect size is NOT: it is NOT the
effect size you are going to observe▪ It is somewhere between
− what current knowledge suggests is a reasonable expectation
− minimum difference that would be considered “important” or “meaningful”
Department of Statistics
9 February 2007 SSP Core Facility 33
And Variance??
▪ Know thy relevant background / Do thy homework▪ Literature search: what have others working
with similar subjects reported as variance?▪ Pilot study▪ Educated guess
− range you’d expect 95% of likely obs? divide it by 4− most extreme values you can plausibly imagine? divide
range by 6
Department of Statistics
9 February 2007 SSP Core Facility 34
Hierarchical Linear Models
▪ From Bovaird (10-27-2006) seminar ▪ 2 treatment▪ 20 classrooms / trt▪ 25 students / classroom▪ 4 years▪ reasonable ideas of classroom(trt), student
(classroom*trt), within student variances as well as effect size▪ Implement via exemplary data + GLIMMIX
Department of Statistics
9 February 2007 SSP Core Facility 35
Categorical Data?
▪ Example: Binary data▪ “Standard” has success probability of 0.25▪ “New & Improved” hope to increase to 0.30▪ Have N subjects at each of L locations
▪ For sake of argument, suppose we have− 900 subjects / location− 10 locations
Department of Statistics
9 February 2007 SSP Core Facility 36
Power for GLMs
▪ 2 treatments▪ P{favorable outcome}▪ for trt 1 p= 0.30; for trt 2 p=0.25▪ power if n1=300; n2=600data a; input trt y n; datalines;1 90 3002 150 600;
proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr;run; data power;
set pwr; alpha=0.05; ncparm=numdf*chisq; crit=cinv(1-alpha,numdf,0); power=1-probchi(crit,numdf,ncparm); proc print; run;
exemplary data
Department of Statistics
9 February 2007 SSP Core Facility 37
Power for GLMM▪ Same trt and sample size per location as before▪ 10 locations▪ Var(Location)=0.25; Var(Trt*Loc)=0.125▪ Variance Components: variation in log(OddsRatio)▪ Power?data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ;
proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr;run;
Department of Statistics
9 February 2007 SSP Core Facility 38
GLMM Power Analysis Results
Obs Effect NumDF DenDF alpha ncparm fcrit power1trt 1 9 0.05 2.29868 5.11736 0.27370
Odds Ratio Estimates
trt _trt Estimate DF95% Confidence
Limits1 2 1.286 9 0.884 1.871
Gives you expected Conf Limits for # Locations & N / Loccontemplated
Gives you the power of the test of TRT effect on prob(favorable)
Department of Statistics
9 February 2007 SSP Core Facility 39
GLMM Power: Impact of Sample Size?
▪ N of subjects per trt per location?▪ N of Locations?
Three cases1. n-300/600 10 loc2. n=600/1200, 10 loc3. n=300/600, 20 loc
data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ;
data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 180 600 2 300 1200 ;
data a; input trt y n; do loc=1 to 20; output; end; datalines; 1 90 300 2 150 600 ;
Department of Statistics
9 February 2007 SSP Core Facility 40
GLMM Power: Impact of Sample Size?Recall, for 10 locations, N=300/600, CI for OddsRatio was (0.884, 1.871); Power was 0.274
For 10 locations, N=600 / 1200
Odds Ratio Estimates
trt _trt Estimate DF 95% Confidence Limits1 2 1.286 9 0.891 1.855
Obs Effect NumDF DenDF alpha ncparm fcrit power1trt 1 9 0.05 2.40715 5.11736 0.28421
For 20 locations, N=300 / 600Odds Ratio Estimates
trt _trt Estimate DF 95% Confidence Limits1 2 1.286 19 1.006 1.643
Obs Effect NumDF DenDF alpha ncparm fcrit power1trt 1 19 0.05 4.59736 4.38075 0.53003
N alone has almost no impact
Department of Statistics
9 February 2007 SSP Core Facility 41
Recent developments
▪ Continue binary example▪ Power analysis shows:
-level 0.10 0.05 0.05 0.01 0.05 0.01Power 0.80 0.80 0.90 0.80 0.95 0.90Llocations 27 38 46 53 57 68
what do you do?
Department of Statistics
9 February 2007 SSP Core Facility 42
More Information
▪ Consider studies directed toward improving success rate similar to that proposed in study▪ Lit search yields 95 such studies▪ 29 have reported statistically significant gains of
p1-p2>0.05 (or, alternatively, significant odds ratios of [(30/70)/(25/75)]=1.28 or greater)▪ If this holds, “prior” prob (desired effect size ) is
approx 0.3
Department of Statistics
9 February 2007 SSP Core Facility 43
An Intro Stat Result
real Pr{type I error}is more like 0.23than 0.10!!!
Department of Statistics
9 February 2007 SSP Core Facility 44
Returning to All Scenarios
-level 0.10 0.05 0.05 0.01 0.05 0.01
Power 0.80 0.80 0.90 0.80 0.95 0.90
Llocations 27 38 46 53 57 68
Pr{DES | reject H0 }
0.77 0.87 0.89 0.97 0.89 0.97
NOTE dramatic impact of alpha-level when “prior” Pr { DES } is relatively lowPOWER role increases at Pr { DES } increases
Department of Statistics
9 February 2007 SSP Core Facility 45
Closing Comments
▪ In case it’s not obvious− I’m not a fan of “painting by numbers”− Role of power analysis misunderstood &
underappreciated▪ MOST of ALL it is an opportunity to explore and
rehearse study design & planned analysis▪ Engage statistician as a participating
member of research team ▪ Give it the TIME it REQUIRES
46
Thanks
... for coming