Sample Size Consideration in Clinical Research John Kwagyan, PhD [email protected] Howard...

54
Sample Size Consideration in Clinical Research John Kwagyan, PhD [email protected] Howard University College of Medicine GHUCCTS

Transcript of Sample Size Consideration in Clinical Research John Kwagyan, PhD [email protected] Howard...

Sample Size Consideration in Clinical Research

John Kwagyan, PhD

[email protected]

Howard University College of Medicine

GHUCCTS

The science of collecting, organizing, analyzing, and interpreting data to assist in making effective decisions.

What Is Statistics?

The science of collecting, organizing, analyzing and interpreting data to assist in making effective decisions.

• Summarization of large quantities of data (Descriptive/Summary Statistics)

• Making decision from sample to population (Inferential Statistics)

What is Statistics?

Type of Statistics

• Descriptive/Summary Statistics

Methods for organizing, summarizing, and presenting data in an informative way.

• Inferential Statistics

Methods for estimation and testing population parameters?? based on sample information.

Well defined Large

Unique Characteristics -prevalence of a disease -variability of a measure

-Response rate of therapy -etc

Population

We are interested in estimating the population characteristics!!!

sample data

SAMPLEPOPULATION

We make inference about population characteristicsbased on sample data

Population Parameters

• Mean cholesterol level of obese individuals • Prevalence of hypertension in Blacks • Incidence of lung cancer among smokers• Risk of liver disease (hepatitis) associated with

drinking • Mortality rate of heart attach among men• Variability of heart rate in PTSD

CENTRAL IDEA: Estimate and Test for differences in parameters

Case Example

• Suppose that we plan to conduct a study comparing a treatment with a control.

• The response variable is systolic blood pressure (SBP), measured using a standard sphygmomanometer.

• The treatment is supposed to reduce blood pressure

• We set up a one-sided test

H0 : μT = μC versus H1 : μT <μC

where μT = mean SBP for the Trt group.

• The parameter Δ = μT −μC is the effect being tested

Case Example

• Suppose the goals of the study specify that we want to be able to detect a situation where the treatment mean is 15 mmHg lower than the control group.

• The required effect size is Δ= −15.

• We specify that such an effect be detected with 80% power (1-β= .80) when the significance level α = .05.

• Past experience with similar study-with similar sphygmomanometers and similar subjects-suggests that the data will be approximately normally distributed with a standard deviation of SD =20 mmHg.

• We plan to use a two-sample pooled t test with equal numbers n of subjects in each group.

Case Example

• Now we have all of the specifications needed for determining sample size using the power approach, and their values may be entered in suitable formulas, charts, or power-analysis software.

• We find that a sample size of n = 23 per group is needed to achieve the stated goals.

Basic Parameters and Concepts

• Study (Research) Hypotheses

• Type I Error Rate, , Significance level

• P-value

• Type II Error Rate, • Power, 1- • Effect Size, Δ

~size of clinically meaningful change.

HYPOTHESIS,HYPOTHESIS TESTING

Hypothesis

• HYPOTHESIS: a statement about a population characteristic/parameter

• HYPOTHESIS: a prediction/idea about what the examination of appropriate data will show about a characteristic

Hypothesis

• Null (Test) Hypothesis, H0

~ Hypothesis to be questioned (disproved).

~ Hypothesis of no real (true) difference

• Alternative (Research) Hypothesis, HA

~ Hypothesis investigator wishes to establish.

~ Hypothesis of a real (true) difference

Example• Research Hypothesis: Combination therapy is

effective?? in the treatment of hypertension.

• Effective ~ considerable reduction in BP (1) ~ controls BP increases (2) • Parameter ~ Mean percent reduction in BP (1) ~ Proportion controlled (2)

• Test Hypothesis: The combination therapy is not effective.

Goal

• Goal is to TEST the Null Hypothesis and decide whether to REJECT IT in favor of the Alternative, or FAIL TO REJECT it.

Test of Hypothesis

One-Tailed Tests

• A test is one-tailed when the research hypothesis, HA , specifies a direction:

HA: The incidence of lung cancer among smokers is higher than nonsmokers

Two-Tailed Tests

• A test is two-tailed when no direction is specified in the research hypothesis HA.

HA: The stress level in DC is different from NY.

Test & Decision Test H0 : no difference in effectiveness

Possible Outcomes

Null Hypothesis could be true (i.e., no difference)

Null Hypothesis could be false (i.e., difference)

Decision Making

Investigator rejects the null hypothesis

Investigator fails to rejects the null hypothesis

Test & Decision

Test: H0 ________________________________________________________________

True (not effective) False (Effective)______________________________________________________________________________________________________

Decision

Accept No Error Type II Error

Reject Type I Error No Error _____________________________________________________________________________________________________

Test H0: therapy is not effective

Drug Trial

H0 ________________________________________________________________

True( Not Effective) False (Effective)__________________________________________________________________________________________

Decision

Accept No Error Type II Error

Reject Type I Error No Error

H0: “Miracle” drug is not effective

TI: Deny a patient a “known therapy” in favor of an ineffective “miracle drug”

TII: Deny a patient a better drug in favor of a less effective “known therapy

Test & Decision

Test H0 ________________________________________________________________

True False __________________________________________________________________________________________________

Decision

Accept No Error Type II Error =P(Type II Error)

Reject Type I Error No Error

=P(Type I Error )_____________________________________________________________________________________________________

Is this Familiar !!!!!• All tests were performed two-sided at the

5% level of significance.

• Significance was defined as a value of p < 0.05.

• A value of p < 0.05 was considered statistically significant.

• ALL YOU ARE DOING IS CONTROLLING THE TYPE I ERROR RATE

Definitions

= P{Type I Error }

= P{rejecting H0|H0 is true}

= P{rejecting the truth}

~ is called the Type I Error Rate ~ is called the Significance Level

Definitions

= P{Type II error}

= P{fail to reject H0|H0 is false}

= P{accepting a fallacy }

~ called the Type II Error Rate

1- ~ called Power of study

Definitions

= P{fail to reject H0|H0 is false}

1- = P{reject H0 | H0 is false}

= P{ accept HA| HA is true}

1- ~ is called Power of study

Power ~ quantifies the ability of the study to detect a difference, if any

Definitions: P-value

~ probability of having observed our data (i.e. observed a difference) when the null hypothesis is true???.

~ probability of the data having arisen by chance when the null hypothesis is true.

Definitions: P-value

~ the smaller the p-value, the weaker the null hypothesis

~ the smaller the p-value, the stronger the alternative hypothesis

How do we evaluate this probability?

By calculating a test statistic

Test Statistic

Most test statistic have the form:

• Test Statistic

= observed value – expected value

standard error of observed value

-a value which we can compare with a known distribution of what we expect when the null hypothesis is true

Common Test Statistic

• T-test• F-test• Chi-square (χ2) test

How do you choose the appropriate statistic???

Statistical Significance

• Accepted values in clinical research

p 0.05 significant P 0.01 highly significant

In Genetic (Linkage) Analysis:

• Lod Score =3.0 ~ significant• Lod Score =3.0 ~ =0.0001

SAMPLE SIZE CONSIDERATION

Population And Sample

Target Population

Study Sample

Study Population

IneligibleDefine Eligibility Criteria

Eligibility Criteria!!!!

~ consist of inclusion criteria exclusion criteria

• Inclusion criteria is used to outline the intended study population

• Exclusion criteria is used to fine-tune the intended population by removing expected sources of variation

Eligibility Criteria!!!!

• Inclusion Criteria

Female

Age ≥ 21 years

BMI ≥ 25 kgm-2

REDUNDANT!!!!

• Exclusion Criteria

Male

Age < 21 years

BMI < 25 kgm-2

Eligibility Criteria!!!!• Inclusion Criteria Exclusion Criteria i. Female i. Male

ii. Age > 21 yrs ii. Age < 21 iii. BMI ≥ 25kgm-2 iii. BMI < 25

• Exclusion Criteria i. Pregnant or breast feeding ii. History of …….

iii. Any other condition in the opinion of the investigator (s) that would make the subject unsuitable for the study

Why Sample Size ?

• Requirement ( Clinical Research Protocol, Funding Agencies, etc) in many grant application

• Budgetary Constraints

• Provide Statistical Justification

• Inference (decision) is based on it

How Much Data Do I Need?

• How big a difference are you trying to detect? Effect Size

- Absolute difference ~ say 5mmHg drop BP

- Relative difference ~ 5% drop in BP

• How much variation is there in the outcome?

• How certain do you want to be that you will detect the difference of interest ?

Eliciting effect size

• How big a difference would be of clinical importance for you?

Some responses I get:• Huh??• What do you mean?• What do you recommend? • Any difference at all would be important

Finding the right variance

• Based on experience

Range of values

Stories behind extreme values

Sources of variations• Use of historical data• Conduct a pilot study.

What if u have imposed sample size

• Sometimes, a proposal comes with imposed sample size.

• Sample size is but one of several quality characteristics of a study

• If n is held fixed, we simply need to focus on other characteristics, such as effect size.

Determination of Sample Size

Depends on:

1. Outcome measure (Data Endpoint)2. Study Design

Types of Data Endpoints

• Continuous Data - BP, BMI, TC, LDL, Blood Sugar

• Categorical Data - Hypertension, Obese, Dyslipidemia, Diabetes

• Count Data

0, 1, 2, 3 - No of risk factors

• Survival (Time-to-Event) Data

- time-to-cardiac event, time-to-death

Putting All Together(Power Analysis)

1- = P{ accept HA|HA is true)

=Func (, 2(n), )

Power

Certainty Variability Effect Size

Sample size

Crude SS Estimate for Means 2-Sample Test for Means (2-sided)

16s2

2n = , =0.05, =0.2

sd n

10 5 48

10 10 16

15 5 144

15 10 36

Power = 80%

Sample Size Formula2-Sample Test for Means (2-sided)

162

2n = , =0.05, =0.2

sd n

10 5 48

10 10 16

15 5 144

15 10 36

Power = 80%

Sample Size

• A larger sample size is needed to detect the smallest meaningful difference.

• A larger sample size is needed when there is much variability in the population

• A larger sample size is required to increase the power of a study.

Other Approaches

There are several approaches to sample size.

• One can specify the desired width of a confidence interval and determine the sample size that achieves that goal.

• A Bayesian approach can be used where we optimize some utility function-perhaps one that involves both precision of estimation and cost.

Avoid “canned” effect sizes.- The T-shirt effect sizes

• This is an elaborate way to arrive at the same sample size that has been used in past social science studies of large, medium, and small size. 

• The method uses a standardized effect size as the goal. 

• Think about it: for a "medium" effect size, you'll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your population. 

• Important considerations are being ignored here.  "Medium" is definitely not the message!

Cohen Effect Sizes????

What is small, medium, or large effect sizes for:

• Odds Ratio

• Hazard Ratios

• Repeated Measures ANOVAs

• Regression Models

• Multivariate Models

• Sensitivity Analysis

• Adaptive Designs

Post Hoc Power Analyses

• In contrast to a priori power analyses, post hoc power analyses often make sense after a study has already been conducted.

Take Away Points• Use power prospectively for planning future

studies.  • Put science before statistics.  The appropriate

inputs to power/sample-size calculations should be based on careful considerations of the underlying scientific (not statistical!!) goals of the study. 

• T-shirt Effect Sizes- If at all possible avoid using “canned” effect sizes

References

1. Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193.

2. Hoenig, John M. and Heisey, Dennis M. (2001), ``The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis,'' The American Statistician, 55, 19-24