Basic Statistical Concepts

Chapter 2 Reading instructions

• 2.1 Introduction: Not very important• 2.2 Uncertainty and probability: Read• 2.3 Bias and variability: Read• 2.4 Confounding and interaction: Read• 2.5 Descriptive and inferential statistics:

Repetition• 2.6 Hypothesis testing and p-values: Read• 2.7 Clinical significance and clinical

equivalence: Read• 2.8 Reproducibility and generalizability: Read

Bias and variability

Bias: Systemtic deviation from the true value

ˆE

Design, Conduct, Analysis, Evaluation

Lots of examples on page 49-51

Bias and variabilityLarger study does not decrease bias

nˆ ; n

Drog X - Placebo

-7 -4-10 mm Hg -7 -4-10

Drog X - Placebo

mm Hg mm Hg

Drog X - Placebo

-7 -4-10

n=40 n=200 N=2000

Distribution of sample means: = population mean

Population mean

bias

Bias and variabilityThere is a multitude of sources for bias

Publication bias

Selection bias

Exposure bias

Detection bias

Analysis bias

Interpretation bias

Positive results tend to be published while negative of inconclusive results tend to not to be published

The outcome is correlated with the exposure. As an example, treatments tends to be prescribed to those thought to benefit from them. Can be controlled by

randomizationDifferences in exposure e.g. compliance to treatment could be associated with the outcome, e.g. patents

with side effects stops taking their treatment

The outcome is observed with different intensity depending no the exposure. Can be controlled by

blinding investigators and patients

Essentially the I error, but also bias caused by model miss specifications and choice of estimation

technique

Strong preconceived views can influence how analysis results are interpreted.


Amount of difference between observations

True biological:

Temporal:

Measurement error:

Variation between subject due to biological factors (covariates) including the treatment.

Variation over time (and space)Often within subjects.

Related to instruments or observers

Design, Conduct, Analysis, Evaluation

Raw Blood pressure data

Baseline 8 weeks

Placebo

Drug XDBP(mmHg)

Subset of plotted data


XY

Unexplained variation

Variation in observations

= Explained variation

+


Drug A

Drug B

Outcome

Is there any difference between drug A and drug B?


Y=μA+βx

Y=μB+βx

μA

μB

x=age

Model: ijijiij xY

Confounding

Predictors of

treatment

Predictors of outcome

Confounders

Treatmentallocation

Treatmentallocation

A

B

Outcome

Example

Variable Non smokers Cigarette smokers

Cigar or pipe smokers

Mortality rate* 20.2 20.5 35.5

Cochran, Biometrics 1968*) per 1000 person-years %

Smoking Cigaretts is not so bad but watch out for Cigars or Pipes (at least in Canada)

ExampleSmoking Cigaretts is not so bad but watch out for Cigars or Pipes (at least in Canada)




Average age 54.9 50.5 65.9


ExampleSmoking Cigaretts is not so bad but watch out for Cigars or Pipes (at least in Canada)




Average age 54.9 50.5 65.9

Adjusted mortality rate*

20.2 26.4 24.0


Confounding

The effect of two or more factors can not be separated

Example: Compare survival for surgery and drug

R

Life long treatment with drug

Surgery at time 0

•Surgery only if healty enough•Patients in the surgery arm may take drug•Complience in the drug arm May be poor

Looks ok but:

Survival

Time

Confounding

Can be sometimes be handled in the design

Example: Different effects in males and females

Imbalance between genders affects result

Stratify by gender

R

A

BGender

M

F

R

R

A

A

B

B

Balance on average Always balance

InteractionThe outcome on one variable depends on the value of another variable.

ExampleInteraction between two drugs

R

A

A

B

B

Wash out

A=AZD1234

B=AZD1234 + Clarithromycin

Interaction

Mean

0

1

2

3

4

5

0 4 8 12 16 20 24

Time after dose

Plas

ma

conc

entr

atio

n (µ

mol

/L)

line

ar s

cale

AZD0865 alone

Combination of clarithromycinand AZD0865

19.75 (µmol*h/L)

36.62 (µmol*h/L)

AUC AZD0865:

AUC AZD0865 + Clarithromycin:

Ratio: 0.55 [0.51, 0.61]

Example: Drug interaction

InteractionExample:Treatment by center interaction

Treatment difference in diastolic blood pressure

-25

-20

-15

-10

-5

0

5

10

15

0 5 10 15 20 25 30

Ordered center number

mm

Hg

Average treatment effect: -4.39 [-6.3, -2.4] mmHg

Treatment by center: p=0.01

What can be said about the treatment effect?

Descriptive and inferential statistics

The presentation of the results from a clinical trial can be split in three categories:

•Descriptive statistics•Inferential statistics•Explorative statistics


Descriptive statistics aims to describe various aspects of the data obtained in the study.

•Listings.•Summary statistics (Mean, Standard Deviation…).•Graphics.


Inferential statistics forms a basis for a conclusion regarding a prespecified objective addressing the underlying population.

Hypothesis Results

Confirmatory analysis:

Conclusion


Explorative statistics aims to find interesting results thatCan be used to formulate new objectives/hypothesis for further investigation in future studies.

Results Hypothesis

Explorative analysis:

Conclusion?

Hypothesis testing, p-values and confidence intervals

ObjectivesVariableDesign

Statistical ModelNull hypothesis

Estimatep-value

Confidence interval

ResultsInterpretation

Hypothesis testing, p-values

Statistical model: Observations nn RXX ,1X

from a class of distribution functions

:P

Hypothesis test: Set up a null hypothesis: H0: 0and an alternative H1: 1

Reject H0 if 0|cSP Xnc RS X

p-value:

Rejection region

The smallest significance level for which the null hypothesis can be rejected.

Significance level

What is statistical hypothesis testing?

A statistical hypothesis test is a method of making decisions using data from a scientific study.

Decisions: Conclude that substance is likely to be

efficacious in a target population.

Scientific study:Collect data measuring efficacy of a

substance on a random sample of patients representative of the target population

Formalizing hypothesis testing

We try to prove that collected data with little likelihood can correspond to a statement or a hypothesis which we do not want to be true!

Hypothesis:•The effect of our substance has no effect on blood pressure compared to placebo •An active comparator is superior to our substance

What do we want to be true then?•The effect of our substance has a different effect on blood pressure compared to placebo •Our substance is non-inferior to active comparator

Formalizing hypothesis testing cont. Formalizing it even further…

As null hypothesis, H0, we shall choose the hypothesis we want to reject

H0: μX = μplacebo

As alternative hypothesis, H1, we shall choose the hypothesis we want to prove.

H1: μX ≠ μplacebo

(two sided alternative hypothesis)

μ is a parameter that characterize the efficacy in the target population, for example the Mean.

H0: Treatment with substance X has no effect on blood pressure compared to placebo

Example, two sided H1:

H1: Treatment with substance X influence blood pressure compared to placebo

0 Difference in BP


Example: one sided H1

H1: Treatment with substance X decrease the blood pressure more than placebo

0Difference in BP

Structure of a hypothesis test Given our H0 and H1

Assume that H0 is true H0: Treatment with substance X has no effect on

blood pressure compared to placebo (μX = μplacebo)

1) We collect data from a scientific study. 2) Calculate the likelihood/probability/chance that data would actually turn out as it did (or even more extreme) if H0 was true;

P-value:Probability to get at least as extreme as what

we observed given H0

How to interpret the p-value?P-value:Probability to get at least as extreme as what we observed given H0

P-value small: Correct: our data is unlikely given the H0. We don’t believe in unlikely things so something is wrong. Reject H0.Common language: It is unlikely that H0 is true; we should reject H0

P-value large: We can´t rule out that H0 is true; we can´t reject H0

A statistical test can either reject (prove false) or fail to reject (fail to prove false) a null hypothesis, but never prove it true (i.e., failing to reject a null hypothesis does not prove it true).

How to interpret the p-value? Cont.

Expected outcome of the study if H0 is true (μX = μplacebo )

Likely outcome if the treatments are equal:The p-value is big

Expected outcome of the study for some H1 (μX < μplacebo )

Unlikely outcome if thetreatments are equal:The p-value is small

From the assumption that if the treatment does not have effect we calculate: How likely is it to get the data we collected in the study?

μX = μplaceboμX μplacebo

Risks of making incorrect decisionsWe can make two types of incorrect decisions:

1)Incorrect rejection of a true H0 “Type I error”; False Positive

2) Fail to reject a false null hypothesis H0 “Type II error”; False Negative

These two errors can´t be completely avoided.

We can try to control and minimize the risks of making error is several ways:

1) Optimal study design 2) Deciding the burden of evidence in favor of

H1

required to reject H0

Misconceptions and warnings

Easy to misinterpret and mix things up(p-value, size of test, significance level, etc).

P-value is not the probability that the null hypothesis is true

P-value says nothing about the size of the effect.A statistical significant difference does not need to be clinically relevant!

A high p-value is not evidence that H0 is true


Example:

0Difference in BP

Distribution of average effect if Ho is true

Xobs=10.4mmHg

p=0.0156

•If the null hypothesis is true it is unlikely to observe 10.4•We don’t believe in unlikely things so something is wrong•The data is ”never” wrong•The null hypothesis must be wrong!


Example:

0Difference in BP

Distribution of average effect if Ho is true

Xobs=3.2mmHg

p=0.35

•If the null hypothesis is true it is not that unlikely to observe 3.2•Nothing unlikely has happened•There is no reason to reject the null hypothesis•This does not prove that the null hypothesis is true!

Confidence intervals

A confidence set is a random subset covering the true parameter value with probability at least .

XC

1

Let rejectednot : if 0

rejected : if 1,

*0

*0*

H

HX (critical function)

Confidence set: 0,: XXC

The set of parameter values correponding to hypotheses that can not be rejected.

Example

• We study the effect of treatment X on change in DBP from baseline to 8 weeks

• The true treatment effect is: d = μX – μplacebo

• The estimated treatment effect is:

– How reliable is this estimation?

mmHgxx placeboX 5.7

Hypothesis testing

• Test the hypothesis that there is no difference between treatment X and placebo

• Use data to calculate the value of a test function: High value -> reject hypothesis, since it is unlikely given not true difference

• For a long time it was enough to present the point estimate and the result of the test– Previous example: The estimated treatment

effect is 7.5 mmHg, which is significantly different from zero at the 5% level.

Famous (?) quotes

“We do not perform an experiment to find out if two varieties of wheat or two drugs are equal. We know in advance, without spending a dollar on an experiment, that they are not equal” (Deming 1975)

“Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned” (Cox 1977)

“The continued very extensive use of significance tests is alarming” (Cox 1986)

41

Estimations + Interval

Better than just hypothesis test• The estimation gives the best guess of the

true effect• The confidence interval gives the accuracy

of the estimation, i.e. limits within which the true effect is located with large certainty

• One can decide if the result is statistically significant, but also if it has biological/clinical relevance

• Now required by Regulatory Authorities, journals, etcNeyman, Jerzy (1937). "Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability". Philosophical Transactions of the Royal Society of London. Series A.

Example

• The 95% confidence interval is [-9.5, -4.9] mmHg (based on a model assuming normally distributed data)

• With great likelihood (95%), the interval [-9.5, -4.9] contains the true treatment effect μX – μplacebo

• Provided that our model is correct…

Drug X Placebo

Model:

μX μplacebo

σ σ

0-9.5 -4.5

General principle

• There is a True fixed value that we estimate using data from a clinical trial or other experiment

• The estimate deviates from the true value (by an unknown amount)

• If we were to repeat the study a number of times, the different estimates would be centred around the true value

• A 95% confidence interval is an interval estimated so that if the study is repeated a number of times, the interval will cover the true value in 95% of these

General principle• Estimate a parameter, for

example – Mean difference,– Relative risk reduction, – Proportion of treatment

responders, etc

• Calculate an interval that contains the true parameter value with large probability

General principle

Estimate ± Quantile * Variability

Model

Depends on confidence level

Of the estimate

-/ n+/ n

~67%

General principle – Normal model

Mean ± 1 *

Normal model, σ known

67% confidence level

SD of the mean

n

-1.96/ n +1.96/ n

95%


Mean ± 1.96 *

Normal model, σ known

95 % confidence level

SD of the mean

n


Mean ± t(1-α/2, n-1) *

Normal model, σ not

known

(1-α)% confidence level

SD of the mean

n

sd

Width• The width of the confidence

interval depends on:– Degree of Confidence (90%,

95% or 99%)– Size of the sample– Variability– Statistical model– Design of the study

95% 90% 17% narrower

95% 80% 36% narrower

95% 99% 35% wider

Effect of confidence level

51

(normal model, n=30)

“Excuse me, professor. Why a 95% confidence interval and not a 94% or 96% interval?”, asked the student.

“Shut up,” he explained.

Sample size

Double SD double width, etc

Double 30% narrower

+50% 20% narrower

+30% 12% narrower

Effect of Sample size and SD

52

Model Dependence• Can we assume normally distributed data?• Is the variability the same in all treatment

groups?• Are there important covariates, e.g. baseline

level or confounders?• Can we assume a covariance structure e.g.

repeated measurements per individual?• For time to event data – can we assume

proportional hazards?• Multiple events data – can we assume that

there is no overdispersion?

Model Dependence• Statistical packages provides confidence

intervals for most models• But assumptions are not always easily

checked and you usually have to pre-specify the model

• Given a sufficient sample size you may still ‘get away with it’, since mean values are approximately normally distributed (or a log transform might do the trick)

• Otherwise you may want to use a non-parametric method which does not depend on a (possibly approximate) model

median

Non-parametric CIs - ExampleCalculated using the available values.

Example - say we have calculated the following the following treatment differences (in sorted order):

-7.8 -6.9 -4.7 3.7 6.5 8.7 9.1 10.1

10.8 13.6 14.4 15.6

20.2

22.4

23.5

Confidence level (interval for the median)

99.2% 96.5% 88.2%

Analogy with tests• For most statistical tests there is a

corresponding confidence interval• If a 95% confidence interval does not

cover ‘the zero’, the corresponding test is to give a p-value < 0.05.

• If for example your p-value is based on the median and the confidence interval is based on the mean you can get conflicting results = trouble

Comparing two CIs • Can we judge whether groups are significantly

different depending on whether or not their confidence intervals overlap? - Not always.

• Parallel group:– Non-overlapping confidence intervals -

significantly different.– Overlapping confidence intervals - not

certain. – Depends on the confidence level of the

individual CIs: Comparing two 84% confidence intervals corresponds to a test at the 5% level (approximately)

Sample size• Based on the precision of the interval instead

of the power of a test

• Examples: – Interval of treatment difference not wider

than x, with some probability– Lower limit/mean > 0.6 and upper

limit/mean < 1.4 with at least 80% probability* - suitable when estimating geometric means

* Based on ‘FDA recommendation’ for pediatric PK studies

Example

yij = μ + τi + β (xij - x··) + εij

Variable: The change from baseline to end of study in sitting DBP (sitting SBP) will be described with an ANCOVA model, with treatment as a factor and baseline blood pressure as a covariate

Null hypoteses (subsets of ):H01: τ1 = τ2 (DBP)H02: τ1 = τ2 (SBP)H03: τ2 = τ3 (DBP)H04: τ2 = τ3 (SBP)

Objective: To compare sitting diastolic blood pressure (DBP) lowering effect of hypersartan 16 mg with that of hypersartan 8 mg

Model:treatment effecti = 1,2,3 {16 mg, 8 mg, 4 mg}

Parameter space: 4R

4R

0321

Example continedHypothesis Variable LS Mean CI (95%) p-value

1: 16 mg vs 8 mg Sitting DBP

-3.7 mmHg [-4.6, -2.8] <0.001

2: 16 mg vs 8 mg Sitting SBP

-7.6 mmHg [-9.2, -6.1] <0.001

3: 8 mg vs 4 mg Sitting DBP

-0.9 mmHg [-1.8, 0.0] 0.055

4 : 8 mg vs 4 mg Sitting SBP

-2.1 mmHg [-3.6, -0.6] 0.005This is a t-test where the test statistic follows a t-distribution

Rejection region: cT XX :

001.0P-value: The null hypothesis can pre rejected at

XT0-c c

21 0-4.6 -2.8

P-value says nothing about the size of the effect!

No. of patients per group Estimation of effect p-value

10 1.94 mmHg 0.376

100 -0.65 mmHg 0.378

1000 0.33 mmHg 0.129

10000 0.28 mmHg <0.0001

100000 0.30 mmHg <0.0001

A statistical significant difference does NOT need to be clinically relevant!

Example: Simulated data. The difference between treatment and placebo is 0.3 mmHg

Statistical and clinical significance

Statistical significance:

Clinical significance:

Health ecominical relevance:

Is there any difference between the evaluated treatments? Does this difference have any meaning for the patients?Is there any economical benefit for the society in using the new treatment?

Statistical and clinical significance

A study comparing gastroprazole 40 mg and mygloprazole 30 mg with respect to healing of erosived eosophagitis after 8 weeks treatment.

Drug Healing rategastroprazole 40 mg

87.6%

mygloprazole 30 mg

84.2%

Cochran Mantel Haenszel p-value = 0.0007

Statistically significant!

Health economically relevant??

Clinically significant?

Basic Statistical Concepts

Documents

Transcript of Basic Statistical Concepts