Post on 29-Dec-2015
Health and Disease in Populations 2001
Sources of variation (2)
Jane Hutton
(Paul Burton)
Informal lecture objectives
Objective 1 To enable the student to distinguish between
observed data and the underlying tendencies which give rise to those data
Objective 2: To understand the concept of random variation
...
Objective 3 Describe how ‘observed’ values provide
knowledge of the ‘true’ values using tests of hypotheses about about the true value Confidence intervals give a range which
include the ‘true’ value with a specific probability.
Neural tube defects in Western Australia (1975-2000) – hypothetical data
Hypothesis testing1. Calculate the probability of getting an observation
as extreme as, or more extreme than, the one observed if the stated hypothesis was true.
2. If this probability is very small, then eithera) something very unlikely has occurred; orb) the hypothesis is wrong
3. It is then reasonable to conclude that the data are incompatible with the hypothesis.
The probability is called a ‘p-value’
Remember!
IMPORTANT: Think of the implications Rejecting H0 is little use without a conclusion
p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051
p=0.0001 and p=0.6 are easy to interpret False positive and false negative results Statistical significance depends on sample size. Flip a
coin 3 times minimum p=0.25 (i.e. 2×1/8) Statistically significant clinically important
P values widely used
Conclusions - range of values
Objective 3 Describe how ‘observed’ values help us
towards a knowledge of the ‘true’ values by:
b) Confidence intervals give a range which include the ‘true’ value with a specific probability.
Allowing us to test hypotheses about the true value
Any questions?
Estimation
In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK IRR of 1.3
H0 ‘rejected’ (p=0.01)
But, what is our ‘best guess’ at the true excess risk?
Hypothesis p-value Rejected? -20% risk 0.0001 Rejected -10% risk 0.002 Rejected Same risk 0.01 Rejected +10% risk 0.1 Not rejected +20% risk 0.4 Not rejected +30% risk 0.5 Not rejected +40% risk 0.2 Not rejected +50% risk 0.1 Not rejected +60% risk 0.01 Rejected
Informally Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data The range [10% excess risk to 50% excess risk] probably includes the true value
The 95% confidence interval
A range which we can be 95% certain includes the true value of the underlying tendency.
The IRR for Warwick lies in (1.1, 1.5) with probability 95%.
Centred on the observed value (our best guess at the real underlying value). So, the observed value always falls inside the 95%
confidence interval
The 95% confidence interval Fortunately, the link between hypothesis tests and confidence
intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’.
Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then: (observed value e.f.) is called the lower 95% confidence limit
(CL) (observed value e.f.) is called the upper 95% confidence limit
(CL)
The full range between the lower and upper 95% CLs is called the 95% confidence interval
An example
Observe 50 new cases of diabetes in a population of 2,000 people over 5 years. Exposure = 2,0005 = 10,000 person years New cases = 50 Incidence = 50/10,000 = 0.005 per person year
= 5 per 1000 person years
33.150
12expe.f.
Diabetes example continued Incidence = 50/10,000 = 0.005 per person year = 5 per 1,000 person years
Lower 95% CL = 0.0051.33 = 0.00376 Upper 95% CL = 0.0051.33 = 0.00665
So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.
33.150
12expe.f.
As we get more data We get more and more sure about the underlying
value: e.f. gets smaller and the 95% CI narrower Observe 200 new cases of diabetes in a population
of 40,000 people over 1 year. Estimated rate = 0.005 (same as before)
lower 95% CL = 0.005 1.15 = 0.0043 upper 95% CL = 0.005 1.15 = 0.0058 Best estimate still 5 cases per 1,000 person years,
but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.
15.1200
12expe.f.
Any questions?
Confidence intervals
Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc.
NOT a range within which 95% of
individual observations lie
CASES P-YRS RATE 13 2,000 0.0065 10 2,000 0.005 6 2,000 0.003
14 2,000 0.007 7 2,000 0.0035
50 cases, 10,000 p-yrs.
Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years.
But rates in 3 individual years fall outside this range!!
Another example A sample of 50 students
Observed mean height = 1.675m
The 95% confidence interval for mean height is 1.65m to 1.70m
But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval.
This is an important distinction
Inference on a rate ratio
Population 1: d1 cases in P1 person years
Population 2: d2 cases in P2 person years
Rate ratio = d1/P1 d2/P2
Inference on a rate ratio
Population 1: d1 cases in P1 person years
Population 2: d2 cases in P2 person years
Confidence interval and test
Rate ratio = d1/P1 d2/P2
21 d
1
d
12expe.f.
Estimation versus hypothesis testing Estimation is more informative Estimation can incorporate a hypothesis test:
Hypothesis: the incidence of diabetes in population A is the same as that in B.
Data: Population A: 12 cases in 2,000 patient years
Population B: 16 cases in 4,000 patient years Rates:A: 12/2,000 = 0.006
B: 16/4,000 = 0.004 Ratio of rates: AB = 1.5
Estimation vs hypothesis testing …. Estimation can incorporate a hypothesis
test: Ratio of rates = 1 if rates are the same. Ratio of rates: AB = 1.5
95% CI for rate ratio = 1.52.15 = 0.70 to 1.52.15 = 3.23. The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!
15.216
1
12
12expe.f.
Another example 80 deaths in 8,000 person-yrs (male) 50 deaths in 10,000 person-yrs (female) RateM = 10 per 1,000 p-y; RateF = 5 per 1,000 p-y Observed rate ratio (M/F) = 2.0
95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86] Best estimate of true rate ratio=2.0, and 95% certain
that true rate ratio lies between 1.40 and 2.86. This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)
43.150
1
80
12expe.f.
Inference on an SMR
Observe O deaths Expect E deaths (based on age-specific
rates in the standard population and age-specific population sizes in the test population)
SMR = (O/E) 100
O
12expe.f.
Example for SMR On basis of age specific rates in standard
population expect 50 deaths in test population. Observe 60. (O=60, E=50)
SMR = (60/50)×100 = 120
95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.
29.160
12expe.f.
Any questions?
Summary All observations (disease rates, levels of
occupational risk, effectiveness of new drugs etc) are subject to random variation
We always want to know about the underlying tendency = the true value of rates or risks
We use observed data to test hypotheses about the underlying value
We use observed data to estimate the underlying tendency
Summary In this course the best estimate of the true value
of the underlying tendency is the observed value We express uncertainty by calculating error
factors and deriving confidence intervals A 95% confidence interval is the range which
includes the true value of the statistic of interest with probability 95%.
It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude
Summary
Population A: rate=0.008; B: rate=0.002 Rate ratio = 4, e.f.=2, 95% CI [2 to 8]
All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)
SummaryPopulation A: rate=0.01; B: rate=0.005 Rate ratio = 2, 95% CI [0.5 to 8]
Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly.
In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality
But this does not prove that the rates are equal
Any questions?