Introduction to Biostatistics for Clinical Researchers
description
Transcript of Introduction to Biostatistics for Clinical Researchers
![Page 1: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/1.jpg)
Introduction to Biostatistics for Clinical Researchers
University of Kansas Department of Biostatistics
& University of Kansas Medical Center
Department of Internal Medicine
![Page 2: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/2.jpg)
ScheduleFriday, December 3 in 1023 Orr-Major
Friday, December 10 in 1023 Orr-MajorFriday, December 17 in B018 School of Nursing
Possibility of a 5th lecture, TBDAll lectures will be held from 8:30a - 10:30a
![Page 3: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/3.jpg)
Materials PowerPoint files can be downloaded from the Department of
Biostatistics website at http://biostatistics.kumc.edu
A link to the recorded lectures will be posted in the same location
![Page 4: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/4.jpg)
Sampling Variability and Confidence Intervals
![Page 5: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/5.jpg)
Topics Sampling distribution of a sample mean
Variability in the sampling distribution
Standard error of the mean
Standard error versus standard deviation
Confidence intervals for the population mean μ
Sampling distribution of a sample proportion
Standard error and confidence intervals for a proportion
![Page 6: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/6.jpg)
The Random Sampling Behavior of a Sample Mean Across Multiple Random Samples
![Page 7: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/7.jpg)
Random Sample When a sample is randomly selected from a population, it is
called a random sample Technically speaking, values in a random sample are
representative of the distribution of the values in the population, regardless of size
In a simple random sample, each individual in the population has an equal chance of being chosen for the sample
Random sampling helps control systematic bias
Even with random sampling, there is still sampling variability or error
![Page 8: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/8.jpg)
Sampling Variability of a Sample Statistic If we repeatedly choose samples from the same population,
a statistic will take different values in different samples
If the statistic does not change much from sample to sample, then it is fairly reliable (does not have a lot of variability)
![Page 9: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/9.jpg)
Example: Blood Pressure of Males Recall, we had worked with data on blood pressures using a
random sample of 113 men taken from the population of all men
Assume the population distribution is given by the following:
![Page 10: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/10.jpg)
Example: Blood Pressure of Males Suppose we had all the time in the world
We decide to do an experiment
We are going to take 500 separate random samples from this population of men, each with 20 subjects
For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation
![Page 11: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/11.jpg)
Random Samples Sample 1: n = 20 Sample 2: n = 20
![Page 12: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/12.jpg)
Example: Blood Pressure of Males We did this 500 times—let’s look at a histogram of the 500
sample means
![Page 13: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/13.jpg)
Example: Blood Pressure of Males We decide to do another experiment
We are going to take 500 separate random samples from this population of men, each with 50 subjects
For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation
![Page 14: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/14.jpg)
Random Samples Sample 1: n = 50 Sample 2: n = 50
![Page 15: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/15.jpg)
Example: Blood Pressure of Males We did this 500 times—now let’s look at a histogram of the
500 sample means
![Page 16: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/16.jpg)
Example: Blood Pressure of Males We decide to do one more experiment
We are going to take 500 separate random samples from this population of men, each with 100 subjects
For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation
![Page 17: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/17.jpg)
Random Samples Sample 1: n = 100 Sample 2: n = 100
![Page 18: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/18.jpg)
Example: Blood Pressure of Males We did this 500 times—lets look at a histogram of the 500
sample means
![Page 19: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/19.jpg)
Example: Blood Pressure of Males Let’s review the results
Population distribution of individual BP measurements for males is normal
μ = 125 mmHg; σ = 14 mmHg Results from 500 random samples:
Sample Size
Mean of 500
sample means
SD of 500 sample means
Shape of Distribution of 500 sample means
n = 20 125 mmHg 3.3 mmHg Approx. normal
n = 50 125 mmHg 1.9 mmHg Approx. normal
n = 100 125 mmHg 1.4 mmHg Approx. normal
![Page 20: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/20.jpg)
Example: Blood Pressure of Males Let’s review the results
![Page 21: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/21.jpg)
Example: Hospital Length of Stay Recall, we had worked with the data on length of stay (LOS)
using a random sample of 500 patients taken from all patients discharged in 2005
Assume the population distribution is given by the following:
![Page 22: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/22.jpg)
Example: Hospital Length of Stay Boxplot
![Page 23: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/23.jpg)
Example: Hospital Length of Stay Suppose we had all the time in the world, again
We decide to do another set of experiments
We are going to take 500 separate random samples from this population of patients, each with 20 subjects
For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation
![Page 24: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/24.jpg)
Random Samples Sample 1: n = 20 Sample 2: n = 20
![Page 25: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/25.jpg)
Example: Hospital Length of Stay We did this 500 times—let’s look at a histogram of the 500
sample means
![Page 26: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/26.jpg)
Example: Hospital Length of Stay Suppose we had all the time in the world, again
We decide to do another experiment
We are going to take 500 separate random samples from this population of patients, each with 50 subjects
For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation
![Page 27: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/27.jpg)
Random Samples Sample 1: n = 50 Sample 2: n = 50
![Page 28: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/28.jpg)
Example: Hospital Length of Stay We did this 500 times—lets look at a histogram of the 500
sample means
![Page 29: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/29.jpg)
Example: Hospital Length of Stay Suppose we had all the time in the world, again
We decide to do one more experiment
We are going to take 500 separate random samples from this population of patients, each with 100 subjects
For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation
![Page 30: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/30.jpg)
Random Samples Sample 1: n = 100 Sample 2: n = 100
![Page 31: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/31.jpg)
Example: Hospital Length of Stay We did this 500 times—lets look at a histogram of the 500
sample means
![Page 32: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/32.jpg)
Example: Hospital Length of Stay Let’s review the results
Population distribution of individual LOS values for population of patients is right skewed
μ = 5.05 days; σ = 6.90 days Results from 500 random samples:
Sample Size
Mean of 500
sample means
SD of 500 sample means
Shape of Distribution of 500 sample means
n = 20 5.05 days 1.49 days Approx. normal
n = 50 5.04 days 1.00 days Approx. normal
n = 100 5.08 days 0.70 days Approx. normal
![Page 33: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/33.jpg)
Example: Hospital Length of Stay Let’s review the results
![Page 34: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/34.jpg)
Summary What did we see across the two examples?
A few trends: Distribution of sample means tended to be
approximately normal, even with the original individual level data was not (LOS)
Variability in the sample mean values decreased as the size of the sample of each mean was based upon increased
Distribution of sample means was centered at true population mean
![Page 35: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/35.jpg)
Clarification Variation in the sample mean values is tied to the size of
each sample selected in our exercise (i.e., 20, 50, or 100), not to the number of samples (i.e., 500)
![Page 36: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/36.jpg)
The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample
![Page 37: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/37.jpg)
Sampling Distribution of the Sample Mean In the previous section we reviewed the results of
simulations that resulted in estimates of what’s formally called the sampling distribution of the sample mean
The sampling distribution of the sample mean is a theoretical probability distribution
It describes the distribution of sample means from all possible random samples of the same size taken from a population
![Page 38: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/38.jpg)
Sampling Distribution of the Sample Mean For example, the histogram below is an estimate of the
sampling distribution of sample BP means based on random samples of n = 50 from the population of all men
![Page 39: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/39.jpg)
Sampling Distribution of the Sample Mean In research, it is impossible to estimate the sampling
distribution of a sample mean by actually taking many random samples from the same population
No research would ever happen if a study needed to be repeated multiple times to understand this sampling behavior
Simulations are useful to illustrate a concept, but not to highlight a practical approach
Luckily, there is some mathematical machinery that generalizes some of the patterns we saw in the previous simulation results
![Page 40: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/40.jpg)
The Central Limit Theorem (CLT) The Central Limit Theorem is a powerful mathematical tool
that gives several useful results The sampling distribution of sample means based on all
samples of size n is approximately normal, regardless of the distribution of the original, individual-level data in the population/sample
The mean of all sample means in the sampling distribution is the true mean of the population from which the samples were taken (μ)
The standard deviation of the sample means taken from samples of size n is equal to : this is often called the standard error of the mean,
n
SE x
![Page 41: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/41.jpg)
Example: Blood Pressure of Males The population distribution of individual BP measurements
for males is normal with μ = 125 mmHg and σ = 14 mmHg
Sample Size
Mean of 500
Sample Means
Mean of 5000
Sample Means
SD of 500
Sample Means
SD of 5000
Sample Means
SD of Sample Means by CLT (SE)
n = 20 124.98 mmHg
125.05 mmHg 3.31 mmHg 3.11 mmHg 3.13 mmHg
n = 50 125.03 mmHg
125.01 mmHg 1.89 mmHg 1.96 mmHg 1.98 mmHg
n = 100 124.99 mmHg
125.01 mmHg 1.43 mmHg 1.39 mmHg 1.40 mmHg
![Page 42: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/42.jpg)
Example: Blood Pressure of Males The population distribution of individual BP measurements
for males is normal with μ = 125 mmHg and σ = 14 mmHg
Sample Size
Mean of 500
Sample Means
Mean of 5000
Sample Means
SD of 500
Sample Means
SD of 5000
Sample Means
SD of Sample Means by CLT (SE)
n = 20 124.98 mmHg
125.05 mmHg 3.31 mmHg 3.11 mmHg 3.13 mmHg
n = 50 125.03 mmHg
125.01 mmHg 1.89 mmHg 1.96 mmHg 1.98 mmHg
n = 100 124.99 mmHg
125.01 mmHg 1.43 mmHg 1.39 mmHg 1.40 mmHg
![Page 43: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/43.jpg)
Example: Blood Pressure of Males The population distribution of individual BP measurements
for males is normal with μ = 125 mmHg and σ = 14 mmHg
Sample Size
Mean of 500
Sample Means
Mean of 5000
Sample Means
SD of 500
Sample Means
SD of 5000
Sample Means
SD of Sample Means by CLT (SE)
n = 20 124.98 mmHg
125.05 mmHg 3.31 mmHg 3.11 mmHg 3.13 mmHg
n = 50 125.03 mmHg
125.01 mmHg 1.89 mmHg 1.96 mmHg 1.98 mmHg
n = 100 124.99 mmHg
125.01 mmHg 1.43 mmHg 1.39 mmHg 1.40 mmHg
![Page 44: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/44.jpg)
Recap: CLT The CLT tells us:
When taking a random sample of continuous measures of size n from a population with true mean μ and true standard deviation σ, the theoretical sampling distribution of sample means from all possible random samples of size n is as follows:
x
x
x SE xn
![Page 45: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/45.jpg)
CLT: So What? So what good is this information?
Using the properties of the normal curve, this shows that for most random samples we can take (i.e., 95% of them), the sample mean will fall within 2 SE of the true mean (actually, 1.96 SE)
1.96n
1.96n
![Page 46: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/46.jpg)
CLT: So What? AGAIN, what good is this information?
We are going to take a single sample of size n and get one
We won’t know μ, and if we did know μ why would we care about the distribution of estimates of μ from imperfect subsets of the population?
1.96n
1.96n
x
![Page 47: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/47.jpg)
CLT: So What? We are going to take a single sample of size n and get one
But for most (i.e., 95%) of the random samples we can get, our will fall within ± 1.96 SE of μ
x
x
![Page 48: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/48.jpg)
CLT: So What? We are going to take a single sample of size n and get one
So if we start at and go 1.96 SE in either direction, the interval created will contain μ most (i.e., 95%) of the time
x
x
![Page 49: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/49.jpg)
Estimating a Confidence Interval Such an interval is called a 95% confidence interval for the
population mean μ
The interval is given by the formula:
Problem: we don’t know σ, either! We can estimate it with s, and will detail this in the next
section
What is the interpretation of a confidence interval?
1.96 1.96x SE x xn
![Page 50: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/50.jpg)
Interpretation of a 95% Confidence Interval (CI) Laypersons’ range of plausible values for the true mean
Researchers never can observe the true mean μ is the best estimate based on a single sample The 95% CI starts with this best estimate and
additionally recognizes uncertainty in this quantity
Technical interpretation: Were 100 random samples of size n taken from the same
population and 95% confidence interval limits computed from each of these 100 samples, 95 of the 100 intervals would contain the value of the true mean μ
x
![Page 51: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/51.jpg)
Technical Interpretation One hundred 95% confidence intervals from 100 random
samples of size n = 50 BPs
![Page 52: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/52.jpg)
Notes on Confidence Intervals Random sampling error
A confidence interval only accounts for random sampling error, not any other systematic sources of error (or bias)
![Page 53: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/53.jpg)
Examples of Systematic Bias BP measurement is always +5 too high (broken instrument)
Only those with high BP agree to participate (non-response bias)
![Page 54: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/54.jpg)
Notes on Confidence Intervals Are all CIs 95%?
No It is the most commonly used level of confidence A 99% CI is wider A 90% CI is narrower
To chance the level of confidence, adjust the number of SE added to and subtracted from the sample mean: For a 99% CI, you need ± 2.58SE For a 98% CI, you need ± 2.33SE For a 95% CI, you need ± 1.96SE For a 90% CI, you need ± 1.645SE
![Page 55: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/55.jpg)
Standard Deviation versus Standard Error The term standard deviation refers to the variability in
individual observations in a single sample (s) or population (σ)
The standard error of the mean is also a measure of standard deviation, but not of individual values—rather, it is a measure of the variation in sample means computed from multiple random samples of the same size taken from the same population
![Page 56: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/56.jpg)
Estimating Confidence Intervals for the Mean of a Population Based on a Single Sample of Size n
![Page 57: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/57.jpg)
Estimating a 95% Confidence Interval In the previous section, we defined a 95% confidence
interval for the population mean μ
Interval is given by:
Problem: we don’t know σ We can estimate it with s, such that our estimated SE is
given by
Estimated 95% confidence interval for μ based on a single sample of size n is
1.96 1.96x SE x xn
sSE xn
1.96 sxn
![Page 58: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/58.jpg)
Example 1 Suppose we had blood pressure measurements collected
from a random sample of 100 KUMC students collected in September 2010
We wish to use the results of the sample to estimate a 95% CI for the mean blood pressure of all KUMC students
Results:
A 95% CI for the true mean BP of all KUMC students:
123.4 mmHg13.7 mmHg
SE 13 100 1.3 mmHg
xs
x
123.4 1.96 1.3 123.4 2.548120.9,125.9
![Page 59: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/59.jpg)
Example 2 Data from the National Medical Expenditures Survey
(1987): U.S. Based Survey administered by the Centers for
Disease Control (CDC)
Some results: Smoking History
No Smoking History
Mean 1987 Expenditures
(US$)2260 2080
SD (US$) 4850 4600N 6564 5016
![Page 60: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/60.jpg)
Example 2 95% CIs for 1987 medical expenditures by smoking history
Smoking history
No smoking history
48502260 1.96 2260 1176564
$2143,$2377
46002080 1.96 2080 1275016
$1953,$2207
![Page 61: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/61.jpg)
Example 3 Effect of lower targets for blood pressure and LDL
cholesterol on atherosclerosis in diabetes: the SANDS Randomized Trial1 “Objective: To compare progression of subclinical
atherosclerosis in adults with type 2 diabetes treated to reach aggressive targets of low-density lipoprotein cholesterol (LDL-C) of 70 mg/dL or lower and systolic blood pressure (SBP) of 115 mmHg or lower versus standard targets of LDL-C of 100 mg/dL or lower and SBP of 130 mmHg or lower.”
1Howard, B., et al. (2008). Effect of lower targets for blood pressure and LDL cholesterol on atherosclerosis in diabetes: The SANDS Randomized Trial. JAMA 299, no. 14.
![Page 62: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/62.jpg)
Example 3 “Design, setting, and participants: A randomized,
open-label, blinded-to-end point, three-year trial from April 2003 – July 2007 at four clinical centers in Oklahoma, Arizona, and South Dakota. Participants were 499 American Indian men and women aged 40 years or older with type 2 diabetes and no prior CVD events.”
“Interventions: Participants were randomized to aggressive (n = 252) versus standard (n = 247) treatment groups with stepped treatment algorithms defined for both.”
![Page 63: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/63.jpg)
Example 3 Results: target LDL-C and SBP levels for both groups were
reached and maintained Mean (95% confidence interval) levels for LDL-C in the
last 12 months were 72 (69-75) and 104 (101-106) mg/dL and SBP levels were 117 (115-118) and 129 (128-130) mmHg in the aggressive versus standard groups, respectively
![Page 64: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/64.jpg)
Example 3 Lots of 95% CIs!
![Page 65: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/65.jpg)
Example 3 Lots of 95% CIs!
![Page 66: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/66.jpg)
Using Excel to Create 95% CI for a Mean Use the “CONFIDENCE” function in Excel to obtain the limits
of the interval
For Example 1: = 123.4 mmHg; s = 13.7 mmHg; n = 100
![Page 67: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/67.jpg)
Using Excel to Create 95% CI for a Mean Use the “CONFIDENCE” function in Excel to obtain the limits
of the interval
For Example 1: = 123.4 mmHg; s = 13.7 mmHg; n = 100
![Page 68: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/68.jpg)
Using Excel to Create 95% CI for a Mean Use the “CONFIDENCE” function in Excel to obtain the limits
of the interval
With alpha = .05, CONFIDENCE(.05, 13.7, 100) returns 2.685151.
![Page 69: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/69.jpg)
Using Excel to Create 95% CI for a Mean Use the “CONFIDENCE” function in Excel to obtain the limits
of the interval
The corresponding confidence interval is then 123.4 ± 2.685151 = approximately [120.7, 126.1].
![Page 70: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/70.jpg)
What We Mean by Approximately Normal and What Happens to the Sampling Distribution of the Sample Mean with Small n
![Page 71: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/71.jpg)
Recap: CLT The CLT tells us:
When taking a random sample of continuous measures of size n from a population with true mean μ and true standard deviation σ, the theoretical sampling distribution of sample means from all possible random samples of size n is as follows:
x
x
x SE xn
![Page 72: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/72.jpg)
Recap: CLT Technically, this is true for “large n”
When n is “small,” the sampling distribution is not quite normal—it follows a Student’s t distribution
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
dt(x
, 1)
Student T distributions
df=1
df=2
df=5
df=10Gaussian distribution
![Page 73: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/73.jpg)
Student’s t The distribution of t is “flatter and fatter” than its cousin,
the normal distribution
The t-distribution is uniquely defined by its degrees of freedom
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
dt(x
, 1)
Student T distributions
df=1
df=2
df=5
df=10Gaussian distribution
![Page 74: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/74.jpg)
Why t? Basic idea: remember, the true is given by the
formula
But of course we don’t know σ, and replace with s to estimate
In small samples, there is a lot of sampling variability in s as well, so this estimate is less precise
To account for this additional uncertainty, we have to go slightly more than ±1.96 to get 95% coverage under the sampling distribution How much bigger than 1.96 depends on the sample size
SE x
x SE x n
ˆx SE x s n
SE
![Page 75: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/75.jpg)
The t distribution If we have a smaller sample size, we will have to go out
more than 1.96 SEs to achieve 95% confidence
How many standard errors we need to go depends on the degrees of freedom—this is linked to sample size
The appropriate degrees of freedom are n – 1
0.95, 1 0.95, 1n nsx t SE x x tn
![Page 76: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/76.jpg)
Notes on the t-Correction The particular t-table gives the number of SEs needed to
cut off 95% under the sampling distribution
![Page 77: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/77.jpg)
Notes on the t-Correction You can easily find a t-table for other cutoffs (90%, 99%) in
any stats text or by searching the internet
Also, using the TINV function in Excel will return cutoffs (use alpha/2 for “probability”)
The point is not to spend a lot of time looking up t-values: more important is a basic understanding of why slightly more needs to be added to the sample mean in smaller samples to get a valid 95% CI
The interpretation of the 95% CI (or any other level) is the same as discussed before
![Page 78: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/78.jpg)
Example Small study on response to treatment among 12 patients
with hyperlipidemia (high LDL cholesterol) given a treatment
Change in cholesterol post–pre treatment computed for each of the 12 patients
Results:1.4 mmol/ L
0.55 mmol/ Lxs
![Page 79: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/79.jpg)
Example 95% confidence interval for true mean change
0.95,11
0.551.4 2.2 121.75, 1.05
x t SE x
![Page 80: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/80.jpg)
Using Excel to Create Other CIs for a Mean The TINV function
![Page 81: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/81.jpg)
The Sample Proportion as a Summary Measure for Binary Outcomes and the CLT
![Page 82: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/82.jpg)
Proportion (p) Proportion of individuals with health insurance
Proportion of patients who became infected
Proportion of patients who are cured
Proportion of individuals who are hypertensive
Proportion of individuals positive on a blood test
Proportion of adverse drug reactions
![Page 83: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/83.jpg)
Proportion (p) For each individual in the study, we record a binary
outcome (Yes/No; Success/Failure) rather than a continuous measurement
![Page 84: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/84.jpg)
Proportion (p) Compute a sample proportion, (pronounced “p-hat”), by
taking observed number of “yes” responses divided by total sample size This is the key summary measure for binary data,
analogous to a mean for continuous data There is a formula for the standard deviation of a
proportion, but the quantity lacks the “physical interpretability” that it has for continuous data
p̂
![Page 85: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/85.jpg)
Example Proportion of dialysis patients with national insurance in 12
countries (only six shown..)1
Example: Canada1Hirth, R., et al. (2008). Out-of-pocket spending and medication adherence among dialysis patients in twelve countries, Health Affairs, 27 (1).
400ˆ 0.796503p
![Page 86: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/86.jpg)
Example Maternal/infant transmission of HIV1
HIV-infection status was known for 363 births (180 in the zidovudine [AZT] group and 183 in the placebo group) 13 infants in the AZT group and 40 in the placebo group
were infected
13ˆ 0.07 7%18040ˆ 0.22 22%183
AZT
PLA
p
p
1Spector, S., et al. (1994). A controlled trial of intravenous immune globulin for the prevention of serious bacterial infections in children receiving zidovudine for advanced human immunodeficiency virus infection, NEJM 331 (18).
![Page 87: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/87.jpg)
Proportion (p) What is the sampling behavior of a sample proportion?
In other words, how do sample proportions estimated from random samples of the same size from the same population behave?
![Page 88: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/88.jpg)
Example: Health Insurance Coverage Suppose we have a population in which 80% of persons
have some form of health insurance:
![Page 89: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/89.jpg)
Example: Health Insurance Coverage Suppose we had all the time in the world . . . Again
We decide to do another set of experiments
We are going to take 500 separate random samples from this population, each with 20 subjects
For each of the 500 samples, we will plot a histogram of the insured and uninsured numbers and record the sample proportion of insured subjects
![Page 90: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/90.jpg)
Random Samples Sample 1: n = 20 Sample 2: n = 20
![Page 91: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/91.jpg)
Estimating the Sampling Distribution What does the histogram of the 500 sample proportions
look like?
![Page 92: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/92.jpg)
Example: Health Insurance Coverage We decide to do another experiment
We are going to take 500 random samples from this population, each with 100 subjects
For each of the 500 samples, we will plot a histogram of the insured and uninsured numbers and record the sample proportion of insured subjects
![Page 93: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/93.jpg)
Random Samples Sample 1: n = 100 Sample 2: n = 100
![Page 94: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/94.jpg)
Estimating the Sampling Distribution What does the histogram of the 500 sample proportions
look like?
![Page 95: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/95.jpg)
Example: Health Insurance Coverage We decide to do another experiment
We are going to take 500 random samples from this population, each with 1000 subjects
For each of the 500 samples, we will plot a histogram of the insured and uninsured numbers and record the sample proportion of insured subjects
![Page 96: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/96.jpg)
Random Samples Sample 1: n = 1000 Sample 2: n = 1000
![Page 97: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/97.jpg)
Estimating the Sampling Distribution What does the histogram of the 500 sample proportions
look like?
![Page 98: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/98.jpg)
Example: Health Insurance Coverage Results review:
True proportion of insured: p = 0.80 Results from 500 random samples:
Sample Size
Means of 500 Sample Proportions
SD of 500 Sample
Proportions
Shape of Distribution
of 500 Sample
Proportions
n = 20 0.805 0.094 Approaching normal?
n = 100 0.801 0.041 Approximately normal
n = 1000 0.799 0.012 Approximately normal
![Page 99: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/99.jpg)
Example: Health Insurance Coverage Results:
![Page 100: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/100.jpg)
The Theoretical Sampling Distribution of the Sample Proportion and Its Estimate Based on a Single Sample
![Page 101: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/101.jpg)
Sampling Distribution of the Sample Proportion In the previous section, we reviewed the results of
simulations that resulted in estimates of what was formally called the sampling distribution of the sample proportion
The sampling distribution of the sample proportion is a theoretical probability distribution It describes the distribution of sample proportions
calculated from all possible random samples of the same size taken from a population
![Page 102: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/102.jpg)
Sampling Distribution of the Sample Proportion In research, it is impossible to estimate the sampling
distribution of a sample proportion by actually taking many random samples from the same population to understand this sampling behavior
Luckily, there exists some mathematical machinery that generalizes some of the patters we saw in the simulation results
![Page 103: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/103.jpg)
The Central Limit Theorem (CLT) The Central Limit Theorem (CLT) is a powerful mathematical
tool that gives several useful results: The sampling distribution of the sample proportion
calculated from all samples of size n is approximately normal
The mean of all sample proportions is the true mean (p) of the population from which the samples were taken
The standard deviation of the sample proportions is equal to
This quantity is often called the standard error of the sample proportion,
1p pn
ˆSE p
![Page 104: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/104.jpg)
Example: Health Insurance Coverage Population distribution of individual insurance status
True proportion: p = 0.8
Sample SizeMean of 500
Sample Proportions
Mean of 5000
Sample Proportions
SD of 500 Sample
Proportions
SD of 5000 Sample
Proportions
SD of Sample
Proportions (SE) by CLT
n = 20 0.805 0.799 0.094 0.090 0.089n = 100 0.801 0.799 0.041 0.040 0.040n = 1000 0.799 0.80 0.012 0.012 0.012
![Page 105: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/105.jpg)
Recap: CLT The CLT tells us the following:
When taking a random sample of binary measures of size n from a population with true proportion p, the theoretical sampling distribution of sample proportions from all possible random samples of size n is
p
p̂ ˆ
ˆ1ˆ
p
p
p
p pSE p
n
![Page 106: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/106.jpg)
CLT: So What? What good is this information?
Using the properties of the normal curve, this shows that for most random samples (i.e., 95% of them), the sample proportion will fall within 1.96 SEs of the true proportion, p:
p̂
p 11.96 p pp
n
11.96 p p
pn
![Page 107: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/107.jpg)
CLT: So What? AGAIN, what good is this information?
We are going to take a single sample of size n and get one
We won’t know p, and if we did know p why would we care about the distribution of estimates of p from imperfect subsets of the population?
p 11.96 p pp
n
11.96 p p
pn
p̂
![Page 108: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/108.jpg)
CLT: So What? We are going to take a single sample of size n and get one
But for most (i.e., 95%) of the random samples we can get, our will fall within ± 1.96 SE of p
p̂
p̂
![Page 109: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/109.jpg)
CLT: So What? We are going to take a single sample of size n and get one
So if we start at and go 1.96 SE in either direction, the interval created will contain p most (i.e., 95%) of the time
p̂
p̂
![Page 110: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/110.jpg)
Estimating a Confidence Interval Such an interval is called a 95% confidence interval for the
population proportion p
Interval is given by:
Problem: we don’t know p Can estimate with , but we will detail this in the next
section
What is the interpretation?
p̂
1ˆ ˆ ˆ1.96 1.96 p pp SE p p
n
![Page 111: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/111.jpg)
Interpretation of a 95% Confidence Interval Laypersons’ range of “plausible” values for the true
proportion p Researchers can never observe p is the best estimate based on a single sample The 95% CI starts with this best estimate and
additionally recognizes uncertainty in this quantity
Technical interpretation Were 100 random samples of size n taken from the same
population and 95% confidence limits computed using each of these 100 samples, 95 of them would contain p
p̂
![Page 112: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/112.jpg)
Summary Trends
Distribution of sample proportions tended to be approximately normal--even when original, individual-level data was not (e.g., a binary outcome)
Variability in sample proportion values decreased as the size of the sample each proportion was based upon increased
As with the sample mean, variation in proportions is tied to the size of each sample selected, NOT the number of samples
![Page 113: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/113.jpg)
Estimating Confidence Intervals for the Proportion of a Population Based on a Single Sample of Size n
![Page 114: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/114.jpg)
Estimating a 95% Confidence Interval for p In the last section, we defined a 95% confidence interval for
the population proportion p
Interval given by:
Problem: we don’t know p Can estimate with , such that our estimated SE is
Estimated 95% CI for p based on a single sample of size n
1ˆ ˆ ˆ1.96 1.96 p pp SE p p
n
p̂
ˆ ˆ1ˆ p pSE p
n
ˆ ˆ1ˆ ˆ ˆ1.96 1.96 p pp SE p p
n
![Page 115: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/115.jpg)
Example 1 Proportion of dialysis patients with national insurance in 12
countries (only six are shown):
Example: France
219ˆ 0.46481p
![Page 116: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/116.jpg)
Example 1 Estimated 95% confidence interval
ˆ ˆ1ˆ 1.96
0.46 1 0.460.46 1.96 4810.46 1.96 0.023
0.41,0.51
p pp
n
![Page 117: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/117.jpg)
Example 2 Maternal/infant transmission of HIV
HIV-infection status was known for 363 births (180 in AZT group and 183 in placebo group)
Thirteen infants in AZT and 40 in placebo were infected
13ˆ 0.0718040ˆ 0.22183
AZT
PLA
p
p
![Page 118: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/118.jpg)
Example 2 Estimated 95% confidence interval for transmission
percentage in the placebo group:
ˆ ˆ1ˆ 1.96
0.22 1 0.220.22 1.96 1830.22 1.96 0.031
0.16,0.28
p pp
n
![Page 119: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/119.jpg)
Small Sample Considerations for Confidence Intervals for Population Proportions
![Page 120: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/120.jpg)
The Central Limit Theorem (CLT) The Central Limit Theorem is a powerful mathematical tool
that gives us: The shape of the sampling distribution of --
approximately normal Mother/infant transmission example, placebo group:
CLT 95% CI: (0.16,0.28) Can be done by hand
Exact 95% CI: (0.160984, 0.2855248) Requires computer, always correct
p̂
![Page 121: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/121.jpg)
Notes on 95% CI for p The CLT-based formula for a 95% CI is only approximate--it
works very well if you have enough data in your sample
The approximation works better for bigger values of
“Large sample” is indicative not only of total sample size, but of the balance between ‘yes’ and ‘no’ outcomes in the population
ˆ ˆ1np p
![Page 122: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/122.jpg)
Notes on 95% CI for p HIV example, AZT group:
n = 180,
CLT 95% CI: (0.03,0.11)
Exact 95% CI: (0.0390137,0.1203358)
ˆ 13/ 180 0.07p
![Page 123: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/123.jpg)
Notes on 95% CI for p In the placebo sample:
In the AZT sample:
ˆ ˆ1 183 0.22 0.78 31PLA PLAnp p
ˆ ˆ1 180 0.07 0.93 12AZT AZTnp p
![Page 124: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/124.jpg)
Notes on 95% CI for p When we had sample size issues for population mean
estimation, we used the Student’s t to calculate 95% confidence intervals For population proportion estimation, we use exact
binomial confidence intervals
The interpretation of the confidence interval is exactly the same with either the large sample method or the exact method In real life, using the computer will always give a valid
result CLT only breaks down with “small” sample sizes
![Page 125: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/125.jpg)
Example Random sample of 16 patients on drug A: two of sixteen
patients experience drug failure in first month
CLT 95% CI:
Exact 95% CI: (0.02, 0.38)
ˆ ˆ1ˆ 1.96
2 16 1 2 162 1.9616 160.05,0.28
p pp
n
![Page 126: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/126.jpg)
Next Lecture Friday, December 10 in 1023 Orr-Major from 8:30a -
10:30a
Topics include―P-values―One- and Two-sample t-tests―ANOVA―Linear Regression―Chi-square test―Survival Analysis
![Page 127: Introduction to Biostatistics for Clinical Researchers](https://reader036.fdocuments.net/reader036/viewer/2022062309/56815b1f550346895dc8d934/html5/thumbnails/127.jpg)
References and CitationsLectures modified from notes provided by John McGready and Johns Hopkins Bloomberg School of Public Health accessible from the World Wide Web: http://ocw.jhsph.edu/courses/introbiostats/schedule.cfm