Post on 04-Jan-2016
description
Introduction to Biostatistics for Clinical and Translational
Researchers
KUMC Departments of Biostatistics & Internal Medicine
University of Kansas Cancer Center
Course Information
Jo A. Wick, PhDOffice Location: 5028 RobinsonEmail: jwick@kumc.edu
Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Events and Lectures’
Inferences: Hypothesis Testing
# Groups
2
Normal or large n
Independent Samples
2-sample t
Dependent Samples
Paired t
Non-normal or small n
Independent Samples
Wilcoxon Signed-Rank
Dependent Samples
Wilcoxon Rank-Sum
> 2
Normal or large n
Independent Samples
ANOVA
Dependent Samples
2-way ANOVA
Non-normal or small n
Independent Samples
Kruskal-Wallis
Dependent Samples
Friedman’s
Last Week
Continuous outcome, compared between groups
Today
Yes/No or categorical outcome compared between groups? Chi-square tests
Time-to-event compared between groups? Survival Analysis
Association between two continuous outcomes? Correlation
What if we want to ‘adjust’ any of these for additional factors? Regression Methods
Chi-Square Tests
Inferences on Proportions
When do we do when we have nominal (categorical) data on more than one factor?Gender and hair colorMenopausal status and disease stage at diagnosis‘Handedness’ and genderTumor response and treatmentPresence/absence of disease and exposure
These types of tests are looking at whether two categorical variables are independent of one another (versus associated)—thus, tests of this type are often referred to as chi-square tests of independence.
Inferences on Proportions
Remember, this is essentially looking at the association between two outcomes, where both are categorical (nominal or ordinal).
Assumptions?ROT: No expected frequency should be less than 5 (i.e.,
nπ < 5)If not met, Fisher’s Exact test is appropriate.
Inferences on Proportions
Example: Hair color and GenderGender: x1 = {M, F}
Hair Color: x1 = {Black, Brown, Blonde, Red}
Black Brown Blonde Red Total
Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100
Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200
Total 87 108 80 25 N = 300
Gender Hair Color
Male Black
Female Red
Female Blonde
What the data should look like in the actual dataset:
Hair Color and Gender
The researcher hypothesizes that hair color is not independent of sex.
H0: Hair color is independent of gender (i.e., the phenotypic ratio is the same within each gender).
H1: Hair color is not independent of gender (i.e., the phenotypic ratio is different between genders).
Hair Color and Gender
Chi-square statistics compute deviations between what is expected (under H0) and what is actually observed in the data:
DF = (r – 1)(c – 1)
where r is number of
rows and c is
number of columns
2
2
x
O E
E
Hair Color and Gender
Does it appear that this type of sample could have come from a population where the different hair colors occur with the same frequency within each gender?
OR does it appear that the distribution of hair color is different between men and women?
Black Brown Blonde Red Total
Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100
Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200
Total 87 108 80 25 N = 300
Hair Color and Gender
Conclusion: Reject H0: Gender and Hair Color are independent. It appears that the researcher’s hypothesis that the population phenotypic ratio is different between genders is correct (p = 0.029).
Black Brown Blonde Red Total
Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100
Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200
Total 87 108 80 25 N = 300
23 7.815
Inferences on Proportions
Special case: when you have a 2X2 contingency table, you are actually testing a hypothesis concerning two population proportions: H0: π1 = π2
(i.e., the proportion of males who are blonde is the same as the proportion of females who are blonde).
Blonde Non-blonde Total
Male 16 (16%) 84 (84%) 100
Female 64 (32%) 136 (68%) 200
Total 80 (26.7%) 220 (73.3%) N = 300
Inferences on Proportions
When you have a single proportion and have a small sample, substitute the Binomial test which provides exact results.
The nonparametric Fisher Exact test can be always be used in place of the chi-square test when you have contingency table-like data (i.e., two categorical factors whose association is of interest)—it should be substituted for the chi-square test of independence when ‘cell’ sizes are small.
Survival Analysis
Inferences on Time-to-Event
Survival Analysis is the class of statistical methods for studying the occurrence (categorical) and timing (continuous) of events.
The event could be development of a diseaseresponse to treatmentrelapsedeath
Survival analysis methods are most often applied to the study of deaths.
Inferences on Time-to-Event
Survival Time: the time from a well-defined point in time (time origin) to the occurrence of a given event.
Survival data includes:a timean event ‘status’any other relevant subject characteristics
Inferences on Time-to-Event
In most clinical studies the length of study period is fixed and the patients enter the study at different times.Lost-to-follow-up patients’ survival times are measured
from the study entry until last contact (censored observations).
Patients still alive at the termination date will have survival times equal to the time from the study entry until study termination (censored observations).
When there are no censored survival times, the set is said to be complete.
Functions of Survival Time
Let T = the length of time until a subject experiences the event.
The distribution of T can be described by several functions:Survival Function: the probability that an individual
survives longer than some time, t:
S(t) = P(an individual survives longer than t)
= P(T > t)
Functions of Survival Time
If there are no censored observations, the survival function is estimated as the proportion of patients surviving longer than time t:
ˆ # of patients surviving longer than ( ) =
total # of patients
tS t
Functions of Survival Time
Density Function: The survival time T has a probability density function defined as the limit of the probability that an individual experiences the event in the short interval (t, t + t) per unit width t:
( )
0
an individual dying in the interval , +( ) = lim
t
P t t tf t
t
Functions of Survival Time
Hazard Function: The hazard function h(t) of survival time T gives the conditional failure rate. It is defined as the probability of failure during a very small time interval, assuming the individual has survived to the beginning of the interval:
,( )
t
P t t t th t
t0
an individual of age fails in the time interval ( + )lim
Functions of Survival Time
The hazard is also known as the instantaneous failure rate, force of mortality, conditional mortality rate, or age-specific failure rate.
The hazard at any time t corresponds to the risk of event occurrence at time t:For example, a patient’s hazard for contracting influenza
is 0.015 with time measured in months.What does this mean? This patient would expect to
contract influenza 0.015 times over the course of a month assuming the hazard stays constant.
Functions of Survival Time
If there are no censored observations, the hazard function is estimated as the proportion of patients dying in an interval per unit time, given that they have survived to the beginning of the interval:
ˆ # of patients dying in the interval beginning at time ( ) =
# of patients surviving at interval width
# of patients dying per unit time in the interval =
# of patients surviving at
th t
t
t
Estimation of S(t)
Product-Limit Estimates (Kaplan-Meier): most widely used in biological and medical applications
Life Table Analysis (actuarial method): appropriate for large number of observations or if there are many unique event times
Methods for Comparing S(t)
If your question looks like: “Is the time-to-event different in group A than in group B (or C . . . )?” then you have several options, including:Log-rank Test: weights effects over the entire
observation equally—best when difference is constant over time
Weighted log-rank tests:• Wilcoxon Test: gives higher weights to earlier effects—better for
detecting short-term differences in survival• Tarome-Ware: a compromise between log-rank and Wilcoxon• Peto-Prentice: gives higher weights to earlier events• Fleming-Harrington: flexible weighting method
Early? Late? Proportional?
Early difference that fades
Difference appears late
Difference is early and maintained
Inferences for Time-to-Event
Example: survival in squamous cell carcinomaA pilot study was conducted to compare
Accelerated Fractionation Radiation Therapy versus Standard Fractionation Radiation Therapy for patients with advanced unresectable squamous cell carcinoma of the head and neck.
The researchers are interested in exploring any differences in survival between the patients treated with Accelerated FRT and the patients treated with Standard FRT.
Squamous Cell Carcinoma
AFRT SFRT
Gender
Male 28 (97%) 16 (100%)
Female 1 (3%) 0
Age
Median 61 65
Range 30-71 43-78
Primary Site
Larynx 3 (10%) 4 (25%)
Oral Cavity 6 (21%) 1 (6%)
Pharynx 20 (69%) 10 (63%)
Salivary Glands 0 1 (6%)
Stage
III 4 (14%) 8 (50%)
IV 25 (86%) 8 (50%)
Tumor Stage
T2 3 (10%) 2 (12%)
T3 8 (28%) 7 (44%)
T4 18 (62%) 7 (44%)
Overall Survival by Treatment
Survival Time (months)
0 12 24 36 48 60 72 84 96 108 120
Sur
viva
l Pro
babi
lity
0.00
0.25
0.50
0.75
1.00
AFRTSFRT
Inferences for Time-to-Event
H0: S1(t) = S2(t) for all t
H1: S1(t) ≠ S2(t) for at least one t
Squamous Cell Carcinoma
Overall Survival by Treatment
Survival Time (months)
0 12 24 36 48 60 72 84 96 108 120
Sur
viva
l Pro
babi
lity
0.00
0.25
0.50
0.75
1.00
AFRTSFRT
Median Survival Time:
AFRT: 18.38 months (2 censored)
SFRT: 13.19 months (5 censored)
Squamous Cell Carcinoma
Overall Survival by Treatment
Survival Time (months)
0 12 24 36 48 60 72 84 96 108 120
Sur
viva
l Pro
babi
lity
0.00
0.25
0.50
0.75
1.00
AFRTSFRT
Log-Rank test p-value= 0.5421
Median Survival Time:
AFRT: 18.38 months (2 censored)
SFRT: 13.19 months (5 censored)
Squamous Cell Carcinoma
AFRT SFRT
Gender
Male 28 (97%) 16 (100%)
Female 1 (3%) 0
Age
Median 61 65
Range 30-71 43-78
Primary Site
Larynx 3 (10%) 4 (25%)
Oral Cavity 6 (21%) 1 (6%)
Pharynx 20 (69%) 10 (63%)
Salivary Glands 0 1 (6%)
Stage
III 4 (14%) 8 (50%)
IV 25 (86%) 8 (50%)
Tumor Stage
T2 3 (10%) 2 (12%)
T3 8 (28%) 7 (44%)
T4 18 (62%) 7 (44%)
Squamous Cell Carcinoma
Staging of disease is also prognostic for survival.Shouldn’t we consider the analysis of the survival
of these patients by stage as well as by treatment?
Overall Survival by Treatment and Stage
Survival Time (Months)
0 12 24 36 48 60 72 84 96 108 120
Sur
viva
l Pro
babi
lity
0.00
0.25
0.50
0.75
1.00
AFRT/Stage 3AFRT/Stage 4SFRT/Stage 3SFRT/Stage 4
Squamous Cell Carcinoma
Median Survival Time:AFRT Stage 3: 77.98 mo. AFRT Stage 4: 16.21 mo.SFRT Stage 3: 19.34 mo. SFRT Stage 4: 8.82 mo.
Log-Rank test p-value = 0.0792
Inferences on Time-to-Event
Concerns a response that is both categorical (event?) and continuous (time)
There are several nonparametric methods that can be used—choice should be based on whether you anticipate a short-term or long-term benefit.
Log-rank test is optimal when the survival curves are approximately parallel.
Weight functions should be chosen based on clinical knowledge and should be pre-specified.
Publication Bias
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
Table 4 Risk factors for time to publication using univariate Cox regression analysis
Characteristic # not published # published Hazard ratio (95% CI)
Null 29 23 1.00
Non-significant trend
16 4 0.39 (0.13 to 1.12)
Significant 47 99 2.32 (1.47 to 3.66)
Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.
Publication Bias
Correlation
Linear Correlation
Linear regression assumes the linear dependence of one variable y (dependent) on a second variable x (independent).
Linear correlation also considers the linear relationship between two continuous outcomes but neither is assumed to be functionally dependent upon the other.Interest is primarily in the strength of association, not in
describing the actual relationship.
42
Scatterplot
43
Correlation
Pearson’s Correlation Coefficient is used to quantify the strength.
Note: If sample size is small or data is non-normal, use non-parametric Spearman’s coefficent.
2 2
x x y yr
x x y y
Correlation
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-56
r < 0 r > 0
r = 0
Inferences on Correlation
H0: ρ = 0 (no linear association) versusH1: ρ > 0 (strong positive linear relationship) or H1: ρ < 0 (strong negative linear
relationship)or H1: ρ ≠ 0 (strong linear relationship) Test statistic: t (df = 2)
46
Correlation
47
Correlation
* Excluding France
Regression Methods
What about adjustments?
There may be other predictors or explanatory variables that you believe are related to the response other than the actual factor (treatment) of interest.
Regression methods will allow you to incorporate these factors into the test of a treatment effect:Logistic regression: when y is categorical and nominal
binaryMultinomial logistic regression: when y is categorical
with more than 2 nominal categoriesOrdinal logistic regression: when y is categorical and
ordinal
What about adjustments?
Regression methods will allow you to incorporate these factors into the test of a treatment effect:Linear regression: when y is continuous and the factors
are a combination of categorical and continuous (or just continuous)
Two- and three-way ANOVA: when y is continuous and the factors are all categorical
What about adjustments?
Regression methods will allow you to incorporate these factors into the test of a treatment effect:Cox regression: when y is a time-to-event outcome
Linear Regression
The relationship between two variables may be one of functional dependence—that is, the magnitude of one of the variables (dependent) is assumed to be determined by (dependent on) the magnitude of the second (independent), whereas the reverse is not true.Blood pressure and ageDependent does not equate to ‘caused by’
Linear Regression
In it’s most basic form, linear regression is a probabilistic model that accounts for unexplained variation in the relationship between two variables:
This model is referred to as simple linear regression.
y
mx + b
x0 1
=Deterministic Component + Random Error
= +ε
= β +β +ε
0 2 4 6 8 10
02
46
81
0
x
y
0 2 4 6 8 10
02
46
81
0
x
y
Simple Linear Regression
0 1
0
1
= β +β +ε
response variable
explanatory variable
β intercept
β slope
ε 'error'
y x
y
x
y x= 0+1 +0y x= 0.78+0.89 + ε
Arm Circumference and Height
Data on anthropomorphic measures from a random sample of 150 Nepali children up to 12 months old
What is the relationship between average arm circumference and height?
Data:Arm circumference:
Height:
x
s
R
=12.4cm
=1.5cm
= (7.3cm,15.6cm)x
s
R
= 61.6cm
= 6.3cm
= (40.9cm,73.3cm)
Arm Circumference and Height
Treat height as continuous when estimating the relationship
Linear regression is a potential option--it allows us to associate a continuous outcome with a continuous predictor via a linear relationshipThe line estimates the mean value of the outcome for
each continuous value of height in the sample usedMakes a lot of sense, but only if a line reasonably
describes the relationship
Visualizing the Relationship
Scatterplot
Visualizing the Relationship
Does a line reasonably describe the general shape of the relationship?
We can estimate a line using a statistical software package
The line we estimate will be of the form:
Here, is the average arm circumference for a group of children all of the same height, x
y x0 1= β +β
y
Arm Circumference and Height
Arm Circumference and Height
Arm Circumference and Height
How do we interpret the estimated slope?The average change in arm circumference for a one-unit
(1 cm) increase in heightThe mean difference in arm circumference for two
groups of children who differ by one unit (1 cm) in heightThese results estimate that the mean difference in
arm circumferences for a one centimeter difference in height is 0.16 cm, with taller children having greater average arm circumference
Arm Circumference and Height
What is the estimated mean difference in arm circumference for children 60 cm versus 50 cm tall?
Arm Circumference and Height
Our regression results only apply to the range of observed data
Arm Circumference and Height
How do we interpret the estimated intercept?The estimated y when x = 0--the estimated mean arm
circumference for children 0 cm tall.Does this make sense given our sample?Frequently, the scientific interpretation of the
intercept is meaningless.It is necessary for fully specifying the equation of a
line.
Arm Circumference and Height
X = 0 isn’t even on the graph
Inferences using Linear Regression
H0: β1 = 0 (no relationship) versusH1: β1 > 0 (strong positive linear relationship)
or H1: β1 < 0 (strong negative linear relationship)or H1: β1 ≠ 0 (strong linear relationship)
Test statistic: t (df = n – 2)
1
2
1
ˆ 2
ˆi i
i
i
x x y y
x xt
ssx x
Notes
Linear regression performed with a single predictor (one x) is called simple linear regression.Correlation is a measure of the strength of the linear
relationship between two continuous outcomes.Linear regression with more than one predictor is
called multiple linear regression.
k ky x x x0 1 1 2 2= β +β +β + +β +ε
Logistic Regression
When you are interested in describing the relationship between a dichotomous (categorical, nominal) outcome and a predictor x, logistic regression is appropriate.
Conceptually, the method is the same as linear regression MINUS the assumption of y being continuous.
1
ln x
y
0 1= β +β +ε
Pr =1
Logistic Regression
Interpretation of regression coefficients is not straight-forward since they describe the relationship between x and the log-odds of y = 1.
We often use odds ratios to determine the relationship between x and y.
Odds of Death
A logistic regression model was used to describe the relationship between treatment and death:Y = {died, alive}X = {intervention, standard of care}
1
ln x
y
x
0 1= β +β +ε
Pr = death
1 if intervention=
2 if standard of care
Odds of Death
β1 was estimated to be -0.69. What does this mean?If you exponentiate the estimate, you get the odds ratio
relating treatment to the probability of death!exp(-0.69) = 0.5—when treatment involves the
intervention, the odds of dying decrease by 50% (relative to standard of care).
Notice the negative sign—also indicates a decrease in the chances of death, but difficult to interpret without transformation.
Death
β1 was estimated to be 0.41. What does this mean?If you exponentiate the estimate, you get the odds ratio
relating treatment to the probability of death!exp(0.41) = 1.5—when treatment involves the
intervention, the odds of dying increase by 50% (relative to standard of care).
Notice the positive sign—also indicates an increase in the chances of death, but difficult to interpret without transformation.
Logistic Regression
What about when x is continuous?Suppose x is age and y is still representative of
death during the study period.
1
ln x
y
x
0 1= β +β +ε
Pr = death
= baseline age in years
Death
β1 was estimated to be 0.095. What does this mean?If you exponentiate the estimate, you get the odds ratio
relating age to the probability of death!exp(0.095) = 1.1—for every one-year increase in age,
the odds of dying increase by 10%.Notice the positive sign—also indicates a decrease in the
chances of death, but difficult to interpret without transformation.
Multiple Logistic Regression
In the same way that linear regression can incorporate multiple x’s, logistic regression can relate a categorical y response to several independent variables.
Interpretation of partial regression coefficients is the same.
Cox Regression
Cox regression and logistic regression are very similarBoth are trying to describe a yes/no outcomeCox regression also attempts to incorporate the timing
of the outcome in the modeling
Cox vs Logistic Regression
Distinction between rate and proportion:Incidence (hazard) rate: number of “events” per
population at-risk per unit time (or mortality rate, if outcome is death)
Cumulative incidence: proportion of “events” that occur in a given time period
Cox vs Logistic Regression
Distinction between hazard ratio and odds ratio:Hazard ratio: ratio of incidence ratesOdds ratio: ratio of proportions
Logistic regression aims to estimate the odds ratio
Cox regression aims to estimate the hazard ratioBy taking into account the timing of events, more
information is collected than just the binary yes/no.
Proportional Hazards Assumption
Early? Late? Proportional?
Early difference that fades
Difference appears late
Difference is early and maintained
Treatment interacts with time!
Cox Regression
Cox Regression is what we call semiparametricKaplan-Meier is nonparametricThere are also parametric methods which assume the
distribution of survival times follows some type of probability model (e.g., exponential)
Can accommodate both discrete and continuous measures of event times.
Can accommodate multiple x’s.Easy to incorporate time-dependent covariates—
covariates that may change in value over the course of the observation period
For example, evaluating the effect of taking oral contraceptives (OCs) on stress fracture risk in women athletes over two years—many women switch on or off OCs .
If you just examine risk by a woman’s OC-status at baseline, can’t see much effect for OCs. But, you can incorporate times of starting and stopping OCs.
Time Dependent Covariates
Incidence and Prevalence
Incidence and Prevalence
An incidence rate of a disease is a rate that is measured over a period of time; e.g., 1/100 person-years.
For a given time period, incidence is defined as:
Only those free of the disease at time t = 0 can be included in numerator or denominator.
# of newly - diagnosed cases of disease
# of individuals at risk
Incidence and Prevalence
A prevalence ratio is a rate that is taken at a snapshot in time (cross-sectional).
At any given point, the prevalence is defined as
The prevalence of a disease includes both new incident cases and survivors with the illness.
# with the illness
# of individuals in population
Incidence and Prevalence
Prevalence is equivalent to incidence multiplied by the average duration of the disease.
Hence, prevalence is greater than incidence if the disease is long-lasting.
Measurement Error
To this point, we have assumed that the outcome of interest, x, can be measured perfectly.
However, mismeasurement of outcomes is common in the medical field due to fallible tests and imprecise measurement tools.
Diagnostic Testing
True Disease State
Diagnostic Test Result Present (D+) Absent (D-)
Positive (T+) True Positive (TP) False Positive (FP)
Negative (T-) False Negative (FN)
True Negative (TN)
Sensitivity and Specificity
Sensitivity of a diagnostic test is the probability that the test will be positive among people that have the disease.
P(T+| D+) = TP/(TP + FN)Sensitivity provides no information about people that
do not have the disease.Specificity is the probability that the test will be
negative among people that are free of the disease.Pr(T-|D-) = TN/(TN + FP)
Specificity provides no information about people that have the disease.
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
Diseased
Healthy
Diagnosed positive
SN = 24/30 = 0.80SP = 56/70 = 0.80Prevalence = 30/100 = 0.30
Diseased
Healthy
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
A perfect diagnostic test has SN = SP = 1
Diseased
Healthy
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
A 100% inaccurate diagnostic test has SN = SP = 0
Sensitivity and Specificity
Example: 100 HIV+ patients are given a new diagnostic test for rapid diagnosis of HIV, and 80 of these patients are correctly identified as HIV+
What is the sensitivity of this new diagnostic test?Example: 500 HIV- patients are given a new
diagnostic test for rapid diagnosis of HIV, and 50 of these patients are incorrectly specified as HIV+
What is the specificity of this new diagnostic test? (Hint: How many of these 500 patients are correctly specified as HIV-?)
Positive and Negative Predictive Value
Positive predictive value is the probability that a person with a positive diagnosis actually has the disease.
Pr(D+|T+) = TP/(TP + FP)This is often what physicians want-patient tests positive for
the disease; does this patient actually have the disease?Negative predictive value is the probability that a person
with a negative test does not have the disease.Pr(D-|T-) = TN/(TN + FN)
This is often what physicians want-patient tests negative for the disease; is this patient truly disease free?
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
Diseased
Healthy
Diagnosed positive
PPV = 24/38 = 0.63NPV = 56/62 = 0.90
Diseased
Healthy
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
A perfect diagnostic test has PPV = NPV = 1
Diseased
Healthy
DiseasedNon-Diseased
Positive DiagnosisNegative Diagnosis
A 100% inaccurate diagnostic test has PPV = NPV = 0
PPV and NPV
Example: 50 patients given a new diagnostic test for rapid diagnosis of HIV test positive, and 25 of these patients are actually HIV+.
What is the PPV of this new diagnostic test?Example: 200 patients given a new diagnostic test
for rapid diagnosis of HIV test negative, but 2 of these patients are actually HIV+.
What is the NPV of this new diagnostic test? (Hint: How many of these 200 patients testing negative for HIV are truly HIV-?)