Nemours Biomedical Research Statistics April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D....
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Nemours Biomedical Research Statistics April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D....
Nemours Biomedical Research
Statistics
April 16, 2009Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.
Nemours Bioinformatics Core Facility
Nemours Biomedical Research
Experimental Design Terminology
• An Experimental Unit is the entity on which measurement or an
observation is made. For example, subjects are experimental units
in most clinical studies.
• Homogeneous Experimental Units: Units that are as uniform as
possible on all characteristics that could affect the response.
• Randomization is the process of assigning experimental units
randomly to different experimental groups.
• Replication is the repetition of an entire experiment or portion of an experiment under two or more sets of conditions.
Nemours Biomedical Research
Experimental Design Terminology
• A Factor is a controllable independent variable that is being
investigated to determine its effect on a response. E.g.
treatment group is a factor.
• Factors can be fixed or random
– Fixed -- the factor can take on a discrete number of values
and these are the only values of interest.
– Random -- the factor can take on a wide range of values
and one wants to generalize from specific values to all
possible values.
• Each specific value of a factor is called a level. E.g. treatment
group: A, B, and placebo. Then all these are three levels.
Nemours Biomedical Research
Experimental Design Terminology
• Effect is the change in the average response
between two factor levels.
• That is, factor effect = average response at one level
– average response at a second level.
Nemours Biomedical Research
Experimental Design Terminology
• Interaction is the joint factor effects in which the effect of
one factor depends on the levels of the other factors.
No interaction effect of factor A and B
Interaction effect of factor A and B
Nemours Biomedical Research
Analysis of Variance (ANOVA)
• The analysis of variance (ANOVA) is a technique of decomposing the total
variability of a response variable into:
• Variability due to the experimental factor(s) and…
• Variability due to error (i.e., factors that are not accounted for in the
experimental design).
• The basic purpose of ANOVA is to test the equality of several means.
• A fixed effect model includes only fixed factors in the model.
• A random effect model includes only random factors in the model.
• A mixed effect model includes both fixed and random factors in the model.
Nemours Biomedical Research
The basic ANOVA situation
Type of variables: Quantitative response and Categorical (factor) predictors (independent variable).
Main Question: Are mean response measures of different groups are equal?
One categorical variable with only 2 levels (groups): 2-sample t-test
One categorical variable with more than two levels (groups): One way ANOVA
Two or more categorical variable, each with at least two or more levels (groups) of each:
Factorial ANOVA
Nemours Biomedical Research
Graphical Investigation
Graphical investigation:
• side-by-side box plots
• multiple histograms
Side by Side Boxplots
Nemours Biomedical Research
One-way analysis of Variance
• One factor of k levels or groups. E.g., 3 treatment groups in our
default data
• Total variation of observations (SST) can be split in two components:
variation between groups (SSG) and variation within groups (SSE).
• Variation between groups is due to the difference in different groups.
E.g. different treatment groups or different doses of the same
treatment.
• Variation within groups is the inherent variation among the
observations within each group.
• Completely randomized design (CRD) is an example of one-way
analysis of variance.
Nemours Biomedical Research
One-way analysis of Variance
• Model:– yij = µ + ai + eij
– Where yij is the ith observation of the jth group– ai is the effect of the ith group– µ is the grand mean and eij is the error.
• Assumptions:– Observations yij are independent.
– eij are normally distributed with mean zero and constant standard deviation.
– The second assumption implies that response variable for each group is normal (Check using q-q plot, histogram, or test for normality) and standard deviations for all groups are equal (rule of thumb: ratio of largest to smallest are approximately 2:1).
Nemours Biomedical Research
One-way analysis of Variance
• Hypothesis:Ho: Means of all groups are equal.Ha: At least one of them is not equal to other.
– doesn’t say how or which ones differ.– Can follow up with “multiple comparisons”
• ANOVA Table for one way classified data
Sources of Variation
Sum of Squares
df Mean Sum of Squares
F-Ratio
Group SSG k-1 MSG=SSG/k-1 F=MSG/MSE
Error SSE n-k MSE=SSE/n-k
Total SST n-1
Note: Large F means that MSG is large compared to MSE
Nemours Biomedical Research
Summary One-way ANOVA
Significance of the differences between the groups depends on
• the difference in the means
• the standard deviations of each group
• the sample sizes
• A useful web (thanks Bette for sending this website for the class):
www.psych.utah.edu/stat/introstats/anovaflash.html
ANOVA determines P-value from the F statistic
Nemours Biomedical Research
Multiple comparisons
• If the F test is significant in ANOVA table, then we intend to find the pairs of groups are significantly different. Following are the commonly used procedures:
– Fisher’s Least Significant Difference (LSD)
– Tukey’s HSD method
– Bonferroni’s method
– Scheffe’s method
– Dunn’s multiple-comparison procedure
– Dunnett’s Procedure
Nemours Biomedical Research
One-way ANOVA – Rcmdr Demo
• Make sure that group variables are in factor form.
• Data->Manage variables in active data set -> Convert numeric
variables to factor -> pick variables, select supply level names or Use
numbers for Factor levels, Give a new name or leave as the same
name.
• To run one-way ANOVA:
• Statistics -> Means-> One-way ANOVA -> Model Name, pick
response variable (e.g, PLUC.post), pick group variable (e.g. grp),
select pairwise comparisons of means
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on treatment groups
> Model1 <- aov(PLUC.post ~ grp, data=data)
> summary(Model1)
Df Sum Sq Mean Sq F value Pr(>F)
grp 2 157.282 78.641 18.844 5.225e-07 ***
Residuals 57 237.880 4.173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> numSummary(data$PLUC.post , groups=data$grp, statistics=c("mean", "sd"))
mean sd n
1 10.15657 2.227334 20
2 12.15339 1.874268 20
3 14.12242 2.011497 20
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on treatment groups
> .Pairs <- glht(Model1, linfct = mcp(grp = "Tukey"))
> confint(.Pairs)
Simultaneous Confidence Intervals
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = PLUC.post ~ grp, data = data)
Estimated Quantile = 2.4064
95% family-wise confidence level
Linear Hypotheses:
Estimate lwr upr
2 - 1 == 0 1.9968 0.4422 3.5514
3 - 1 == 0 3.9658 2.4113 5.5204
3 - 2 == 0 1.9690 0.4144 3.5236
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on treatment groups
1 2 3 4 5
3 - 2
3 - 1
2 - 1 (
(
(
)
)
)
95% family-wise confidence level
Linear Function
Nemours Biomedical Research
Analysis of variance of factorial experiment (Two or
more factors)• Factorial experiment:
– The effects of the two or more factors including their
interactions are investigated simultaneously.
– For example, consider two factors A and B. Then total
variation of the response will be split into variation for A,
variation for B, variation for their interaction AB, and variation
due to error.
Nemours Biomedical Research
Analysis of variance of factorial experiment (Two or
more factors)• Model with two factors (A, B) and their interactions:
• Assumptions: The same as in One-way ANOVA.
Nemours Biomedical Research
Analysis of variance of factorial experiment (Two or
more factors)• Null Hypotheses:
• Hoa: Means of all groups of the factor A are equal.
• Hob: Means of all groups of the factor B are equal.
• Hoab:(αβ)ij = 0, i. e. two factors A and B are independent
Nemours Biomedical Research
Two Factor ANOVA
• ANOVA for two factors A and B with their interaction AB.
Kpr-1SSTTotal
MSE=SSE/
kp(r-1)
kp(r -1)SSEError
MSAB/MSEMSAB=SSAB/ (k-1)(p-1)
(k-1)(p-1)SSABInteraction Effect AB
MSB/MSEMSB=SSB/p -1P-1SSBMain Effect B
MSA/MSEMSA=SSA/k -1k-1SSAMain Effect A
F-RatioMean Sum of Squares
dfSum of Squares
Sources of Variation
Kpr-1SSTTotal
MSE=SSE/
kp(r-1)
kp(r -1)SSEError
MSAB/MSEMSAB=SSAB/ (k-1)(p-1)
(k-1)(p-1)SSABInteraction Effect AB
MSB/MSEMSB=SSB/p -1P-1SSBMain Effect B
MSA/MSEMSA=SSA/k -1k-1SSAMain Effect A
F-RatioMean Sum of Squares
dfSum of Squares
Sources of Variation
Nemours Biomedical Research
Two-factor ANOVA - Rcmdr Demo
Statistics -> Means-> One-way ANOVA -> Model Name, pick response variable (e.g, PLUC.post), pick group variable (e.g. grp), select pairwise comparisons of means
Nemours Biomedical Research
Two- way ANOVA output: Pluc.pre on treatment groups
and pedModel2 <- (lm(PLUC.post ~ grp*Ped, data=data))
> Anova(Model2)
Anova Table (Type II tests)
Response: PLUC.post
Sum Sq Df F value Pr(>F)
grp 157.282 2 19.2016 5.024e-07 ***
Ped 0.842 1 0.2055 0.6521
grp:Ped 15.880 2 1.9387 0.1538
Residuals 221.159 54
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), mean, na.rm=TRUE)
+ # means
Ped
grp 1 2
1 9.591563 10.72158
2 12.397521 11.90926
3 14.798619 13.44621
Nemours Biomedical Research
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), sd, na.rm=TRUE)
+ # std. deviations
Ped
grp 1 2
1 2.656100 1.645896
2 1.850990 1.964046
3 2.034403 1.840354
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), function(x)
+ sum(!is.na(x))) # counts
Ped
grp 1 2
1 10 10
2 10 10
3 10 10
Two- way ANOVA output: Pluc.pre on treatment groups
and ped
Nemours Biomedical Research
Repeated Measures
• The term repeated measures refers to data sets with
multiple measurements of a response variable on the same
experimental unit or subject.
• In repeated measurements designs, we are often concerned
with two types of variability:
– Between-subjects - Variability associated with different groups of subjects
who are treated differently (equivalent to between groups effects in one-
way ANOVA)
– Within-subjects - Variability associated with measurements made on an
individual subject.
Nemours Biomedical Research
Repeated Measures
• Examples of Repeated Measures designs:
A. Two groups of subjects treated with different drugs for
whom responses are measured at six-hour increments for
24 hours. Here, DRUG treatment is the between-subjects
factor and TIME is the within-subjects factor.
B. Students in three different statistics classes (taught by
different instructors) are given a test with five problems and
scores on each problem are recorded separately. Here
CLASS is a between-subjects factor and PROBLEM is a
within-subjects factor.
Nemours Biomedical Research
Repeated Measures
• When measures are made over time as in example A we want to assess: how the dependent measure changes over time independent of treatment
(i.e. the main effect of time) how treatments differ independent of time (i.e., the main effect of
treatment) how treatment effects differ at different times (i.e. the treatment by time
interaction).
• Repeated measures require special treatment because: Observations made on the same subject are not independent of each
other. Adjacent observations in time are likely to be more correlated than non-
adjacent observations
Nemours Biomedical Research
Repeated Measures
• Methods of repeated measures ANOVA
Univariate - Uses a single outcome measure.
Multivariate - Uses multiple outcome measures.
Mixed Model Analysis - One or more factors (other than subject)
are random effects.
• We will discuss only univariate approach
Nemours Biomedical Research
Repeated Measures
• Assumptions:
Subjects are independent.
The repeated observations for each subject follows a
multivariate normal distribution
The correlation between any pair of within subjects
levels are equal. This assumption is known as
sphericity.
Nemours Biomedical Research
Repeated Measures
• Test for Sphericity:
Mauchley’s test
• Violation of sphericity assumption leads to inflated F statistics and hence inflated
type I error.
• Three common corrections for violation of sphericity:
Greenhouse-Geisser correction
Huynh-Feldt correction
Lower Bound correction
• All these three methods adjust the degrees of freedom using a correction factor
called Epsilon.
• Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within
subject factor.
Nemours Biomedical Research
Repeated Measures - R Demo
• The following R commands will arrange the data in default.csv for the ANOVA of repeated measures (you can also use R functions gl() for this manipulation), -
attach(x)
sid1<- c(sid,sid) grp1<-c(grp,grp)plc<- c(PLUC.pre, Pluc.post)time<-c(rep(1,60),rep(2,60))
• The following code is for repeated measures ANOVAsummary(aov(plc~factor(grp1)*factor(time) +
Error(factor(sid1))))
Nemours Biomedical Research
Repeated Measures R output:
Error: factor(sid1) Df Sum Sq Mean Sq F value Pr(>F) factor(grp1) 2 121.640 60.820 16.328 2.476e-06 ***Residuals 57 212.313 3.725 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 1 172.952 172.952 33.8676 2.835e-07 ***factor(grp1):factor(time) 2 46.818 23.409 4.5839 0.01426 * Residuals 57 291.083 5.107 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1