R A V (ANOVA)ion.uwinnipeg.ca/~clark/teach/4100/_Winter/40Chap-1to6.pdf · represent the degree of...
Transcript of R A V (ANOVA)ion.uwinnipeg.ca/~clark/teach/4100/_Winter/40Chap-1to6.pdf · represent the degree of...
Scientific Psychology and Statistics Introduction - 1.1
CHAPTER 01:
INTRODUCTION TO SCIENTIFIC PSYCHOLOGY
AND THE ROLE OF ANALYSIS OF VARIANCE (ANOVA)
A General Conceptualization of Scientific Research .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Concretizing (Operationalizing) Theoretical Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Predictor and Criterion Variables . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 6
Experimental and Non-experimental Predictor Variables . . . . . . . . . . . . . . . . . . . . . . . . . 7
Numerical and Categorical Variables . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 10
Types of Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 11
An Overview of Data Analysis (Statistics) . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12
Correlation and Regression Methods . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 14
Introduction to ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 16
Properties of ANOVA Predictors . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 16
Overview of ANOVA Designs . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 18
Introduction to the F-Statistic . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 22
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 27
Scientific Psychology and Statistics Introduction - 1.2
Psychologists want to understand human behavior and experience scientifically. Why do
some people do better in school than others? What is the effect of maternal alcohol consumption
during pregnancy on the later cognitive abilities of children? What are the perceptual
mechanisms that allow people at a crosswalk to determine whether oncoming cars are going to
hit them or stop in time? What factors determine whether people like or fall in love with one
another? What are the effects of severe head injury on different language processes, and how can
understanding the fundamental mechanisms of language be used to develop training methods that
will improve the rehabilitation of linguistic deficits? What biological and psychological
determinants lead some people to become so sad that it interferes seriously with the quality of
their lives, and what treatments are effective at easing their suffering? Curiosity about these and
innumerable similar questions is the starting point for psychological research.
The focus of this book is on the statistical analysis of data relevant to such questions, in
particular the technique of analysis of variance (ANOVA). However, it will help to place
ANOVA in the more general context of scientific research and various methods of statistical
analysis. A general characterization of scientific research will provide an important terminology
for describing and understanding various approaches to statistical analysis. Hence, we begin with
a conceptualization of how scientific psychology (and indeed any scientific discipline) works,
and then proceed to some ways in which scientific studies differ from one another. These
differences have implications for data analysis and for other important aspects of research.
A GENERAL CONCEPTUALIZATION OF SCIENTIFIC RESEARCH
Empirical research into questions about our psychological world is a complicated process
that requires many skills involved in the design, conduct, data analysis, and explanation of
scientific studies. Scientific studies can therefore differ from one another in many ways. At an
abstract level, however, any scientific study can be conceptualized in relatively simple terms.
The sample research questions listed above share a common general form. Specifically,
they involve questions about causes and effects. Consider the question of why people are
attracted to one another. The effect that we are interested in is being attracted versus not being
attracted to another person (or perhaps more finely graduated degrees of attraction from low to
high). The many factors that contribute to whether or not (or how much) people like one another
Scientific Psychology and Statistics Introduction - 1.3
are the potential causes. For example, the old saying “Birds of a feather flock together”
hypothesizes that in general people who are similar to one another will like one another better
than people who are dissimilar. Similarity of attitudes, appearance, or whatever is therefore a
potential cause of attraction. “Opposites attract” suggests that dissimilarity breeds attraction.
Consider a second example, the effect of maternal alcohol consumption on children’s
later intelligence. Here the effect of interest concerns variation in children’s intelligence, and the
cause is whether or not their mothers consumed alcohol during pregnancy (or the amount of
alcohol that they consumed). The many other causes of intelligence (e.g., heredity, early
childhood environment, nutrition, brain damage) would not be the primary focus of such a study,
although researchers would need to consider other potential causes in their design. A research
design specifies how observations related to some specific hypothesis are to be collected.
At its most basic level, scientific research is concerned with testing the validity of these
kinds of claims about cause-effect relationships. Such claims often arise as deductions from
more general theories, or they may simply be specific relationships that researchers are interested
in for diverse reasons, including practical ones (e.g., school psychologists interested in the
determinants of academic success). Whatever the origins of the hypotheses, scientific
researchers wish to evaluate the validity of the hypothesized cause-effect relationship(s)
empirically (i.e., through observation that produces relevant data). Validity here means whether
the hypothesized cause-effect relationship represents accurately the true state of affairs in the real
world. Evaluation of validity involves many difficult issues, including the measurement of the
hypothetical constructs of interest.
Concretizing (Operationalizing) Theoretical Constructs
Any attempt to validate cause-effect relationships leads quickly to the realization that
psychological concepts tend to be highly abstract and difficult to observe. Psychological theories
posit relationships between abstract concepts (or constructs, to use a more technical term), such
as intelligence, attention, depression, problem solving, similarity, and love. These constructs
cannot be observed directly and must be inferred from behaviour. For example, we might
conclude that two people are in love if they often gaze into each other's eyes, spend time together,
write long letters to one another when apart, make sacrifices for one another, and avow their love
Scientific Psychology and Statistics Introduction - 1.4
Figure 1.1. Model of scientific thinking.
in some verbal manner. But our conclusion about the abstract construct of love always remains
at least somewhat inferential; for example, there may be alternative internal states responsible for
these behaviours, such as a plan to exploit the other person’s affection.
The abstractness of many psychological terms is a serious challenge for scientists because
abstract terms cannot be observed directly and observation is the basic method by which science
tests theories. It is difficult to identify factors that cause people to fall in love if we have no
reliable and valid way to determine whether people were actually in love. Similar difficulties
arise without acceptable measures for managerial competence, intelligence, motivation, imagery,
and numerous other abstract psychological constructs that scientific psychologists wish to
understand. Concretizing an abstract concept generally involves specifying some procedures or
operations by which the concept can be measured, and in some cases actually manipulated
directly by researchers. Hence, the process of concretizing has been referred to as operational
definition. The concrete results of this process are called variables (rather than constructs), and it
is these variables that are subjected to statistical analysis.
The problem of concretizing or operationalizing abstract concepts is not unique to
Psychology. At one time, weight, heat, pressure, and other physical constructs also lacked
operational definitions (i.e., concrete procedures for accurate, reliable, and valid measurement of
the constructs). Early generations of natural scientists slowly and methodically developed
operational and increasingly precise measures for these constructs, which then allowed the
research underlying our current sophisticated theories of the physical world. Indeed, major
scientific advances often depended on the identification of the critical constructs and the
development of precise measurement operations. At the present time, other disciplines in the
social sciences and humanities generally lag even further behind than psychology in the
development of specific measures for their
theoretical constructs, and some appear to
have even rejected empiricism as a necessary
foundation for their disciplines.
Figure 1.1 provides a very simplified
illustration of how operationalizing constructs
Scientific Psychology and Statistics Introduction - 1.5
allows researchers to test a theoretical hypothesis such as "Birds of a feather flock together"
(versus "Opposites attract"). In the example, the hypothesized causal relationship (symbolized by
the top arrow) is between two abstract constructs -- Similarity ("Birds of a feather") and
Attraction ("flock together"). Verbally, the hypothesis could be stated as “Similarity causes more
Attraction, whereas Dissimilarity leads to Less Attraction”.
To test this hypothesis, the two abstract constructs of Similarity and Attraction must be
translated into concrete, observable variables. That is, the constructs are operationally defined in
terms of the procedures or operations used to measure or experimentally manipulate them. The
criterion variable of Attraction might be operationally defined by ratings of how much each
person likes the other person, by observing interactions between people, or by other observable
measures relevant to liking. The predictor Similarity might be operationally defined by a
measure of the similarity of people’s independent responses to an attitude questionnaire, or by
experimenter instructions that focussed on similarities or differences between pairs of
participants in order to manipulate the perceived similarity between participants in the study.
Translating similarity and attraction into observable variables allows relevant observations to be
collected, perhaps from roommates in a university residence, couples, or even strangers meeting
for the first time.
Only occasionally is the translation from theoretical constructs to observational variables
simple (e.g., gender, age). Many psychological constructs are extremely difficult to translate into
observational terms, and an entire field of specialization, psychometrics, is devoted to the process
of operationalization. The considerable controversy around purported measures of intelligence
demonstrates the complexities. Some people deny that such abstract traits can even be measured,
others claim that any such measure is biassed against disadvantaged people, whereas
psychometrists themselves generally view intelligence tests in a positive way. One use of
statistical techniques, such as those described in this book, is to evaluate relationships relevant to
the reliability and validity of psychological measures (e.g., whether two measures of attraction
are correlated to one another, whether pre-existing groups differ as expected on some measure).
Such research would generally precede tests of a causal predictor, and would involve methods
discussed more fully in methodology texts and books for various content areas of psychology.
Scientific Psychology and Statistics Introduction - 1.6
The results from our hypothetical study of similarity and liking would be pairs of
numbers for participants in the study. One number of each pair would represent the degree of
similarity between two people (e.g., a numerical measure of the similarity of their attitudes as
determined by an attitude questionnaire, or a categorical number representing which
experimental condition each person was assigned). The second number of each pair would
represent the degree of attraction between the two people (e.g., rated liking for a partner, perhaps
on a 7-point scale from 1=Not at All to 7=A Lot). The number of pairs of scores would depend
on the actual number of participants in the study.
Such data would permit researchers to determine statistically whether measures of liking
tend to be demonstrably higher for similar people than for dissimilar people (the ? mark in Figure
1.1). A positive relationship would occur if people having greater similarity to one another
tended to report higher liking. This outcome would support the prediction generated by the
theory that "birds of a feather flock together" and be contrary to predictions of the "opposites
attract" theory. The lack of a relationship or a negative relationship (i.e., people with greater
similarity having less liking) would be inconsistent with the hypothesis.
Predictor and Criterion Variables
In any causal model, at least one variable (and often more than one) is identified as a
predictor of some other variable, which is referred to as the dependent or criterion variable.
Criterion variables correspond to the outcomes that researchers ultimately want to explain (e.g.,
liking in the similarity study described above, anxiety in a study of the effectiveness of
psychotherapy, grades in a study of school success). Criterion variables are also called dependent
variables, a more specific term that is especially appropriate for experimental studies. Although
a researcher may try to influence a criterion or dependent variable by experimentally
manipulating some other variable, the criterion variable itself is simply measured by the
researcher and is not itself directly manipulated. Researchers do not, for example, directly
control the degree of liking between participants in research studies, although they might
manipulate variables that they think will influence liking (such as perceived similarity).
The other variable in any causal relationship is the predictor variable. In Figure 1.1,
similarity is a predictor of liking. An alternative term for predictor, especially appropriate for
Scientific Psychology and Statistics Introduction - 1.7
experimental variables manipulated by the researcher, is independent variable. The fundamental
question in evaluating a theoretical hypothesis is whether the criterion variable is influenced
causally by the predictor variable(s) being studied (e.g., similarity in the liking study, type of
therapy in a study of the effectiveness of therapy, achievement motivation and intelligence as
predictors of school success). Properly, the term independent variable refers to causal variables
that the experimenter controls directly (i.e., manipulates), but independent variable is often used
for non-experimental causal variables, such as measures of achievement motivation.
The identification of a variable as a predictor or criterion variable or as a manipulated or
measured predictor can seldom be made without considering the specific study. A variable such
as anxiety, for example, might be used as a criterion variable in a study of therapeutic treatment
or as a predictor variable in a study of the negative impact of anxiety on interpersonal relations,
learning, or some other behavior. And anxiety as a predictor could be measured by some
appropriate scale or manipulated by the experimenter (e.g., by including or excluding in the
instructions statements that would increase or decrease participants’ anxiety levels). Only by
examining the specific causal hypothesis of interest is it possible to know the cause-effect role of
a variable (i.e., its status as predictor or criterion, as independent or dependent variable), and
whether the predictor is experimental or non-experimental, the next distinction that we consider.
Experimental and Non-experimental Predictor Variables
One distinction that can be important for statistical and other purposes is whether the
predictor is an experimental or non-experimental predictor. Experimental predictors are
manipulated by the researcher, whereas non-experimental variables are measured in their natural
state rather than manipulated. In the case of experimental predictors, the researcher is free to
determine the level of the predictor variable assigned to particular subjects (or to the same
subjects at different points in time). Examples of experimental predictors include: different
instructions given randomly to participants in a memory task, various amounts of reinforcement
in a learning study as determined by the experimenter, and experimenter-prepared instructions
that prime different degrees of perceived similarity in a social psychology study of liking. In the
last study, for example, researchers could randomly assign half of the participants to receive
instructions that emphasize similarity with their randomly chosen partner, and the other half to
Scientific Psychology and Statistics Introduction - 1.8
receive instructions that emphasize differences. Subsequent liking for the partner would be
measured as the criterion variable.
In the case of non-experimental predictors, the researcher measures pre-existing
differences among participants rather than directly creating the differences. Examples of non-
experimental variables are gender of subjects, age of participants in a learning study, self-selected
participation in some therapeutic treatment, naturally adopted strategies in a study of learning
from textbooks, and attitude similarity for pre-existing couples. In the last study, people’s
similarity to a pre-existing partner would be measured, along with their liking. Similarity could
vary along some numerical scale, or else be roughly determined (e.g., similar, neutral,
dissimilar), but in either case it would measure similarities that existed prior to the study, rather
than the experimentally-determined “similarities” manipulated in an experimental study.
As just illustrated, one can not always tell from the construct alone whether the variable is
experimental or non-experimental. The specific procedures used in a particular study must be
examined to determine whether predictor variables are simply measured by the researchers in a
non-experimental study (e.g., administer attitude scales to different people and determine the
degree of similarity in attitudes), or manipulated by the researchers in some experimental manner
(e.g., randomly assign participants to different treatment conditions in a psychotherapy study).
Although the predictor of similarity is most easily studied through measurement of similarity,
even attitude similarity could be manipulated by a researcher (e.g., by experimentally inducing or
emphasizing similarity of attitudes in participants and then assessing their liking of one another).
The distinction between measured and manipulated predictors has several important
implications for researchers. First, the distinction is an important factor to consider in evaluating
how much confidence one should have in any causal inferences made from a study. The fact that
the experimenter controls directly an experimental predictor means that, ideally, assignment of
subjects to experimental conditions is potentially controlled in such a way that differences on the
predictor are independent of (i.e., not confounded with) other factors that might influence the
dependent variable. Random assignment to conditions is one way to achieve independence from
other potential determinants of behaviour. If participants in our attraction study were randomly
assigned to Low and High levels of similarity, for example, then we would be more confident
Scientific Psychology and Statistics Introduction - 1.9
that any differences in liking between the two groups were actually due to similarity rather than
to other characteristics of participants that could influence liking (e.g., how sociable a person is,
whether response biases influenced ratings of liking). Randomly assigning participants to
different memory instructions similarly means that groups with different instructions will on
average not differ on a host of other variables that affect memory performance (e.g., intelligence,
fatigue, mood, motivation to do well).
The story is quite different for non-experimental predictors that are simply measured by
the researcher in their naturally occurring state. Because researchers do not randomly assign
subjects to the different levels or values of non-experimental variables, the participants are likely
to differ in a host of ways correlated with the predictor of interest. Participants who are classified
as low or high in similarity based on measured attitudes, for example, are likely to vary in
numerous other ways as well (e.g., similarity of social class, religious background, ethnicity,
susceptibility to response biases, and so on). Any seeming effect of similarity could instead be
due to these alternative factors, referred to as confounding variables.
As a second illustration, participants who voluntarily undergo therapy differ from non-
participants in many ways besides exposure to therapy (e.g., motivation, degree of distress,
income, education, duration of disorder, accessibility to treatment, attitudes toward psychological
disorders). Any of these confounded variables could account for the apparent effect of
therapeutic treatment. More subtly, these same confounding variables could mask or hide the
effect of treatment by counteracting its positive effects. That is, even a null relationship is
seriously compromised in a non-experimental study.
Thus, non-experimental designs lead to more ambiguous causal conclusions than properly
executed experiments, which allow researchers to be more confident that the intended predictor
variable was the only difference between the experimental conditions. Of course, few studies are
perfectly designed. Poorly executed experiments may provide little more assurance about
causality than a non-experimental study, so one should not accept blindly causal conclusions
simply because a study had researchers control assignment to conditions. The critical question is
whether the assignment to conditions was done in a way that eliminated the differential effects of
other variables on the dependent variable. This can be extremely difficult to achieve and even
Scientific Psychology and Statistics Introduction - 1.10
more difficult to determine whether true.
Although the distinction between experimental and non-experimental predictors is
absolutely critical for what causal inferences one draws from the statistical results, it is also one
factor that is considered when researchers decide on appropriate statistical analyses for different
studies. Another important distinction for statistical purposes is the difference between
numerical and categorical variables.
Numerical and Categorical Variables
Predictor and criterion variables can be either categorical or numerical in nature.
Numerical variables involve the assignment of numbers (i.e., scores) to reflect the amount of
some variable. Examples of numerical variables include number of words recalled on a memory
task, IQ (i.e., a numerical score derived from an intelligence test), scores on some measure of
shyness, number of bar-presses by rats in a Skinner Box, and amount of practice at relaxation in a
therapy study. In each of these examples, the number represents the amount (less to more, or
vice versa) of some psychological quality. That is, the levels of the variable represented by the
numbers are graded in amount and the actual numbers reflect those gradations. Numerical
variables are also called quantitative variables and are sometimes further sub-classified as
ordinal, interval, or ratio. The latter distinctions are not important for statistical purposes and
will be ignored here.
Psychologists also study non-numerical or categorical variables, such as religious
affiliation, gender, psychiatric diagnoses, reasons for breaking off romantic relationships, style of
parent attachment in infants, and different kinds of therapeutic treatments. Such categorical
variables involve qualitative distinctions between the levels of the variable, rather than amount.
A person diagnosed as Schizophrenic, for example, differs in a host of qualitative ways from
someone diagnosed as having an Antisocial Personality Disorder. They do not differ simply on a
single, quantifiable dimension. Similarly, qualitative differences distinguish Roman Catholics,
Anglicans, Baptists, atheists, and people from other religious groups. Such categorical variables
are also called qualitative or nominal variables. Numbers may be used to represent the different
levels of categorical variables, but in this case the numbers are used simply as distinct labels for
the groups, rather than as representing the amount of some quality. That is, the group numbers
Scientific Psychology and Statistics Introduction - 1.11
are similar to social insurance numbers, telephone numbers, and other cases in which numbers
identify individuals, rather than representing the amount of some quality.
Types of Relationship
A causal relation involves at least one predictor and one criterion variable. Since the
predictor and the criterion variables can each be either categorical or numerical, it is possible to
classify individual relationships as one of four types. Box 1.1 lists the four types of relationship
depending on whether the predictor is numerical or categorical and whether the criterion is
numerical or categorical. Also shown for each of the four types are examples of such
relationships. To illustrate, a study of the relationship between size (or amount) of reward and
rate of bar-pressing would involve a numerical predictor (amount of reward) and a numerical
criterion (rate of bar-pressing).
The distinction between
numerical and categorical variables
is important when deciding how to
analyze data or how to test specific
hypotheses about relations between
independent and dependent
variables. This book focuses on
statistical relations that involve
numerical dependent variables and
categorical predictor variables (i.e.,
the first type in Box 1.1), although
the methods can sometimes be
applied to numerical predictors
(i.e., the second type in Box 1.1).
Examples of relationships between categorical predictors and numerical criterion variables
include: the relationship of gender to performance on spatial ability tests, and the relationship of
treatment conditions to improvement on some measure of psychopathology. Examples of
relationships between numerical predictors and numerical criterion variables include: the
Predictor Criterion
Categorical Numerical- therapy condition - anxiety score- gender - imagery performance
Numerical Numerical- IQ - GPA- size of reward - bar-pressing
Numerical Categorical- age - religious affiliation- years experience - position in company
Categorical Categorical- gender - psychiatric diagnosis- ethnicity - type of crime
Box 1.1. A Taxonomy of Empirical Relationships.
Scientific Psychology and Statistics Introduction - 1.12
relationship of quantified similarity of attitudes to degree of liking between people, and the
relationship of age to amount remembered from a list of words. The other two types of
relationships (i.e., those involving categorical criterion variables) generally involve other
statistical procedures that are not considered in this book on ANOVA, or its companion on
Multiple Regression and Correlation techniques.
AN OVERVIEW OF DATA ANALYSIS (STATISTICS)
The fundamental purpose of statistics is to determine the significance and strength of an
observed relationship between the predictor and criterion variables. Significance is concerned
with whether an observed relationship could have occurred by chance or not; for example, would
randomly paired numbers be unlikely to demonstrate the observed degree of relationship?
Strength is concerned with how much variation in the criterion can be accounted for by the
predictor; for example, does similarity account for 10%, 20%, ... of the variation in liking among
people?
The specific analysis would depend on several features of the relationship discussed
previously. Are the predictors categorical or numerical? Are they experimental or non-
experimental? Do the predictors involve a large number of levels or relatively few? If
categorical, do the predictors contain the same number of participants in each level? Answers to
such questions would result in the researchers choosing either correlation and regression methods
of statistical analysis, or analysis of variance methods. These are two closely related “families”
of data analysis, and account for the majority of statistical techniques used in psychology and
probably in the social sciences generally. Since ANOVA involves more restrictive conditions,
let us consider it first.
Analysis of Variance (ANOVA)
ANOVA methods of statistical analysis were designed originally to determine whether
distinct experimental conditions result in differences on some numerical dependent variable.
That is, ANOVA methods were designed specifically for the case of categorical predictors and
numerical criterion variables. In essence, the techniques analyze differences among means on the
numerical criterion variables as a function of varying conditions on the categorical predictor(s).
ANOVA determines whether the differences between condition means are significant and also
Scientific Psychology and Statistics Introduction - 1.13
how much of the variation on the criterion variable can be attributed to the predictor.
True experiments of this sort demonstrate a number of characteristics that permit the use
of ANOVA. One condition that is generally true is that the categorical predictor has relatively
few levels to it. For example, amount of reinforcement might have three levels (Low, Medium,
High), treatment condition could have four levels (Control, Placebo Control, Treatment 1,
Treatment 2), and word concreteness in a memory study might have only two levels (Abstract,
Concrete). There are certainly cases in which a single experimental variable has many levels, but
this is not at all the norm for such studies.
A second condition that is often true of experiments is that there are the same number of
observations for each level of the predictor. Because the researcher has control of the assignment
of participants to experimental conditions (or vice versa), experimental researchers will generally
try to assign the same number of participants to different conditions. In a study comparing
Treatment and Control groups, for example, the researcher might randomly assign 20 participants
to each group. Equal numbers can greatly simplify the analysis of the results and allow for the
use of standard ANOVA methods. More sophisticated methods than standard ANOVA may be
necessary to deal with unequal numbers of observations per condition.
Another characteristic of well-designed experiments is that multiple predictors can be
made independent of one another and of many other potential confounds (i.e., variables
correlated with the target predictor in the natural world). Under these conditions, ANOVA can be
used to measure the contribution of the single independent variable of interest, or the effects of
multiple, independent predictors. If intelligence in pigs is studied as a function of the
experimental manipulation of maternal alcohol consumption, for example, then researchers can
examine the effect of alcohol consumption alone with some confidence that alcohol consumption
is independent of other potential confounding variables, or of other predictor variables being
studied (assuming sound procedures were used in assigning consumption levels to pigs).
Furthermore, multiple predictors in an experiment can be made independent of one
another by assigning equal numbers of pigs to each combination of conditions. To illustrate, a
study of both amount of maternal alcohol consumption (Low versus High), and nutritional status
(Poor versus Good) could ensure independence of the two predictors by assigning the same
Scientific Psychology and Statistics Introduction - 1.14
number of animals to each condition (10 Low+Poor, 10 Low+Good, 10 High+Poor, 10
High+Good). The correlation between alcohol consumption and nutritional status would be 0.00.
If unequal numbers of animals were in each condition (e.g., 15 Low+Poor, 5 Low+Good, 5
High+Poor, 15 High+Good), then the two predictors would be correlated (r = 0.50 given the
numbers just presented) and their effects would be confounded. Standard ANOVA techniques
would have trouble teasing apart the unique effects of such correlated predictors.
Although the preceding qualities (i.e., few levels, equal numbers of observations) are
more characteristic of experimental variables, they are also sometimes true of non-experimental
variables. Many non-experimental studies may involve categorical predictors that have relatively
few levels to them. Consider the following non-experimental predictors: gender has two levels
(Male, Female), religious affiliation could be studied with four levels (Roman Catholic,
Protestant, Jewish, Other), and different psychological diagnoses in a particular study might have
three levels (e.g., Obsessive-Compulsive Disorder, Depression, Anxiety). It may also be possible
to have the same number of observations per level of a single predictor and, albeit with greater
difficulty in many cases, to have the same number of observations in combinations of levels of
different predictors (e.g., equal numbers of males and females in each of four religious groups).
Achieving such equality of numbers and independence of predictors, however, may be
challenging and wasteful of data. The hypothetical study above involving unequal numbers in
four groups (specifically, 15, 5, 5, 15), for example, could be analyzed with equal numbers in the
groups (5, 5, 5, 5), but only by discarding half the data. To avoid such waste, the more general
techniques of correlation and regression may be preferred.
Correlation and Regression Methods
Standard ANOVA techniques become complicated by deviations from the above
conditions. The presence of many levels to a predictor variable (e.g., a numerical, non-
experimental variable, such as intelligence) makes it difficult and less desirable to simply
compare means. The observed values of the predictor may be scattered over a wide range, with
unequal numbers of observations at each level. In many cases, there may only be a single
observation for a particular predictor score. Correlation and regression techniques are very
general and can readily accommodate such varied sets of observations, as well as the more
Scientific Psychology and Statistics Introduction - 1.15
orderly data characteristic of ANOVA.
Rather than compare differences between means, correlational methods allow researchers
to determine whether increases or decreases on the criterion variable numbers are associated in
any systematic way with the values of the predictor variable numbers. Do high numbers on the
predictor tend to be associated with high numbers on the criterion, and low with low? Such
analyses do not require few levels of the predictor, equal numbers of participants at each level, or
even that the predictor values be independent of other predictors in the study. The statistical
procedures are powerful enough to accommodate such traits.
Researchers planning non-experimental studies may choose to use such correlation and
regression methods to measure the statistical contribution of multiple predictors to the criterion
variable, and to perhaps adjust statistically for correlations among the predictors. In a non-
experimental study of maternal alcohol use on fetal development, for example, researchers might
measure a host of predictor variables possibly confounded with alcohol use (e.g., education level,
smoking, drug use, maternal age). Multiple regression would allow the researchers to determine
the unique contribution of alcohol use over and above these other predictors, thus strengthening
the validity of any causal inferences. Doubts would remain, however, about confounding with
other variables not included in the study.
Despite the preceding correlation between qualities of predictors and method of statistical
analysis, then, there is nothing rigid about the use of correlation and regression for non-
experimental, numerical variables and analysis of variance for categorical, experimental
variables. Although ideally suited to and perhaps even required for non-experimental predictor
variables that are numerical, irregularly distributed, and confounded with other predictors,
correlation and regression methods are general ones that also accommodate the more regular
situation of categorical experimental predictors. Correlation and regression can in fact be used
for experimental variables and may in some cases be required (e.g., if loss of participants results
in unequal numbers of participants and correlated predictors). Analysis of variance may similarly
be appropriate for certain types of non-experimental predictors (e.g., relatively few levels and
equal numbers of participants in various conditions defined by combinations of predictors).
The approach in this book will be to view analysis of variance as a special case of the
Scientific Psychology and Statistics Introduction - 1.16
more general correlation and regression methods (sometimes referred to as the general linear
model and abbreviated as GLM). That is, ANOVA can be thought of as a version of correlation
and regression analysis that has been simplified because of the highly structured nature of the
predictors.
Some books contrast correlational studies with experimental studies, but in such cases,
correlational is used as a synonym for non-experimental (i.e., no manipulated independent
variable). We will use correlation simply to describe statistical methods ideally suited to
analyzing relations between numerical, non-experimental variables (versus analysis of variance
methods for experimental, categorical predictors), and will use the more precise term "non-
experimental" to describe the underlying research method (versus experimental).
INTRODUCTION TO ANOVA
Analysis of Variance (ANOVA) refers to a family of statistical procedures that can be
used for a wide variety of experimental and non-experimental studies. ANOVA is in fact a
somewhat limited (yet still powerful) variant of multiple regression, as will be demonstrated
later. There are several properties of research designs that permit use of ANOVA rather than
regression.
Properties of ANOVA Predictors
Studies appropriate for ANOVA demonstrate some or all of certain characteristics,
predictors that are categorical (or can be treated as categorical), that involve a relatively small
number of levels (e.g., Male/Female or Low/Medium/High), and that are independent of one
another (or can be made independent in the data being analyzed).
Categorical predictors. In both experimental and non-experimental studies, researchers
are often interested in relationships between a numerical criterion variable and such categorical
predictors as gender, ethnicity, and numerous experimental variables relevant to psychological
theories or applied concerns. The predictor variables are categorical in the sense that different
levels of the variable (e.g., Male, Female) can be thought of as belonging to qualitatively
different categories, rather than differing in the amount of some underlying trait, as in the case of
such numerical or quantitative predictors as intelligence scores or self-ratings of subjective well-
being. Another characteristic of many categorical predictors is that they have a relatively small
Scientific Psychology and Statistics Introduction - 1.17
number of discrete levels (e.g., 2 to 6 or so), although that is not necessarily the case. The
ANOVA analyses developed for such categorical variables can in some cases also be applied to
numerical predictors, especially ones that involve equal numbers of participants at each of
relatively few and well-spaced levels (e.g., Young, Middle-Age, Older). Treating the numerical
predictor as categorical might facilitate, for example, the examination of an expected non-linear
relationship with a criterion variable(s). Although the ordering of such predictors would be
ignored in the initial, omnibus analysis of differences among the groups, order could be
considered to good effect in more specific follow-up analyses.
Independent predictors. Another characteristic of ANOVA designs, especially those
involving experimentally manipulated variables, is that multiple predictors can be made
independent of one another (i.e., multiple predictors are uncorrelated), thereby avoiding the
causal ambiguity that plagues non-experimental predictors. Randomly assigning participants in
equal numbers to various combinations of conditions ensures that the predictors are independent.
Independence can even be achieved in non-experimental studies if the number of participants of
various types can be controlled by the researcher. Age and Gender, for example, would be
independent if the researcher obtained equal numbers of Males and Females at each of three
levels of Age (Young, Middle-Age, Older). In non-experimental studies that do not control the
frequency of various combinations of attributes, predictors will generally be correlated (e.g.,
more older females than older males, parent income and parent education level), and multiple
regression will be the preferred method of analysis.
Differences among means and ANOVA. The basic question in studies with one or more
categorical predictors is whether mean scores on the numerical criterion or dependent variable
differ as a function of the group to which the observation belongs. ANOVA, for example, could
be used to determine whether mean depression scores vary between treatment and control
subjects, whether children with different attachment styles in infancy grow up to differ on the
personality trait of independence, and whether mean number of words recalled varied among
subjects who learned the material using imagery mnemonics, sentence mnemonics, or no
mnemonics. Or to illustrate with two non-experimental variables, whether memory varies as a
function of Age (Young, Mid, Older) and Gender (Male, Female).
Scientific Psychology and Statistics Introduction - 1.18
Although correlation and regression procedures can be used to analyze relationships
between numerical and categorical variables, special procedures collectively referred to as
Analysis of Variance (or ANOVA) have been developed to analyze studies in which a single
numerical criterion variable is related to one or more uncorrelated and categorical predictor
variables. We will demonstrate in subsequent chapters that ANOVA is a restricted variant of
regression analysis. More limited in some respects than regression, ANOVA is still a very
powerful statistical technique that can detect both simple and complex relationships in many
research designs. And ANOVA is more readily suited than regression for some purposes (e.g.,
analyses of factors that involve interactions or related observations at different levels of one or
more predictors).
The term “ANOVA” actually refers to a family of statistical procedures that involve some
common and some distinct elements. The research designs to which ANOVA can be applied
differ in a variety of ways that determine specific features of the analysis. Briefly stated, there
can be one or more predictors in the study, each predictor can have anywhere from two levels
(e.g., gender, treatment versus control groups) to multiple levels (e.g., different religions, effects
of several different drugs), and each predictor can involve related or unrelated observations at
each of its levels (e.g., independent subjects or the same subjects in different conditions). In all
these cases, the researchers are interested in possible differences among the means of the various
groups. We will see that the computations on these means will be identical across different
ANOVA designs. The estimate of the error variability used for hypothesis testing, however, will
depend somewhat on the specific design of the study.
Overview of ANOVA Designs
ANOVA terminology generally refers to predictor variables as Factors, the term that we
will adopt. Occasionally we will refer to them as predictors or independent variables (when
appropriate, as in a true experimental design). The ANOVA analysis appropriate for a specific
experimental design depends on the number of factors included in the study and the type of
factors, specifically whether between-subjects or within-subjects.
Between-Subject and Within-Subject factors. One important consideration in ANOVA is
whether the observations at the different levels of the factor(s) are related or not; that is, whether
Scientific Psychology and Statistics Introduction - 1.19
researchers have reason to expect scores in one condition to correlate with scores in other
conditions.
Observations in different conditions are unrelated when there is no meaningful
connection between individual observations in the different conditions. If subjects were
randomly assigned to two or more groups, there would be no expectation of a correlation or
relationship between the specific scores in any of the groups. Position being arbitrary, the first
and ninth people in Group 1 would not be expected to have similar relationships to one another
as the first and ninth people in Group 2 (i.e., if person 1 in group 1 had a higher score than person
9 in group 1, there would be no reason to infer anything from this about the relative standing of
persons 1 and 9 in group 2). To illustrate, if a randomly selected half of participants studied a list
of abstract words and the other half studied a list of concrete words, the recall scores in the
different groups would be uncorrelated except by chance. Scores at different levels of a non-
experimental variable will be similarly uncorrelated if the various groups are selected
independently of one another (e.g., 50 men and 50 women chosen completely independently).
Individual scores would be uncorrelated because there is no systematic relationship between
specific participants in the different conditions. Such factors are called Between-Subjects (or
Between-S) factors. Between-S factors are also referred to as Independent Groups or Completely
Randomized factors.
Correlated or related observations can occur for several reasons. One common way is if
the same participants are observed in different conditions. For example, the number of concrete
and abstract words recalled by participants who studied a list containing both kinds of words
would be expected to produce correlated scores (i.e., the relative rank of people’s concrete word
recall would be similar to their rank for abstract word recall). A correlation is expected because
many individual factors (e.g., fatigue, intelligence, motivation) will influence memory for both
kinds of words, even though one type of word might be recalled better (i.e., lead to higher mean
recall), which is the question addressed by ANOVA.
Somewhat more difficult to appreciate is that related observations can also occur with
different subjects in different conditions. If specific individuals in one group match specific
individuals in another group on variables relevant to the dependent variable, then a correlation
Scientific Psychology and Statistics Introduction - 1.20
would be expected. If one member from each of several twin pairs, for example, was assigned to
a control group and the other member of each pair to a treatment group, then we would expect
their scores on the criterion variable to be similar, at least to the extent that genetic factors and
common upbringing were associated with the criterion variable. A correlation would also be
expected if researchers matched participants prior to the experiment on some relevant variable
and then assigned matched subjects to different conditions. For example, clinical researchers
might pre-test participants on anxiety and then assign people to treatment or control conditions in
pairs; one treatment subject and one control subject would come from the highest scoring pair,
the next highest scoring pair, and so on down to the lowest scoring pair. The researchers would
expect post-test scores to be correlated, despite a lower average anxiety score for the treatment
group.
We will refer to predictor variables that involve related observations as Within-Subjects
(or Within-S) factors. Other terms for Within-S factors include Repeated Measures or
Randomized Blocks factors. Although the term Within-S is not completely appropriate because
matching can produce Within-S factors for variables that involve different subjects, this
terminology fits well with how statistical packages refer to such factors. You may be familiar
with this distinction between related and independent observations, although perhaps not the
terminology, from your previous study of the t-test. The Independent t-test is used for unrelated
observations (what we are calling Between-S factors) and the Paired Difference t-test is used for
related observations (what we are calling Within-S factors).
Single-factor and factorial designs. ANOVA designs also vary in the number of factors
in a study. Single-Factor studies involve just one predictor variable (e.g., Gender, or Age, or
Treatment Condition, ...). Because the one factor can be either a Within-S or Between-S factor,
there are two types of single-factor design and two corresponding types of ANOVA, with slightly
different calculations for Within-S and Between-S designs. Single-factor designs also vary in
terms of the number of levels of the single factor: such variables as Gender have two levels
(Male, Female), whereas other variables could have three or more levels. This has implications
for follow-up analyses, although not for the omnibus ANOVA.
Scientific Psychology and Statistics Introduction - 1.21
Factorial studies involve two
or more predictors, each with a
certain number of levels. Consider
first the case of two predictors. A
factorial study with two predictors
would include all combinations of the two factors, say two levels of Gender and three levels of
Age. Box 1.2 shows the possible number of participants in each cell of such a factorial study. A
cell or treatment condition is defined by the combination of levels of two or more factors (e.g.,
Young Males is a cell or condition). Box 1.2 shows 10 participants in each of the 6 cells defined
by the combination of each level of Age and each level of Gender (i.e., 10 Young Males, 10
Young Females, 10 Mid Males, and so on).
Factorial designs are often described in terms of the number of levels of each factor. The
design in Box 1.2, for example, would be a 2 × 3 Factorial. The total number of cells in a two-
factor design is the number of levels of one factor times the number of levels of the other factor
(e.g., 2 × 3 = 6), and the total number of observations in the entire study will be the number per
cell times the number of cells (10 × 6 = 60, in the preceding example). Although the number of
observations per cell can vary, there is much benefit to keeping the number equal. Most
importantly, this ensures that the two factors are uncorrelated and allows standard ANOVA
procedures to be used.
Details of the ANOVA analysis for a two-factor design would depend on whether both
factors were Between-S, one factor was Within-S and the other Between-S, or both factors were
Within-S. One possible design would involve both Gender and Age in Box 1.2 being Between-
S; that is, the required number of participants of a specific age and gender would be solicited
without consideration of the other conditions. It would also be possible, however, for one of the
factors, either Age or Gender, to be Within-S. Age would be Within-S if the same people were
assessed longitudinally at different ages; Gender would be Within-S if both members of Male-
Female twin pairs participated in the study. Or both Age and Gender would be Within-S if both
members of Male-Female twin pairs were observed longitudinally at three points in their lives.
These four different designs imply three somewhat distinct Factorial ANOVAs: (1) two
AgeGender Young Mid Older
Male 10 10 10Female 10 10 10
Box 1.2. N per Cell of 2 × 3 Factorial Design
Scientific Psychology and Statistics Introduction - 1.22
Between-S factors, (2) one Within-S factor plus one Between-S factor, or (3) two Within-S
factors. As we shall see, the analyses performed in these three distinct cases are different in
important ways, although they also share some commonalities. The differences among means
would be calculated in the same manner, but the calculation of the error terms to test significance
would vary.
Factorial designs can involve more than two factors, resulting in three, four, or even
higher order factorial designs. A 2 × 3 × 2 design, for example, is a three-factor design that
would have 12 distinct conditions defined by all possible combinations of the three factors. In
these more complex designs, each of the factors can be either Between-S or Within-S, which
produces an increasing number of distinct designs and ANOVAs.
Fortunately, the principles that apply to two factor designs generalize readily to higher
order designs, and the calculations for the differences among the means are identical for the
different designs. We will therefore focus primarily on understanding the single factor designs
and the various two factor designs, and then illustrate in the final chapters how the principles can
be extended to higher-order designs. We start by considering designs in which observations in
the different groups are independent of one another; that is, the factors are Between-S variables.
Later chapters extend the ideas presented here to Within-S designs and designs that involve a
mix of Between-S and Within-S factors. But first, we need to introduce the F-statistic itself.
Introduction to the F-Statistic
The F-statistic can be used to test many different kinds of research hypotheses. Although
nominally a test of the significance of the difference between two variances, the F-statistic can
also test the significance of the difference between two or more means (classic Analysis of
Variance or ANOVA), and the significance of regression analyses involving one or more
predictors.
The probability distribution for a test statistic (e.g., t, F) is called a sampling distribution.
The sampling distribution of F presents the probability of observing an F-value greater than or
equal to some value, making certain assumptions, such as that variances are equal, that
differences among means are due only to chance (ANOVA), or that there is no correlation
between a criterion variable and its predictor(s). Selected probabilities for the F distribution are
Scientific Psychology and Statistics Introduction - 1.23
Figure 1.2 Sampling Distribution of F-statistic.
presented in Appendix A.3; these are the F values corresponding to the probabilities most
commonly needed for hypothesis testing (e.g., .10, .05, .01).
Testing differences between variances. At its most basic, the F-statistic is the ratio of
two variances (i.e., F = s12/s2
2), and is used to determine whether or not the two variances are
sufficiently different to reject the null hypothesis that their population variances are equal (i.e.,
H0: σ12 = σ2
2). If F is sufficiently large, then reject the null hypothesis. Although in theory one
could also determine whether F is sufficiently small, the common practice is to put the larger
variance (or the variance expected to be larger) in the numerator, which will result in F always
assuming large values greater than 1.
Given FObserved, we can then use the sampling distribution of F in Appendix A.3 to
determine a critical value for F, which is how large FObserved must be to reject the null hypothesis
of no difference. The critical value of F depends on the degrees of freedom for the numerator of
F and the degrees of freedom for the denominator. For the ratio of two sample variances,
dfNumerator = n1 - 1 and dfDenominator = n2 - 1, where n1 and n2 represent the number of observations in
the numerator and denominator samples, respectively. Table A.3 shows the probability of F $ Fa
(the tabled number) when the H0 is true (i.e., σ12 = σ2
2). Fa indicates the specific value of F such
that p(F $ Fa) = a, where a is the area (or probability) to the right of Fa. For example, when two
samples of six subjects are selected from the
same population (i.e., df = 5, 5) and a = .05,
Fa = 5.05. That is, the table indicates that
p(F $ 5.05) = .05 for df = 5, 5. Assuming that
samples are selected from the same
population is equivalent to assuming that H0:
σ12 = σ2
2 is true, because each variance would
equal the common variance in the single
population from which they were selected.
The sampling distribution of F is
different from the sampling distribution for t
or other statistical distributions that you might
Scientific Psychology and Statistics Introduction - 1.24
know. Figure 1.2 shows an approximation of the F distribution. The horizontal axis represents
values of F from its minimum to its maximum, and the area defined by the vertical axis indicates
the probability of F taking on different values. Because variances cannot be negative, F must be
positive. Therefore, F ranges from a minimum of 0 (when the numerator is 0) to a maximum of
infinity (when the denominator is 0). Unlike t, the distribution of F is also asymmetrical. The
shape varies with the degrees of freedom, but generally there is a larger area (a hump) to the left
and a longer tail section to the right, as illustrated in Figure 1.2.
As shown in Figure 1.2, the probabilities in Appendix A.3 indicate an area (labelled a in
the Figure) above Fs of specified values. The area represents the probability that F is greater
than or equal to the specified value, Fa. In hypothesis testing, researchers decide how small the
probability must be before the null hypothesis would be rejected (e.g., .10, .05, .01, ...). This is
the alpha level (α), or probability of a Type I error (rejecting a true null hypothesis). Researchers
then find the critical value corresponding to the specified alpha and df. We will see in Chapter 2
that SPSS and other statistical programs also allow researchers to work with observed
probabilities, in addition to critical values.
We illustrate the use of the F-statistic to test the equality of two variances using data from
a study analyzed more fully in Chapter 2. The study involved scores for 6 participants in each of
two groups, s1 = 2.000 and s2 = 1.789. One way to test whether the group variances differ is to
perform an F test of the variances; that is, FObserved = 2.0002 / 1.7892 = 1.25. If we use α = .05,
then the critical value of F is F(.05; 5, 5) = 5.05. Because the observed F of 1.25 is not greater
than or equal to the critical value for F of 5.05, we would not reject H0: σ12 = σ2
2. By using a
critical value of 5.05, researchers ensure that, if the H0 is true, then FObserved will lead to rejection
of the (true) H0 only 5% of the time. It is critical to note that this F test of the ratio of two sample
variances is different than the F for testing differences between means.
The F-test and hypotheses about a single mean. The F statistic can be adapted to test
various hypotheses about means. Our discussion of differences between means begins in Chapter
2. Here we demonstrate the use of F to test hypotheses about a single mean. Consider the
situation in which a sample of 16 university students is administered an intelligence test (see Box
1.4 for the data), and obtains an average score of 108.75, with a standard deviation of 17.9536.
Scientific Psychology and Statistics Introduction - 1.25
Does this sample provide evidence that the IQ of university students differs from the average for
the general population, which is known to be 100.0?
This test is typically done using a t test. From the Central
Limit Theorem, the Standard Error of a sample mean is known to
be approximately the standard deviation of the sample divided by
the square root of the number of observations in the sample.
That is, SEy& . s/%n = 17.9536/%16 = 4.4884, in our particular
case. The distance of the observed mean from the hypothesized
value divided by SEy& is distributed as a t statistic and can be
compared to critical values for df = n - 1 = 16 - 1 = 15, for our
problem. That is, tObserved = (108.75 - 100.0) / 4.4884 =
1.1.94947. The critical value of t for a two-tailed test with df =
15 is 2.131 (from Appendix A.2), assuming α = .05 and a two-
tailed test. Therefore, the researchers would not reject the null
hypothesis that µ = 100.0 for the population of university
students.
Doing this as an F test is straightforward. For the denominator, the variance of the
sample, s2 = 17.95362 = 322.3318, represents random variation or noise. The variation is random
because we do not know why individuals in the group have different IQs. For the numerator, the
squared deviation of the observed mean (108.75) from the hypothesized mean (100.0), weighted
by the number of observations (more on this later), provides a variance that reflects how much
the two means differ. That is, the Numerator = 16 × (108.75 - 100.0)2 = 1225.00. This quantity
represents how close the sample mean is to the hypothesized value. If the sample mean was
closer to 100.00, then the Numerator would be smaller than 1225.00, and if the sample mean was
further away from 100.00, the the Numerator would be larger than 1225.00. The Numerator is
divided by the Denominator to give FObserved = 1225.0 / 322.3318 = 3.80043, with df = 1, 15. The
critical value of F with α = .05 is 4.54, and we fail to reject H0: µ = 100.0.
Observe that the t and F tests are equivalent: FObserved = 3.80043 = 1.949472 = tObserved2, and
FCritical = 4.54 = 2.1312 = tCritical2. Therefore, whenever t is significant, so is F, and vice versa, and
IQ 85117124115 82119134106111111145116 98 96 99 82
Mean 108.75SS 4835.00SD 17.9536
Box 1.4. Sample of 16 IQs.
Scientific Psychology and Statistics Introduction - 1.26
whenever one test is not significant, so is the other. This equality holds true for hypotheses about
a single mean, as here, and for hypotheses about the differences between two means, as covered
in the following chapter.
SPSS and Hypotheses about Single Means
There are a
number of ways to
perform the
preceding analysis in
SPSS. Box 1.5
shows the SPSS
commands to enter
the data, and then
conduct a single
sample t-test. The
results agree with our
earlier calculations,
and we learn the additional information that the p for the observed t is .07, which means that p (t
$ 1.949 | H0 true) = .07.
ANOVA is not often used to test a single mean, so the various SPSS ANOVA commands
do not provide a mechanism for specifying 100 as the null hypothesis. We can, however, test
DATA LIST FREE / subj iq.BEGIN DATA 1 85 2 117 3 124 4 115 5 82 6 119 7 134 8 106 9 111 10 111 11 145 12 116 13 98 14 96 15 99 16 82END DATA.
TTEST /TESTVALU = 100 /VARIABLE = iq.
N Mean Std. Deviation Std. Error Mean IQ 16 108.750000 17.9536440 4.4884110
Test Value = 100
t df Sig. Mean 95% Conf I nt of Diff (2-tailed) Difference Lower Upper IQ 1.949 15 .070 8.750000 -.816822 18.316822
Box 1.5. SPSS Data Entry and T-test Commands.
COMPUTE iq2 = iq - 100.MANOVA iq2 /PRINT = CELL.
Variable .. IQ2 Mean Std. Dev. N For entire sample 8.750 17.954 16
Tests of Significance for IQ2 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 4835.00 15 322.33 CONSTANT 1225.00 1 1225.00 3.80 .070
Box 1.6. F-test for Single Sample Hypothesis.
Scientific Psychology and Statistics Introduction - 1.27
whether the mean differs from 0. If we first subtract 100 from the scores, testing whether the
mean of the scores minus 100 differs from 0 is equivalent to testing whether the mean of 108.75
differs from 100. Box 1.6 shows the relevant commands. The COMPUTE statement subtracts
100 from each score, and the MANOVA command with a single score and no factors tests
whether the mean of the scores differs from 0. Note that the p value of .07 is identical to that for
t, that F = t2, and that the SSs for the numerator (CONSTANT in Box 1.6) and denominator
(WITHIN CELLS in Box 1.6) agree with our earlier calculations. We again can reject H0: µ =
100.
CONCLUSION
To summarize, psychologists obtain empirical data relevant to theoretical constructs and
their causes by translating abstract constructs into concrete, observable variables (e.g., a score on
a standardized IQ test, amount of time looking into another person's eyes, imagery ratings for
words, number of sad words checked off a list of emotion terms), and determining whether the
observed relationships between the variables are consistent with theoretical expectations. If the
results are not consistent with the theoretical predictions, then the empirical study needs to be
reviewed for flaws, or perhaps the theory needs to be rejected or revised. In testing hypotheses,
researchers measure an outcome or criterion variable (i.e., the dependent variable), and either
measure or manipulate directly a predictor variable (or independent variable). Predictor and
criterion variables can also be classified as categorical or numerical. The levels of numerical
variables are ordered in some quantitative manner, whereas the levels of categorical variables
simply represent differences rather than amounts of some quality.
ANOVA can be used to analyze the results of studies that involve numerical criterion
variables and predictors that demonstrate most of the following properties: usually categorical
with relatively few and regular levels, the same or similar number of observations per category,
and independence of one predictor from other predictors (and from extraneous variables if
experimental in nature). Such properties are most easily achieved with experimental factors. In
the absence of such predictors, the more general correlation and regression methods are
preferred. Correlation and regression can accommodate not only ANOVA-type predictors, but
also predictors that involve scattered and irregular numerical values (i.e., numerous levels and
Scientific Psychology and Statistics Introduction - 1.28
unequal numbers of participants) and are correlated with other extraneous variables because they
are non-experimental. Some extraneous variables may be measured, which allows statistical
testing of the unique contribution of each measured predictor independent of the others.
ANOVA refers to a family of statistical techniques that can be used for single-factor or factorial
studies, with one or more of the factors being either between-S or within-S factors. We begin in
the next chapter with a single categorical variable involving uncorrelated observations.
Between-Subject Design Single-Factor - 2.1
Chapter 02:
SINGLE-FACTOR BETWEEN-S DESIGN FOR K = 2
Single-Factor Between-S Design (k = 2) . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
T-test Approach to Two-group Study . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 3
ANOVA Approach to Two-group Study . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
Equivalence of F and t for Two Groups . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 9
ETA2 (η2) and Strength of Relationship . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 10
SPSS Analyses for the Two-Group Between-S ANOVA . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 10
SPSS TTEST Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 11
SPSS ONEWAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 13
Other SPSS ANOVA Commands . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 15
Between-Subject Design Single-Factor - 2.2
Agitation Scores Control (1) Treatment (2)
11 7 14 12 12 11 9 9 13 12
13 9
y&1 = 12.0 y&2 = 10.0
SS1= 16.0 SS 2= 20.0
s² p = (16 + 20) / (6 + 6 -2) = 36 / 10
= 3.6 [= MS Error ]
sp = %3.6 = 1.897
H0: µ 1 - µ 2 = 0 H a: µ 1 - µ 2 =/ 0
df = 10 t .05,10 = 2.228 [= %F.05;1,10 ]
t = (12 - 10) / (1.897 %(1/6+1/6))
= 2 / 1.095 = 1.83 [= %F]
Box 2.1. T-test for Independent Groups.
This chapter introduces the Single-Factor Between-S design for two groups (i.e., k = 2,
where k stands for the number of groups). We will see that both the t-test and ANOVA can
handle the case of two independent groups. Some of this material will be review from earlier
statistics courses. We will then examine various ways in which SPSS can perform this analysis:
t-test, ANOVA, and regression. A general notation for ANOVA will be introduced in Chapter 3,
and the methods extended to more than two groups (i.e., k $ 2).
SINGLE-FACTOR BETWEEN-S DESIGN (K = 2)
The simplest Between-S study is one in which two groups are compared on some
criterion variable. Either t-tests or ANOVA can be used to test the difference between the means
of two groups, with ANOVA leading to exactly the same conclusion as a t-test. Specifically, we
will see that t2Observed = FObserved and t2
α = Fα. This means that if t is significant (i.e., tObserved $ tα),
then F will be as well (i.e., FObserved $ Fα). The ANOVA, however, is more general than the t-test
and can be used even when there are more than two levels of a factor, or when there are multiple
factors (e.g., the separate and joint effects of both Gender and Age). Thus, in more complex
designs, ANOVA will provide
information that the t-test
cannot.
The analyses will be
illustrated first with the small
dataset shown in Box 2.1.
Agitation scores were obtained
on two groups (n1 = 6, n2 = 6).
Group 2, the treatment group,
had completed a course in
relaxation methods, whereas
the Group 1, the control group,
had not received such training.
Participants were randomly
assigned to the two groups;
Between-Subject Design Single-Factor - 2.3
H0:µ1-µ2'0 Ha:
µ1-µ2…0µ1-µ2>0µ1-µ2<0
t'( y1- y2)&0
sp1n1
+ 1n2
s2p '
SS1%SS2
n1%n2&2df'n1%n2&2
Equation 2.1. Difference Between Independent Means.
hence, the observations in the groups are independent and we expect no correlation between the
two sets of scores. The data will be analyzed in several different ways, all of which lead to the
same statistical conclusions. Equalities shown in square brackets ([]) in Box 2.1 and later boxes
demonstrate some of the many parallels between the different analyses and will be explained at
appropriate times. We begin with the independent groups t-test.
T-test Approach to Two-group Study
Equation 2.1 shows formula for one approach to the independent groups t-test. The H0 of
no difference between the means is contrasted against one of three alternative hypotheses shown
in Equation 2.1 (a two-tailed “not equal” alternative, or one of two one-tailed alternatives).
Equation 2.1 shows a common variant of the independent groups t-test in which the SSs and dfs
for groups 1 and 2 are pooled to create a pooled variance, sp2. The pooled variance appears in the
denominator of the t-test (i.e., in the computation of the SE for the difference between
independent means). The tObserved is compared to tCritical for df = n1 + n2 - 2.
Box 2.1 shows the calculation of this independent groups t-test for our hypothetical data.
The test will determine whether the mean agitation score observed for the control group (y&1 =
12.0) and the treatment group (y&2 = 10.0) are far enough apart that it is unlikely (but not
impossible) that this difference occurred by chance. If the difference is large enough to be
deemed significant, then researchers will conclude that a real difference exists between the two
means for the populations from which the two groups were selected. That is, researchers will
reject the null hypothesis µ1 = µ2, or equivalently, µ1 - µ2 = 0.
The calculations shown in Box 2.1 are those represented by Equation 2.1. The SSs within
the two groups and their corresponding dfs are pooled together to create a pooled estimate of the
population variance: sp² = (SS1 + SS2) / (df1 + df2) = 3.6. This pooled estimate is used to
Between-Subject Design Single-Factor - 2.4
calculate the standard deviation of the difference between the means (i.e., the Standard Error):
SEy&1- y&2 = sp × SQRT(1/n1 + 1/n2) = 1.095. The SE of the difference indicates how much
variability in the means would be expected just by chance given the amount of random variation
observed within each of the two groups and the sample sizes.
The difference between the means is divided by its SE to produce the observed value of t.
This observed value is then compared to the critical value of t needed to reject the null hypothesis
of equal population means for groups one and two. The df for the critical value of t is (n1 - 1) +
(n2 - 1) = n1 + n2 - 2 = 10, the sum of the dfs for the two groups and the denominator for s2p. In
the present example, the observed t of 1.83 is not greater than the two-tailed critical value of
2.228 for α = .05. Therefore, the researchers cannot reject the H0, and cannot conclude that the
population means differ from one another.
The preceding analysis assumes a two-tailed alternative hypothesis (i.e., µ1 =/ µ2). For a
one-tailed test in which the researchers have a priori grounds to ignore the possibility that group
2 might have a higher score than group 1, tα = 1.812, and the null hypothesis would be rejected in
favor of Ha: µ1 - µ2 > 0 (equivalently, µ1 < µ2).
ANOVA Approach to Two-group Study
The t-test is designed for comparisons of two groups, but many studies involve
comparisons among more than two groups. Educational researchers, for example, might
compare two or more experimental reading programs relative to a standard or control technique.
Social psychologists might want to compare the effectiveness of multiple conditions for changing
people’s attitudes (e.g., Control, Layperson, Expert). Neuroscientists might compare the effects
of several different drugs versus placebo conditions. Clinical psychologists may similarly
compare three different treatments for a particular disorder. Cognitive psychologists may contrast
theoretical predictions about reading differences for low, medium, and high frequency words.
Such designs require a statistical technique that can assess the significance of differences among
more than two means. ANOVA is just such a technique.
The trick to using F to test the differences between the means is to calculate a numerator
variance that reflects the variability among the means and a denominator variance that reflects
random variability within the groups (i.e., variation due to chance or noise). The denominator for
Between-Subject Design Single-Factor - 2.5
Control Treatment y1 y-y &1 y2 y-y &2
11 -1 7 -314 2 12 212 0 11 1 9 -3 9 -113 1 12 213 1 9 -1
y&1 = 12.0 y &2 = 10.0 y &G = 11.0
SS 1 = 16.0 SS 2 = 20.0
y &1-y&G +1.0 y &2-y&G -1.0
SSTotal = 48.0 = ΣΣ(y - y &G)² = SS Treatment + SS Error
= (11-11.0) 2 + (14-11.0) 2 + ... + (9-11.0) 2
SSError = 36.0 = SS 1+SS2 = Σ(y-y&1)² + Σ(y-y&2)²= 16.0 + 20.0
SSTreatment = 12.0 = SS Treatment - SS Error = 48.0 - 36.0
= n 1(y&1-y&G)² + n 2(y&2-y&G)² = 6×1.0² + 6×-1.0²
Box 2.2. ANOVA for Two-group Comparison.
two groups is simply the pooled variance used in the t-test, sp2. With more than two groups, the
SSs and dfs would be pooled for all of the groups to produce a pooled variance: that is, sp2 = (SS1
+ SS2 + SS3 ... ) / (df1 + df2 + df3 ...). Another name for variance is Mean Square (MS). In
ANOVA this denominator is often called MSError (or MSWithin), and the corresponding SS is termed
SSError.
SSTreatment by subtraction. There are several ways to determine the numerator variance,
which will represent the treatment effect. Perhaps the simplest is to consider the total variability
in all of the scores (i.e., SSTotal) as consisting of the random or error variation just described (i.e.,
SSError) plus the treatment variation; that is, SSTotal = SSError + SSTreatment. Thus, SSTreatment can be
calculated by subtraction; SSTreatment = SSTotal - SSError. Dividing SSTreatment by its appropriate df
would give a variance that reflects differences among the means. Box 2.2 presents the data for
our example problem and the calculations relevant to ANOVA.
First, consider the total amount of variability in all 12 scores. The total variability is
measured by calculating the squared deviations of the scores from the mean of all the scores,
which is called the Grand Mean (y&G = 11.0) to distinguish it from the means for the separate
Between-Subject Design Single-Factor - 2.6
groups. This measure of total variability is called SSTotal (or simply SSy), just as in regression.
SSTotal equals 48.0 units in our example. Note that y&G and SSTotal measure the average and
variability that would result if we completely ignored the Group variable (or factor) and treated
the 12 scores as coming from a single sample.
SSError is calculated by summing the SSs within each group across all of the groups, the
identical operation used to calculate the pooled variance. Each group's SS represents the sum of
the squared deviations of each observation about its group mean; these deviations from the group
means are shown in Box 2.2. The SS for group 1 (16.0 units) and the SS for group 2 (20.0 units)
are summed to obtain an SSError equal to 36.0. This SSError or SSWithin reflects the variability within
each of the groups being compared. Summing SS within groups to calculate the SS for the
pooled estimate of the variance for the independent groups t-test is extended in ANOVA to
multiple groups. That is, the SSs could be summed across many groups, not just two; for
example, if we had 4 groups, then SSError would be the four SSs within each group summed over
all four groups.
SSError is less than SSTotal, which indicates that variability within groups does not account
for all of the variation in the scores. If the remaining SS is not due to variability within groups
(as measured by SSError), then it must reflect variability between groups; that is, differences
among the treatment means. In our example, SSTotal - SSError = 12.0 units, suggesting that 12.0
units of variability are accounted for by our treatment variable; that is, by differences between the
groups. This is SSTreatment (or SSBetween). If the two means were identical, then there would only be
variability within the groups, and SSError would equal SSTotal and SSTreatment would equal 0.
Direct calculation of SSTreatment. Although SSTreatment can be obtained by subtracting SSError
from SSTotal, as we have just done, it can also be calculated independently. Specifically, SSTreatment
can be calculated from the deviations of the group means (y&1 and y&2 ) from the Grand Mean
(y&G). This variability of treatment means about the grand mean represents the difference
between SSTotal and SSError. Calculating SSError, SSTreatment, and SSTotal independently demonstrates
directly that independent groups ANOVA divides or partitions the total variability in the
observations (SSTotal) into two distinct components, one of which represents variability within
groups (SSError) and the other variability between groups (SSTreatment). It also confirms that the
Between-Subject Design Single-Factor - 2.7
computations were done correctly.
The direct calculation of SSTreatment involves first computing the deviation of each
treatment mean from the grand mean; that is, y&1 - y&G = 12.0 - 11.0 = +1.0, y&2 - y&G = 10.0 - 11.0
= !1.0 (and so on, if there were more than two groups). Each deviation is referred to as the
treatment effect for that group. If all group means were identical (i.e., there were absolutely no
differences among group means), then the group means would all equal the grand mean and all
treatment effects would be zero. The only source of variation in the scores would be variability
within groups or error. The larger the treatment effects are (i.e., the larger the deviations of
group means from the grand mean), then the greater the variability among the group means (i.e.,
the larger their differences from one another and from the grand mean).
To obtain SSTreatment, square the deviations of the treatment means from the grand mean
(i.e., square the effects), multiply by the number of subjects in each group (n1 and n2), and sum
the resulting values. As shown in Box 2.2, these calculations lead to SSTreatment = 6 × +1.02 + 6 ×
!1.02 = 12.0, which agrees with the value obtained by subtraction. If there were more variability
among the group means (i.e., a bigger difference between the two means), then SSTreatment would
be larger than 12.0, and if there were less variability in the means then it would be smaller.
In summary, the ANOVA calculations divide the total variability for all 12 scores (SSTotal
= 48.0 units) into error variability within groups (SSError = 36.0 units) and treatment variability
between groups (SSTreatment = 12.0 units).
Calculating MSs, F, and significance. The next step in ANOVA is to determine whether
there is more variability between groups than we would expect by chance given the variability
within groups and the sample sizes. As we have seen before, the F statistic determines whether
two measures of variability (i.e., variances or Mean Squares) differ significantly from one
another. To perform the F test, SSTreatment and SSError are divided by their respective dfs to produce
Mean Squares (MSTreatment and MSError). These various quantities are presented in Box 2.3.
The df for SSError is equal to the sum of the dfs for the individual groups, (n1 - 1) + (n2 - 1)
= 5 + 5 = 12 - 2 = 10 = N - 2, where N represents the total number of observations. The rationale
for this df is that SSError is the deviation of N scores from two group means; hence N - 2. If there
were more than two groups (say k), the df would be N - k.
Between-Subject Design Single-Factor - 2.8
The df for SSTreatment equals the number of groups (two in our example) minus one, 2 - 1 =
1, because SSTreatment is the deviation of two group means about a single grand mean. For the
Single-Factor Between-S design in general, dfTreatment = k - 1, the deviation of k sample means
from a single grand mean. The sum of dfError and dfTreatment is 10 + 1 = 11 = N - 1 = dfTotal. These
dfs and the corresponding MSs (SS/df) are summarized in a standard ANOVA summary table in
Box 2.3. Note that MSError = 3.6 is identical to sp² from the t-test.
The final step in ANOVA is to determine whether MSTreatment is significantly greater than
MSError. If the null hypothesis of no difference among the group means is true, then MSTreatment and
MSError will be approximately equal to one another. The greater the variability between groups
relative to the variability within groups, the greater the likelihood that the differences are not due
to chance. The F statistic is the ratio of MSTreatment over MSError. If H0 is true, then F should be
approximately equal to 1. However, large values of F can occur by chance even when H0 is true.
But the probability of these large values is known; for example, given the present design, p(F $
4.96) = .05 if the H0 is true.
The observed value of F is 3.33. To determine whether 3.33 indicates significant
variability between groups, the observed F is compared to Fα, the value of F expected 100×α% or
less of the time if the H0 were true. As noted previously, tables of F present F values as a
function of dfs for the numerator and denominator of the F ratio. In our two-group, Between-S
ANOVA with 6 participants per group, dfNumerator = 2 - 1 = 1 and dfDenominator = 6 + 6 - 2 = 10.
Appendix A.3 gives the critical values of F at the .05 level for various degrees of freedom. For df
Source SS df MS F
Treatment 12.0 k-1 = 1 12.0 3.33 [= t 2]
Error 36.0 n-k = 10 3.6 [= sp²]
Total 48.0 n-1 = 11
H0: µ1 = µ2 Ha: (one of) equalities is false i.e., µ1 =/ µ2
F.05; 1, 10 = 4.96 Do not Reject H 0 [ %4.96 = 2.23 = t .05, 10 ]
η² = 12.0/48.0 = .25 η=.5
Box 2.3. Between-S ANOVA Summary Table.
Between-Subject Design Single-Factor - 2.9
= 1 and 10, Fcritical = 4.96. Because FObserved is not greater than or equal to FCritical, the H0 of no
difference between the group population means cannot be rejected. The probability of an F of
3.33 or larger by chance alone is too high to reject the null hypothesis. If we were to reject the
null for F less than 4.96, then the probability that we would be rejecting a true H0 and making a
Type I error would be greater than .05, a risk that we are generally willing to take only under
exceptional circumstances (e.g., a pilot study, only minor consequences of Type I error).
Equivalence of F and t for Two Groups
The equivalence of the t-test and ANOVA approaches is easy to demonstrate. Note first
that both tests led to the same conclusion (i.e., do not reject H0), and that the numerators and
denominators of t and F are sensitive to the same factors. The numerators both depend on
variation in the group means (either differences between two group means for t or deviations of
two or more group means from the grand mean for F). The variances (s2Pooled and MSError) are
identical in both cases and depend on variability within groups. The sample sizes also contribute
similarly to the magnitude of the two statistics, albeit by making the denominator smaller in the
case of t and the numerator larger in the case of F. Ultimately the equivalence of the two tests
rests on the fact that F = t² (equivalently, t = %F) and F.05;1,10 = t2.05,10 (equivalently, t.05,10 =
%F.05;1,10). Thus, F and t always lead to the same conclusions about the null hypothesis. That is,
if F is greater than F.05, then t must also be greater than t.05. These various equivalencies appear
in square brackets ([ ]) in the preceding boxes.
The equality between Fα and t2α depends on the use of a two-tailed t-test. As typically
used, the F test is a two-tailed test of the difference between means because SSTreatment is
insensitive to the direction of the differences among the groups. SSTreatment in the present case
would have been exactly the same if the control group scored 10.0 and the treatment group
scored 12.0, the reverse of the present outcome. To do a one-tailed F test, if required, use a
critical value of F corresponding to the .10 level of probability. Half of this .10 (.05) would be
for the predicted direction of difference. To illustrate, F.10; 1, 10 = 3.29 and we would reject H0 that
the means are equal and accept Ha: µ1 > µ2. Note that %3.29 = 1.81, which is the critical value of
t for a one-tailed test with α = .05.
Between-Subject Design Single-Factor - 2.10
ETA2 (η2) and Strength of Relationship
The standard F test can produce significant effects even for small differences between
groups. This can occur because larger samples produce larger values for t and F, other things
being equal. Because of this possible limitation of significance testing, researchers often report
measures of the strength of the relationship between categorical and numerical variables
(analogous to r² and R²). One such measure is called η² (eta squared). This statistic indicates the
proportion of total variability that can be attributed to treatment variability, that is to between
groups variability in the independent groups design. η² = SSTreatment/SSTotal = 12.0/48.0 = .25 in
this study. Stated verbally, the difference between groups accounted for 25% of the total
variability in the 12 scores.
SPSS ANALYSES FOR THE TWO-GROUP BETWEEN-S ANOVA
Because SPSS is a general-purpose statistical package that runs on various different
computer systems (platforms), we will demonstrate several approaches to the SPSS analyses in
this and subsequent chapters. For a general introduction to SPSS, see Appendix C. The
convention that we will follow here is to present text that users enter in bold font, with output in
non-bold. Also we will present SPSS command terms in uppercase letters and other text in
lowercase, although SPSS does not require that commands be entered as uppercase characters.
Finally, supplementary notes and calculations will be italicized when presented in the output.
For any Single-Factor Between-S design, users typically enter two numbers for each case,
one number to indicate which group the observation belongs to (i.e., the factor or independent
variable) and the other number to indicate the value observed for the criterion or dependent
variable. Such data can be entered in SPSS as shown in lines 1 to 5 of Box 2.4 (i.e., from DATA
LIST to END DATA. These commands would be typed into the syntax window (Windows
version) or job file (Unix version). Alternatively, the values can be typed directly into the SPSS
Data Editor in Windows (shown in Figure 2.1). The independent and dependent variables are
listed separately, here with 5 cases per row. The order of the agitation and group variables makes
no difference, as long as the order of the variable names in the DATA LIST statement
corresponds to the order of data. However entered, the data would ultimately be represented in
SPSS as shown in Figure 2.1.
Between-Subject Design Single-Factor - 2.11
DATA LIST FREE /group agit.BEGIN DATA1 11 1 14 1 12 1 9 1 13 1 132 7 2 12 2 11 2 9 2 12 2 9END DATA.
TTEST GROUPS = group /VARIABLE = agit.
Variable Number Standard St andard of Cases Mean Deviation Error--------------------------------------------------- ---------AGIT GROUP 1 6 12.0000 1.789 .730 GROUP 2 6 10.0000 2.000 .816
| Pooled Variance Estimate |Separate Variance Estimate F 2-tail| t Degrees of 2-tail| t Degrees of 2-tailValue Prob.| Value Freedom Prob. | Value Freed om Prob.-----------+------------------------+-------------- ------------1.25 .813 | 1.83 10 .098 | 1.83 9. 88 .098
Box 2.4. SPSS Data Entry and t-test Analysis.
Figure 2.1. SPSS Data Editorwith Values.
Figure 2.1 shows the SPSS Data Editor after the data has been entered manually or
entered using the DATA LIST commands in Box 2.1. We can
now enter commands in syntax to perform the desired tests, or
select analyses from the pull-down menus. Let us first perform
the t-test described earlier.
SPSS TTEST Analysis
Using syntax, the general format of the independent
groups TTEST command is: TTEST GROUPS =
independent[(low,high)] /VARIABLES = dependent, where
independent refers to one or more categorical predictor variables,
dependent refers to one or more dependent variables, and the
optional low, high value(s) are the levels of the variable to be
compared (if not equal to the default values of 1 and 2).
Line 6 of Box 2.4 shows the SPSS syntax to perform an independent groups t-test for our
data. The /GROUPS= independent indicates the grouping variable (called group here, although it
could have any name selected by the researchers). The /VARIABLE = dependent command
Between-Subject Design Single-Factor - 2.12
specifies the dependent variable, agit in our case. The subsequent text in Box 2.4 is output
produced by SPSS. Compare this output to the calculations performed earlier. Note that the
analysis produces the independent groups t calculated above, tObserved = 1.83, for the pooled
variance estimate.
The two-tailed significance of t is .098. This means that if the H0 is true, then p(t $ +1.83
or t # !1.83) = .098. Only 9.8 times out of 100 would we expect t to be greater than or equal to
1.83 or less than or equal to -1.83, if the H0 is true. This is a relatively low probability, but it does
not meet the standard value for alpha of .05. For a two-tailed (or non-directional) test, we would
not reject the H0 of no difference between the population means for the two groups. This is the
same conclusion that we arrived at earlier using the critical value approach. Note the equivalence
of the tCritical and p value approaches. If |tObserved| $ |tCritical| (i.e., absolute value of tObserved greater
than or equal to absolute value of tCritical), then pObserved # α.
If we had predicted for theoretical reasons that the treatment group (Group 2) would have
lower agitation scores than the control group, then a one-tailed (or directional) test would be
appropriate. The observed probability would be divided by 2 (.098/2 = .049). We would now
reject H0, which is again the same conclusion that we arrived at earlier using the tCritical approach
and a one-tailed critical value.
TTEST also provides the means and standard deviations for the two groups, which is in
fact all that is required to perform the calculations for the t-test. Detailed calculations were
shown earlier, but briefly, the pooled variance used in the denominator of t, s2p = 36.0/10 = 3.6,
involves pooled SSs from the two groups {i.e., SSPool = (6-1)1.789² + (6-1)2.000² = 16.0 + 20.0 =
36.0} and pooled dfs from the two groups {i.e., dfPool = (n1 - 1) + (n2 - 1) = n1 + n2 - 2 = 10}. The
numerator is simply the difference between the means minus 0, the hypothesized value for the
difference.
Box 2.4 also reports an F value to the left of the row containing the t. This F is a test of
the equivalence of the variances of the two groups, which is relevant to whether or not it is
justified to pool the two variances. This test was performed manually at the end of Chapter 1, and
we concluded using the FCritical approach not to reject the H0 of equal variances. Here we learn
that the probability of FObserved is not even closely to significant. The way to interpret the p of .813
Between-Subject Design Single-Factor - 2.13
ONEWAY dependent BY factor
[/STATISTICS = DESCRIPTIVES]
[/RANGES = LSD[(α)] SNK[(α)] TUKEY[(α)] [/RANGES ...]]
[/POSTHOC = ...]
[/POLYNOMIAL = n] [/CONTRAST = coefficients [/CONTRAST ...]]
Box 2.5. Format for SPSS ONEWAY command.
ONEWAY agit BY group /STATISTICS = DESCRIPTIVE.
N Mean Std. Std. 95% Confide nce Interval for Minimum Maximum Deviation Error Mean Lower Bound Upper Bound 1.0000 6 12.000000 1.7888544 .7302967 10.122712 13.877288 9.0000 14.000 2.0000 6 10.000000 2.0000000 .8164966 7.901129 12.098871 7.0000 12.000 Total 12 11.000000 2.0889319 .6030227 9.672756 12.327244 7.0000 14.000
Sum of Squares df Mean Square F Sig. Between Groups 12.000 1 12.000 3.333 .098 Within Groups 36.000 10 3.600 Total 48.000 11
Box 2.6. Independent Groups ANOVA with SPSS ONEWAY Procedure.
is: p(s12/s2
2 $ 1.25 or s22/s1
2 $ 1.25) = .813 if H0 of equal variances is true. The results in Box 2.4
are from an early version of SPSS still used on some Unix systems, and later versions report a
different test of the equality of variances.
SPSS ONEWAY
ANOVA
with a single
predictor is
sometimes called
one-way ANOVA,
hence the name of
one of several SPSS
procedures that perform ANOVA for Single-Factor Between-S designs. The syntax for
ONEWAY is shown in Box 2.5, including some common options that are discussed in later
chapters. Dependent is the list of names for the dependent variables, factor is the name of the
independent variable or factor. The /STATISTICS = DESCRIPTIVE subcommand produces
descriptive statistics (i.e., means, standard deviations, and confidence intervals for the variables).
POLYNOMIAL, CONTRAST, and RANGES subcommands are explained later.
Box 2.6 demonstrates the use of SPSS ONEWAY to perform a single-factor Between-S
ANOVA for the agitation study. Agit is our dependent variable, and group the independent
factor. ONEWAY provides standard ANOVA results in a typical summary table: SSTreatment
(called Between Groups), SSError (called Within Groups), dfs, MSs, F, and p. The values agree
Between-Subject Design Single-Factor - 2.14
Figure 2.2. Menu Approach to Oneway ANOVA.
with earlier calculations and with the equivalent t-test results just reported. Note in particular that
the p values are equivalent, except for rounding, that %F = %3.33 = 1.83 = t, and that MSError =
s2Pool from the t-test analysis.
Although η² is not provided by ONEWAY, this could be calculated by dividing SSTreatment
(Between) by SSTotal. That is, η² = 12.0/48.0 = .25. This statistic informs both researchers and
readers that 25% of the total variability in the agitation scores (agit) is associated with
differences between the two groups. The remaining 75% is variability within the two groups.
Accounting for 25% of the variability is a relatively robust effect for psychology and indeed for
many other disciplines.
The F probability in the last column of the summary table is the probability of an F as
large as that observed given the null hypothesis is true, that is, p(F $ 3.333 | H0: µ1 = µ2) = .0979.
Recall that this is a two-tailed (non-directional) probability. Because .0979 is not less than or
equal to .05, we do not reject the null hypothesis. If a one-tailed (directional) hypothesis was
appropriate, then the p would be divided by two and it would be less than the critical value of
.05; hence we would reject the null hypothesis and accept the alternative. That is, .0979/2 = .049,
which is less than .05 and falls in the rejection region.
To perform this analysis in
SPSS for Windows, one could either
enter the commands in Box 2.6 into a
syntax window or select the ANOVA
analysis from the pull-down menus.
Figure 2.2 shows the initial steps:
Analyze | Compare Means | One-Way
ANOVA.
Between-Subject Design Single-Factor - 2.15
Figure 2.3. Specification of One-WayANOVA by Menu Commands.
Selecting One-Way ANOVA brings up
the dialog box shown in Figure 2.3, at which
point users would select in turn the independent
and dependent variables and direct them into
the appropriate field. In Figure 2.3, the “group”
variable has already been selected and moved
into the Factor field. Agit is highlighted and
clicking on the black arrow will move the name
into the Dependent List. Descriptive statistics could be selected via the Options button. Once all
the commands are complete, clicking on Ok will produce output very similar or identical to that
in Box 2.6.
Other SPSS ANOVA Commands
SPSS has a number of other commands that perform ANOVA, and we will eventually
need to use these commands to analyze more complex designs. Note that ONEWAY, as its name
implies, can only analyze single factor designs, and only Between-S designs at that. Although
the specifics of the commands and their output may vary somewhat, all SPSS procedures include
in some manner the various components seen in ONEWAY; that is, some way to specify the
independent and dependent variables, some way to request descriptive statistics, and output that
includes the various statistics reported by ONEWAY (e.g., SSs, dfs, MSs, F, p) and even some
that were not (e.g., eta2).
MANOVA. Box 2.7 shows syntax commands and output for the MANOVA command, one
MANOVA agit BY group(1 2) /PRINT = CELLINFO.
FACTOR CODE Mean Std. Dev. N GROUP 1 12.000 1.789 6 GROUP 2 10.000 2.000 6 For entire sample 11.000 2.089 12
Source of Variation SS DF MS F Sig of F WITHIN CELLS 36.00 10 3.60 GROUP 12.00 1 12.00 3.33 .098 (Model) 12.00 1 12.00 3.33 .098 (Total) 48.00 11 4.36 R-Squared = .250 Adjusted R-Squared = .1 75
Box 2.7. SPSS MANOVA Analysis.
Between-Subject Design Single-Factor - 2.16
Figure 2.4. Running GLM from the Windows Menu.
of the most powerful ANOVA commands in SPSS. The first line is the commands typed into the
SPSS job file, and the remainder is the output. The /PRINT=CELLINFO subcommand is
analogous to ONEWAY’s /STATISTICS=DESCRIPTIVES, and produces the cell means and
standard deviations shown before the ANOVA summary table.
The ANOVA summary table itself has similar columns to those seen in ONEWAY, and
includes information for Error (WITHIN CELLS), Treatment (GROUP), and Total. The values
on these lines are identical to those reported earlier. MANOVA includes an extra line, called
Model, that represents the SS for all of the factors and predictors in the design. This line is
redundant in the present case with Group because we only have a single factor. In more complex
designs, Model will aggregate (add) the effects of multiple factors.
MANOVA also reports a measure of the strength of the relationship, here labelled R2 but
equal to the η2 calculated earlier. Adjusted R2 is a more conservative measure of the strength of
the relationship and adjusts for the fact that with small ns or many predictors, it is possible to
account for substantial variability just by chance. The MANOVA command is not accessible by
Windows menus, but can be typed into a syntax window. Its power makes MANOVA well
worth learning.
GLM. A
command that
is available in
Windows and
other more
recent versions
of SPSS is
GLM, which
stands for
General Linear
Model. Like
MANOVA, GLM is a very powerful analytical tool that, nonetheless, can readily accommodate
simple designs. We will illustrate the use of GLM via menus. Figure 2.4 shows the initial steps.
Between-Subject Design Single-Factor - 2.17
Figure 2.5. GLM Dialogue Box.
We select Analyze | General Linear Model | Univariate. Note just below the main Analyze box
the syntax commands to run GLM and print out descriptive statistics and eta2.
Selecting Univariate initiates the
GLM dialogue box shown in Figure 2.5.
The independent variable (group) has been
selected and moved into the Fixed Factor(s)
box, and the dependent variable (agit) has
been selected and moved into the
Dependent Variable box. Descriptive and
other statistics could be selected from the
Options button. A number of the choices in
GLM will be examined later when we talk
about more complex designs. Clicking OK
will initiate the analysis.
The
results of the
GLM procedure
are shown in
Box 2.8.
Following the
descriptive
statistics, we see
some now
familiar results,
namely the SS
and df for
Group, Error,
and Corrected
Total, as well as the F, significance, and eta2 for Group. The values and conclusions are the same
GROUP Mean Std. Deviation N
1.00 12.0000 1.78885 6 2.00 10.0000 2.00000 6
Total 11.0000 2.08893 12
Source Type III Sum of df Mean Square F Sig. Squares
Corrected Model 12.000(a) 1 12.000 3.3 33 .098
Intercept 1452.000 1 1452.000 403 .333 .000
GROUP 12.000 1 12.000 3.3 33 .098
Error 36.000 10 3.600
Total 1500.000 12
Corrected Total 48.000 11
a R Squared = .250 (Adjusted R Squared = .175)
Box 2.8. GLM Results.
Between-Subject Design Single-Factor - 2.18
as the preceding analyses. Although it predicts 25% of the variability in agitation scores, as
shown by the R squared reported at the bottom, Group does not account for a significant amount
of the variability in the scores, p = .098, at least not by a non-directional (or two-tailed) test.
There are several additional results reported by GLM, including one, Corrected Model,
that was seen earlier in the MANOVA results. This represents the relationship of the dependent
variable to all factors and predictors in the model. Since Group is the only factor in our model,
the overall Model is redundant and the values are the same as those on the Group line.
Two other lines are completely new, and quite surprising at first because of the magnitude
of the numbers reported (e.g., SS Total = 1500.00). These additional lines are reported because
GLM attempts to account for all of the variability in the scores, including not just deviations
from the grand mean but also deviations from an absolute value of 0. Not all of this variability is
of interest to researchers or even meaningful in all situations. The Intercept in this context refers
to the deviation of the grand mean (11.0) from 0 and can be calculated as: N × (MG - 0)2 = 12 ×
(11.0 - 0)2 = 1452.0, the value shown as the intercept. This SS has df = 1 because we are
examining the deviation of a single mean about zero (we lose no degrees of freedom for 0
because it is not estimated from the data). Recall that our SSTotal had df = 12 - 1 = 11. The df for
the deviation of the grand mean from zero is the “missing” df. Note also the similarity between
this computation and the use of F to test hypotheses about a single mean, as shown in Chapter 1.
The SSTotal reported by GLM includes the variability due to the deviation of the grand
mean from 0 (just calculated as 1452.00) plus the variability in the scores about the grand mean
(our SSTotal = 48.0). The sum of these two quantities is 1500.00 and it has df = 12. In most cases,
although not all as shown in later chapters, the variability due to the deviation of the grand mean
from zero is ignored. Correcting for the grand mean, gives the SSCorrected Total = 48.0 reported by
GLM in Box 2.5. This is what is normally referred to as SSTotal.
CONCLUSIONS
This chapter has laid the foundations for the Between-S ANOVA. We have seen that for
two groups, the Single-Factor Between-S ANOVA is equivalent to the independent groups t-test.
Although the two tests will always produce exactly the same conclusion for two groups, the F
test is more general than the t test because F can test the hypothesis that multiple means (i.e, two
Between-Subject Design Single-Factor - 2.19
or more) are equivalent, whereas t can only be used for two groups. In the next chapter, ANOVA
approaches are generalized to Between-S designs that include more than two groups.
Between-Subject Design Single-Factor - 2.20
Single-Factor Between-S Design SPSS - 3.1
CHAPTER 03 -
SINGLE-FACTOR BETWEEN-S ANOVA (K $ 3)
Notation and Formula for Between-S ANOVA . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Notation for Single-Factor Design . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 2
Between-S ANOVA for Revised Agitation Study (k = 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
ANOVA Calculations for k =3 . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 5
SPSS Analyses for k = 3 Agitation Study . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 7
An Interview Study (k = 4) . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Calculation of ANOVA for the k = 4 Example . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 9
SPSS Analyses for the Interview Study . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 11
A General Discussion of ANOVA . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 15
Rationale for F-test of Differences Among Means . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 15
Supplementary Discussion of ANOVA . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16
Appendix 3.1: An Alternative Notation . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Single-Factor Between-S Design SPSS - 3.2
Chapter 3 extends ideas from the last chapter to the Single-Factor Between-S design
involving more than two groups. We begin with a notation for describing succinctly the
calculations that need to be performed for this design, and then demonstrate the use of SPSS to
analyze such designs, including again several ANOVA options, as well as Regression
approaches.
NOTATION AND FORMULA FOR BETWEEN-S ANOVA
Although the calculations for the single-factor Between-S ANOVA are fairly clear for the
two-group design, as illustrated in the preceding chapter, this is less the case for more than two
groups. It is therefore helpful to have a general notation and corresponding formula to represent
the statistical calculations.
Notation for Single-Factor Design
Box 3.1 presents a general schema for the Single-Factor Between-S design. The letter y
stands for the dependent variable, and several subscripts are used to identify particular
observations and statistics. There are k levels to the factor, with j serving as a general subscript
indicating the level; that is, j = 1, 2, ..., k. Thus, SS1, n1, and y&1 indicate statistics for the first
group (Sum of Squares, sample size, and mean, respectively), and SS2, n2, and y&2 indicate
statistics for the second group. In general SSj, nj, and y&j represent statistics for each group or
level of the factor, where the subscript j = 1, 2, ..., k.
A subject number (columns S# in Box 3.4) indicates the subject within each group, with i
serving as a general subscript indicating the subject; that is, i = 1, 2, ..., nj. The subscripts i and j
Levels of Factor 1 2 ... j ... kS# y S# y ... S# y S# y1 y11 1 y12 1 y1j 1 y1k
2 y21 2 y22 2 y2j 2 y2k
. . . .i yi1 i yi2 i yij i yik
. . . .n1 y&1 n2 y&2 nj y&j nk y&k N y&G
Box 3.1. ANOVA Notation for Single Factor Design.
Single-Factor Between-S Design SPSS - 3.3
can be used to identify each observation by which subject and which group the observation
belongs to (see columns headed by y in Box 3.1). For example, y11 would be the first subject in
the first group, y32 would be the third subject in the second group, and so on. The general notation
for an observation would be yij, the ith observation in the jth group. Note that for the Between-S
design, observations for individual subjects in the various groups are independent or unrelated to
one another (i.e., subject 1 in group 1 has no pre-existing relationship with subject 1 in the other
groups).
In addition to group statistics, the ANOVA requires calculation of some statistics for all
observations across the groups. To indicate these operations, we use N to represent the total
number of observations (i.e., N = n1 + n2 + ... + nk), and y&G to represent the grand mean of all the
observations. Although not always needed, the subscript G (for Grand) can also be used for other
overall statistics, such as sG and sG2 for the overall standard deviation and variance, respectively.
It is now possible to write general formula that describe calculations for the Single-Factor
Between-S design. First, consider SSTotal. We need to calculate the deviation of each score from
the grand mean, square the deviations, and sum the squared deviations across all N subjects. This
would be written as: SSTotal = . Verbally this formula states to sum the squaredjk
j'1j
nj
i'1(yij& yG)2
deviations from the grand mean within each of the groups, and then sum the squared deviations
across all groups. In practice, this is equivalent to calculating the mean for all of the scores
(ignoring group) and taking the squared deviations from that mean. Hence, SSTotal is the total
variability in all of the scores, with variability within groups and variability between groups
aggregated together.
The error variability is calculated as the sum of the variability of the scores about the
individual group means, which equals the sum of the SSs within each of the groups. Using our
notation, this becomes: . This formula says to sum the squaredSSError'jk
j'1j
nj
i'1(yij& yj)
2'jk
j'1SSj
deviations from the group mean within each group and then across all of the groups. This is
equivalent to calculating a SS for each group and summing the SSs across all k groups. Because
the deviations are taken from the group means rather than the grand mean, SSError represents
variability within groups but does not include variability between groups. Generally (i.e., unless
all group means are identical), SSError will be less than SSTotal. The difference is SSTreatment.
Single-Factor Between-S Design SPSS - 3.4
SSTotal ' SSError
jk
j'1j
nj
i'1(yij & yG)2 ' j
k
j'1j
nj
i'1(yij & yj)
2
% SSTreatment
% jk
j'1nj( yj& yG)2
Equation 3.1. ANOVA Sums of Squares.
In addition to being calculated as a difference between SSTotal and SSError, SSTreatment can be
calculated directly using the formula: . The firstSSTreatment'jk
j'1j
nj
i'1( yj& yG)2'j
k
j'1nj( yj& yG)2
version of this formula makes clear that we are summing the squared deviations of the group
mean from the grand mean, first within each group and then across all of the groups. This is the
equivalent summation as for SSTotal and SSError. A simplification is possible for Treatment,
however, because the group mean is exactly the same for all nj observations in each group. That
is, we are adding up the identical deviation nj times, which is equivalent to multiplying the
deviation by nj (e.g., 12 + 12 + 12 + 12 + 12 + 12 = 6 × 12).
The three SSs for the Single-Factor Between-S ANOVA are summarized in Equation 3.1,
which also demonstrates the partitioning
of SSTotal into its two components. Laying
out the formula in this way nicely
demonstrates the partitioning of SSTotal into
SSError and SSTreatment. Look closely at the
bottom formula in Equation 3.1 and ignore the summation and squaring operators. The resulting
equality, (yij-y&G) = (yij-y&j) + (y&j -y&G), makes clear that the left-hand side of the equation is equal
to the sum of the two components on the right hand side (-y&j and +y&j cancel each other).
Verbally, the equation demonstrates that the deviation of the scores from the Grand Mean (which
produces SSTotal) consists of two components: the deviation of individual scores from the group
means (which produces SSError), and the deviation of group means from the Grand Mean (which
produces SSTreatment).
Equation 3.1 also demonstrates clearly the rationale for the dfs for each of the three
components. SSTotal represents the deviation of N observations from a single grand mean;
therefore, dfTotal = N - 1. SSError represents the deviation of N observations from k group means;
therefore, dfError = N - k. Finally, SSTreatment represents the deviation of k treatment means from a
single grand mean; therefore, dfTreatment = k - 1.
The final steps involve the calculation of the MSs and F ratio. These operations are
summarized in Equation 3.2.
Single-Factor Between-S Design SPSS - 3.5
MSTreatment'SSTreatment
k&1MSError'
SSError
N&kF'
MSTreatment
MSError
df'k&1, N&k
Equation 3.2. ANOVA Mean Squares
The notation and formula presented here are based on a standard notation that is often used
with Single-Factor, Between-S designs. There are limitations to this notation, however, for the
Within-S and Factorial designs discussed later. We will need to modify the notation at that time.
To facilitate the transition to more complex designs, an alternative notation is presented in a
supplementary section at the end of this chapter. In that notation, a letter (A) is used to represent
both the name of the factor and its number of levels (instead of using k), the lower-case a is used
to represent the generic level (rather than j), and s is used to indicate subjects (rather than i). We
will continue to use n and N to indicate the number of subjects.
BETWEEN-S ANOVA FOR REVISED AGITATION STUDY (K = 3)
The generalization of the independent groups ANOVA to a single predictor with more
than two groups is straightforward given the formula just presented. Note that these formula
accommodate different values for k, the number of groups.
ANOVA Calculations for k =3
In the standard ANOVA approach for k independent groups, SSTotal is calculated as the
squared deviations of all observations from the Grand Mean (i.e., the mean based on all N
observations). SSError is the sum over the k groups of the squared deviations of observations from
their respective Group Means (i.e., the sum of the SSs within each of the k groups). SSTreatment is
the sum of the squared deviations of the k Group Means from the Grand Mean multiplied by the
number of observations in each group. This was all summarized in Equation 3.1.
Single-Factor Between-S Design SPSS - 3.6
ANOVA Summary Table
Source SS df MS FBetween 3.5 k-1 = 2 1.750 .354Within 44.5 n-k = 9 4.944
Total 48.0 n-1 = 11
Ho: µ 1 = µ 2 = µ 3 Ha: one or more equality false
F.05;2,9 = 4.26 Do Not Reject H 0
η² = 3.5/48 = .073 η = .27 Strength ( R², R)
Box 3.3. ANOVA for 3-group (k=3) Design.
Box 3.2 shows the 12 agitation scores used in the previous example, but now divided into
3 groups, rather than 2 (i.e., k = 3). The ANOVA computations and results are also presented in
Box 3.2. SSTotal = 48.0, as before. Note that this must be the case because the 12 scores are
exactly the same; no variation has been removed or added by dividing the 12 scores into 3 groups
of 4. What has changed is the partitioning of SSTotal.
SSError = 44.5 is the sum of the SSs within each of the three groups. Note that the sum of
the variability within groups is almost equal to the SSTotal, suggesting that there is not much
variability among the three treatment means. The SSTreatment is only 3.5 = SSTotal - SSError, which
can also be calculated as
deviations of the group means
from the grand mean (see Box
3.2).
Box 3.3 shows the
ANOVA summary table for this
study. Given SSTreatment and
SSError, the next step is to
Group 1 Group 2 Group 311 13 1114 13 912 7 12 9 12 9
y&j 11.50 11.25 10.25 y&G = 11.00
y&j - y&G +.50 +.25 -.75 Treatment Effects
SSj 13.00 24.75 6.75
SSTotal = 48.0 = ΣΣ(y-y&G)²= (11-11) 2 + (14-11) 2 + ... + (9-11) 2
SSError = 44.5 = ΣΣ(y-y&j )² = ΣSSj = Σ(n j -1)s² j
= (11-11.5)² + (14-11.5)² ... (9-10.25)²= 13 + 24.75 + 6.75
SSTreatment = 3.5 = Σn j (y&j -y&G)²= 4(11.5-11)² + 4(11.25-11)² + 4(10.25-11)²= 4(.5² + .25² + -.75²)
Box 3.2. Results and SSs for k = 3 Example.
Single-Factor Between-S Design SPSS - 3.7
determine dfs for these sources of variability, compute MSs, and then an F ratio. The dfTreatment is k
- 1 = 3 - 1 = 2 because SSTreatment represents the deviations of 3 Group Means from a single Grand
Mean. The dfError is N - k = 12 - 3 = 9 because SSError represents the deviations of N = 12
observations from 3 Group Means (or equivalently, the sum of the within-groups dfs for the three
groups, each of which is 4 - 1 = 3). Note that dfTreatment + dfError = 11 = 12 - 1 = N - 1 = dfTotal. The
calculations for MSs (SS/df) and F (MSTreatment/MSError) follow the general principles summarized
above. If the null hypothesis that the group population means are equal (i.e., µ1 = µ 2 = µ 3) is
true, then the expected value of the F ratio is approximately 1. If the group population means
differ (i.e., H0 false), then variability among the treatment means will be larger than expected
relative to variability within groups and F should be correspondingly larger than 1. Our observed
value of F is .354, which is clearly not significant. In fact, even if H0 were true, we would expect
F to be 4.26 or larger 5% of the time with df = 2, 9. Our F does not even come close to this
critical value. The measure of the strength of the relationship, η², is also modest; only 7.3% of the
variability in the scores is due to differences between groups.
It is instructive to note why there is little evidence of a difference between the groups in
this study, whereas the two-group analysis approached significance and was even significant by a
one-tailed test. The 3-group effect does not even approach significance because the new division
of scores placed two relatively high scores (two 13s) from the former Group 1 into the new Group
2 along with the lowest score (7) from the former Group 2. The redistribution of these high and
low scores resulted in the three group means being very similar (i.e., SSTreatment decreased) and the
variability within groups being quite high (i.e., SSError increased; note that SS2 = 24.75 is
particularly high). Hence, less of the total variability ended up in the numerator and more in the
denominator, resulting in the much smaller value for FObserved. Even if SSTreatment had been as large
as for k = 2, note that dfTreatment = 3 - 1 = 2 for k = 3, which would result in a smaller MSTreatment and
F.
SPSS Analyses for k = 3 Agitation Study
All of the SPSS commands covered earlier, with the possible exception of the regression
approach, can be easily generalized to k > 2 studies. Indeed, several of the methods require
absolutely no changes whatsoever. We illustrate with the ONEWAY Anova.
Single-Factor Between-S Design SPSS - 3.8
DATA LIST FREE/ group agit.BEGIN DATA
1 11 1 14 1 12 1 92 13 2 13 2 7 2 123 11 3 9 3 12 3 9
END DATA.
ONEWAY agit BY group /STATISTICS = DESCRIPTIVE .
N Mean Std. Deviation Std. Error 95% Conf. Int. Mean Lower Upper1.0000 4 11.500000 2.0816660 1.0408330 8.187605 14.81239 52.0000 4 11.250000 2.8722813 1.4361407 6.679559 15.82044 13.0000 4 10.250000 1.5000000 .7500000 7.863165 12.63683 5
Total 12 11.000000 2.0889319 .6030227 9.672756 12.327244
Sum of Squares df Mean Square F Sig. Between Groups 3.500 2 1.750 .354 .711 Within Groups 44.500 9 4.944
Total 48.000 11
Box 3.4. SPSS ANOVA for k=3 groups.
ONEWAY for k = 3 Independent Groups. The generalization of ONEWAY to k > 2 levels
of a Between-S factor is straight-forward. Box 3.4 shows the data entry and ANOVA commands
to analyze the three-group agitation problem. The results of the ANOVA agree with those
calculated earlier and indicate no significant variability among the group means; an F of .3539 has
a very high probability of occurring if H0: µ1 = µ2 = µ3 were true. Specifically, p(F $ .3539 given
H0: µ1 = µ2 = µ3) = .7113. Group also accounts for a modest 7.3% of the variability in agitation
scores: η2 = 3.5/48.0 = .073
The /STATISTICS = DESCRIPTIVE subcommand produced the additional statistics shown
above the ANOVA summary table, including group means, standard deviations, standard errors of
the means, and confidence intervals for the means. Minimum and maximum scores were also
reported by the DESCRIPTIVE option, but these have been deleted from the printout. The 95%
confidence intervals in Box 3.4 show the range within which the population means would be
expected to fall 95% of the time. Note that the ranges are quite wide and overlap one another to a
considerable degree. Moreover, the overall confidence interval for the grand mean encompasses
the means for all three groups. This overlap among the group means is consistent with the
Single-Factor Between-S Design SPSS - 3.9
ANOVA conclusion that the group means do not differ significantly from one another.
These data could also be analyzed by MANOVA or GLM, with equivalent results. To
perform the equivalent analysis using regression, we would need k - 1 predictors in order to
ensure that the predicted scores would be the cell means and that the regression analysis would
parallel completely the ANOVA results.
AN INTERVIEW STUDY (K = 4)
Consider now the analysis of a more realistic social psychology study, although still
hypothetical. A total of 40 undergraduate students were randomly assigned to one of four groups
(10 per group). Each student rated the attractiveness (from 1 to 7) of five participants in a
videotaped interview. Although participants viewed and rated the same videos, they received
different instructions about the alleged purpose: Group 1 was given No Specific Instructions,
Group 2 was told the videos were of Job Interviews, Group 3 that they were Psychiatric
Interviews, and Group 4 that they were Parole Interviews for community release of prison
inmates. Subjects received a single score that was the sum of their ratings of the five interviews,
with scores ranging from 5 (all ones) to 35 (all sevens).
This study is a single-factor, Between-S design with k = 4. The four levels of the factor
are defined by the four experimental conditions, and the factor is Between-S because there is no
basis for matching specific scores in one group with specific scores in the other groups. Stated
differently, there is no reason to expect a correlation between scores obtained in the four different
groups because there are no pre-existing relationships between correspondingly numbered
subjects in the different conditions (i.e., subject 1 in each group is neither the same subject as
subject 1 in other groups, nor related to those subjects in any way).
Calculation of ANOVA for the k = 4 Example
Box 3.5 shows descriptive statistics for the four groups and for the entire set of 40
observations. The means and standard deviations are in fact all that is needed to perform a
Between-S ANOVA for this study. SSTotal can be calculated from the standard deviation of the 40
observations: since, sG2 = SSTotal / (N - 1), then SSTotal = (N - 1)s2 = 823.985. The standard
deviation for each group can be similarly manipulated to produce the SSjs that must be summed to
produce SSError = 498.199, as shown in Box 3.5. Finally, the deviation of each cell mean from the
Single-Factor Between-S Design SPSS - 3.10
Group ( n j = 10)
1 2 3 4y&j 27.30 26.00 22.70 20.00 y &G = 24.00
y&j - y &G +3.3 +2.0 -1.3 -4.0
s j 3.3015 3.6209 4.2177 3.6818 s G = 4.5965
SSj = (10-1)(3.3015 2 (10-1)4.2177 2
(10-1)3.6209 2 (10-1)3.6818 2
= 98.099 117.998 160.101 122.001
SS Total = 823.985 = (40-1)4.5965 2
SS Error = 498.199 = 98.099 + 117.998 + 160.101 + 122.001 SS Group = 325.800 = 10(+3.3 2 + 2.0 2 + -1.3 2 + -4.0 2)
. 823.985 - 498.199
ANOVA Summary TableSource SS df MS F F Critical (.05;3,30)Treatment 325.80 3 108.600 7.847 < 2.92Error 498.20 36 13.839Total 824.00 39
Box 3.5. Independent Groups ANOVA for Interview Study.
grand mean is squared, multiplied by (nj - 1), and summed to produce SSTreatment = 325.800, which
is very close to the value obtained from SSTotal - SSError.
These values are reproduced in the ANOVA summary table in Box 3.5, along with
dfTreatment = k - 1 = 4 - 1 = 3, dfError = N - k = 40 - 4 = 36, and dfTotal = N - 1 = 40 - 1 = 39. Dividing
SSs by their respective dfs produces MSTreatment and MSError, and F = MSTreatment / MSError = 108.60 /
13.839 = 7.847, with dfNumerator = 3, dfDenominator = 36. The table of critical F values (Appendix A.3)
does not include dfDenominator = 36, and the next smallest df = 30 is used instead. This gives FCritical =
2.92. We reject H0: µ1 = µ2 = µ3 = µ4, and accept HA that one or more of these equalities is false.
FObserved is also greater than FCritical for alpha = .01.
Although the researchers can reject H0 of no differences, the alternative hypothesis is quite
vague about which groups differ from one another. Indeed, it takes no stand on this important
question. The group means in Box 3.7 indicate that the No Instruction and Job Interview
Instructions (i.e., Groups 1 and 2) received higher attractiveness ratings than the Psychiatric
Interview Instructions (Group 3), and the Parole Interview Instructions (Group 4), with the latter
being only slightly lower than Group 3. Later chapters will discuss additional analyses that are
Single-Factor Between-S Design SPSS - 3.11
DATA LIST FREE / subj inst rate.BEGIN DATA 1 1 26 2 1 26 3 1 28 4 1 23 5 1 29 6 1 26 7 1 31 8 1 34 9 1 24 10 1 2611 2 21 12 2 30 13 2 26 14 2 25 15 2 2816 2 30 17 2 20 18 2 30 19 2 24 20 2 2621 3 28 22 3 17 23 3 22 24 3 26 25 3 2726 3 21 27 3 18 28 3 19 29 3 21 30 3 2831 4 20 32 4 24 33 4 23 34 4 16 35 4 1936 4 18 37 4 19 38 4 19 39 4 15 40 4 27END DATA.
Box 3.6. Interview Results and SPSS Commands to Enter Data.
ONEWAY rate BY inst /STATISTICS = DESCRIPTIVE.
N Mean Std. Std. 95% Confid ence Interval for Minimum Maximum Deviation Error Mean Lower Boun d Upper Bound 1.0000 10 27.300000 3.3015148 1.0440307 24.938239 29.661761 23.000 34.000 2.0000 10 26.000000 3.6209268 1.1450376 23.409745 28.590255 20.000 30.000 3.0000 10 22.700000 4.2176876 1.3337499 19.682848 25.717152 17.000 28.000 4.0000 10 20.000000 3.6817870 1.1642833 17.366208 22.633792 15.000 27.000
Total 40 24.000000 4.5965427 .7267772 22.529954 25.470046 15.000 34.000
Sum of Squares df Mean Square F Sig. Between Groups 325.800 3 108.600 7.847 .000 Within Groups 498.200 36 13.839 Total 824.000 39
Box 3.7. Independent Groups ANOVA for Interview Study.
often done following the omnibus ANOVA, as the overall test of significance is called.
SPSS Analyses for the Interview Study
The interview
study is a single-
factor, Between-S
design with k = 4, and
can be analyzed using
a number of SPSS
procedures. The raw
data are presented in Box 3.6, along with the syntax commands to enter this data into SPSS. Each
case has three variables: an optional subject code from 1 to 40 (subj), an instructional condition
code from 1 to 4 (inst), and a total rating score from 5 to 35 (rate). Alternatively, the data could be
entered directly into the SPSS Data Editor, or stored in a separate file to be read by SPSS (as
described in a later chapter).
ONEWAY Analysis of Interview Study. Box 3.7 shows the syntax commands to perform a
ONEWAY ANOVA for this data, along with the resulting output. The only changes from
previous commands are the new variable names. The ANOVA summary table shows clearly that
the means for the four groups differ significantly, p(F $ 7.8475 | H0 true) = .0004, which agrees
with our earlier calculations and conclusions. The researchers would reject the null hypothesis
Single-Factor Between-S Design SPSS - 3.12
Figure 3.1. Mean Attractiveness Ratings byInstructions.
that the rated attractiveness of the interviews was not affected by the instructions; that is, reject
H0: µNone = µJob = µPsyc = µParole. An examination of the means suggests that groups 1 and 2 (No
Instructions or Job Interview) were rated higher than groups 3 and 4 (Psychiatric or Parole
Interview). Note that the confidence intervals for groups 1 and 2 do not include the means for
groups 3 and 4 (and vice versa). Follow-up analyses will allow us to isolate more specifically the
significant differences among the various means, as well as any overall pattern across the four
means.
We earlier demonstrated the basic calculations for the components of the Between-S
ANOVA in Box 3.7. SSTotal can be obtained from the standard deviation of the entire set of 40
scores, and includes between-groups variability due to instructions plus random variation among
subjects within the groups. The variability within groups is computed by summing the four SSs
for the groups (i.e., SSError = ΣSSj). The group SSs can in turn be calculated from the group
standard deviations using the general formula, SS = (nj - 1)sj2. The variability that is not within-
group variability, must be due to variability between groups. SSTreatment represents the deviations
of the treatment means from the grand mean of 24.0; the greater these deviations, the larger the
effect of the treatment. SSTreatment (or SSBetween) can also be calculated by subtraction, although this
provides no independent assurance that the three quantities have been calculated correctly.
Although not computed by earlier
versions of SPSS ONEWAY, η2 can be
calculated from SSBetween and SSTotal; η2 =
325.8/824.0 = .395. Instructions account for
39.5% of the variability in the attractiveness
scores. This is a robust effect, as can be seen
in Figure 3.1. This figure can be produced in
various ways, including the /PLOT = MEANS
option in ONEWAY. The pattern of means
permits some inferences about the effect of
instructions, but does not reveal which means
or combinations of means differ significantly from one another. The omnibus ANOVA that we
Single-Factor Between-S Design SPSS - 3.13
just performed only demonstrates that some combination of the groups differs significantly from
some other combination of the groups. Supplementary analyses described in later chapters permit
more precise conclusions when k > 2.
GLM Analysis of Interview Study. Box 3.8 analyzes the interview study using GLM,
another SPSS ANOVA command, and one that we will use more later with factorial designs. The
basic format of GLM is quite similar to ONEWAY: GLM dependent BY factor, although the MIN,
MAX values for the factor are not required, and to request the descriptive statistics we use /PRINT
= DESCRIPTIVES. The output is also formatted somewhat differently, and presents some
different values. The descriptive statistics section is straightforward, and presents means and
standard deviations for the entire data set and for each of the four groups.
As noted previously, the ANOVA summary table is quite different and can be somewhat
confusing. First, let us find the values noted previously. Our SSTotal appears in the Corrected
Total row under the Type III SS column. The SS of 824.0 and df = 39 agree with earlier
calculations and printouts. The SSError appears on the Error row, and the SSTreatment appears on a
row labeled INST, the name of our independent factor for this study. The MSs for INST and
GLM rate BY inst /PRINT = DESCRIPTIVE .
INST Mean Std. Deviation N 1.0000 27.300000 3.3015148 10 2.0000 26.000000 3.6209268 10 3.0000 22.700000 4.2176876 10 4.0000 20.000000 3.6817870 10
Total 24.000000 4.5965427 40
Source Type III SS df Mean Square F Sig. Corrected Model 325.800 3 108.600 7.847 .000 Intercept 23040.000 1 23040.000 1664.874 .000
INST 325.800 3 108.600 7.847 .000 Error 498.200 36 13.839
Total 23864.000 40
Corrected Total 824.000 39
a R Squared = .395 (Adjusted R Squared = .345)
Box 3.8. Single Factor Between-S ANOVA Using GLM.
Single-Factor Between-S Design SPSS - 3.14
Error, as well as the F and significance (Sig.) for INST also agree with earlier results. Let us now
consider some of the new quantities presented.
The rows labelled Intercept and Total are perhaps the most anomalous, especially the
extremely large SSs. The Intercept SS represents the deviation of the overall mean from 0; it is
calculated as N × (MG - 0)2 = 40 × (24.0 - 0)2 = 23,040.00. The further the grand mean is from 0,
the greater this value. Much of the time this quantity is not of interest, although there are
occasions when researchers do want to know whether the overall mean for a sample of scores
deviates significantly from some hypothesized value, such as 0. The SS Intercept would be the
appropriate numerator for such a test. In the present case, we obtain an F of 1,664.874, a huge
value but of no interest because a hypothesized value of 0 is meaningless (note that the lowest
possible score is 5, making a mean of 0 impossible in the present study). What GLM calls Total
is actually the SSs for our SSTotal = 824.0 (i.e., deviations of scores about the grand mean), plus the
SSIntercept = 23040.0 (i.e., the variability in the grand mean about 0). Adding these values together
gives 23864.000, as shown in Box 3.8.
MANOVA command. Box 3.9 shows the equivalent output using MANOVA to perform
the analysis of variance. Although labelled somewhat differently than in ONEWAY or GLM, the
same values are reported.
MANOVA rate BY inst(1 4) /PRINT = CELLINFO.
FACTOR CODE Mean Std. Dev. N 95 percent Conf. Int. inst 1 27.300 3.302 10 24.938 29.662 inst 2 26.000 3.621 10 23.410 28.590 inst 3 22.700 4.218 10 19.683 25.717 inst 4 20.000 3.682 10 17.366 22.634 For entire sample 24.000 4.597 40 22.530 25.470
Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 inst 325.80 3 108.60 7.85 .000
(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13
R-Squared = .395 Adjusted R-Squared = .3 45
Box 3.9. MANOVA Analysis of Instruction Study.
Single-Factor Between-S Design SPSS - 3.15
A GENERAL DISCUSSION OF ANOVA
Having now a general introduction to ANOVA, we will review briefly a rationale for the
F-test of differences among the means, and also examine some other general issues.
Rationale for F-test of Differences Among Means
The trick in using the F-statistic to test hypotheses about the variability in population
means is to construct one variance that represents random variability in the data (the denominator)
and another variance that represents random variability PLUS systematic differences between the
means (the numerator). The value of F is expected to increase as the variability in the population
means increases. The use of F to test differences between means was just demonstrated; the
resulting F was 3.333.
The denominator for the F test is he pooled variance, sp2 = MSError = 3.6, which represents
the random variation within the two groups. The quantity estimated by sp2 is σ2, the variance of
the population from which the samples were selected. Another way of saying this is that the
Expected Value of sp2 is σ2.
Box 3.10 shows an alternative way to
calculate the numerator of F to test the
difference between means. The two sample
means can be used to calculate a second
variance, the variance of the means. This
variance is obviously sensitive to differences
among means. This variance will be zero if the
sample means are identical and becomes larger as the difference between the sample means
increases. The calculation of the variance of the means uses the standard formula, except that the
scores are means instead of individual observations. It might appear unusual to calculate a
variance based on just two scores (means), as here, but absolutely nothing precludes it.
In Box 3.10, the numerator of the F test is the variance of the means multiplied by the
number of subjects in each group (n1 = n2 = 6). We multiply by nj because the null hypothesis for
the F-statistic is equality between the population variances from which the sample variances are
calculated, and multiplying nj times the variance of the means produces a numerator that has the
sp2 = 3.6
My& = (12.0+10.0) / 2 = 11.0
SSy& = +1.0 2 + -1.0 2 = 2.0
s y&2 = 2.0 / (2 - 1) = 2.0
F = n×s y&2 / s p
2
= 6×2.0 / 3.6 = 3.333
Box 3.10. Alternative Calculation of F Ratio.
Single-Factor Between-S Design SPSS - 3.16
same expected value as the denominator (i.e., σ2) when the null hypothesis of equal means is true.
To understand the reasoning, consider the Central Limit Theorem (CLT), which you may
have learned in previous statistics courses. The CLT states that if an infinite number of samples
are randomly selected from a single population, then σ y&2 = σy
2/n; that is, the variance of the sample
means will equal the variance of the original population divided by n, the sample size. Basic
algebra leads to n × σ y&2 = σy
2. Therefore, the Expected Value of nj × sy&2 is σ2, the same as the
Expected Value for the denominator, sp2. If H0: µ1 = µ2 is true, then the numerator and
denominator of F should be approximately equal (both are estimates of σ2) and F should not take
on large values. If the null hypothesis is false, however, there will be more variability in the
sample means than expected based on the CLT, and F will be larger. How large F must be to
reject the null hypothesis is determined by the critical value of F. The basic logic of ANOVA,
then, is that if there is no variation among the group means in the population, we would expect the
numerator and denominator of the F ratio to be approximately equal. If there is variation among
the group means in the population, we expect the numerator to be larger than the denominator,
and F should be correspondingly larger.
Box 3.11 shows a similar calculation
for the interview study. We first calculate a
variance for the means, and then compute an F
ratio with this variance times the number of
subjects in the numerator and MSError in the
denominator. This produces the same F as in
earlier boxes.
Supplementary Discussion of ANOVA
Here we consider several
supplementary points that might help in
understanding ANOVA and the use of the F test to test hypotheses about differences between
means. The first point has to do with a consideration of factors that influence the magnitude of F,
and the second with the directionality of the hypotheses tested by the standard F test.
Factors Promoting a Significant F. Examination of earlier calculations for t reveal that
Group y&j y&j - y &G
1 - 27.30 +3.32 - 26.00 +2.03 - 22.70 -1.34 - 20.00 -4.0
y&G = 24.00
s y&j2 = (3.3 2+2.0 2+-1.3 2+-4.0 2)/(4-1)
= 32.58 / 3 = 10.860
F = (10 × 10.860) / 13.839 = 108.60 / 13.839 = 7.847
Box 3.11. Alternative Calculation of F forInterview Study.
Single-Factor Between-S Design SPSS - 3.17
the value of t will increase as its numerator (y&1 - y&2) increases and as its denominator decreases
(i.e., as the Standard Error, SEy&1- y&2, gets smaller). The Standard Error will be smaller when there
is little variability within the two groups (i.e., SS1 and SS2 are small), and when the ns for the two
samples are large (i.e., 1/n1 + 1/n2 is small). Stated verbally, t is more likely to be significant
when there is a large difference between the means, when there is little variability within the
groups, and when samples are large.
These same characteristics determine the magnitude of F in ANOVA. MSTreatment is larger
as the differences between the means are larger (i.e., as their distance from a common central
value, y&G, becomes larger). Unless the purpose of a study is to detect modest or minute
differences, researchers generally should aim to manipulate or measure robust differences among
the groups being compared. Comparing 80-year old people to 30-year olds, for example, is a more
powerful manipulation than comparing 60-year olds to 50-year olds.
The denominator of F is also sensitive to the same factors as the denominator of t.
Researchers should try to minimize the SSjs that determine SSError. This means such things as
trying to ensure that all testing is done under as standard conditions as possible (e.g., instructions
that are tape recorded or read verbatim, minimizing distractions), and trying to avoid excessive
variation in participants within groups (e.g., including only participants with similar
backgrounds). Researchers also need to ensure an adequate number of participants since SSError is
divided by N - k to produce MSError, which should be kept as small as possible.
The F Distribution and One- versus Two-Tailed Tests. As noted earlier, the F
distribution is asymmetrical and researchers generally arrange hypothesis tests so that the null
hypothesis is rejected if F is greater than some value (i.e., if F falls sufficiently far in the upper tail
of the distribution). Despite the fact that only one tail of the distribution is used, the ANOVA is
actually a two-tailed test, or to use a less ambiguous terminology, a non-directional test. The
critical factor in deciding whether a test is one- or two-tailed (i.e., directional or non-directional)
is whether the test statistic and its probabilities are sensitive to the direction of the alternative
hypothesis. In testing differences between means, the F-statistic is not sensitive to the direction of
the difference between the means. In the interview study, for example, the same value of F would
have been obtained if the various means were associated with different groups.
Single-Factor Between-S Design SPSS - 3.18
A second way to think of why the default ANOVA is generally two-tailed is by relating the
F and t distributions. Squaring t can be conceptualized as folding over the t distribution vertically
at 0 to produce the F distribution. In squaring t to produce F, both negative and positive values of
t become positive. The upper tail of the resulting F distribution will contain both positive and
negative values of t (i.e., values of t corresponding to both directions of outcome). For a two-
tailed t test, each tail will contain α/2 (often .025) that will both be folded into the upper tail for F.
Hence, the corresponding F distribution will contain a two-tailed α (e.g., .05 = 2 x .025) in its
upper tail. For a one-tailed test, however, t has α (.05) in each tail of the distribution so the
corresponding F distribution (i.e., the folded-over t-distribution) will contain 2×α in the upper tail
(e.g., .10 = 2 x .05).
Being two-tailed, the probabilities in Table A.3 must be divided in half to obtain one-
tailed probabilities. If the desired α is .05 for a one-tailed test, then 2 × α = .10 is area that we
must use in Table A.3 to determine the critical value of F for a one-tailed test. For our example, df
= 1 and 10, FCritical = 3.28 = 1.8122 = tCritical2. Given Fobserved = 3.33, we reject H0: µ1 = µ2 and
accept Ha: µ1 > µ2. This is the same conclusion as produced by the one-tailed independent t-test.
If we had a two-tailed Ha: µ1 =/ µ2, then F.05 = 4.96 (.05 = .025 + .025) would be the critical value
and we would not reject H0.
CONCLUSION
In this and the preceding chapter, we have examined a number of important concepts, and
laid the foundations for the omnibus analysis of studies that involve a single Between-S
categorical predictor (or Factor). We have acquired a general notation for the calculations to
perform ANOVA for the single-factor Between-S design, seen how a number of SPSS procedures
can perform such analyses. In particular, SPSS provides diverse ways to conduct single-factor
Between-S ANOVA, including the procedures ONEWAY, GLM, MANOVA. All three can be
specified using syntax and all but MANOVA can also be requested via the Windows menu system.
The format of the output varies somewhat for the different Anovas and versions of SPSS, but all
include the basic quantities calculated previously.
Although the overall (or omnibus) F allows researchers to determine whether they can
reject the null hypothesis of no differences among the groups, it does not provide specific
Single-Factor Between-S Design SPSS - 3.19
information about which groups differ; that is, about the pattern of differences among the group
means. The next few chapters address this question and, in doing so, shed light on the nature of
the predictors in regression approaches to ANOVA.
Single-Factor Between-S Design SPSS - 3.20
SSTotal'jA
a'1j
na
s'1(yas& yG)2
SSError'jA
a'1j
ns
s'1(yas& ya)
2'jA
a'1SSa'j
A
a'1(na&1)s2
a
SSTreatment'jA
a'1jS
s'1( ya& yG)2'j
A
a'1na( ya& yG)2
MSTreatment'SSTreatment
A&1MSError'
SSError
N&AF'
MSTreatment
MSError
df'A&1, N&A
APPENDIX 3.1: AN ALTERNATIVE NOTATION
Although standard for single-factor Between-S designs, the notation presented in this
chapter does not lend itself readily to generalizing to multiple factor studies. There we will use
sequential uppercase letters to represent both the treatment variables and the number of levels
(i.e., A, B, ...), and lowercase letters, rather than i, as an indicator for the different levels (i.e., a =
1 to A, b = 1 to B, ...). To facilitate transition to this alternative notation, the single-factor
Between-S notation and corresponding formula are shown below.
Level of A
1 2 ... a ... A
S# y S# y S# y S# y
1 y11 1 y12 1 y1a 1 y1A
2 y21 2 y22 2 y2a 2 y2A
. . . . . . . .
s ys1 s ys2 s ysa s ysA
. . . . . . . .
n1 y&1 n2 y&2 na y&a nA y&A N y&G
Single-Factor, Between-S Design Pairwise Comparisons - 4.1
CHAPTER 04:
PAIRWISE COMPARISONS FOR THE
SINGLE-FACTOR BETWEEN-S DESIGN
Introduction to Pairwise Comparisons . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Overview of Post-Hoc Procedures . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 4
Selecting a Pairwise Procedure . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 4
Summarizing Results of Pairwise Comparisons . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 6
Least Significant Difference Method . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Description of LSD Test . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 7
Calculating the LSD Test for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . . 8
SPSS and LSD tests for the Brain Stimulation Study .. . . . . . . . . . . . . . . . . . . . . . . . . . 10
The LSD Procedure and Type I Errors . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 11
The Bonferroni Correction . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Bonferroni using Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 13
Tests Using the Studentized Range Statistic . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Studentized Range Statistic . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 14
LSD and the q Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 16
TUKEY Honestly Significant Difference . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 16
The Student-Newman-Keuls Method . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 17
Tabular Arrangement of Comparisons . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 18
More SPSS Post Hoc Analyses for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . 21
Post Hoc Comparisons Using Windows . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 22
Comparison of Post Hoc Procedures . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 24
Single-Factor, Between-S Design Pairwise Comparisons - 4.2
Pairwise Comparisons for the Interview Study . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
The LSD t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
The q Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 27
SPSS Analyses for the Interview Study . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 29
Calculating Critical and P Values for Post Hoc Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 34
Single-Factor, Between-S Design Pairwise Comparisons - 4.3
The alternative hypothesis in ANOVA is quite general; it states only that one or more of
the H0 equalities is false. With more than a few means, there are many possible ways in which
the H0 can be false. Rejection of the H0 does not therefore permit a precise conclusion about
which means or combinations of means are different from one another. In the interview study,
for example, the omnibus F-test (i.e., the F for the general comparison among the four groups)
does not specify which instructional conditions or which combinations of conditions (e.g., 1+2
versus 3+4) differed significantly from one another. Less obviously, failure to reject an omnibus
H0 does not necessarily mean that more specific differences or comparisons would not be
significant. The overall F-test can be insensitive to more specific comparisons, especially when k
is large. This can occur, for example, when the SSTreatment is divided by df = k - 1 when most of
SSTreatment is in fact due to differences between one group and the rest, a df = 1 effect.
INTRODUCTION TO PAIRWISE COMPARISONS
The purpose of multiple comparison techniques is to test specific hypotheses about
differences between means, following either a significant or nonsignificant omnibus F. This
two-step approach is analogous to testing the significance of the overall regression in Multiple
Regression (i.e., F for R²), accompanied by tests of the significance for specific components
represented by the regression coefficients or srs (i.e., ts for bs or Fs for SSChange).
This and the next chapter describe pairwise procedures to compare individual means with
one another (e.g, y&1 versus y&2). Pairwise comparison procedures are typically used when
researchers have no prior expectation about which groups should differ, and are performing what
are called post hoc or a posteriori comparisons (i.e., after the fact). Chapter 6 describes
techniques that can be used for either pairwise comparisons or for more complex comparisons
that involve multiple means, such as (y&1 + y&2)/2 versus y&3. Those methods are most appropriate
when researchers have planned or predicted before examining the data which groups should
differ from one another and the direction of difference. Such comparisons are called a priori or
planned comparisons, or contrasts. Although researchers would ideally plan focussed contrasts
sensitive to specific predictions about what means are expected to differ from one another, many
Anova situations (perhaps too many?) call for post hoc procedures.
Single-Factor, Between-S Design Pairwise Comparisons - 4.4
Either all k × (k - 1) / 2 possible pairwisecomparisons are made (OR fewer specificcomparisons are made based on examinationof the means), and the omnibus F-test issignificant.
Box 4.1. Conditions for Post Hoc PairwiseComparisons.
Overview of Post-Hoc Procedures
A wide variety of procedures have been
developed for performing pairwise
comparisons. The post hoc pairwise procedures
considered in this chapter are typically used
under the conditions stated in Box 4.1, although
exceptions to these principles do exist for some
post hoc procedures. The most common use of post-hoc procedures is when the researchers want
to compare all possible pairs of means. With 4 groups, for example, there are 4 × 3 / 2 = 6
pairwise comparisons involving the following combinations of groups: 1-2, 1-3, 1-4, 2-3, 2-4,
and 3-4. With 5 groups, there would be 5 × 4 / 2 = 10 pairwise comparisons, specifically, 1-2, 1-
3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5.
A second important requirement for most post hoc procedures is that the omnibus F be
significant. According to this criterion, a non-significant F means that researchers are not
permitted to “poke around” looking for significant differences among all the possible pairwise
comparisons. The reason for this, as explained later, is that looking at all possible pairwise
comparisons results in a highly inflated Type I error rate. Requiring a significant omnibus F
provides some modest protection against the Type I error.
We discuss four pairwise comparison methods (of many in the literature): the Least
Significant Difference (LSD) method, the Bonferroni procedure, one of several methods
developed by Tukey (TUKEY), and the Student-Newman-Keuls (SNK) method. Because the
BONFERRONI test benefits from statistical programs to compute the required probabilities, this
test will be discussed primarily in the context of SPSS analyses for pairwise comparisons.
Selecting a Pairwise Procedure
The above pairwise comparison methods vary in how conservative or liberal they are with
respect to the probability of a Type I Error. The LSD method is excessively liberal for most
research purposes and has a high probability that one of the comparisons found significant is in
fact a Type I Error. The TUKEY method is quite conservative (i.e., not liberal) and keeps the
probability of one or more Type I Errors under stricter control. The SNK method falls between
Single-Factor, Between-S Design Pairwise Comparisons - 4.5
the LSD and TUKEY methods on the liberal-conservative scale. The Bonferroni is even more
conservative than TUKEY.
In a complementary way, the methods also vary in their power; that is, in their ability to
detect differences between group means when they do exist in the population. Power is the
complement of a Type II Error, which is a failure to reject a false null hypothesis, that is, power =
1 - ß = 1 - p(Type II Error). The LSD method is the most powerful and has the lowest probability
of a Type II Error, the SNK method is next most powerful, followed by TUKEY and then
BONFERRONI, which is the least powerful and has the highest probability of a Type II Error.
Note how protection against Type I and Type II Errors are inversely related. The
procedure that protects most against the Type I Error (BONFERRONI) protects least against the
Type II Error, and the procedure that protects least against the Type I Error (LSD) protects most
against the Type II Error. Researchers often must juggle to balance the competing demands for
protection from Type I Errors and for power to detect differences when they do exist. Some of
the post hoc procedures considered here are NOT generally recommended because they are either
too liberal or too conservative with respect to Type I errors.
Although we only consider four of many post hoc procedures, a natural question is which
of these procedures to use. This question is even more urgent, of course, if one considers the full
range of options that are available in performing pairwise and other post hoc comparisons. The
choice ultimately boils down to a balancing of Type I and Type II errors. Let us consider two
extremes that represent prototypical studies for the use of liberal (e.g., LSD) and conservative
(e.g., TUKEY, BONFERRONI) procedures.
Consider a pilot study in which a relatively small number of participants have been tested
in order to determine whether a larger-scale study is worthwhile, and the larger-scale study does
not entail inordinate costs for the experimenter or risks for subjects. Here the researcher might
be more concerned about missing a possible real effect (i.e., making a Type II Error) than about
wrongly concluding that there was a real difference (i.e., making a Type I Error). The researcher
might choose to use a liberal procedure (e.g., LSD) in order to perform pairwise comparisons,
and might even be willing to use a more liberal alpha value for the omnibus F (e.g., α = .10).
Consider next a study in which a reasonable number of participants have been tested, and
Single-Factor, Between-S Design Pairwise Comparisons - 4.6
very real and perhaps expensive consequences are going to happen as a result of the statistical
conclusions. For example, perhaps some expensive albeit risky medical treatment would be
recommended for large numbers of patients if the H0 is rejected. Now we can appreciate that
Type I Errors are very important. That is, the researcher does not want to wrongly conclude that
a particular treatment is effective when it is not because this conclusion would entail much
expense and some (unnecessary) risk for clients. Instead the researcher now wants to be quite
certain that the treatment is worth the expense and risk. The prudent researcher would decide to
use a conservative procedure (e.g., TUKEY, Bonferroni) to perform the pairwise comparisons.
Although, unhappily, real-life research seldom involves such clear-cut distinctions, these
prototypes symbolize the factors that researchers must balance in deciding about the most
appropriate post-hoc procedure. The primary question is whether the costs and dangers of Type I
Errors are more or less serious than the costs and dangers of Type II Errors. We begin with the
LSD procedure, which would be most appropriate when an elevated probability of a type I error
held little risk or cost.
Summarizing Results of Pairwise Comparisons
Because there are so many comparisons being made, the results of pairwise comparison
tests can become confusing. Several methods have been developed for summarizing the results
of multiple tests of significance. One convention is to order the means from smallest to largest,
and draw lines under means that are NOT significantly different from one another (i.e., group
means together as sets whose members are not significantly different from one another).
The procedure is illustrated in Box 4.2 for several outcomes involving three groups and
different patterns of results. Outcome 1 in Box 4.2 shows groups 2 and 1 and groups 1 and 3
joined by distinct lines, indicating that the 2-1 and 1-3 t-tests were not significant. There is no
shared line joining groups 2 and 3, indicating that this difference was significant. Outcome 2
shows a single line joining all three groups (i.e., putting all in a single set). This indicates that all
three comparisons were nonsignificant. The final outcome, outcome 3, shows that the difference
between groups 2 and 1 was not significant (i.e., they belong to the same set), leading to the
inference that 2-3 and 1-3 comparisons were significant.
Single-Factor, Between-S Design Pairwise Comparisons - 4.7
Group 2 1 3Mj = 3.0 5.0 8.0
Outcome 1 -------- 2-1 NS --------- 1-3 NS
2-3 Sig
Outcome 2------------- 2-1, 2-3, 1-3 NS
Outcome 3-------- 2-1 NS
2-3, 1-3 Sig
Box 4.2. Alternative (hypothetical) outcomes for pairwisecomparisons.
Note that the conclusion in outcome 1 is somewhat anomalous, because groups 2 and 3
do differ significantly, although
neither differs significantly
from group 1. One common
difficulty with pairwise
comparison procedures is that
the overall pattern of
differences may not be very
easy to describe and may be
even more difficult to explain.
This is less likely to happen
using a priori procedures that
test for particular patterns in the means.
LEAST SIGNIFICANT DIFFERENCE METHOD
One simple method of performing pairwise comparisons is to compute t-statistics for each
pair of means and compare the observed ts to the critical value for a desired α. The LSD method
is in fact a t-test, sometimes called a protected t-test because the omnibus F should be significant
before these post hoc ts are done. If the omnibus F is significant, then some means or
combination of means differ from one another, thus justifying (weakly) pairwise comparisons
using the t-test. This method is called the Least Significant Difference method because the
difference between means required for significance is less than in other post hoc procedures.
Description of LSD Test
Equation 4.1 shows one formula for the LSD t-test, which is a slight modification of the
standard t-test formula. In the formula, j and j’ represent any two of the k groups being
compared to one another (e.g., j = 1, j’ = 2). The only difference from the standard formula for t
is that MSError appears in the denominator instead of the pooled variance, sp2. Because the
denominator uses MSError rather than sp2, all of the LSD t tests will involve the same denominator
as long as nj and nj’ are the same (e.g., as in studies that involve equal njs). This greatly simplifies
computations. The use of MSError also means that the df for the t are the same as for dfError.
Single-Factor, Between-S Design Pairwise Comparisons - 4.8
H0:µj'µj )
Ha:µj…µj )t '
( yj& yj ))&0
MSError1nj
%1nj )
'q
2df'dfError
Equation 4.1. Formula for LSD t Test.
Area of Stimulation (n j = 3)NS(1) A(2) B(3)
y&j 5.0 3.0 8.0 y &G = 5.333H0: µ 1 = µ 2 = µ 3
Ha: at least one pair or combination of µsdiffer
SOURCE df SS MS FTreatment k-1 = 2 38.0 19.00 9.5Error n-k = 6 12.0 2.00
Total n-1 = 8 50.0
F.05;2,6 = 5.14 F Obs $ F α ˆ Reject H 0
Box 4.3. Single-Factor Between-S ANOVA.
Equation 4.1 shows as well the relationship between the t statistic and the q statistic used
for the SNK and Tukey procedures and described in Chapter 5. Dividing q by the square root of
2 gives the corresponding t, or equivalently, multiplying t by the square root of 2 gives the
corresponding q. These equivalencies can be used to relate either observed values of the
statistics, that is, tObserved and qObserved, or the critical values of the statistics, that is, tCritical and
qCritical. These equivalencies help us on occasion with calculations of the different statistics, and
as well in conceptualizing the relationship among the different post hoc procedures.
Calculating the LSD Test for the Brain Stimulation Study
The multiple
comparison procedures will
be illustrated first using the
brain stimulation study in
Box 4.3. Nine animals were
randomly assigned to one of
three groups: a No
Stimulation control group
(NS), a neutral area A control
group (A), and a critical area B treatment group (B). Researchers measured the number of bar
presses per minute after 10 sessions in the conditioning phase of the study (pressing rewarded by
brain stimulation). Because observations in the different groups are uncorrelated, this is a
between-S ANOVA design. Calculations for the omnibus ANOVA are shown in Appendix 4-1.
The significant F in Box 4.3 indicates that there is more variability among the treatment means
than would be expected given the null hypothesis of no treatment effect (i.e., no differences
among the three population µs or, equivalently, that the three samples come from the same
population and therefore share a common µ).
Single-Factor, Between-S Design Pairwise Comparisons - 4.9
H0: µ i = µ j H a: µ i =/ µ j (two tailed)
y &i - y &j
t = ))))))))))))))))))) compare to t α/2 , df Err = n-k %MSErr (1/n i + 1/n j )
LSD Comparisons for Sample Study
df = n-k = 9-3 = 6 t .05,6 = 2.447
den t = %2(1/3+1/3) = 1.1547
t A-NS = (3-5) / 1.1547 = -1.732 No Rej H 0: µ A = µ NS
t B-NS = (8-5) / 1.1547 = 2.598 Reject H 0: µ B = µ NS
t B-A = (8-3) / 1.1547 = 4.330 Reject H 0: µ A = µ B
Summary A NS B Groups ordered from Lowest
y&j = 3.0 5.0 8.0 to Highest Mean --------- A and NS NOT significantl y
different
Box 4.4. Least Significant Differences (LSD) Analysis.
Although the H0: µ1 = µ2 = µ3 is rejected, the researchers cannot state which means or
combinations of means differ from one another. The ANOVA only shows that some pairs or
combination of means vary, but not which one(s). Multiple comparison procedures provide more
specific conclusions about the differences between the means. One option is to perform all
pairwise comparisons, a situation that normally surfaces when researchers do not have specific
expectations about
which groups will
differ.
Box 4.4
summarizes the LSD
calculations and
shows the LSD results
for the brain
stimulation study.
The shared
denominator for the t-
test is calculated
using MSError from the
omnibus ANOVA in
Box 4.3, and the
denominator is then used to calculate three ts, one each for A - NS, NS - B, and A - B. The
differences between B and the other two groups are significant, whereas the control groups A and
NS do not differ significantly from one another. The quantities shown in square brackets [] with
the t-values will be used later to compare the LSD and other multiple comparison procedures.
One convention to describe pairwise comparison results, as described previously, is to
order the means from smallest to largest, and draw lines under means that are NOT significantly
different from one another. The summary is presented at the bottom of Box 4.4. The LSD test
shows that A and NS are not significantly different, but both differ from B. A single line is
therefore drawn under A and NS. The line does not extend under B, which differs significantly
Single-Factor, Between-S Design Pairwise Comparisons - 4.10
from both A and NS. The line indicates that groups underlined together belong to a set of groups
whose means do not differ significantly from one another. Note this corresponds to outcome 3 in
Box 4.2. We later illustrate a second way to summarize pairwise comparison results in table
format. In the next few sections we illustrate various ways to perform the LSD test using
ONEWAY and GLM.
SPSS and LSD tests for the Brain Stimulation Study
Using syntax, the /POSTHOC = method option on ONEWAY provides a number of
standard methods for performing pairwise comparisons, including the Least Significant Difference
t-test (LSD). The term in brackets is the SPSS keyword for this method. To illustrate, the
command ONEWAY dep BY group /POSTHOC = LSD requests a between-S ANOVA followed
by unadjusted comparisons among all pairs of means (i.e., a t-test or the equivalent, a q-statistic
using a stretch of 2; the q statistic is discussed in the next section). The different pairwise
procedures discussed here and in following sections can be requested in one analysis by including
several different tests (e.g., /POSTHOC = LSD SNK). Researchers can also specify the desired
alpha level for some of the tests, for example, /POSTHOC = LSD (.10). ONEWAY also still
recognizes an earlier terminology for these posthoc tests; specifically, the command /RANGES =
option is equivalent to /POSTHOC =.
Box 4.5 shows the syntax commands to enter the data and perform pairwise comparisons
for our brain stimulation example, along with the omnibus ANOVA and LSD results. As shown
previously, the omnibus F is significant, suggesting that one or more of the equalities implied by
the H0: µ1 = µ2 = µ3 are false. Pairwise comparisons may help to determine which groups differ.
The /POSTHOC = LSD subcommand in Box 4.5 requests the LSD test also be reported. The
omnibus F is followed by the results for the post hoc tests. The results here have been edited to
remove some lines in the output as SPSS performs redundant comparisons (e.g., 1 vs. 2 and 2 vs.
1).
The I and J columns denote the levels of the factor that are being compared; I = 1 and J = 2
indicates that group 1 is being compared to group 2. The mean difference column provides the
difference between the two respective means, and the standard error column provides what would
be the denominator for the t-test. SPSS then provides significance levels for each comparison, but
Single-Factor, Between-S Design Pairwise Comparisons - 4.11
DATA LIST FREE / group press.BEGIN DATA1 4 1 5 1 62 3 2 2 2 43 10 3 8 3 6END DATA.
ONEWAY press BY group /POSTHOC = LSD.
Sum of Squares df Mean Square F Sig. Between Groups 38.000 2 19.000 9.500 .014 Within Groups 12.000 6 2.000 Total 50.000 8
Post Hoc TestsMultiple ComparisonsLSD
(I) (J) Mean Difference Std. Sig. group group (I-J) Error 1.0000 2.0000 2.0000000 1.1547005 .134 3.0000 -3.0000000(*) 1.1547005 .041 2.0000 3.0000 -5.0000000(*) 1.1547005 .005
Box 4.5. SPSS commands for ANOVA and pairwise comparisons, with the LSD results.
not actual observed values for the test statistics, either t or the q statistics discussed later.
Dividing the mean difference by the standard error would produce the t-value.
The p values confirm our previous manual analyses. Groups 1 and 2 do not differ
significantly, whereas groups 1 and 3 and groups 2 and 3 do differ significantly. The summary of
these results would be identical to that shown in Box 4.4.
The identical results to Box 4.5 can be obtained using the following GLM commands:
GLM press BY group /POSTHOC = group(LSD), or by requesting the LSD procedure using
menus and either ONEWAY or GLM.
The LSD Procedure and Type I Errors
A serious problem with multiple t-tests for post hoc comparisons is that the probability of
a Type I Error is controlled at α for each individual comparison, but the probability of one or more
Type I Errors for the entire experiment or family of comparisons can become very high across the
k × (k - 1) / 2 comparisons. A good analogy would be tossing a coin 6 times, and thinking that the
probability of a head would be .5. The probability of a head on each toss of the coin is .5, but the
probability of at least one head (i.e., one or more heads) on 6 tosses is much higher than .5. In fact,
Single-Factor, Between-S Design Pairwise Comparisons - 4.12
c = # pairwise comparisons k = k × (k-1)/2 αExp . 1-(1- α) c
--- -------------------------- -------------- 2 1 (12) .050 3 3 (12 13 23) .143 4 6 (12 13 14 23 24 34) .265 5 10 (12 13 14 15 23 24 25 34 35 45) .401 ...10 45 (12 13 14 15 16 17 18 ...) .940
Box 4.6. Experiment-wise Type I Error Rate.
the probability of one or more heads is one minus the probability that no heads are tossed on the 6
trials, which equals 1 - .56 = 1 - .016 = .984. There is a very high probability that one or more
heads will occur on 6 tosses of a fair coin, and it would be incorrect to use p = .5 rather than .984
as the probability.
Most of the time, multiple t-tests are similarly inappropriate for post hoc comparisons
because α for the experiment would be excessively high. The probability of one or more Type I
Errors across all the comparisons
of the experiment is sometimes
referred to as the experimentwise
(or familywise) α, symbolized
here as αExp; α by itself refers to
the probability of a Type I Error
for each comparison. Box 4.6
shows how αExp quickly
increases as the number of comparisons increases. Consider the case of all pairwise comparisons
between k = 4 groups. There are 4 × (4 - 1) / 2 = 6 possible t-tests (1-2, 1-3, 1-4, 2-3, 2-4, 3-4),
with the probability of a Type I Error for each individual comparison being at the specified α level
(say .05). The calculation for the probability of at least one Type I Error is: αExp = 1 - p(no Type I
error) . 1 - (1 - α)6 = 1 - .956 = 1 - .735 = .265. This is much higher than the .05 level that
researchers normally strive for. As shown in Box 4.6, the 45 comparisons among 10 groups
would virtually guarantee one or more Type I errors (p = .94).
THE BONFERRONI CORRECTION
One way to conduct pairwise comparisons to correct for the inflated probability of a Type I
error is to divide the p-value required for significance of each pairwise comparison by the total
number of comparisons being made. If three comparisons are made with an experiment-wise
alpha of .05, for example, a difference must produce a p-value of .017 (i.e., .05 / 3 = .017) in order
to be significant. This procedure is known as the Bonferroni correction and is available in several
of SPSS’s ANOVA programs. Bonferroni procedures generally need computer assistance in order
Single-Factor, Between-S Design Pairwise Comparisons - 4.13
to determine the observed ps required to conduct the tests (or equivalently the critical values for t
or q given non-standard p values). Given an alpha of .017, we can see from the p values in Box
4.5 that only the difference between groups 2 and 3 will be significant using this procedure.
Bonferroni using Syntax
Box 4.7 shows the SPSS commands and output for the Bonferroni test of our three-group
study. The format of the command is similar to the LSD test, but with a different posthoc
procedure specified. Box 4.7 uses the keyword BONFERRONI (or rather its abbreviation). The
significance levels in Box 4.7 can be compared directly to the standard alpha of .05. In essence,
SPSS conducts the Bonferroni adjustment by multiplying the p values for the LSD procedure by
the number of comparisons. The reasoning is that any LSD p value that is significant when
compared to .05 divided by the number of comparisons, will be significant if multiplied by the
number of comparisons and compared to .05. Note that the p values in Box 4.7 are 3 times the p
values in Box 4.5. The results confirm that only the difference between groups 2 and 3
(stimulation areas A and B) is significant.
The important lesson from the increased p values, which can also apply to other post hoc
procedures, is that the Bonferroni procedure is very conservative and perhaps should be avoided
unless there are exceptionally harmful consequences to making a Type I Error. Using such a
conservative test necessarily increases greatly the probability of a Type II Error (i.e., failing to
reject a false null hypothesis).
One anomaly that can arise when p values are multiplied by the number of comparisons is
that the resulting produce may be greater than 1.0, something that cannot occur with a probability.
ONEWAY press BY group /POSTHOC = BONF....Post Hoc TestsMultiple ComparisonsBonferroni
(I) (J) Mean Difference Std. Sig. group group (I-J) Error 1.0000 2.0000 2.0000000 1.1547005 .402 3.0000 -3.0000000 1.1547005 .122 2.0000 3.0000 -5.0000000(*) 1.1547005 .015
Box 4.7. Bonferroni test using ONEWAY.
Single-Factor, Between-S Design Pairwise Comparisons - 4.14
In that case, SPSS prints 1.0 as the p value for that comparison.
The Bonferroni test can be specified in the GLM procedure. The syntax commands for our
data would be: GLM press BY group /POSTHOC = group(BONF). The Bonferroni test is also
one of the options on the Post Hoc menu for ONEWAY and GLM, as shown shortly.
TESTS USING THE STUDENTIZED RANGE STATISTIC
The fact that multiple t-tests (i.e., the LSD method) provide very weak protection against
experimentwise Type I Errors challenges their use under most post hoc conditions. Researchers
need to use a higher significance level for each comparison so that the experimentwise error rate
does not become excessive. Although it is possible to perform such adjustments with the t-test
(e.g., set α for each comparison to αEXP divided by the # comparisons, as described for the
Bonferroni procedure), the two procedures that we consider next (TUKEY and SNK) make use of
the Studentized Range Statistic (q), a statistic that is different from t, although related (as we shall
see). Both the TUKEY and SNK methods adjust for the number of comparisons being made, but
differ in how extreme the adjustment is. We will also show that the LSD method can be viewed
as a q-statistic with no adjustment for the number of groups. Casting all tests in terms of the
q-statistic allows a clearer comparison of the various posthoc methods.
Studentized Range Statistic
Critical values for the q statistic were determined originally by selecting varying numbers
(k) of samples of a given size from a common population of scores. The two most extreme
groups (i.e., the highest and lowest means of the k groups) were contrasted using the Studentized
Range q statistic shown in Equation 4.2 with y&j = y&Max and y&j’ = y&Min. Critical values of q are
shown in Appendix A.4 as a function of dfError and a parameter called stretch, which for now can
be thought of as varying with k, the number of groups. As the stretch increases, the value of q
needed to reject the null hypothesis also increases. The reasoning is that the more groups sampled
from the population, the greater the likelihood of getting larger differences between the most
extreme groups, just by chance alone. Correcting for the number of groups maintains the
experimentwise Type I Error rate closer to α while making all possible comparisons between pairs
of means. In other words, raising the critical value of q as k increases corrects for the increased
Single-Factor, Between-S Design Pairwise Comparisons - 4.15
H0:µj'µj )
Ha:µj…µj )q '
yj& yj )
MSError1nj
'yj& yj )
MSError
21nj
%1nj )
' t 2 df'dfError
Equation 4.2. Formula for q Statistic.
n1 = n2 = n3 = n j = 3 MS Error = 2.0
Denominator q = %MSE/nj = %2.0/3 = .8165
Ordered Means y&A y&NS y&B
3.00 5.00 8.00
qB-A = (8.0 - 3.0)/.8165 = 6.124 [= %2 × 4.330 = %2 × t B-A]
qNS-A = (5.0 - 3.0)/.8165 = 2.449 [= %2 × 1.732 = %2 × t NS-A]
qB-NS = (8.0 - 5.0)/.8165 = 3.674 [= %2 × 2.598 = %2 × t B-NS]
Box 4.8. Calculation of Studentized Range Statistics (q).
experimentwise Type I Error rate as the number of possible comparisons increases.
Equation 4.2 shows two formula for the q statistic. Either version can be used when the
samples have equal njs, and the second when the samples have unequal njs. If you compare the
second version with the formula for t, you will see that the only difference is the presence of 2 as a
divisor for MSError under the square root sign. In essence, the denominator for q will be %2 smaller
than denominator for t, which means that q will be %2 larger than t. As for the LSD t, the qs for
pairwise comparisons will involve a common denominator unless ns for the various groups differ
from one another.
Box 4.8 shows calculations of the q-statistic for the brain stimulation study. The
denominator used for the three tests (.8165) depends on the number of observations per treatment
group (nj = 3 in this study) and the MSError from the omnibus ANOVA (2.0). The differences
between the three pairs of means are divided by this common quantity to determine the qs.
Comparison of the qs in Box 4.8 and the ts in Box 4.4 shows that the qs are larger than the ts by a
factor of %2, because the denominator for q is %2 smaller than the denominator for t, which makes
the final values for q larger. This is most easily seen in the unequal n version of Equation 4.2.
The TUKEY and SNK procedures both use the q statistic, and use identical calculations to
Single-Factor, Between-S Design Pairwise Comparisons - 4.16
obtain the observed values of q for the various comparisons. The two methods differ in what
value is used for the stretch parameter in Appendix A.4 and hence in the critical value against
which the observed qs are compared. But first, let’s redo the LSD test using the q statistic.
LSD and the q Statistic
We have stated that the LSD procedure makes no adjustment for the number of
comparisons (i.e., for the number of groups when all pairwise comparisons are made). Each test
is done as though we were only comparing the two groups involved in our test. In terms of the q-
statistic, this means that we use a stretch of two (i.e., k = 2) for all of our comparisons. The value
of v in Appendix A.4 is dfError, which equals 9 - 3 = 6 in the present example. The critical value for
the three q statistics would therefore be 3.46. Our observed q-values were qNS-A = 2.449 (not
significant), qNS-B = 3.674 (significant), and qA-B = 6.124 (significant).
This is (necessarily) the same conclusion that we came to earlier using the t statistic. Note
the values in square brackets ([ ]) in Boxes 4.8. These show that the observed t and q statistics are
related to one another by the factor of %2, that is, q = t × %2 and t = q / %2. Note also that the
critical values of t and q are similarly related; that is, tCritical = qCritical / %2, or %2 × tCritical = qCritical.
Specifically for our example, tCritical × %2 = 2.447 × %2 = 3.46 = qCritical. Because the critical values
of t and q are related to one another in exactly the same way that observed values of t and q are
related, the t and q versions of the LSD procedure must always come to the identical conclusions.
Note, however, that the equivalence occurs because we made no adjustment for the actual number
of groups in the study. Other procedures do make an adjustment.
TUKEY Honestly Significant Difference
The TUKEY Honestly Significant Difference test uses the number of groups (k) as the
stretch parameter to determine the critical value of q for all comparisons. Using k = 3 and v = 6
(i.e., dfError), q.05 = 4.34 for the TUKEY procedure. Note that 4.34 is considerably greater than the
critical value of 3.46 for the LSD procedure. Any pairwise q falling between these values will be
significant using the LSD procedure, but not significant using the TUKEY procedure. Given the
TUKEY critical value, only the difference between A and B is significant. Note that there is a
single critical value of q for the TUKEY method, just as there was a single critical value of q for
the LSD method (and for the equivalent t).
Single-Factor, Between-S Design Pairwise Comparisons - 4.17
The results of the TUKEY procedure correspond to the outcome presented earlier as one
hypothetical possibility for the pairwise comparison procedures; that is, our conclusions could be
summarized as in Box 4.3. Note that the difference between B and NS is significant by the LSD
test but is not significant by the TUKEY test. The difference between means must be larger to be
significant by TUKEY than by LSD. That is what makes the TUKEY method is more
conservative than the LSD method. Because we are less likely to reject the null hypothesis, we
are less likely to make a Type I error (i.e., reject a true null hypothesis). Of course, TUKEY is
also less powerful for the same reason; that is, we are more likely to make a Type II error (i.e., fail
to reject a false null hypothesis). More real differences between means will be missed (Type II
Error) because they are too small to be detected by the increased critical value.
The Student-Newman-Keuls Method
Although the TUKEY test corrects for the inflated probability of a Type I Error that occurs
with the LSD method (and although TUKEY is less conservative than BONFERRONI, as shown
later), there is some concern that the adjustment is still too extreme. That is, the TUKEY test may
be too conservative. One way to think of this intuitively is that the TUKEY method assumes that
the two most extreme of the k groups are being compared, but in fact only one of the comparisons
is between the highest and lowest of the k groups. The other comparisons involve means that are
not the most extreme, and hence may be handicapped by using a single critical value based on k.
This is a concern because excessive Type II Errors can occur for overly conservative methods.
That is, researchers will too often fail to reject a false null hypothesis.
To compensate for TUKEY’s conservativeness, it is possible to vary the stretch value used
to obtain the critical value for q. One such procedure is the Student-Newman-Keuls (or SNK)
method, named after various individuals involved in its development. The essential idea behind
the SNK method is that the stretch used to find the critical value of q is equal to the number of
steps between y&j and y&j’ , inclusive, when the means are ranked from the smallest to largest. This
number of steps is known as the stretch or range of the groups.
When comparing the two most extreme groups, the critical value for SNK will equal k, the
critical value for TUKEY, because the stretch between the highest and lowest means is always k,
the number of treatment conditions. When comparing adjacent groups, the critical value will be 2,
Single-Factor, Between-S Design Pairwise Comparisons - 4.18
A NS B SNK Critical Values (df Error = 6)
93.0 5.0 8.0 Stretch qα qObserved
B-A 1---2---3 3 4.34* 6.124 Reject H 0: µ A = µ B
NS-A 1---2 2 3.46 2.449 No Rej H 0: µ A = µ NS
B-NS 1---2 2 3.46* 3.674 Reject H 0: µ NS = µ B
A NS By&j = 3.0 5.0 8.0
--------- A vs. NS not signific antly different
Box 4.9. SNK Pairwise Comparisons.
as though there were only two groups being compared and equivalent to the LSD procedure.
Between these extremes, the stretch will be the span for the specific groups being compared (and
including other groups that fall between these when means are ordered from lowest to highest). In
essence, the value of q required to reject H0 becomes larger as the stretch increases.
The SNK method is illustrated in Box 4.9 for our example problem. Note that the group
means are ordered from lowest (A) to highest (B). The stretch used to obtain the critical value of q
is 3 for the A-B comparison (A to NS to B spans all 3 groups), whereas the stretch is 2 for the
A-NS and NS-B comparisons (A to NS spans two groups, as does NS to B). The appropriate
critical values are recorded in the column labelled qα in Box 4.8. In this particular example, the
results of the SNK procedure are the same as those for the LSD procedure; that is, group B differs
from both A and NS, which do not differ from one another.
Tabular Arrangement of Comparisons
One way in which SNK and other pairwise comparison tests are often done is to first rank
order the means from the smallest mean to the largest. A table is then prepared in which the rows
and columns are the ranked means, and the cells contain the values of q for comparisons between
corresponding row and column means. It is useful to arrange the means systematically in a table
of this sort, especially when there are many comparisons to be performed (i.e., k is large).
Single-Factor, Between-S Design Pairwise Comparisons - 4.19
Denominator q = %2.0/3 = .816
LSD q .05 = 3.46 TUKEY q .05 = 4.34
Ordered Means 3.0 5.0 8.0
SNK A NS B Stretch q.05
A - 2.449 6.124 LST ---> 3 4.34
NS - 3.674 LS ---> 2 3.46
B -
LST = significant by LSD, SNK, and TUKEYLS = significant by LSD and SNK
Box 4.10. Tabular Arrangement for Q-Tests.
The steps in this procedure (see Box 4.10) are: (a) order the means from smallest to
largest, (b) label he rows and columns of the table with the group labels, (c) calculate the observed
qs for the differences between specified row and column means (remember the denominators of
all comparisons are the same) and
enter the values in the
corresponding cells, and (d)
compare the observed values to the
critical value (or values) of q for
whichever post hoc procedure is
being used. The arrangement is
shown in Box 4.10.
There are several benefits to
this systematic approach. The
ordered means and arrangement in a
table make it easier to identify the k × (k - 1) / 2 comparisons to be performed. Start in the upper
right corner and work to the left until you reach the cell in which the column is the same group as
the row. Move down a row and start again at the right. Continue until you again reach the cell in
which the column is the same group as the row. Following this method ensures that all pairwise
comparisons are made. For the LSD and TUKEY procedures, the k(k-1)/2 observed qs are
compared to a single value (using k = 2 for LSD or k = k for TUKEY).
A second benefit of the tabular arrangement occurs with the SNK procedure. For the SNK
procedure, a different q is used depending on the stretch between means for the particular
comparison; the stretch varies from k to 2. The tabular arrangement facilitates comparison of
observed values to the appropriate critical values because stretch is related systematically to
specific locations in the table. The comparison in the upper right corner always involves the
maximum stretch of k because the upper right comparison is between the largest and the smallest
means. Moving from the upper right corner to the diagonal (indicated by dashes [-] in Box 4.10),
the stretch decreases from k to 2. The comparisons just above the diagonal always involve
adjacent means, for which the SNK procedure specifies k = 2 for the critical value. Diagonal rows
Single-Factor, Between-S Design Pairwise Comparisons - 4.20
Group 2 3 4 SNK Stretch q α
Means
2.00 1 3.34* 3.75* 3.98 4 4.05
5.34 2 - .41 .64 3 3.65
5.75 3 - .23 2 3.00
5.98 4 -
Box 4.11. Anomalous Outcome for SNK Procedure.
between the main diagonal and the upper right corner will involve intermediate stretches.
A third benefit
of a table occurs
because SNK uses a
different critical value
for different stretches,
which can lead to anomalous outcomes. For example, a relatively large difference between means
could be nonsignificant and a relatively small difference significant because the former occurred
at a wider stretch than the other. A wider stretch means a larger critical value for q. The
hypothetical example in Box 4.11 illustrates the problem.
The difficulty arises in row one. The 1 - 4 comparison is not significant because 3.98 is
less than 4.05, the critical value of q for a stretch of 4. But the other two comparisons on row one
are significant even though the differences between the means for 1 - 3 and 1 - 2 are smaller than
for 1 - 4. This occurs because qα is decreasing as the stretch lowers and produces an anomalous
outcome; specifically, the difference between 5.98 and 2.00 is not significant, whereas the smaller
differences between 5.75 and 2.00 and between 5.34 and 2.00 are significant.
To avoid such awkward outcomes, the usual practice is to start at the right of each row and
stop making comparisons for that row when the first nonsignificant effect occurs. In the present
example, the nonsignificant comparison for 1 - 4 means that comparisons 1 - 3 and 1 - 2 on row
one would not be performed, or at least would not be viewed as significant. Similarly for row
two, comparison 2 - 4 is not significant and would nullify comparison 2 - 3. Performing SNK
comparisons in this way ensures that the conclusions of the study are at least coherent. As we see
next, the possibility of anomalous conclusions leads SPSS to report the results of SNK
comparisons differently than the pairwise comparisons produced for the LSD and BONFERRONI
procedures. Instead, SPSS will report results as sets of means that do not differ significantly from
one another, analogous to the underlining technique that we have been using. If pairwise
probabilities were reported, uninformed users might interpret a smaller difference as significant
even though a larger difference was not significant.
Single-Factor, Between-S Design Pairwise Comparisons - 4.21
GLM press BY group /POSTHOC = group(LSD SNK TUKEY B ONF)....Post Hoc TestsMultiple Comparisons
(I) (J) Mean Difference Std. Sig. group group (I-J) Error LSD 1.0000 2.0000 2.000000 1.1547005 .134 3.0000 -3.000000(*) 1.1547005 .041 2.0000 3.0000 -5.000000(*) 1.1547005 .005
Tukey HSD 1.0000 2.0000 2.000000 1.1547005 .269 3.0000 -3.000000 1.1547005 .090 2.0000 3.0000 -5.000000(*) 1.1547005 .012
Bonferroni 1.0000 2.0000 2.000000 1.1547005 .402 3.0000 -3.000000 1.1547005 .122 2.0000 3.0000 -5.000000(*) 1.1547005 .015
Homogeneous Subsets group N Subset 1 2 Student-Newman- 2.0000 3 3.000000 Keuls(a,b,c) 1.0000 3 5.000000 3.0000 3 8.000000 Sig. .134 1.000
Tukey 2.0000 3 3.000000 HSD(a,b,c) 1.0000 3 5.000000 5.000000 3.0000 3 8.000000 Sig. .269 .090
Box 4.12. SPSS commands for ANOVA and pairwise comparisons, with the LSD results.
More SPSS Post Hoc Analyses for the Brain Stimulation Study
Box 4.12 shows the results for the SNK and TUKEY procedures. The LSD and
BONFERRONI procedures have also been requested in order to better appreciate the relationship
among the four tests. Looking first at the p values for the TUKEY procedure, notice that the
values fall between those for the LSD and BONFERRONI procedures. TUKEYs is more
conservative than the LSD procedure but less conservative than the BONFERRONI procedure. In
addition to reporting pairwise probabilities for the TUKEY test, SPSS also reports the results as
homogeneous subsets. This is the method that is similar to our underlining technique. SPSS
reports two subsets of means that are not significantly different from one another. Subset 1
contains groups 2 and 1 (note that groups are ordered from lowest to highest and that column
values for the subsets are the group means), and subset 2 contains groups 1 and 3. The p values
Single-Factor, Between-S Design Pairwise Comparisons - 4.22
Figure 4.1. ONEWAY Post-Hoc Comparison Screen.
printed at the bottom of the subset columns indicate the smallest p value for a comparison within
a subset. In the present case there are only two groups in each subset, so the p values correspond
to the p values reported in the Sig. column for the TUKEY test.
The SNK results are reported only as subsets, with subset 1 again containing groups 2 and
1, but subset 2 now only containing group 3. This indicates that the differences between 2 vs. 3
and 1 vs. 3 are both significant but the difference between 2 and 1 is not significant. Because
groups 2 and 1 are adjacent (i.e., stretch = 2), the p value for subset 1 is the p value for the LSD
comparison between groups 2 and 1 (.134). By similar reasoning, the p value for groups 1 vs. 3
will be .041, a significant difference. Means for groups 2 and 3 are more distant (i.e., stretch = 3
= k), and we can therefore infer that the p value for that comparison is equivalent to the TUKEY
value for this comparison (i.e., .012).
Post Hoc Comparisons Using Windows
Figure 4.1 shows the Post Hoc screen for ONEWAY ANOVA in Windows. To reach this
screen, we would follow Analyze | Compare Means | Oneway, move “press” into the Dependent
List and “group” into the
Factor list, and then click on
Post Hoc. A series of boxes
appears and the user selects
which pairwise comparison
procedures to use. In the box,
the procedures that we have
discussed (i.e., LSD, SNK,
TUKEY, Bonferroni) are
already selected. Click
Continue on this screen, and
then Ok when returned to the
One-Way ANOVA screen.
The omnibus Anova results
Single-Factor, Between-S Design Pairwise Comparisons - 4.23
would be reported, followed by the post hoc results.
Box 4.13 shows the equivalent syntax. The results for the LSD procedure are also shown,
first in the complete form produced by SPSS for Windows, and then edited to eliminate
redundancies.
Note in the original version that 6 rows were printed, although only three comparisons are
of interest. The original six rows involve three pairs of redundant comparisons because it does not
matter which group is first and which second (i.e., 1-2 and 2-1 involve the same comparison). The
redundant pairs are 1-2 and 2-1, 1-3 and 3-1, and 2-3 and 3-2. This represents the above-diagonal
and below-diagonal cells of the matrix, which we noted previously were redundant. These are
redundant because the direction of comparison does not affect significance; it only affects the sign
of the difference. Note in Box 4.13, for example, that the p-values for 1-2 (.041) and 2-1 (.041)
are identical and that their confidence intervals are mirror images of one another. The rows 2-1, 3-
1, and 3-2 can be eliminated without losing any information. These six unnecessary rows have
been eliminated from the edited printout shown at the bottom, which makes the results much
clearer. This editing has been done for most of our printouts.
ONEWAY press BY group /POSTHOC = LSD SNK TUKEY BO NFERRONI....Post Hoc Tests Mean Difference Std. Sig. 95% Confidence Interval (I-J) Error (I) GROUP (J) GROUP Lower Bound Upper Bound LSD 1.00 2.00 2.0000 1.15470 .134 -.8255 4.8255 3.00 -3.0000(*) 1.15470 .041 -5.8255 -.1745 2.00 1.00 -2.0000 1.15470 .134 -4.8255 .8255 3.00 -5.0000(*) 1.15470 .005 -7.8255 -2.1745 3.00 1.00 3.0000(*) 1.15470 .041 .1745 5.8255 2.00 5.0000(*) 1.15470 .005 2.1745 7.8255
Edited Results for Post Hoc Tests Mean Difference Std. Sig. 95% Confidence Interval (I-J) Error (I) GROUP (J) GROUP Lower Bound Upper Bound LSD 1.00 2.00 2.0000 1.15470 .134 -.8255 4.8255 3.00 -3.0000(*) 1.15470 .041 -5.8255 -.1745 2.00 3.00 -5.0000(*) 1.15470 .005 -7.8255 -2.1745
Box 4.13. Original and Edited Post Hoc Results for Windows.
Single-Factor, Between-S Design Pairwise Comparisons - 4.24
Figure 4.2. Post Hoc Tests using GLM.
Next we briefly examine how we would perform post hoc tests using GLM in Windows
SPSS. Figure 4.2 shows the screen after selecting Analyze | GLM | Univariate, entering group as
the Fixed Factor(s) and press as the Dependent Variable, and then clicking on Post Hoc to bring
up the Univariate:
Post Hoc window.
We have moved the
group Factor into
the Post Hoc Tests
for: window, and
selected the four
tests of interest.
Clicking on
Continue would
return to the
Univariate window,
and clicking OK
would initiate the
analysis. The format of the post hoc results is identical to that just presented for ONEWAY,
although the omnibus F output is somewhat different in format.
Comparison of Post Hoc Procedures
To facilitate comparison across the four procedures, the p values for the various post hoc
procedures are summarized in
Box 4.14 (several of the SNK ps
had to be inferred from SNK’s
equivalence to LSD for a stretch
of 2 and to TUKEY for a stretch
of 3). As we proceed from the top
row (i.e., LSD) to the bottom row
(i.e., Bonferroni), the p values
Comparison
Test 1-2 1-3 2-3
LSD .134 .041 .005
SNK .134 .041 .012
TUKEY .269 .090 .012
BONFERRONI .402 .122 .015
Box 4.14. Comparison of post hoc p values.
Single-Factor, Between-S Design Pairwise Comparisons - 4.25
either remain fixed or increase (i.e., differences become less significant). This reflects the
increasing conservativeness of the procedures.
The Bonferroni was introduced as a test in which the nominal alpha level was divided by
the number of comparisons. With three comparisons, as here, the critical alpha becomes .05 / 3 =
.017. We could then compare the LSD ps to .017 to see what differences are significant. Only the
2 - 3 comparison, p = .005, has a p less than .017 and would be significant by the Bonferroni test.
But this operation suggests a way to determine the p values for the Bonferroni, as shown in Boxes
4.15 and 4.17, assuming that we already know the LSD p values. What we need to do is multiply
the LSD p values by the number of comparisons (3 in this case) and observe which ps remain #
.05. This is the reverse of the divide-p approach. Multiplying the LSD ps by three in fact produces
the BONFERRONI p values shown in Boxes 4.15 and 4.17 (e.g., 3 × .134 = .402).
Whether in terms of p values, as here, or in terms of critical values of t or q, as illustrated
later, it is clear that the Bonferroni test is very conservative, too much so, according to many
statisticians. There have been several adjustments to the basic Bonferroni test that we used, some
of which are implemented in SPSS. That the Bonferroni is overly conservative can be appreciated
by considering what would have happened if one of our LSD comparisons had produced a p as
large as .40. The Bonferroni approach would say to calculate the new probability as 3 × .40 =
1.20, which is impossible as ps must fall between 0 and 1. In fact, 1.0 would appear as the
probability in this case, but the example serves to illustrate that simply multiplying p times the
number of comparisons is a very aggressive approach to the problem of Type I errors with
pairwise comparisons.
Single-Factor, Between-S Design Pairwise Comparisons - 4.26
PAIRWISE COMPARISONS FOR THE INTERVIEW STUDY
Previous chapters analyzed data from a study of attractiveness ratings for interviews given
four groups receiving different instructions about the purpose of the interview: Group 1 received
no additional information, Group 2 were told the interviews were for jobs, Group 3 were told they
were psychiatric interviews, and Group 4 were told they were parole interviews. The results for
the omnibus Anova
and descriptive
statistics are
presented in Box
4.15. The
differences among
the four means were
clearly significant.
Although the order of the means suggests something about the nature of this significant
difference, more precise conclusions require follow-up analyses to determine which groups differ
from one another (or some other tests of the pattern among the means, as in chapter 6).
The LSD t Test
The calculations
for the LSD t-test are
shown in Box 4.16. A
common denominator
(1.664) has been
calculated using MSError
= 13.839 from Box
4.15, and the common
group n of 10. The
means have been
ordered from lowest to
highest in the matrix
Denominator t = SQRT(13.839(1/10 + 1/10)) = 1.664
df = 36 t Critical = 2.028
t 1-4 = (27.30 - 20.00)/1.664 = 7.30/1.664 = 4.387
Ordered Group Means
4 3 2 120.00 22.70 26.00 27.30
4 - 1.623 3.606 L 4.387 L
3 - 1.983 2.764 L
2 - .781
1 -
Summary of Differences4 3 2 1-------- --------
--------
Box 4.16. LSD t Calculations and Results for Interview Study.
Source df SS MS F pBetween 3 325.80 108.600 7.848 .0 004Within 36 498.20 13.839Total 39 824.00
Group Mean SD n j
1 27.30 3.302 102 26.00 3.621 103 22.70 4.218 104 20.00 3.682 10
All 24.00 4.597 40
Box 4.15. Anova and Descriptive Statistics for Interview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.27
and differences between the means divided by 1.664 to produce the reported ts. The df for each t
is dfError = 36, which gives tCritical = 2.028 for α = .05. The significant ts are indicated by the
superscript “L” in Box 4.16.
The overall pattern summarized in Box 4.16 using the underline (subset) notation is less
than ideal, as reflected in the overlapping lines. The overlapping lines indicate that a number of
the nonsignificant differences are not transitive. For example, the 1 - 2 comparison is not
significant, the 2 - 3 difference is not significant, but the 1 - 3 difference is significant. The
pattern would have been much better if the 2 - 3 comparison had also been significant (note how
close 1.983 is to the critical value of 2.028). Unfortunately, the SNK and TUKEY procedures are
not going to help with the 2 - 3 comparison, since the LSD procedure is the most liberal of the
three; that is, if a difference is not significant by LSD, then it cannot be significant by SNK or
TUKEY.
The q Statistic
Box 4.17 presents the q calculations and results. Note that qDenominator is smaller than
qDenominator = SQRT(13.839(1/10)) = 1.176 [= 1.664/ %2]
df = 36
q1-4 = (27.30 - 20.00)/1.176 = 7.30/1.176 = 6.207[= 4.3 87 × %2]
Ordered Group Means4 3 2 120.00 22.70 26.00 27.30 Stretch qCritical
4 - 2.296 5.102 LST 6.207 LST 4 3.813 - 2.806 3.912 LST 3 3.452 - 1.105 2 2.871 -
Summary of Differences4 3 2 1
LSD -------- ----------------
SNK -------- ----------------
TUKEY -------- ----------------
Box 4.17. LSD q Calculations and Results for Interview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.28
tDenominator by a factor of %2, and that qObserved is larger than tObserved by the same factor of %2. With df
= 18, the critical value of q for the LSD test would be that for a stretch of 2, q.05, LSD = 2.87 (which
is larger than tCritical by a factor of %2). The superscripts in the matrix show that the same three
differences are significant as reported for t, which must always be the case barring any errors in
the calculations. The summary for LSD is identical in Boxes 4.17 and 4.16. For the Tukey HSD
procedure, the stretch is always k = 4, and q.05, Tukey = 3.81. For Tukey, the 1-3 difference stops
being significant, although just barely. The pattern then is identical to that observed with the LSD
and SNK procedures. All share the same weakness that Groups 2 and 3 do not differ from one
another.
The SNK procedure involves three different critical values. Starting in the upper right
corner with the 1 - 4 comparison, the stretch is 4 and the critical value is 3.81. The 1 - 4 difference
is significant by SNK (as it must be since it was significant by the more stringent Tukey
procedure). The 2 - 4 cell is considered next and is compared to the critical value for a stretch of
3, that is, to 3.45. This difference too is significant. The left-most cell in the top row (i.e., the cell
just above the diagonal) involves a stretch of 2 and a critical value of 2.87 (i.e., the same critical
value as the LSD procedure). This difference is not significant. The same procedure is followed
for the second row, with the rightmost cell involving a stretch of 3, a critical value of 3.45, and a
significant difference. The next cell to the left involves a stretch of 2 and is not significant. The
final comparison is for the 1 - 2 cell, which involves a stretch of 2. The difference is not
significant.
Note again that the 2 - 3 comparison in Box 4.17 just misses being significant by the LSD
and SNK procedures. If this difference had been significant, then the pattern of results would be
much tidier. Specifically, the two “control” conditions (1 and 2) would not differ from each other,
the two “stigma” conditions (3 and 4) would not differ from each other, but all four comparisons
between “control” and “stigma” groups (i.e., 1 - 3, 1 - 4, 2 - 3, 2 - 4) would be significantly
different. Such a pattern lends itself to a tidy conclusion.
Note also the potential for an awkward conclusion if only individual comparisons are
considered. In the second row, the critical value for 1 - 3 is 3.45 and the critical value for 2 - 3 is
2.87. If the observed qs for both cells fell between these values, then the 1 - 3 comparison would
Single-Factor, Between-S Design Pairwise Comparisons - 4.29
be not significant and the 2 - 3 comparison would be significant. This is an awkward outcome at
best, and perhaps even an illogical one. Starting at the right of each row and stopping at the first
non-significant difference avoids this problem.
Box 4.18
summarizes the
distinct critical values
of q used for these
three post hoc
procedures. Irrespective of stretch, the LSD procedure uses the same (low) critical value of 2.87
for all six comparisons, and the Tukey HSD test uses the same (high) critical value of 3.81. The
SNK falls between these extremes, using the maximum stretch for the biggest span between 1 - 4
(qAlpha = 3.81); the minimum stretch for the adjacent spans of 1 - 2, 2 - 3, and 3 - 4 (qAlpha = 2.87);
and the in-between stretch of three for the remaining two comparisons of 1 - 3 and 2 - 4 (qAlpha =
3.45).
To perform the Bonferroni test on these data, we would divide alpha by 6, the number of
comparisons to get the alpha to use for each comparison; that is, pair-wise alpha = .05 / 6 =
.008333. This test is best done using a statistical package like SPSS. But we can also obtain a
critical value for q based on p = .008333. Using procedures described shortly, the critical value is
3.95, even larger than the critical value for Tukey. Expressed differently, the Bonferroni
adjustment is even more conservative than Tukey.
SPSS Analyses for the Interview Study
Box 4.19 shows the GLM syntax commands to conduct the omnibus ANOVA for the
interview study, and to request the post hoc analyses. Note that the format of the post hoc request
is slightly different for GLM than for ONEWAY. Because GLM can have multiple factors, it is
necessary to specify which factor should be tested (inst here), and then the various post hoc
procedures are listed. This analysis could also have been conducted using menus, as described
previously.
Comparisons Stretch LSD SNK TUKEY
1-4 4 2.87 3.81 3.81
1-3, 2-4 3 2.87 3.45 3.81
1-2, 2-3, 3-4 2 2.87 2.87 3.81
Box 4.18. Comparison of Critical Values for Interview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.30
Edited LSD results are also shown in Box 4.19. Differences 1 - 3, 1 - 4, and 2 - 4 are
significant, and 2 - 3 approaches significance, p = .055. One attractive feature of p values is that
researchers can see how close to significance, various differences are. Although not presented as
such, researchers can determine from the analysis in Box 4.19 that there are three subsets of
means that are not significantly different: 1 - 2, 2 - 3, and 3 - 4.
GLM rate BY inst /POSTHOC = inst(LSD SNK TUKEY BONF ERRONI).
Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 325.800(a) 3 108.600 7.847 .000 Intercept 23040.000 1 23040.000 1664.874 .000 INST 325.800 3 108.600 7.847 .000 Error 498.200 36 13.839
Total 23864.000 40
Corrected Total 824.000 39
a R Squared = .395 (Adjusted R Squared = .345)
Mean Difference Std. Sig. (I-J) Error (I) INST (J) INST LSD 1.00 2.00 1.3000 1.663 66 .440 3.00 4.6000(*) 1.663 66 .009 4.00 7.3000(*) 1.663 66 .000 2.00 3.00 3.3000 1.663 66 .055 4.00 6.0000(*) 1.663 66 .001 3.00 4.00 2.7000 1.663 66 .113
Box 4.19. GLM Analysis of Interview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.31
Edited results for the SNK, TUKEY, and BONFERRONI tests are presented in Box 4.20,
the TUKEY in both table and subset formats. The SNK procedure is reported in terms of
homogeneous subsets. The same subsets were identified by the LSD procedure. From the
summary, we
can infer that
group 1 differs
from groups 3
and 4, and
group 2 differs
from group 4.
Note that the p
values reported
for the SNK
comparisons
within subsets,
all of which
involve
adjacent
groups, are
identical to the
LSD ps
reported for
the same three
comparisons (i.e., 4 - 3, 3 - 2, 2 - 1).
TUKEY results are presented in both table and subset format, and lead to the same
conclusions as the LSD and SNK procedures. But note that the p values for TUKEY are
considerable higher than for LSD and SNK. The 2 - 3 difference, for example, had a p value of
.055 in both LSD and SNK, but now has a much higher TUKEY p of .213.
The BONFERRONI results are even less significant; p values are larger again and now
Mean Difference Std. Sig. (I-J) Error (I) INST (J) INST Tukey HSD 1.00 2.00 1.3000 1.663 66 .862 3.00 4.6000(*) 1.663 66 .042 4.00 7.3000(*) 1.663 66 .001 2.00 3.00 3.3000 1.663 66 .213 4.00 6.0000(*) 1.663 66 .005 3.00 4.00 2.7000 1.663 66 .379
Bonferroni 1.00 2.00 1.3000 1.663 66 1.000 3.00 4.6000 1.663 66 .054 4.00 7.3000(*) 1.663 66 .001 2.00 3.00 3.3000 1.663 66 .330 4.00 6.0000(*) 1.663 66 .006 3.00 4.00 2.7000 1.663 66 .680
Homogeneous Subsets N Subset INST 1 2 3 Student-Newman- 4.00 10 20.0000 Keuls(a,b,c) 3.00 10 22.7000 22.7000 2.00 10 26.0000 2 6.0000 1.00 10 2 7.3000 Sig. .113 .055 . 440 Tukey 4.00 10 20.0000 HSD(a,b,c) 3.00 10 22.7000 22.7000 2.00 10 26.0000 2 6.0000 1.00 10 2 7.3000 Sig. .379 .213 . 862
Box 4.20. Other Post Hoc Analyses for Interview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.32
only two of the differences are significant (1 - 4 and 2 - 4). The pattern of differences defines two
subsets, 1 & 2 & 3 as one subset and 3 & 4 as another subset. Although different, this result is
untidy because group 3 (the psychiatric interview) belongs to both sets.
Box 4.21 summarizes the available
p values for the six comparisons and the
four tests. The SNK values for the
intermediate stretch of 3 are not available,
although we can infer that they are less
than .05 and even less than the
corresponding TUKEY values. The p
values remain constant or increase as the
tests become more conservative.
Moreover, the p values for the Bonferroni test are the LSD values times the number of
comparisons, six (e.g., 6 × .055 = .330 for the 2-3 comparison), although one of the Bonferroni p
values was set arbitrarily to the maximum of 1.00 (i.e., 6 × .440 > 1.00). The equal signs indicate
tests that are equivalent; specifically, the SNK ps for stretch 2 are equal to the LSD ps, and the
SNK p for stretch 4 (i.e., k) is equal to the TUKEY value.
In this case, the LSD, SNK, and TUKEY procedures all led to the same conclusions; we
cannot reject the following H0s: µ1=µ2, µ2=µ3, µ3=µ4. The overall pattern of results is somewhat
unclear because of the combined equivalence and nonequivalence of various groups. Focussing
on the parole group, for example, subjects in this condition rated the attractiveness of the
interviewees as lower than groups 1 and 2 (the no instruction and job interview groups) but did
not rate interviewees significantly lower than group 3 (the psychiatric group). However, the
psychiatric group, which was not different from the parole group was also not significantly
different from the job interview group. Overall the results are rather untidy. Ideally, researchers
should have theoretical reasons to predict differences between certain groups, and nondifferences
between other groups. Such planned or focussed comparisons are often called contrasts in the
ANOVA context. These are discussed in the next chapter.
Stretch Groups LSD SNK TUKEY BONFERRONI2 1-2 .440= .440 .862 1.00 2-3 .055= .055 .213 .330 3-4 .113= .113 .379 .6803 1-3 .009 ? .042 .054 2-4 .001 ? .005 .0064 1-4 .000 .001= .001 .001
Box 4.21. Comparison of Post Hoc p values forInterview Study.
Single-Factor, Between-S Design Pairwise Comparisons - 4.33
CALCULATING CRITICAL AND P VALUES FOR POST HOC TESTS
This section describes how to use SPSS to obtain critical values of t and p for the various
Post Hoc tests described in this chapter. Calculating these values, rather than using more limited
tables, allows us to compare more clearly how conservative each of the tests is. In addition, we
can use SPSS to calculate p-values for the observed test statistic. This is helpful when the exact
degrees of freedom required are not present in tabled critical values for q.
Determining Critical Values for Post Hoc Tests
One of the nice features of the Unix
version of SPSS, as illustrated in this chapter,
is that it provided critical values of q for the
post hoc statistics. We could then arrange
these critical values in order to demonstrate
clearly which procedure is most conservative
and which most liberal. Box 4.22 summarizes critical values of q that would actually be reported
in Unix versions of SPSS for the interview study. We can clearly see that the LSD procedure is
most conservative, qAlpha = 2.87 for all stretches; SNK is next with qAlpha = 2.87, 3.45, and 3.81 for
stretches 2, 3, and 4, respectively, TUKEY is next with qAlpha = 3.81 for all stretches, and
BONFERRONI is most conservative with qAlpha = 3.95 for all stretches.
Later versions of SPSS do not
provide these critical values.
Moreover, our table for q would not
give us values for the Bonferroni test
and may not have entries for certain df.
It is possible to overcome these
limitations using SPSS, because it
allows users to determine critical
values, as illustrated in Box 4.23. Box
4.23 shows the computation of the
critical value of t for the various post
LSD SNK TUK BONStretch
2 2.87 2.87 3.81 3.953 2.87 3.45 3.81 3.954 2.87 3.81 3.81 3.95
Box 4.22. Critical Values for Interview Study.
DATA LIST FREE / str df.BEGIN DATA2 363 364 36END DATA.COMP tlsd = IDF.T(1 - .05/2, df).COMP qlsd = IDF.SRANGE(1-.05, 2, df).COMP qsnk = IDF.SRANGE(1- .05, STR, df).COMP qtuk = IDF.SRANGE(1- .05, 4, df).COMPUTE qbonf = IDF.SRANGE(1 - (.05/6), 2, df).FORMAT str df (F2.0).LIST.
str df tlsd qlsd qsnk qtuk qbonf 2 36 2.028094 2.868158 2.868158 3.808798 3.948445 3 36 2.028094 2.868158 3.456758 3.808798 3.948445 4 36 2.028094 2.868158 3.808798 3.808798 3.948445
Box 4.23. Computing Critical Values for Post HocTests.
Single-Factor, Between-S Design Pairwise Comparisons - 4.34
hoc tests. The first six lines read the stretches and df for which we want to calculate critical
values. We then use the SPSS functions IDF.T and IDF.SRANGE to compute critical values for
these tests, using alpha = .05 in this example. The printed results are then listed. Note that the
columns headed qlsd, qsnk, qtuk, and qbonf reproduce the values in Box 4.22, although with more
precision. Substituting different values for stretch and df in Box 4.23 allows us to compute
critical values for post hoc tests in any study.
Determining P Values for Post Hoc Tests
We can also use SPSS to
calculate p values for the various post
hoc tests, which allows us to compare
the conservativeness of the different
procedures, to overcome the
limitations of our q test values, and to
fill in some “missing” values that are
not always reported by SPSS. See
Box 4.24 for a summary of the
available p values for the interview
study. The complete table of
observed p values is presented in Box
4.24. Except for the fact that PSNK = PLSD for stretch = 2 and PTUK = PSNK for stretch = 4,
the table clearly shows that the tests become less significant as we go from LSD to SNK to
TUKEY to BONFERRONI. Note also that the p value for PBON is greater than 1.00, .439703*6
= 2.638220. SPSS reports this value as 1.00 in the post hoc printouts because a p value cannot be
greater than 1.
CONCLUSION
We have reviewed various pairwise post hoc comparisons to draw more specific
conclusions about an omnibus ANOVA. These procedures are typically used when the overall F
is significant. We have also seen how SPSS’s ONEWAY and GLM procedures report in diverse
ways the results for such pairwise comparisons, and that post hoc procedures can lead to
DATA LIST FREE /comp str df qobs.BEGIN DATA12 2 36 1.10523 2 36 2.80634 2 36 2.29613 3 36 3.91224 3 36 5.10214 4 36 6.207END DATA.COMP plsd = 1 - CDF.SRANGE(qobs, 2, df).COMP psnk = 1 - CDF.SRANGE(qobs, str, df).COMP ptuk = 1 - CDF.SRANGE(qobs, 4, df).COMP pbon = 6*(1 - CDF.SRANGE(qobs, 2, df)).FORMAT comp str df (F2.0) qobs (F6.3).LIST.
comp str df qobs plsd psnk ptuk pbon 12 2 36 1.105 .439703 .439703 .862393 2.638220 23 2 36 2.806 .054904 .054904 .212796 .329425 34 2 36 2.296 .113204 .113204 .378626 .679226 13 3 36 3.912 .008894 .023592 .042229 .053362 24 3 36 5.102 .000931 .002616 .004931 .005584 14 4 36 6.207 .000095 .000532 .000532 .000573
Box 4.24. P Values for Observed Post Hoc Statistics.
Single-Factor, Between-S Design Pairwise Comparisons - 4.35
anomalous outcomes that do not permit a tidy summary of results or a tidy explanation. Choosing
a post hoc test can also be complex. There are many such tests, including some that will
accommodate comparisons between combinations of groups (e.g., groups 1 and 2 versus groups 3
and 4). In essence researchers need to consider thoughtfully the consequences of making Type I
and Type II errors, and then select a procedure with the desired degree of conservativeness /
liberalness. The next chapter examines a second approach to multiple comparisons, one that is
more likely to lead to “elegant” conclusions, especially if a priori expectations are correct.
Single-Factor, Between-S Design Pairwise Comparisons - 4.36
Single-Factor Between-S Design Planned Comparisons 5-1
CHAPTER 5:
PLANNED CONTRASTS FOR SINGLE FACTOR BETWEEN-S DESIGN
The Basics of Planned Contrasts . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Defining Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 3
Formula for Contrast Analysis . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 4
Contrasts for the Two-group Agitation Study . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Manova and Contrasts for Agitation Study . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 8
GLM and Contrasts for the Agitation Study . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 9
Contrasts for k > 2 Designs . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Orthogonal Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 11
Polynomial Contrasts (Trend Analysis) . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 15
Planned Contrasts for the Brain Stimulation Study . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Equivalence of Different Values for Contrast Coefficients . . . . . . . . . . . . . . . . . . . . . . . 19
Polynomial Contrasts for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 20
SPSS and Contrasts for the Brain Stimulation Study .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
ONEWAY and Planned Contrasts for the Stimulation Study . . . . . . . . . . . . . . . . . . . . . 21
MANOVA and Contrasts for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . 22
GLM and Contrasts for the Brain Stimulation Study . .. . . . . . . . . . . . . . . . . . . . . . . . . . 24
Polynomial Contrasts in GLM and MANOVA . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 27
Planned Contrasts for the Interview Study . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Control versus Stigmatized Groups . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 30
Control versus Stigmatized Groups: SPSS Analyses . . .. . . . . . . . . . . . . . . . . . . . . . . . . 32
Polynomial Contrasts for the Interview Study: SPSS Analyses . . . . . . . . . . . . . . . . . . . 36
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix 5.1: Contrasts and Differences Between Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Appendix 5.2: GLM and SSs for Planned Comparisons . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Single-Factor Between-S Design Planned Comparisons 5-2
A "few" specific comparisons are plannedprior to examining the means (researcherscannot "plan" all possible comparisons), Fneed not be significant (but can be), and littleor no adjustment is made for the number ofcomparisons (the strong version of this claimis generally doubtful).
Box 5.1. Conditions for Planned Contrasts.
Chapter 4 described four (of many) statistical procedures for performing all-possible
pairwise comparisons in a single-factor, between-S design. But researchers may be interested in
differences that cannot readily be tested by pairwise comparisons. For example, the difference
between two control groups and two treatment groups might be critical for the research, as might
a linear effect of treatment (i.e., a systematic increase or decrease) across the ordered levels of a
factor. Such analyses are generically referred to as contrasts, and are usually Planned (or A
Priori ) Contrasts. This chapter describes the logic of such planned contrasts, once again using
the studies analyzed previously by other methods. We also discuss using SPSS ANOVA
methods to conduct these planned comparisons; a later chapter demonstrates that these tests can
readily be done using multiple regression.
THE BASICS OF PLANNED CONTRASTS
Something more than pairwise comparison procedures are required when a specific
pattern of differences among groups is predicted. Given a treatment group and two control
groups, for example, researchers might expect that the treatment group mean would differ from
the average of the two control group means, and that the two control group means would not
differ. Or if the Between-S factor involved ordered levels of some variable (e.g., amount of
reinforcement, age), then the researchers might be interested in examining various linear and
non-linear effects. Such designs call for Planned Contrasts.
Box 5.1 shows the general conditions
under which "a priori" or planned comparisons
are appropriate. Planned comparisons
generally involve a relatively small number of
specific tests rather than all possible
comparisons. In the single-factor design, for
example, the number of comparisons is often
limited to the df for the treatment effect (i.e., k - 1 comparisons). Moreover, the specific tests are
planned prior to observing the data, usually on the basis of theoretical predictions and/or previous
findings. That is why such comparisons are often called a priori comparisons.
There are several advantages to a priori rather than post hoc comparisons. One advantage
Single-Factor Between-S Design Planned Comparisons 5-3
is that the omnibus F need not be significant in order to perform planned comparisons. Even a
nonsignificant F can be followed by planned contrasts, which are generally more sensitive to
predicted differences than the omnibus F, as we will see in several of our examples. By more
sensitive, we mean that a planned comparison can show a particular contrast to be significant
even when the omnibus F is not. A second advantage is that less of an adjustment to alpha is
needed for the number of comparisons being performed; indeed, many researchers do not make
any adjustment, at least for certain kinds of planned comparisons.
Defining Contrasts
A contrast is defined as the sum of each group mean multiplied by a coefficient for each
group (i.e., a signed number, cj), with the restriction that the sum of the coefficients alone across
the k groups equals 0. An uppercase L (for linear contrast) will be used to represent a specific
contrast and lowercase c's used for the individual coefficients. Our definition for contrast would
be written as: L = Σcj y&j, where Σcj = 0. The subscript j in this notation indicates the level of the
factor being examined, identical to its use for group means and other group statistics.
To illustrate with four groups, one possible set of contrast coefficients would be c1 = !3,
c2 = !1, c3 = +1, and c4 = +3; the sum of these coefficients is 0. Another possible set of contrast
coefficients would be c1 = !3, c2 = +1, c3 = +1, and c4 = +1; the sum of these coefficients is also
0. Numerous other contrasts exist for four groups.
Each set of contrast coefficients in essence defines and tests for a particular pattern of
differences among the means. The second set, for example, tests whether the mean for group 1
differs from the average of the other three means. Although slightly more complex, the first set
of coefficients tests whether there is a linear increase or decrease across the four means; note that
the coefficients !3, !1, +1, and +3 increase by the same amount (2 units) for each successive
group. That defines a linear pattern. We will see, especially in Chapter 6, that contrasts are
closely related to the indicator variables used in regression analysis of ANOVA designs).
Single-Factor Between-S Design Planned Comparisons 5-4
L ' j
k
j '1
c j y j j c j ' 0 SEL' MSError jc
2j
n j
t L'L&0
SEL
df 'df Error
Equation 5.1. Linear Contrasts.
SSL'n j L 2
j c 2j
FL 'SSL/1
MSError
' t 2L df L ' 1,df Error η2'
SSL
SSTotal
Equation 5.2. Anova for Contrasts.
Formula for Contrast Analysis
Equation 5.1 shows several formula relevant to contrasts, including its definition. We can
test the significance of our contrast against an hypothesized value of 0 by calculating the SEL and
using this SE as the denominator for a t-test, as shown in Equation 5.1. The t will have df = dfError
because MSError appears in the denominator for t just as it did for the LSD t test.
It is also possible to analyze contrasts using the F statistic, which has a number of
advantages, as we shall see. Equation 5.2 shows formula to calculate a SS for any contrast, which
can then be used to calculate an F statistic. One benefit of the F-test approach is that the SS
associated with the contrast can be used to obtain an η² for the contrast, as shown in Equation
5.2. A second benefit of the F-test approach is that the sum of the SSs for the k - 1 contrasts will
equal SSTreatment, assuming that the k - 1 contrasts were selected to be independent, as described
below. Equation 5.2 also shows that the t-test of Equation 5.1 is equivalent to the F-test of 5.2;
hence, only one of these tests is normally used.
These formulae may look novel, but we will see in the following section that they do
reproduce the results from more familiar versions of the t-test and the F-test. And some
substitution (e.g., sp2 for MSError) and a little algebra would show why they are equivalent.
Because contrasts do use a novel method to perform the comparisons, we begin with a two group
example.
Contrast analysis involves comparisons (or contrasts) between specific groups or specific
Single-Factor Between-S Design Planned Comparisons 5-5
y&1 y&2
12.0 10.0
c j -1 +1 Σc j = -1 + +1 = 0
L = -2.0 = Σc j y&j
= -1 × 12.0 + 1 × 10.0= -2 = y &2 - y &1
T-test of C
SE L = 1.095 = %(MSError Σ(c j ²/n j ))= %(3.6(1/6+1/6))
t L = -1.83 = (L - 0) / SE L
= -2.0 / 1.095= t y&2-y&1 = %3.33 = %F
F-test of C
SS L = 12.0 = n j L²/ Σc²= 6 × -2.0² / (-1²+1²)= SS Treatment (when k = 2)
F L = 3.33 = (SS L/1) / MS Error= 12.0 / 3.6= F ANOVA = 1.83² = t L²
η2L = .25 = SS L / SS Total = 12.0/48.0
Box 5.2. Contrast Analysis for k=2 Independent GroupANOVA.
combinations of groups. When there are two groups, the only contrast is between those two
groups, so contrast analysis essentially duplicates the results of the ANOVA and the
corresponding t-test between the two group means. This redundancy means that the two-group
design provides an excellent introduction to contrasts, because the one contrast will be equivalent
to tests with which we are already familiar. This is helpful because planned comparisons can
appear somewhat abstract and a little mysterious at first. This mystery also lessens somewhat if
we keep in mind that contrast analysis can be viewed as something of a hybrid between Anova
and regression approaches. Essentially, researchers define contrasts (sets of numbers that define
some expected pattern) and then examine the correlation between these defined numbers (i.e., the
pattern) and the variability in the group means.
CONTRASTS FOR THE TWO-GROUP AGITATION STUDY
The two-group agitation study was described in previous chapters. It involved a
comparison of agitation scores for 6
Control and 6 Treatment subjects.
Our calculations produced t = 1.83,
and F = 3.33, neither of which was
significant by the standard two-tailed
test (see Boxes 2.4 and 2.6) for the
analyses. MSError (equivalently, sp2)
was 3.6.
Box 5.2 illustrates the
contrast analysis for this two-group
Agitation study, using coefficients of
-1 and +1 for groups 1 and 2,
respectively. Note that -1 + 1 = 0, a
prerequisite for a contrast. The
computed value of the contrast is !1
× 12.0 + 1 × 10.0 = !2.0, which in
this case equals the difference
Single-Factor Between-S Design Planned Comparisons 5-6
between the means.
As shown in Equations 5.1 and 5.2, this difference or contrast can be tested for
significance in two ways. A t-test can be computed using as the standard error of the contrast, a
quantity based on MSError from the ANOVA, the contrast coefficients, and the sample sizes. For
the present example involving only two groups, the computations are shown in Box 5.2. SEL =
1.095, which equals the denominator for the t-test in our earlier analyses. The t for the contrast
also equals the t for the difference between means (and %F), not surprising since the numerator
(i.e., the contrast L) is the difference between means and the denominator is identical to that used
for the independent groups t-test. The df for the t-test will be the df associated with the MSError,
that is, N - k = 12 - 2 = 10, in our example.
The significance of contrasts can also be tested using an F-test, and this approach is also
shown in Box 5.2. The SS for a contrast is obtained by squaring the contrast, multiplying by the
number of observations in each group, and dividing by the sum of the coefficients squared.
Including the coefficients in the formula “adjusts” for the magnitude of the coefficients, as shown
shortly. Although somewhat impenetrable, this formula does provide a meaningful SS. In the
two-group case, for example, SSL for the contrast equals SSTreatment from the standard Anova
previously conducted on this data. The F test for the contrast is the ratio of the SSL (conceptually
divided by df = 1 to produce a MSL) to the MSError. In the two-group case, this F equals both the
ANOVA F and the independent t².
When there are only two groups, the contrast analysis is completely redundant with the
analyses that we have already performed (i.e., t, F). But this is not the case when there are more
than two groups; then, it is possible to make specific contrasts that provide additional
information about the differences among the groups.
Contrasts and SPSS ONEWAY
Whereas the /POSTHOC option in ONEWAY permits all-possible pairwise comparisons
of means, selective pairwise comparisons or comparisons involving more than two groups are
done using the ONEWAY /CONTRAST command. Several contrast subcommands can be
included as options on the same ONEWAY command. Each CONTRAST command is followed
by k coefficients, one for each of the k groups in the ANOVA. To illustrate with a k = 4 design,
Single-Factor Between-S Design Planned Comparisons 5-7
ONEWAY agit BY group /CONTRAST = -1 +1.
Sum of Squares df Mean Square F Sig. Between Groups 12.000 1 12.000 3.333 .098 Within Groups 36.000 10 3.600 Total 48.000 11
Contrast Coefficients GROUP Contrast 1.00 2.00 1 -1 1
Contrast Tests Contrast Value of Std. Error t df Sig.
Contrast (2-tailed)
AGIT Assume equal 1 -2.0000 1.09545 - 1.826 10 .098 variances
Box 5.3. SPSS Contrasts for Two-group Design.
Figure 5.1. One-Way Contrasts via Menus.
/CONTRAST = -1 +1 0 0 would compare group 1 to group 2, and /CONTRAST = -1 -1 +1 +1
would compare groups 1 and 2 combined to groups 3 and 4 combined. Here we will use the
contrast command for a k = 2 design, which permits only one contrast, namely /CONTRAST = -1
+1, although the actual numerical coefficients could be different (e.g., -.5 +.5).
The contrast command is illustrated in Box 5.3 for the two-group agitation study. The
contrast is redundant when there are only two groups because the original t and F test the
significance of the difference between the two means, and the contrast coefficients in Box 5.3 (-1
+1) also correspond to a contrast between the two means. The value of the contrast (L = -2.0), its
standard error (SEL = 1.095), and the t-value (tL = -1.826) all agree with earlier calculations and
analyses, including in this case the initial t-test because k = 2. The SS for the contrast can be
calculated by (6 × -2²)/(-1² + 1²) = 12.0,
which equals SSTreatment for the k = 2
design. Note also that the p values for the
contrast and the omnibus F are identical,
.098, and that tContrast2 = -1.8262 = 3.334 .
FOmnibus. Contrasts are not redundant with
the omnibus analysis when k > 2.
To request this analysis by menus,
Single-Factor Between-S Design Planned Comparisons 5-8
select Analyze | Compare Means | One-way ANOVA. Select and move the independent factor
and dependent variable into the appropriate boxes. Then, to specify the contrast, click on the
Contrasts button shown at the bottom of the main One-way ANOVA screen in Figure 5.1. This
brings up the One-Way ANOVA: Contrasts screen also shown in Figure 5.1. To specify a
particular set of contrast coefficients, enter each individual coefficient in the Coefficients box and
click Add to add the coefficient to the list. In Figure 5.1, +1 has just been entered and can now be
added to the set of coefficients. The first value of -1 had already been added to the list.
Following completion of one set of contrasts, it would be possible to select Next to specify values
for another contrast. When all contrasts have been specified, Continue will return to the main
One-Way ANOVA box and OK will run the analysis.
Manova and Contrasts for Agitation Study
Box 5.4 illustrates the syntax for conducting specific contrasts using MANOVA, another
of SPSS’s general purpose Anova programs. Here the CONTRAST sub-command requires the
name of the variable in parentheses because MANOVA can handle factorial designs. This is
followed by an equals sign, the keyword SPECIAL, and then k sets of k coefficients within
parentheses. Although MANOVA requires k sets of coefficients (rather than k - 1), the first set
of coefficients will always be k 1s. These k 1s represent variability associated with the deviation
of the grand mean from zero and, although required, they are not generally associated with any
MANOVA agit BY group(1 2) /CONTRAST(group)=SPECIAL( 1 1 -1 1).
Tests of Significance for AGIT using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F
WITHIN CELLS 36.00 10 3.60 GROUP 12.00 1 12.00 3.33 .098
(Model) 12.00 1 12.00 3.33 .098 (Total) 48.00 11 4.36
R-Squared = .250 Adjusted R-Squared = . 175
GROUP Parameter Coeff. Std. Err. t-Value Sig. t 2 -2.0000000 1.09545 -1.82574 .097 85
Box 5.4. Two-Group Contrasts using MANOVA.
Single-Factor Between-S Design Planned Comparisons 5-9
contrast of interest. It is the remaining k - 1 sets of k coefficients that are interesting. Each set
will contain one coefficient for each of the k groups. In Box 5.4, the coefficients of +1 +1 -1 +1
represent variability due to the grand mean (+1 +1) followed by the k - 1 = 2 - 1 = 1 set of k (i.e.,
2) coefficients for the contrast(s) of interest (-1 +1). By default, MANOVA reports the test of
each contrast as a t-test, shown in Box 5.4. Note again that the ps for t and F are identical (.098),
that t2 = FOmnibus, and that SSTreatment = SSL = (6 × 2.0)2 / (-12 + 12) = 12.0. Although the statistical
tests for the contrasts are reported here as t tests, we will see later that MANOVA has alternative
ways to report the results of contrasts.
GLM and Contrasts for the Agitation Study
Box 5.5 shows the equivalent analyses using GLM. The format of the command is
similar to MANOVA, except that GLM does not require the k 1s representing the grand mean. If
included, those coefficients would represent variation of the grand mean about 0.
Single-Factor Between-S Design Planned Comparisons 5-10
Although the contrast output from GLM is more extensive than the other programs, the
essential elements remain the same. We see a contrast value for L1 of -2.00, a SE of 1.095, and a
significance of p = .098, which in the k = 2 situation is identical to the p for the F test. Although
not reported, t = -2.0 / 1.095 = 1.826, agreeing with previous reports and equalling %F.
Similarly, we could calculate SSL = (6 × 2.0)2 / (-12 + 12) = 12.0 = SSTreatment. This GLM analysis
could also be requested using menus, which we demonstrate for a later example in this chapter.
CONTRASTS FOR K > 2 DESIGNS
The contrasts in the preceding sections were only done for pedagogical reasons,
specifically to introduce the basic idea of planned contrasts, and the operations for their
GLM agit BY group /CONTRAST(group) = SPECIAL(-1 1).
Tests of Between-Subjects EffectsDependent Variable: AGIT Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 12.000(a) 1 12.000 3 .333 .098 Intercept 1452.000 1 1452.000 403 .333 .000 GROUP 12.000 1 12.000 3 .333 .098 Error 36.000 10 3.600 Total 1500.000 12 Corrected Total 48.000 11
a R Squared = .250 (Adjusted R Squared = .175)
Custom Hypothesis TestsContrast Results (K Matrix) Depend ent Variab le GROUP Special Contrast AGIT L1 Contrast Estimate -2.000 Hypothesized Value 0 Difference (Estimate - -2.000 Hypothesized) Std. Error 1.095 Sig. .098 95% Confidence Lower Bound -4.441 Interval for Difference Upper Bound .441Test ResultsDependent Variable: AGIT Source Sum of Squares df Mean Square F Sig. Contrast 12.000 1 12.000 3.333 .098 Error 36.000 10 3.600
Box 5.5. Two-Group Contrasts and GLM.
Single-Factor Between-S Design Planned Comparisons 5-11
calculation. Contrasts provide no additional information when there is only two groups, and
indeed there is only one possible contrast that could be conducted. With more than two groups,
however, contrasts can be critical for interpretation of the results and there are various ways to
define the contrasts.
In a three-group study, for example, it would be possible to do a specific contrast that
compared groups 1 and 2; the coefficients would be -1 1 0 for groups 1, 2, and 3 respectively
(Note: Σcj = 0). Contrasts can also be done on combinations of groups. For example, we could
compare the average of groups 1 and 2 to group 3 using -1 -1 and +2 as coefficients (or
equivalently, -.5 -.5 +1, since the absolute size of the coefficients is irrelevant for most purposes).
In general, there are as many independent or orthogonal contrasts among a set of means as there
are degrees of freedom in the between groups effect (i.e., k - 1 in the single factor design). Two
groups only allow 2 - 1 = 1 independent contrast, which we just saw is redundant with the
Anova. Three groups allow 3 - 1 = 2 independent contrasts, four groups allow 4 - 1 = 3
independent contrasts, and so on. The specific contrasts are determined by the nature of the
factor and expectations about its relationship to the dependent variable.
There are several guidelines to follow in performing planned contrasts. Although the
number of possible contrasts can in principle be quite large under special circumstances (an
adjustment for the number of comparisons should then be used), it is generally desirable to keep
the number of contrasts less than or equal to the degrees of freedom for the effect being analyzed.
In the single-factor ANOVA, this means that k - 1 planned contrasts is a desirable maximum
(although not an absolute one). Fewer contrasts than this are even better, and more contrasts than
this would need to be justified and should be adjusted for (i.e., post hoc type adjustments used).
Orthogonal Contrasts
A second desirable (but again not essential) characteristic of multiple contrasts is that
they be independent of one another. Contrasts that are uncorrelated are called orthogonal
contrasts. If contrasts are orthogonal (i.e., independent, uncorrelated), then researchers know that
each contrast is testing the significance of a unique component of the overall differences among
the means (i.e., a unique SS). If contrasts are correlated, then the same variation in means could
contribute to the SSs for different contrasts, and some of the variation across all the means may
Single-Factor Between-S Design Planned Comparisons 5-12
not be captured by any of the specific contrasts. The number and orthogonality of contrasts are
related issues because sets of exactly k - 1 orthogonal contrasts are possible; a set of more than k
- 1 contrasts will necessarily involve correlated contrasts.
Because even just two contrasts can be correlated, it is necessary to plan contrasts to
ensure that they are independent. Although it can be a challenge to determine in advance what
contrasts are orthogonal to existing contrasts, it is always possible to determine whether known
contrasts are orthogonal. When two contrasts are independent, the sum of the products of their
respective coefficients is zero; that is, Σcjcj' = c1 × c1' + c2 × c2' + ... + ck × ck' = 0, where c and c'
are independent contrasts. If Σcjcj' = 0, then contrasts c and c' are uncorrelated.
This test for independence is illustrated in Box 5.6 for a hypothetical study involving four
groups. Six contrasts are actually shown. Because 6 is greater than k - 1 = 3, all of these
contrasts cannot be mutually independent. Contrast 1 might be independent of two other
contrasts (giving k - 1 independent contrasts), but one or more of those three contrasts must be
correlated with some of the remaining contrasts.
Single-Factor Between-S Design Planned Comparisons 5-13
1 - Anxiety Control GroupGroups | 2 - Depression Control Group
| | 3 - Anxiety Treatment | | | 4 -Depression Treatment Group
Coefficients c 1 c2 c3 c4 Comparison
L1 -1 +1 0 0 Anx Cont vs. Dep ControlL2 0 0 -1 +1 Anx vs. Dep TreatmentL3 -1 -1 +1 +1 Controls vs. Treatments
L4 -1 0 +1 0 Anx Cont vs. Anx TreatmentL5 0 -1 0 +1 Dep Cont vs. Dep TreatmentL6 -1 +1 -1 +1 Anx Groups vs. Dep Groups
L1, L 2, and L 3 constitute one set of mutually orthogonal contrast sΣc1j c2j = 0 = -1× 0 + 1× 0 + 0×-1 + 0× 1Σc1j c3j = 0 = -1×-1 + 1×-1 + 0× 1 + 0× 1Σc2j c3j = 0 = 0×-1 + 0×-1 + -1× 1 + 1× 1
L4, L 5, and L 6 constitute a second set of mutually orthogonal con trasts
L1 and L 4 are not independent: Σc1j c4j = +1 = -1×-1 + 1× 0 + 0×+1 + 0× 0
L3 and L 6 are independent: Σc3j c6j = 0 = -1×-1 + -1×+1 + 1×-1 + 1×+1
Box 5.6. Testing Contrasts for Independence (Orthogonality).
As shown in Box 5.6, contrast 1 is independent of contrasts 2 and 3, which are
independent of one another. Therefore, contrasts 1, 2, and 3 are k - 1 mutually independent (or
orthogonal) contrasts. Mutually orthogonal means that each contrast is independent of or
uncorrelated with all other contrasts in that set and is determined by summing cross-products of
the paired coefficients (shown in Box 5.6 for contrasts 1, 2, and 3). Contrasts 4, 5, and 6 make
up a second set of k - 1 mutually orthogonal contrasts (to test this, cross-multiply all possible
pairs of coefficients).
The two sets of mutually orthogonal contrasts (i.e., one set containing 1, 2, and 3, and a
second set containing 4, 5, and 6) cannot be completely independent of one another. For
example, contrasts 1 and 4 are not orthogonal. Intuitively, contrasts 1 and 4 are not independent
because group 1 is compared to group 2 in contrast 1 and to group 3 in contrast 4 (i.e., the group
1 mean is one pole of two different contrasts). Although many contrasts across sets are not
orthogonal to one another, there are also numerous ways to select orthogonal contrasts for any
Single-Factor Between-S Design Planned Comparisons 5-14
particular design. In Box 5.6, for example, contrasts 3 and 6 are orthogonal.
The preceding appears quite abstract to most students. The trick is to appreciate that
virtually any expected pattern for a set of means can be translated into contrast coefficients. We
will get many more examples of this here and in future chapters. And patterns will be
uncorrelated as long as the cross-product of their respective coefficients sums to zero. The
rationale for this last rule is actually quite basic. Remember that contrast coefficients must sum
to 0, which means that their means are 0. If we let x and y stand for two sets of contrast
coefficients, then the definitional formula for the Sum of Cross Products (SCP) can be
simplified, as follows: SCP = G(x - x& )(y - y&) = G(x - 0)(y - 0) = Gxy. Since SCP is the
numerator for the correlation and regression coefficients, SCP = Gxy = 0 ensures that r = 0 for
the correlation between the two contrasts (i.e., they are independent or orthogonal). We
demonstrate this later using correlation and regression.
The benefit of mutually orthogonal contrasts is that they permit researchers to divide the
SSTreatment into separate, independent components (i.e., SSL1, SSL2, and so on) that will sum to
SSTreatment. Each component has an associated F or t-test that will be meaningful, assuming the
contrasts are chosen carefully. One way to conceptualize this process is that the omnibus F
divides SSTotal into SSTreatment and SSError, and mutually orthogonal contrasts further divide
SSTreatment into specific components (SSL1, SSL2, ...). If contrasts are not orthogonal, then the sum
of the SSs for the separate contrasts might be greater or less than SSTreatment because some source
of variation is included in more than one contrast or some source of substantive variation is
omitted from all of the contrasts.
It is worth noting here that only SSTreatment in the Between-S design is subdivided into
contrasts; SSError remains the same. In Within-S designs, both SSTreatment and SSError are partitioned
into contrast-specific components. This introduces an additional complexity into the Within-S
design and can lead to anomalous outcomes, as shown in later chapters.
In generating mutually orthogonal contrasts, each successive contrast allows less freedom
of choice about subsequent contrasts. Indeed, by the last contrast, only one possible set of
coefficients will be orthogonal to the preceding k - 2 sets of coefficients. With four groups, for
example, there are many possibilities for the first contrast. But once we have chosen the first
Single-Factor Between-S Design Planned Comparisons 5-15
contrast, this limits possibilities for the next contrast. And once we have the first two contrasts,
then the final orthogonal contrast will be fully determined. Say, for example, that our first
contrast is !1 !1 +1 +1. Then our second contrast could be one of a number of possibilities,
such as: !1 +1 0 0, 0 0 !1 +1, !1 +1 +1 !1, or !1 +1 !1 +1. Once we had settled on our first
and second contrasts, however, the third contrast is also set. Given the first contrast of !1 !1 +1
+1, and the second contrast of !1 +1 0 0, then the third contrast must be 0 0 !1 +1. Given !1 !1
+1 +1 and !1 +1 !1 +1, then the third contrast must be +1 !1 !1 +1. What this means from a
practical point of view is that researchers generally define their principal contrasts first to ensure
that they are present in the final set of orthogonal contrasts.
Polynomial Contrasts (Trend Analysis)
Although Anova and contrasts were designed especially for the treatment of categorical
variables (e.g., qualitatively different drugs or therapies, strategies in a learning study, parenting
styles, and so on), there are special orthogonal contrasts that are appropriate when the
independent variable or factor involves groups ordered along some numerical dimension (e.g.,
amount of reinforcer, study time, levels of intelligence, and so on). Such quantitative variables
as low, medium, or high levels of some factor can be analyzed by traditional ANOVA. As in
other designs, however, researchers usually want to draw specific conclusions about the nature of
the differences between the groups. When the groups are ordered, a meaningful question is
whether there is a systematic change in the means as the numerical variable increases (e.g., do the
means increase or decrease in a linear manner). Polynomial coefficients permit the division of
SSTreatment into k - 1 components (linear and nonlinear), each having one degree of freedom.
The coefficients for polynomial contrasts are shown in Appendix A.5 for up to 6 groups
(coefficients exist for more groups if needed). In principle there are k - 1 orthogonal polynomials
possible, but anything beyond the first few components is meaningful only in very special circumstances.
Single-Factor Between-S Design Planned Comparisons 5-16
Figure 5.2. Polynomial Coefficients for k = 3.
Figure 5.3. Polynomial Contrasts for k = 4.
For three groups, there are two possible
polynomial contrasts, linear and quadratic. The
linear coefficients are -1, 0, and +1, and the
quadratic coefficients are +1, -2, and +1. Note
that the linear and quadratic contrasts are
orthogonal; that is, GcLincQua = !1 × !1 + 0 × 2
+ 1 × !1 = 0. Figure 5.2 plots the values of the
linear and quadratic coefficients for k = 3. The
linear coefficients form a linear pattern; that is,
there is a constant increase from coefficient 1
(!1) to coefficient 2 (0) to coefficient 3 (+1). If the means are ordered in a similar way, either
increasing or decreasing, then there will be a strong correlation between the linear coefficients
and the means. The quadratic coefficients form a perfect U-shaped pattern, first decreasing and
then increasing. If the means follow this pattern, or the inverse U-shaped pattern of an increase
followed by a decrease, then there will be a strong correlation between the coefficients and the
data. The contrasts test the significance of the correlation between the data and the linear and
quadratic coefficients.
A factor with 4 levels will have k - 1
polynomial contrasts, specifically linear,
quadratic, and cubic. The linear coefficients are
!3, !1, 1, 3; the quadratic coefficients are 1,
!1, !1, 1; and cubic coefficients are !1, 3, !3,
1. These three contrasts are mutually
orthogonal and partition the SSTreatment into three
unique components; that is, SSLinear + SSQuadratic
+ SSCubic = SSTreatment. Figure 5.3 plots the
linear, quadratic, and cubic coefficients for k =
4. The linear coefficients in Figure 5.3 show a linear increase and will capture systematic
increases or decreases in the means. The quadratic coefficients show a decrease followed by an
Single-Factor Between-S Design Planned Comparisons 5-17
increase (i.e., the U-shaped pattern), and will capture systematic changes involving one change in
direction. The cubic coefficients are sensitive to several reversals in direction (up-down-up or
down-up-down).
In essence polynomial contrasts partition any pattern in the k means into k - 1
components or trends. Each pattern has a SS associated with it, and ΣSSContrast is equal to the
SSTreatment. The linear component is equivalent to a simple correlation between the ordered
treatment numbers and the dependent variable. The nonlinear components are analogous to the
power predictors (i.e., y2, y3, etc.) used for nonlinear or polynomial regression.
PLANNED CONTRASTS FOR THE BRAIN STIMULATION STUDY
Contrast analysis for a multi-group study will be demonstrated first with the 3-group
brain stimulation study used to illustrate pairwise comparisons. Recall that rate of barpressing
was measured for three groups of animals: a No Stimulation group (NS), a control stimulation
group (area A), and an experimental stimulation group (area B). The overall ANOVA was
significant, suggesting that the null hypothesis of equal population means was false. In contrast
to pairwise comparisons, however, the omnibus F need not be significant in order to do the
planned contrasts presented here.
The brain stimulation study involved two control groups (NS and A) and one treatment
group. This design suggests two specific contrasts. One contrast would test the difference
between area B and the average of NS and area A. The coefficients would be !1 !1 +2. A second
contrast would test the difference between NS and area A (i.e., between the two control groups).
The coefficients for this contrast would be !1 +1 0. These two contrasts exhaust the df associated
with the treatment effect in this study (i.e., k - 1 = 2). Moreover they are independent (i.e.,
orthogonal). Note that the sum of their cross-products equals zero; that is, (!1 × !1) + (!1 × +1)
+ (+2 × 0) = 0. Calculations and results for this contrast analysis are shown in Box 5.7. It would
also be possible to compare each treatment group to the NS control group, but these contrasts
would not be orthogonal; that is, the cross-product of !1 +1 0 (A versus NS) and !1 0 +1 (B
versus NS) sums to +1 rather than zero.
Single-Factor Between-S Design Planned Comparisons 5-18
NS A B 4 3 10 5 2 8 6 4 6
y&j = 5.0 3.0 8.0 y &G = 48/9 = 5.333
SOURCE df SS MS=SS/df F=MST/MSE F .05;2,6
Treatment k-1=2 38.0 19.00 9.5 5.14Error n-k=6 12.0 2.00 F $ F α, reject H 0
Total n-1=8 50.0
Contrast Analysis
NS A B t .05,6 = 2.447 F .05;1,6 = 5.99 = t α²
y&j 5.0 3.0 8.0 L t L SSL F L
c1j -1 -1 2 8.0 4.00 32.0 16.0 = t L1²
c2j -1 1 0 -2.0 1.73 6.0 3.0 = t L2²
ΣSSL = 38.0 M F = (16+3)/2= SS Treatment = 9.5
= F Omnibus
Do Not Reject H 0: µ NS=µA Reject H 0: µ NS+A=µB
Calculations for L 2
L2 = Σc2j y&j = -1×5.0 + 1×3.0 + 0×8.0 = -2
t L2 = L 2/ %MSE( Σc2j ²/n j ) = -2.0 / %2.0(1/3+1/3) = -2/1.152 = 1.732 . %FL2
SSL2 = n j L2²/ Σc j ² = (3×-2.0²) / (-1²+1²+0²) = 6
FL2 = SS L2/MSError = 6.0/2.0 = 3.0 . 1.732² = t 2L2
Box 5.7. Contrast Analysis of Barpressing (k = 3) Study.
The analysis in Box 5.7 demonstrates several useful features of contrasts. First, the
planned contrasts permit a very "tidy" or "elegant" description of the results. The results indicate
that the difference between the two control groups (i.e., contrast 2) is not significant, whereas the
difference between the treatment group and the average of the two control groups (i.e., contrast
1) is significant. Second, the sum of the SSs for the contrasts is equal to SSTreatment, which must
occur whenever we use k - 1 = 2 orthogonal contrasts. Orthogonal contrasts partition the
SSTreatment into two independent components that sum to SSTotal. For this same reason, the average
of the Fs for the two contrasts is equal to the FOmnibus, as shown in Box 5.7.
Third, note that the F for contrast one is larger and more significant than the omnibus F
Single-Factor Between-S Design Planned Comparisons 5-19
NS A B n j = 3
y&j 5.0 3.0 8.0 L SS =n j L2/ Σc2j
Original Coefficients
c1 -1 1 0 -2.0 6.0
c2 -1 -1 2 8.0 32.0
Original Coefficients ÷ Σ|c j | ( Σ|c j | = 1)
c' 1 -.5 .5 0 -1.0 6.0
c' 2 -.25 -.25 .5 2.0 32.0
Normalized Coefficients: Original ÷ %%%%Σc2j ( Σc2
j = 1) SS = nL 2
c'' 1 -.707 .707 0 -1.414 6.0
c'' 2 -.408 -.408 .816 3.264 32.0
Box 5.8. Alternative Values for Contrast Coefficients.
(16.0 versus 9.5 for the omnibus F). Although both these Fs are significant in this particular
case, the fact that the omnibus and contrast Fs do differ means that under certain circumstances
the omnibus F might not be significant even when the specific F for the contrast is significant.
This would occur because the significant contrast would be “watered down” when averaged
together with a non-significant contrast, resulting in a non-significant omnibus F. Finally, note
that the t-tests and F-tests for the contrasts are equivalent to one another (i.e., t² = F or F = %t), so
only one of the tests is normally done.
Equivalence of Different Values for Contrast Coefficients
Except for their relative values (i.e., the pattern), the actual numbers used for the contrast
coefficients are largely irrelevant to the analyses because Σc2j appears in the denominator of the
formulae for t and SSContrast "adjusts" the final statistics for whatever values were used to produce
the contrast. This point might be clearer with an example.
Box 5.8 reproduces the contrasts from Box 5.7 and shows two alternative sets of
coefficients that could be used to perform the identical contrasts. The second set of contrasts
uses fractions, the absolute values of which sum to 1. For these coefficients, the contrast L
represents the deviation of the cell means from the "grand" mean of the conditions being
Single-Factor Between-S Design Planned Comparisons 5-20
compared. Specifically, L = -1.0 for the first contrast represents the deviations of the means for
NS (5.0) and A (3.0) from their average value of 4.0. The magnitude of the second contrast is
2.0, which represents the deviation of y&NS+A = 4.0 and y&B = 8.0 from their overall average of 6.0.
The third set of contrasts uses normalized coefficients (orthonormal coefficients if also
orthogonal). A property of normalized coefficients is that Σc2j = 1, which means that SSContrast =
nL2/1 = nL2. We will generally use whole-number contrasts, mainly for ease of calculation, but
will also have occasion in later chapters to use normalized contrasts to demonstrate some features
of Within-S analyses. The main point at present, however, is that it is the pattern of differences
among the contrast coefficients that matters, not the actual numbers.
Polynomial Contrasts for the Brain Stimulation Study
Although perhaps not the most relevant contrasts for this study, polynomial contrasts for
the brain stimulation study are shown in Box 5.9, along with the calculations of the SSs. Such
contrasts would be appropriate if, for example, researchers expected the means to be ordered
from No Stimulation (Group 1) to Stimulation in area A (Group 2) to Stimulation in area B
(Group 3), or if they expected some kind of curvilinear pattern (e.g., a decrease from Group 1 to
Group 2, followed by an increase from Group 2 to Group 3). Or they might have expected the
opposite patterns. Previous findings or theory would be necessary to justify such expectations in
this case, because the three groups do not constitute a natural ordering.
Note in Box 5.9 that SSLinear +
SSQuadratic = 38.0 = SSTreatment; that is, SSTreatment
has been partitioned into two orthogonal
components. Also note that SSQuadratic, the U-
shaped pattern, is stronger than SSLinear; its SS
is almost twice as large. Indeed, the dominant
pattern when we examine the three means is
the decrease from 5.0 to 3.0, followed by the
increase from 3.0 to 8.0. It is this pattern that
is being identified by the quadratic contrast. The linear component accounts for some variability
because the mean for group 3 (i.e., 8.0) is higher than the mean for group 1 (i.e., 5.0). Although
Group
1 2 3
y&j 5.0 3.0 8.0 L SS L
Lin -1 0 +1 +3.0 13.50
Qua +1 -2 +1 +7.0 24.50
GSSL = 38.00
Box 5.9. Polynomial Contrasts for BrainStimulation Study.
Single-Factor Between-S Design Planned Comparisons 5-21
ONEWAY press BY group (1,3) /CONTRAST = -1 1 0 /CONTRAST = -1 -1 2.
SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES RATIO PROB.BETWEEN GROUPS 2 38.0000 19.0000 9 .5000 .0138WITHIN GROUPS 6 12.0000 2.0000TOTAL 8 50.0000
CONTRAST COEFFICIENT MATRIX Grp 1 Grp 2 Grp 3CONTRAST 1 -1.0 -1.0 2.0CONTRAST 2 -1.0 1.0 0.0
VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 8.0000 2.0000 4.000 6.0 0.007CONTRAST 2 -2.0000 1.1547 -1.732 6.0 0.134
Box 5.10. SPSS Contrasts for Three-group Design.
we could conduct F tests for these two effects manually, we will leave that for the comparable
SPSS analyses.
SPSS AND CONTRASTS FOR THE BRAIN STIMULATION STUDY
Contrasts for more than two groups are simple to do using the various SPSS Anova
commands. ONEWAY, for example, allows many contrast subcommands in a single ONEWAY.
In general, researchers would not exceed k - 1 contrasts (i.e., the maximum number of contrasts
equals the df for the treatment), and ideally these would be orthogonal contrasts. Indeed, some
SPSS procedures “insist” on orthogonal contrasts (e.g., MANOVA), although other procedures
permit non-orthogonal contrasts (e.g., ONEWAY).
ONEWAY and Planned Contrasts for the Stimulation Study
Box 5.10 shows an SPSS analysis of our two contrasts for the hypothetical brain
stimulation study, in which there were significant differences among the bar-pressing means for
No Stimulation (M1 = 5.0), stimulation in Control Area A (M2 = 3.0), and stimulation in
experimental Area B (M3 = 8.0). The first contrast compares the treatment to the two controls,
and the second compares the two controls to each other. The results of the contrasts are shown as
t-tests at the end of the output and agree with calculations reported earlier. Moreover, it would
be possible to compute SSs and Fs for the contrasts, since SSL1 = (3 × 8.0)2 / (-12 + -12 + 12) =
32.0, FL1 = (32.0/1) / 2.0 = 16.0 = tL12, SSL2 = (3 × -2.02) / (-12 + 12 + 02) = 6.0, and FL2 = (6.0/1) /
2.0 = 3.0 = tL22. Given these calculations, we see that SSL1 + SSL2 = 32.0 + 6.0 = 38.0 = SSTreatment;
Single-Factor Between-S Design Planned Comparisons 5-22
that is, SSTreatment has been partitioned into two independent sources of variability, one associated
with contrast one and the other associated with contrast two.
Of particular note in Box 5.10 is that the p of .007 for contrast 1 (treatment versus
controls) is highly significant, and much more significant than the p of .0138 for the omnibus F,
demonstrating the potential benefits of a focussed contrast that matches the pattern of differences
among the means.
That the omnibus F might be less significant than the specific contrast Fs is clearly shown
by the fact that the omnibus F is the average of the k - 1 orthogonal contrast Fs. In the present
example, square the ts used to test the contrasts and average them: (1.732² + 4.000²) / 2 = (3.0 +
16.0) / 2 = 19.0/2 = 9.5 = FANOVA. In practice, an omnibus F will be weaker than one or more of
the contrast Fs whenever weak contrasts (e.g., FL2 = 3.0) are averaged together with strong
contrasts (e.g., FL1 = 16.0). Planning strong contrasts ahead of time ensures powerful tests of
predicted differences. Not planning specific contrasts risks diluting strong effects by lumping
them together with weak effects.
MANOVA and Contrasts for the Brain Stimulation Study
Box 5.11 shows the same analysis using the SPSS MANOVA command. The
MANOVA press BY group(1 3) /CONTRAST = SPECIAL(1 1 1 -1 -1 2 -1 +1 0) /DESIGN group(1) group(2).
Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 12.00 6 2.00 GROUP(1) 32.00 1 32.00 16.00 .007 GROUP(2) 6.00 1 6.00 3.00 .134
(Model) 38.00 2 19.00 9.50 .014 (Total) 50.00 8 6.25
R-Squared = .760 Adjusted R-Squared = . 680
GROUP(1) Parameter Coeff. Std. Err. t-Value Sig. t 2 8.00000000 2.00000 4.00000 .007 12
GROUP(2) Parameter Coeff. Std. Err. t-Value Sig. t 3 -2.0000000 1.15470 -1.73205 .133 97
Box 5.11. Planned Contrasts Using MANOVA.
Single-Factor Between-S Design Planned Comparisons 5-23
CONTRAST = SPECIAL includes k (i.e., 3) sets of k (i.e., 3) coefficients each. The first set of k
coefficients is simply the three 1s, which represents the grand mean. The next set of 3
coefficients (-1 -1 +2) is our first contrast between group 3 and the average of groups 1 and 2,
and the final set of 3 coefficients (-1 +1 0) is our second contrast between groups 1 and 2. The
spacing between the three sets of three coefficients is for clarity and is not required by SPSS.
If we had only included the CONTRAST line and nothing more, then SPSS would have
printed out the t-tests shown at the bottom of Box 5.11. And these t-tests would have been
labelled simply as Parameter 2 and Parameter 3. However, MANOVA allows researchers to use
the /DESIGN command to specify the particular effects in which they are interested. By default,
SPSS would analyze the overall main effect for the design (and interactions in a factorial study).
Using the default is identical to including a /DESIGN = group for our present single factor study.
But we now want the contrast effects, rather than (or in addition to) the overall main effect. The
DESIGN = group(1) group(2) statement in Box 5.11 requests that MANOVA print Anova results
for the separate contrasts (i.e., SSs, MSs, F). In this notation, the 1 and 2 after group refer to the
first and second contrasts. This DESIGN statement requests the specific Anova results (i.e., SSs,
dfs, and Fs) that appear in Box 5.11 for Group(1) and Group(2). The t-test results are also
labelled as GROUP(1) and GROUP(2).
Although there is much redundancy in the printout, the results in Box 5.11 are ideal for
demonstrating the equivalence of the t and F analyses of contrasts, and as well for practicing
computations associated with these two approaches. Note in particular the following
correspondences: the p values for the corresponding ts and Fs are identical except for rounding
(e.g., .00712 for tL1 and .007 for FL1); the Fs are equal to the corresponding t2 (e.g., tL12 = 4.002 =
16.00 = FL1); and calculating SSs for the contrasts based on the coefficients in Box 5.11 produces
the SSs in the Anova portion of the output; for example, SSL1 = (3 × 8.02) / 6 = 32.0 = SSGroup(1).
Note also that the results for GROUP(1) (i.e., the first contrast) and GROUP(2) (i.e., the second
contrast) agree with earlier calculations and results.
Our omnibus F from earlier appears as the F for the entire model. We can also see that
SSL1 + SSL2 = 8.0 + 32.0 = 38.0 = SSTreatment, and that (FL1 + FL2) / 2 = (3.0 + 16.0) / 2 = 19.0 / 2 =
9.5 = FOmnibus.
Single-Factor Between-S Design Planned Comparisons 5-24
Figure 5.4. GLM Contrast Screen.
GLM and Contrasts for the Brain Stimulation Study
Figure 5.4 shows the screen for using GLM to perform contrasts. The Dependent
Variable and Fixed Factor(s) were placed in the appropriate boxes, and the Contrasts box was
opened. This box does
not allow for SPECIAL
contrasts (although GLM
syntax does), but instead
presents a range of default
contrasts. Available
contrasts are: Deviation,
Simple, Difference,
Helmert, Polynomial, and
Repeated. We use
Polynomial contrasts
shortly, and information about the other types of contrast is available from SPSS Help.
The DIFFERENCE contrast involves the comparison of each group (except the first) with
the average of all preceding groups. That is, group 2 will be compared with group 1, and group 3
with the average of groups 1 and 2. If there were more levels to the factor, group 4 would be
compared to the average of groups 1 to 3, group 5 to the average of groups 1 to 4, and so on.
This type of contrast corresponds to our two contrasts (i.e., -1 +1 0, -1 -1 +2), although in a
different order than we have been using. The order of contrasts does not affect the tests.
To perform the DIFFERENCE contrast, we select Difference from the list of contrasts,
Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 38.000(a) 2 19.000 9.500 .014 Intercept 256.000 1 256.000 128.000 .000 GROUP 38.000 2 19.000 9.500 .014 Error 12.000 6 2.000 Total 306.000 9 Corrected Total 50.000 8
a R Squared = .760 (Adjusted R Squared = .680)
Box 5.12. Omnibus Anova for Brain Stimulation Study.
Single-Factor Between-S Design Planned Comparisons 5-25
press Change to have the selected contrast entered into the Factors box, and press Continue. We
can now return to the Univariate window and run the analysis. The omnibus Anova output for
this analysis is shown in Box 5.12. As reported previously, the omnibus F of 9.50 is significant,
although it need not be to warrant the planned comparisons discussed in Chapters 6 and 7.
Box 5.13 shows the GLM results for the contrasts, with some statistics and text deleted.
Although the actual values for ts, Fs, and SSs are not presented, the contrast, its SE, and its
significance are reported. These values agree with earlier calculations and printouts, and could be
used to compute some
of the missing
quantities: for example,
tL2 = 4.00 / 1.00 = 4.00,
SSL2 = (3 × 4.02) / 6 =
32.0, FL2 = 4.002 =
(32.00/1)/2.00 = 16.0.
The two control groups
do not differ, p = .134,
whereas they differ significantly from the treatment group, p = .007.
SPSS and Polynomial Contrasts (Trend Analysis) for the Brain Stimulation Study
The use of orthogonal polynomials to analyze trends in ordered means is simple in SPSS.
One approach uses the polynomial coefficients in Appendix A.5 as the contrast coefficients for
whatever procedure is being used: manual calculations, ONEWAY, MANOVA, GLM,
REGRESSION, or some other procedure that permits contrasts. The tests of significance for the
contrasts represent the significance of the specific components.
Depend ent Variab le GROUP Difference Contrast PRESS Level 2 vs. Contrast Estimate -2.000 Level 1 Hypothesized Value 0 Std. Error 1.155 Sig. .134 Level 3 vs. Contrast Estimate 4.000 Previous Hypothesized Value 0 Std. Error 1.000 Sig. .007
Box 5.13. GLM Planned Comparisons Output for Brain StimulationStudy.
Single-Factor Between-S Design Planned Comparisons 5-26
ONEWAY press BY group (1,3) /CONTRAST = -1 0 1 /C ONTRAST = -1 2 -1.
SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES R ATIO PROB.BETWEEN GROUPS 2 38.0000 19.0000 9. 5000 .0138WITHIN GROUPS 6 12.0000 2.0000TOTAL 8 50.0000
CONTRAST COEFFICIENT MATRIX Grp 1 2 3CONTRAST 1 -1.0 0.0 1.0CONTRAST 2 -1.0 2.0 -1.0
VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 3.000 1.1547 2.598 6.0 0.041CONTRAST 2 -7.000 2.0000 -3.500 6.0 0.013
Box 5.14. Polynomial Contrasts with SPSS ONEWAY.
Box 5.14 shows the ONEWAY commands and results for a polynomial analysis of the
brain stimulation study. Contrast one (-1 0 1) and contrast two (-1 2 -1) specify the linear and
quadratic components, respectively, and the significance levels are reported as ts below the
contrast coefficient matrix. The results in this and following analyses correspond to the
computations performed earlier. Note in particular that the contrast coefficients could be used to
calculate SSLinear = 3 × 3.02 / 2 = 13.50 and SSQuadratic = 3 × -7.02 / 6 = 24.50, that SSLinear +
SSQuadratic = SSTreatment, that tLinear2 = 2.5982 = 6.75 = 13.50 / 2.0 = FLinear and tQuadratic
2 = -3.5002 =
12.25 = 24.50 / 2.0 = FQuadratic, and that the average of FLinear and FQuadratic equals FTreatment. These
relationships parallel those discussed in the previous analyses for this study, although the
partitioning of the treatment effect results in somewhat different results for the specific contrasts.
For example, both the linear and quadratic effects are significant, with ps = .041 and .013,
respectively.
SPSS ONEWAY and other Anova commands also provide a special subcommand that
avoids the necessity of entering the contrast coefficients (this is especially helpful with factors
that have many levels). In ONEWAY, the subcommand is /POLYNOMIAL = n, where the value
of n indicates the highest order of polynomial contrasts to extract (e.g., n = 2 would specify linear
and quadratic contrasts). Box 5.15 illustrates the POLYNOMIAL statement, which results in
SPSS breaking the SSBetween with df = 2 into two single df polynomial contrasts, a linear and a
quadratic component. These single df components appear in the ANOVA summary table. The
Single-Factor Between-S Design Planned Comparisons 5-27
ONEWAY press BY group (1,3) /POLYNOMIAL = 2.
SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES R ATIO PROB.
BETWEEN GROUPS 2 38.0000 19.0000 9. 5000 .0138
LINEAR TERM 1 13.5000 13.5000 6. 7500 .0408 DEV'N. FROM LIN 1 24.5000 24.5000 12. 2500 .0128 QUAD. TERM 1 24.5000 24.5000 12. 2500 .0128
WITHIN GROUPS 6 12.0000 2.0000
TOTAL 8 50.0000
Box 5.15. Polynomial contrasts with SPSS ONEWAY.
LINEAR TERM represents the linear contrast, now reported as an F test. The SSLinear and FLinear
reported here agree with calculations just performed for Box 5.14, as well as with our earlier
manual calculations.
The DEV'N FROM LIN and QUAD. TERM statistics are identical because the quadratic is
the only nonlinear term when k = 3; that is, with df = 3 - 1 = 2, there are only linear and quadratic
components. For k > 2, DEV'N FROM LIN would include all nonlinear components (quadratic,
cubic, and so
on). The
quadratic
statistics agree
with earlier
calculations.
As well,
SSLinear +
SSQuadratic =
13.5 + 24.5 =
38.0 = SSTreatment, demonstrating that the orthogonal polynomial contrasts exhaust the variability
among the three group means. As well, F&Contrasts = (6.75 + 12.25) / 2 = 9.50 = FTreatment. The
polynomial analysis results in a more balanced distribution of the SSs than the preceding
contrasts. Here both contrasts are significant, as was the treatment effect, with the quadratic
effect being only slightly more significant than the omnibus effect.
That the CONTRAST and POLYNOMIAL approaches are equivalent is demonstrated by
a comparison of ps (e.g., .041 for the linear component in Box 5.14 and .0408 in Box 5.15), the
test statistics (e.g., FLinear = 6.75 = 2.5982 = t2Linear), and the SSs (SSLinear = 13.5 = 3×32/2 =
n×L2Linear/2).
Polynomial Contrasts in GLM and MANOVA
Both GLM and MANOVA allow the keyword POLYNOMIAL to be used for the
CONTRAST subcommand (or selected via menus in the case of GLM). Box 5.16 shows the
GLM syntax commands and output for this analysis. The /CONTRAST(group) = POLYNOMIAL,
Single-Factor Between-S Design Planned Comparisons 5-28
instructs SPSS to perform the polynomial contrasts and report the significance level. The/PRINT
= TEST(LMATRIX) command requests that a matrix of values corresponding to the contrasts be
printed out. We will look at this output shortly.
First, note that the last few lines of output present the results for the requested contrasts.
Some superfluous lines have been edited out, leaving the contrast value, the SE, and the
significance for the Linear and Quadratic contrasts. The p values and conclusions are identical to
those presented
earlier. Recall that
SPSS does not
actually print out
the t value for
these tests in GLM,
although these or
the corresponding
SSs and Fs could
be computed: tLinear
= 2.121/.816 =
2.599, FLinear =
2.5992 = 6.756.
We will compute
SSLinear shortly.
But first,
consider the lines
produced by the
/PRINT = TEST(LMATRIX) command. The relevant values for our contrast are printed in the
section headed “Custom Hypothesis Tests.” The coefficients for the Linear contrast are -.707,
.000, and +.707, which are the normalized coefficients for -1, 0, +1. Recall that normalized
coefficients are calculated by dividing the initial coefficients by %(Gcj2): -1/%2 = -.707, 0/%2 =
.000, +1/%2 = +.707. Similarly, the normalized quadratic coefficients can be calculated as +1/%6
GLM press BY group /CONTRAST(group) = POLYNOMIAL /PRINT = TEST(LMATRIX).
Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 38.000(a) 2 19.000 9.500 .014 Intercept 256.000 1 256.000 128.000 .000 GROUP 38.000 2 19.000 9.500 .014 Error 12.000 6 2.000 Total 306.000 9 Corrected Total 50.000 8
a R Squared = .760 (Adjusted R Squared = .680)
Custom Hypothesis TestsContrast Coefficients (L' Matrix) GROUP Polynomial Contrast(a) Parameter GROUP Linear Quadratic Polynomial Contrast
Intercept .000 .000
[GROUP=1.00] -.707 .408 [GROUP=2.00] .000 -.816 [GROUP=3.00] .707 .408
Contrast Results (K Matrix) Depend ent Variab le GROUP Polynomial Contrast(a) PRESS Linear Contrast Estimate 2.121 Std. Error .816 Sig. .041
Quadratic Contrast Estimate 2.858 Std. Error .816 Sig. .013
Box 5.16. GLM Polynomial Output with PRINT=TEST(LMATRIX)Command.
Single-Factor Between-S Design Planned Comparisons 5-29
= +.408, -2/%6 = -.816, and +1/%6 = +4.08. The sum of normalized coefficients squared is equal
to 1. For our normalized linear coefficients squared, -.7072 + .0002 + .7072 = .500 + .000 + .500
= 1. For our normalized quadratic coefficients squared, +.4082 + -.8162 + .4082 = .166 + .667 +
.166 .1. These normalized coefficients will be especially important later when we discuss
contrasts for Within-S factors. For now, it is important to know that sometimes GLM uses
normalized coefficients, rather than the integer contrasts that we may be working with.
One reason that it is important to know when normalized coefficients are used is to
interpret correctly the contrast coefficients and to calculate various statistics, such as SS for the
contrasts. Note that LLinear = 2.121 in Box 5.16, whereas it equalled 3.00 in Box 5.14, and that
LQuadratic = 2.858 in Box 5.16, versus -7.00 in Box 5.14. The normalized linear coefficients were
calculated by dividing our original linear coefficients by %2; hence, our normalized linear
coefficient is that much smaller than the non-normalized contrast; that is, 2.121 × %2 = 3.00.
Similarly, 2.858 × %6 = 7.001, roughly equivalent to the non-normalized quadratic coefficient.
The normalized coefficients simplify calculation of SSs for the contrasts because the sum
of the squared coefficients is 1.00 and this value appears in the denominator of our formula to
calculate SS for a contrast; specifically, SSContrast = nj × L2 / Gcj2. For normalized coefficients,
SSLinear = 3 × 2.1212 / 1 = 13.50, and SSQuadratic = 3 × 2.8582 / 1 = 24.50. These values agree with
earlier calculations and printouts. The values would have been incorrect if we had used Gcj2 for
non-normalized coefficients in the denominator.
PLANNED CONTRASTS FOR THE INTERVIEW STUDY
Analyses in previous chapters have examined study of attractiveness ratings as a function
of instructions that subjects received about the purpose of the interviews they were rating: no
special instructions (group 1), job interview (group 2), psychiatric interview (group 3), and parole
interview (group 4). The omnibus ANOVA, which is reproduced in Box 5.17, was highly
significant. The group means are also shown, and these can be used to calculate SSs for various
contrasts that might be of interest.
Planned contrasts begin with expectations about the pattern of differences among the
groups, based on either empirical or theoretical grounds. Very speculatively, let us consider
planned comparisons associated with several different ways in which researchers might partition
Single-Factor Between-S Design Planned Comparisons 5-30
Source df SS MS F pTreatment 3 325.80 108.6000 7.8475 .0004Error 36 498.20 13.8389
Total 39 824.00
Group n j y&j Sj
1 10 27.30 3.30152 10 26.00 3.62093 10 22.70 4.21774 10 20.00 3.6818
Box 5.17. Omnibus Anova for Interview Study.
SSTreatment = 325.80 into df = 3 orthogonal contrasts. For the first, we will examine the possibility
that the inmate (group 4) and psychiatric patient (group 3) conditions represent two stigmatized
groups relative to the no instruction (group 1) and job interview (group 2) conditions. This
prediction would be based on the assumption that society would tend to react negatively to
people being a psychiatric patient or an inmate, and that this bias would colour their ratings of
attractiveness of the people being interviewed.
Control versus Stigmatized Groups
The possibility that there would be differences between the two control groups (groups 1
and 2) and the two potentially stigmatized groups (3 and 4) would correspond to the following
contrast: !1, !1, +1, +1. The comparison would be between the average for the first two groups
and the average for the third and fourth groups. Give this contrast, we need two more contrasts
that are orthogonal to this. One hypothesis might be that there would be a difference between the
two stigmatized groups (i.e., groups 3 and 4). People may be more affected in their judgment by
psychiatric patients or inmates. The contrast corresponding to this question is: 0, 0, !1, +1. This
contrast is orthogonal to the first: (!1 × 0) + (!1 × 0) + (1 × !1) + (1 × 1) = 0. The third contrast
orthogonal to the first two would test whether the two control groups (1 and 2) had differential
effects on the ratings. The coefficients would be: !1, +1, 0, 0. One clue for this final contrast is
to observe that groups 1 and 2 receive exactly the same coefficients in contrast one (!1 !1) and
in contrast two (0 0); therefore any difference between these two groups has not yet been
captured by the first two contrasts and must be represented in contrast three.
Single-Factor Between-S Design Planned Comparisons 5-31
Group1 2 3 4
y&j 27.30 26.00 22.70 20.00 L SS F %F = t
L1 -1 -1 +1 +1 -10.60 280.90 20.298 4.505
L2 -1 +1 0 0 - 1.30 8.45 .611 .782
L3 0 0 -1 +1 - 2.70 36.45 2.634 1.623
GSSL = 325.80 F & = 7.848 = SS Treatment = F Omnibus
Sample Calculations for Contrast 1
L1 = -1×27.30 -1×26.00 +1×22.70 +1×20.00 = -10.6
SSL1 = (10 × 10.6 2) /(-1 2 + -1 2 + 1 2 + 1 2) = 280.9
FL1 = 280.9 / 13.8389 = 20.298 FCritical . 4.17 ( df = 30, not 36)
SEL1 = %13.8389(-1 2/10 + -1 2/10 + 1 2/10 + 1 2/10) = 2.353
t L1 = -10.6 / 2.3528 = 4.505 t Critical . 2.042 ( df = 30, not 36)
Box 5.18. Contrasts for Interview Study.
Box 5.18 presents the calculations for these contrasts. The first contrast comparing groups
1 and 2 (the "normal" instruction conditions) with groups 3 and 4 (the "stigmatized" instruction
conditions) accounts for most of the explained variation (280.90 of 325.80 units), and would be
highly significant, F = 20.298 or t = 4.505, versus approximate critical values of 4.17 and 2.042,
respectively. The other two tests show that groups 1 and 2 do not differ significantly, and that
groups 3 and 4 do not differ significantly. Approximate critical values with df = 30 were used for
these tests because the t and F tables did not include df = 36, which is dfError. Because critical
values get smaller as df increases, an effect that is significant with fewer degrees of freedom will
also be significant with more degrees of freedom. Note also that just as FObserved = tObserved2, FCritical
= tCritical2; that is, 2.0422 = 4.17. Therefore the tests are redundant.
Because the contrasts in Box 5.18 are orthogonal, SSL1 + SSL2 + SSL3 = SSTreatment, and the
average of the corresponding F tests equals FOmnibus, that is, (20.298 + .611 + 2.634) / 3 = 7.848.
The SSs for the contrasts could also be used to calculate η2 for each contrast: for example, η2L1 =
280.9/824 = .341, indicating that contrast 1 accounts for 34.1% of the total variability in the
Single-Factor Between-S Design Planned Comparisons 5-32
ONEWAY rate BY inst(1 4) /CONT = -1 -1 1 1 /CONT = 0 0 -1 +1 /CONT = -1 +1 0 0.
SUM OF MEAN F F SOURCE D.F. SQUARES SQUARES RATIO PROB.BETWEEN GROUPS 3 325.8000 108.6000 7.8475 .0004WITHIN GROUPS 36 498.2000 13.8389TOTAL 39 824.0000
CONTRAST COEFFICIENT MATRIX Grp 1 2 3 4CONTRAST 1 -1.0 -1.0 1.0 1.0CONTRAST 2 -1.0 1.0 0.0 0.0CONTRAST 3 0.0 0.0 -1.0 1.0
VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 -10.6000 2.3528 -4.505 36.0 0.000CONTRAST 2 -2.7000 1.6637 -1.623 36.0 0.113CONTRAST 3 -1.3000 1.6637 -0.781 36.0 0.440
Box 5.19. Contrasts for Interview Study.
scores. Similarly, the sum of the separate η2 would equal ηTreatment2.
Control versus Stigmatized Groups: SPSS Analyses
The coefficients for this initial contrast were -1 -1 +1 +1, which compares groups 1 and 2
with groups 3 and 4. We then determined two additional contrasts orthogonal to the first contrast.
We chose -1 +1 0 0, which compares groups 1 and 2, and 0 0 -1 +1, which compares groups 3
and 4 (i.e., the psychiatric and parole groups).
ONEWAY Contrasts. Box 5.19 presents a ONEWAY analysis to test these contrasts.
The first contrast comparing groups 1 and 2 (the "normal" instruction conditions) with groups 3
and 4 (the "stigmatized" instruction conditions) is highly significant, t = 4.505, p = .000. The
other contrasts show that groups 1 and 2 (contrast 3, p = .440) and groups 3 and 4 (contrast 2, p =
.113) do not differ significantly. Although the difference between no instruction and job
interview groups (1 versus 2) is clearly not significant, the difference between psychiatric and
parole groups (3 versus 4) approaches significance by a one-tailed test (.113/2 = .057).
The average of the t2 (i.e., Fs) for the contrasts equals the F from the omnibus Anova.
That is, (4.5052 + 1.6232 + 0.7812) / 3 = 7.846 .7.8475 in Box 5.19. If we calculated SSs for the
contrasts, their sum would equal SSTreatment = 325.80, as shown in the following MANOVA.
Single-Factor Between-S Design Planned Comparisons 5-33
MANOVA contrast analysis for Interview study. Box 5.20 shows the MANOVA
commands and results for the interview study. The coefficients are specified as SPECIAL
contrasts with the 3 sets of contrast coefficients preceded by k = 4 1s. There are also two
DESIGN commands, which means that MANOVA will perform two Anovas. The ability to
perform several analyses in a single procedure can be useful in analyzing data.
The first design requests the default Anova followed by t-tests for the three contrasts of
interest, which are labelled simply as Parameters 2, 3, and 4. The ts and ps correspond to
previous results.
The second MANOVA is more interesting. The /DESIGN inst(1) inst(2) inst(3) asks for
separate Anova results for each of our three contrasts. These appear as lines labelled INST(1),
MANOVA rate BY inst(1 4) /CONTRAST(inst) = SPECIAL(1 1 1 1 -1 -1 +1 +1 -1 +1 0 0 0 0 -1 +1) /DESIGN /DESIGN inst(1) inst(2) inst(3).
* * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000
(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13
R-Squared = .395 Adjusted R-Squared = . 345
Parameter Coeff. Std. Err. t-Value Sig. t 2 -10.600000 2.35278 -4.50532 .000 07 3 -1.300000 1.66366 -.78141 .439 67 4 -2.700000 1.66366 -1.62292 .113 33
* * * * A n a l y s i s o f V a r i a n c e -- design 2 * * * * *
Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 498.20 36 13.84 INST(1) 280.90 1 280.90 20.30 .000 INST(2) 8.45 1 8.45 .61 .440 INST(3) 36.45 1 36.45 2.63 .113
Parameter Coeff. Std. Err. t-Value Sig. tINST(1) 2 -10.600000 2.35278 -4.50532 .000 07INST(2) 3 -1.300000 1.66366 -.78141 .439 67INST(3) 4 -2.700000 1.66366 -1.62292 .113 33
Box 5.20. MANOVA Contrasts for Interview Study.
Single-Factor Between-S Design Planned Comparisons 5-34
INST(2), and INST(3) in the second Anova summary table. Parameters 2, 3, and 4 are also now
labelled in the t-test section. The reported Fs = ts2; for example, tINST(1)2 = 4.505322 = 20.298 =
FINST(1). The average of the three Fs = FOmnibus; that is, (20.30 + .61 + 2.63) / 3 = 7.85.
The second Anova also reports the SSs for the three contrasts, which is useful for
showing that SSTreatment has been partitioned into the SSs for the contrasts. Note that SSL1 + SSL2
+ SSL3 = 280.90 + 8.45 + 36.45 = 325.80 = SSTreatment. The SSs for the contrasts can also be used
to calculate contrast-specific η2: ηL12 = 280.90 / 824.0 = .341, ηL2
2 = 8.45 / 824.0 = .010, and ηL32
= 36.45 / 824.0 = .044. The sum of these η2 for the contrasts will equal the omnibus η2, which is
reported as R-Squared in Box 5.20. That is, .341 + .010 + .044 = .395. Given just the t-test
results from ONEWAY or MANOVA, such statistics would have first required calculation of SS
from the contrast values; for example, SSLinear = 10 × -10.602 / 4 = 280.90.
GLM analyses. It would be possible to perform GLM analyses corresponding to those
reported above, but only using syntax because our contrasts do not correspond to any of the
standard GLM contrasts and the menu version of GLM does not support special contrasts. We
will instead show the GLM menu commands and results shortly, but for the polynomial contrasts
that are standard in GLM.
Polynomial Contrasts for the Interview Study
Although somewhat strained, one might also make a case that the attractiveness ratings
would decrease from no instructions, to job interview, to psychiatric interview, to parole
interview. This prediction would be captured by the linear component of polynomial contrasts,
assuming that the decrease was equivalent at each step. The remaining df would be allocated to
the quadratic and cubic components respectively. Box 5.21 shows calculations for the polynomial
contrasts.
Single-Factor Between-S Design Planned Comparisons 5-35
The coefficients in Box 5.21 come from Appendix A.5 for k = 4, the number of levels of
the factor. Note the pattern defined by each of the sets of coefficients. The linear coefficients
represent a constant increase or decrease from Group 1 to Group 4. The quadratic coefficients
represent a curvilinear pattern, either decrease from 1 to 2 followed by increase from 3 to 4, or
the reverse pattern. The cubic coefficients define a more irregular pattern, much like a tilted N or
its inverse. This contrast will account for substantial variability in the means if they follow one of
two patterns. A positive correlation with these coefficients will occur when the means increase
from 1 to 2, decrease from 2 to 3, and increase again from 3 to 4. A negative correlation will
occur if the means decrease from 1 to 2, increase from 2 to 3, and decrease again from 3 to 4.
Either pattern could result in significance for this contrast.
As shown in Box 5.21, virtually all of the systematic variability in the means is due to the
linear pattern; means decrease consistently from group 1 (no instructions) to group 2 (job
interview) to group 3 (psychiatric interview) to group 4 (parole interview). This effect is clearly
significant and accounts for a substantial 38.5% of the total variation in the scores; that is, ηLinear2
Group1 2 3 4
y&j 27.30 26.00 22.70 20.00
L SS F %F = t
Lin -3 -1 +1 +3 -25.20 317.52 22.940 4.79
Qua +1 -1 -1 +1 - 1.40 4.90 .354 .595
Cubic -1 +3 -3 +1 2.60 3.38 .244 .494
GSSL = 325.80 F &=7.846 = SS Treatment
Sample Calculations for Linear Contrast
LLin = -3×27.30 -1×26.00 +1×22.70 +3×20.00 = -25.20
SSLin = (10 × 25.2 2) / (-3 2 + -1 2 + 1 2 + 3 2) = 317.52
FLinear = 317.52 / 13.8389 = 22.940 FCritical . 4.17 ( df = 30, not 36)
SELin = %13.8389(-3 2/10 + -1 2/10 + 1 2/10 + 3 2/10) = 5.261
t Linear = -25.20 / 5.261 = 4.79 t Critical . 2.042 ( df = 30, not 36)
Box 5.21. Polynomial Contrasts for the Interview Study.
Single-Factor Between-S Design Planned Comparisons 5-36
= 317.52 / 824.0 = .385.
Polynomial Contrasts for the Interview Study: SPSS Analyses
Researchers might also make a case that the attractiveness ratings would decrease from
no instructions, to job interview, to psychiatric interview, to parole interview. This prediction
would be captured by the linear component of polynomial contrasts, assuming that the decrease
was equivalent at each step. The remaining two contrasts would correspond to the quadratic and
cubic components. There are several ways to request these contrasts in SPSS using either the
coefficients from Appendix A.5, or the keyword POLYNOMIAL. The outputs will vary
somewhat, but these variations will all produce some tests of significance for the Linear,
Quadratic, and Cubic components.
Polynomial contrasts and the MANOVA SINGLE DF command. A representative
analysis using MANOVA is illustrated in Box 5.22. This analysis shows a second method in
MANOVA to obtain SSs and F tests for the contrasts, namely, /PRINT = SIGNIF(SINGLEDF).
This command requests that MANOVA partition every effect with df > 1 into separate effects
with df = 1. What the partitioning is will depend on what contrasts, if any, have been requested.
Since we requested POLYNOMIAL contrasts in Box 5.22, the single df effects will correspond
to the Linear, Quadratic, and Cubic effects. If neither the SINGLEDF command nor the /DESIGN
MANOVA rate BY inst(1 4) /PRINT = SIGN(SINGLEDF) /C ONTRAST = POLYNOMIAL.
Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000 1ST Parameter 317.52 1 317.52 22.94 .000 2ND Parameter 4.90 1 4.90 .35 .556 3RD Parameter 3.38 1 3.38 .24 .624
(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13
R-Squared = .395 Adjusted R-Squared = .3 45
INST Parameter Coeff. Std. Err. t-Value Sig. t 2 -5.6348913 1.17639 -4.78999 .000 03 3 -.70000000 1.17639 -.59504 .555 54 4 .581377674 1.17639 .49421 .624 16
Box 5.22. Polynomial Contrasts with MANOVA Using SINGLEDF Option.
Single-Factor Between-S Design Planned Comparisons 5-37
Figure 5.5. GLM Menu Commands for Polynomial Contrasts.
inst(1) inst(2) inst(3) command were included, only the t-tests reported for Parameters 2 (Linear),
3 (Quadratic), and 4 (Cubic) would be reported. The t and F analyses are equivalent. The
significance values are equivalent, except for rounding. The three Fs are all equal to the
corresponding t2, for example, tLinear2 = 4.789992 = 22.94 = FLinear.
It is striking how much larger FLinear = 22.94 is than FOmnibus = 7.85; indeed, FLinear is almost
3x larger than the value of FOmnibus. The latter is diluted by averaging the robust Linear effect with
the very modest Quadratic and Cubic components. This watering down is clearly shown by
averaging the three Fs (equivalently, the three t2) to obtain FOmnibus, (22.94 + .35 + .24) / 3 = 7.843
.7.85. It is the single df Linear effect that is entirely responsible for the overall significant effect.
Other observations of note include the fact that SSLinear + SSQuadratic + SSCubic = SSTreatment;
that is, 317.50 + 4.90 + 3.38 = 325.78 .325.80. These SSs could be used to calculate η2 for each
contrast: ηLinear2 = 317.50 / 824.00 = .385, ηQuadratic
2 = .4.90 / 824.00 = .006, ηCubic2 = 3.38 / 824.00
= .004. The η2 for the polynomial contrasts would in turn sum to the η2 for the overall treatment
effect: .385 + .006 + .004 = .395 = ηTreatment2. These various statistics again show that the linear
effect accounts for virtually all of the differences among the four groups.
Finally, we should note that the coefficients for the various effects are those obtained with
normalized coefficients, rather than the integer values that we used in chapter 6. This means,
among other things, that SSContrast = nj × L2 / 1. For example, SSLinear = 10 × -5.63489132 =
317.52, which is the value reported for the second (i.e., Linear) parameter in the Anova summary
table.
GLM and
polynomial contrasts
for the interview study.
Figure 5.5 shows several
GLM screens following
the initial menu
selection sequence
Analyze | General
Linear Model |
Single-Factor Between-S Design Planned Comparisons 5-38
Univariate, identification of rate as the Dependent Variable and inst as the Fixed Factor(s), and
clicking on Contrasts to activate the Univariate: Contrasts screen shown on top in Figure 5.5.
The Polynomial option has been highlighted and clicking on Change will modify inst(None) to
inst(Polynomial), indicating that polynomial contrasts are requested for this factor. Continue will
return to the Univariate main screen and OK will run the analysis.
Box 5.23 shows the resulting analysis, edited somewhat to eliminate material that is
redundant (e.g.,
difference
between Contrast
Estimate and
Hypothesized
Value), or
material we have
not been
discussing (e.g.,
confidence
intervals). The
ps for the Linear,
Quadratic, and
Cubic effects are
equivalent to
those reported
previously for
the MANOVA analysis. Dividing the Contrast Estimate by its SE would produce the ts reported
in Box 5.22 by MANOVA: for example, tLinear = 5.635 / 1.176 = 4.792. Squaring these values
would give the corresponding Fs, the average of which would be FOmnibus = 7.847.
SSs for the contrasts could be calculated using our formula, but we would need to
remember again that normalized coefficients were used by SPSS; that is, SSLinear = (10 × -5.6352)
/ 1, rather than (10 × -5.6352) / (-32 + -12 + 12 + 12).
Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 325.800(a) 3 108.600 7.8 47 .000 Intercept 23040.000 1 23040.000 166 4.874 .000 INST 325.800 3 108.600 7.8 47 .000
Error 498.200 36 13.839
Total 23864.000 40 Corrected Total 824.000 39
a R Squared = .395 (Adjusted R Squared = .345)
Custom Hypothesis Tests Depend ent Variab le INST Polynomial Contrast(a) RATE Linear Contrast Estimate -5.635 Hypothesized Value 0 Std. Error 1.176 Sig. .000
Quadratic Contrast Estimate -.700 Hypothesized Value 0 Std. Error 1.176 Sig. .556
Cubic Contrast Estimate .581 Hypothesized Value 0 Std. Error 1.176 Sig. .624
Box 5.23. GLM Polynomial Contrast Results for Interview Study.
Single-Factor Between-S Design Planned Comparisons 5-39
CONCLUSION
This chapter has demonstrated how researchers can perform focussed analyses of single
factor Between-S designs. Specific predictions about differences among the means allow
researchers to partition SSTreatment into (ideally) orthogonal components and perform sensitive tests
of the well-specified hypotheses. Although the topic is a complex one, the methods described
here, specifically the use of contrast coefficients, provide powerful tools by which researchers
can root out patterns that correspond to theoretically or practically important outcomes.
Contrasts also provide a way to think about more complex designs involving multiple variables,
as we will see in future chapters.
Although numerous SPSS analyses have been presented in this chapter, there is much
redundancy among the results. The different procedures (i.e., ONEWAY, MANOVA, GLM),
and different variations of procedures (e.g., default, DESIGN, and SINGLEDF approaches for
MANOVA) result in what can be a confusing array of analyses. This diversity can best be
accommodated by focusing on the basic quantities of interest, which are: L for the contrast,
SSContrast, SE or MSE, the t or F, the p values, and eta2. The ultimate criteria are the p values,
which provide information about the likelihood of the observed outcomes given no effects in the
population, and the eta2s, which indicate the strength of the observed effects. These statistics
allow researchers to draw inferences about the presence or absence of hypothesized patterns in
the data.
Conceptualizing planned contrasts can be a challenge, although the basic principle is that
of a correlation between hypothesized and observed patterns in the data. This can be seen most
clearly in regression analyses that provide results equivalent to the standard ANOVA approaches
to contrasts. This is the topic addressed in the next chapter. Essentially, we will see that SSContrast
= r2 × SSTotal, where r is the correlation between the indicator variable (i.e., the contrast
coefficients) and the criterion variable.
Single-Factor Between-S Design Planned Comparisons 5-40
APPENDIX 5.1: CONTRASTS AND DIFFERENCES BETWEEN MEANS
In addition to the standard approach using contrast coefficients, some SSs for contrasts
(but not all) can be viewed as differences between two means averaged across several of the
groups in the study. Here we illustrate this feature for several contrasts from the interview study.
One of our contrasts (-1 -1 +1 +1) compared groups 1 and 2 (the control groups) to
groups 3 and 4 (the stigmatized groups). Box 5.24 shows that SS for this contrast can be
calculated using the deviation of two means from the grand mean, in a manner equivalent to how
we calculate SS for the omnibus effect. One mean is the average of the means for groups 1 and
2, and the other mean is the average of the means for groups 3 and 4.
A contrast comparing groups 1, 2, and 3 to group 4 (-1 -1 -1 3) could similarly be
conceptualized as the difference between the mean averaged across the first three groups versus
the mean for the fourth group. Using the standard approach, SSContrast = 10 x (-1x27.3 + -1x26.0
+ -1x22.7 + 3x20.0)2 / 12 = 10 x -16.02 / 12 = 213.333. Using the two means approach, SSContrast
= 30 x (25.3333 - 24.00)2 + 10 x (20.00 - 24.00)2 = 30 x 1.33332 + 10 x -4.002 = 213.331.
Not all contrasts can be computed in this way; specifically, some coefficients describe
patterns or relationships that cannot easily be conceptualized in terms of differences between
means. The linear effect for the interview study (-3 -1 1 3), for example, does not equal any
comparison among two means.
Group1 2 3 4
y&j 27.30 26.00 22.70 20.00 L SS F %F = t
L1 -1 -1 +1 +1 -10.60 280.90 20.298 4.505
L1 = -1×27.30 -1×26.00 +1×22.70 +1×20.00 = -10.6
SSL1 = (10 × 10.6 2) /(-1 2 + -1 2 + 1 2 + 1 2) = 280.9
M1&2 M3&4 MGrand
26.65 21.35 24.00
Mj - M Grand +2.65 -2.65
SSL1 = 20(2.65 2 + -2.65 2) = 280.9
Box 5.24. Alternative calculation of SS for some contrasts.
Single-Factor Between-S Design Planned Comparisons 5-41
APPENDIX 5.2: GLM AND SSS FOR PLANNED COMPARISONS
One of the positive features of Manova is that it can provide SSs for each of the contrasts
using either the /DESIGN contr(1) contr(2) ... option or the /PRINT = SIGNIF(SINGLEDF)
option. Box 5.25 demonstrates the latter method using contrasts examined previously for the
interview study. The SSs reported by Manova can be used to demonstrate the partitioning of
SSTreatment and to compute R2s for each contrast (i.e., eta2).
GLM can also provide SS for each contrast, although it is necessary to modify the way in
which contrasts are requested. Box 5.26 shows one of the modifications that can be used; in
essence, k - 1 CONTRAST options are specified, one for each contrast. Following the standard
statistics, GLM produces an anova summary table for each contrast, including SS for the
contrast. The same output shown in Box 5.26 could be produced by: GLM rate BY inst
/LMATRIX = inst -1 -1 1 1 /LMATRIX = inst -1 1 0 0 /LMATRIX = inst 0 0 -1 1. Output for
the first contrast is shown in Box 5.27.
MANOVA rate BY inst(1 4) /PRINT = SIGNIF(SINGLE) /CONTR(inst) = SPEC(1 1 1 1 -1 -1 1 1 -1 1 0 0 0 0 -1 1).
Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 inst 325.80 3 108.60 7.85 .000 1ST Parameter 280.90 1 280.90 20.30 .000 2ND Parameter 8.45 1 8.45 .61 .440 3RD Parameter 36.45 1 36.45 2.63 .113
(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13
R-Squared = .395 Adjusted R-Squared = .345
Estimates for rate --- Individual univariate .9500 confidence intervals Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 2 -10.600000 2.35278 -4.50532 .00007 -15.37165 -5.82835 3 -1.3000000 1.66366 -.78141 .43967 -4.67407 2.07407 4 -2.7000000 1.66366 -1.62292 .11333 -6.07407 .67407
Box 5.25. Using MANOVA to obtain SSs for contrasts.
Single-Factor Between-S Design Planned Comparisons 5-42
GLM rate BY inst /CONTR(inst) = SPEC(-1 -1 1 1) /CONTR(inst) = SPEC(-1 1 0 0) /CONTR(inst) = SPEC(0 0 -1 1).
Custom Hypothesis Tests #1 inst Special Dependent Contrast Variable L1 Contrast Estimate -10.600 Std. Error 2.353 Sig. .000
Source Sum of Squares df Mean Square F Sig. Contrast 280.900 1 280.900 20.298 .000 Error 498.200 36 13.839
Custom Hypothesis Tests #2 inst Special Dependent Contrast Variable L1 Contrast Estimate -1.300 Std. Error 1.664 Sig. .440
Source Sum of Squares df Mean Square F Sig. Contrast 8.450 1 8.450 .611 .440 Error 498.200 36 13.839
Custom Hypothesis Tests #3 inst Special Dependent Contrast Variable L1 Contrast Estimate -2.700 Std. Error 1.664 Sig. .113
Source Sum of Squares df Mean Square F Sig. Contrast 36.450 1 36.450 2.634 .113 Error 498.200 36 13.839
Box 5.26. SSs in GLM using repeated /CONTRAST commands.
GLM rate BY inst /LMATRIX = inst -1 -1 1 1 /LMATRIX = inst -1 1 0 0 /LMATRIX = inst 0 0 -1 1....Custom Hypothesis Tests #1 Contrast Dependent Variable L1 Contrast Estimate -10.600 Std. Error 2.353 Sig. .000
Source Sum of Squares df Mean Square F Sig. Contrast 280.900 1 280.900 20.298 .000 Error 498.200 36 13.839 ...
Box 5.27. SSs for Contrasts using GLM and /LMATRIX (partial output).
Single-Factor, Between-S Multiple Regression and ANOVA 6.1
CHAPTER 06:
CORRELATION AND REGRESSION ANALYSIS
FOR SINGLE FACTOR BETWEEN-S ANOVA
Correlation and Regression for Two Groups . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Contrast Analyses by Multiple Regression . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Recode and Regression using Menus . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 10
Regression and Polynomial Contrasts for the Stimulation Study . . . . . . . . . . . . . . . . . . 12
Multiple Regression Analyses of Interview Data . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Omnibus ANOVA Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14
Indicator Variable Results . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 15
Indicator Variables and Regression Analysis using Menus . . . . . . . . . . . . . . . . . . . . . . . 17
Polynomial Regression for the Interview Study . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 19
Nonorthogonal Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 21
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 24
Appendix 6.1: Graphing Contrasts and Means . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Single-Factor, Between-S Multiple Regression and ANOVA 6.2
Figure 6.1. Menu Sequence to Initiate Regression Analysis.
ANOVA for the comparison of group means is a special case of regression analysis
(which is sometimes known as the General Linear Model or GLM, hence the name for SPSS’s
GLM program). This also means that an alternative way to perform ANOVA is using regression
procedures in SPSS or some other statistical package. In this chapter we demonstrate how to
conduct a number of the preceding analyses, the omnibus F and follow-up tests, using multiple
regression and correlation. The approach is especially straightforward for the two-group design
(i.e., k = 2).
CORRELATION AND REGRESSION FOR TWO GROUPS
In the case of two groups, ANOVA (and the equivalent t-test) can be performed with a
single predictor that has two
values. The two values for X
indicate to which of the two
groups the score (Y) belongs.
The analysis of two groups is a
simple regression analysis with a
single predictor. Consider the
two-group agitation design that
we have analyzed in previous
chapters. Figure 6.1 shows the
Windows menu sequence to
access regression via Analysis | Regression | Linear. This brings up the dialogue box shown in
Figure 6.2. As in other analyses, we select the independent and dependent variable, and move
them into their respective boxes. We then select from the various options that are available. The
Statistics button in Figure 6.2 allows users to select various statistics, including descriptive
statistics (i.e., mean, standard deviation, correlation).
Single-Factor, Between-S Multiple Regression and ANOVA 6.3
Figure 6.2. Specifying the Regression Analysis.
Box 6.1 shows edited
output of the regression analysis
requested in Figures 6.1 and 6.2.
Considering first the descriptive
statistics, the mean agitation
score (11.0) is equal to the grand
mean calculated earlier. The SD
of the 12 agitation scores about
this mean can be used to calculate
SSTotal: (12 - 1) × 2.088932 = 48.00. The mean for the group variable (X) is simply the average of
6 ones and 6 twos (i.e., 1.50).
After the correlation matrix (discussed shortly), Box 6.1 reports the results of the
regression analysis, including an ANOVA summary table and statistics related to the slope of the
best-fit regression line. The best-fit regression line is y’ = 14.0 - 2.0 × Group. The slope or
regression coefficient (i.e., -2.0) is the difference between the two means, and the test of the
significance of the slope produces the same results as the independent groups t-test of the
Descriptive Statistics Mean Std. Deviation N AGIT 11.000000 2.0889319 12 GROUP 1.500000 .5222330 12
AGIT GROUP Pearson Correlation AGIT 1.000 -.500
GROUP -.500 1.000 Sig. (1-tailed) AGIT . .049
GROUP .049 .
Model Sum of Squares df Mean Square F Sig. 1 Regression 12.000 1 12.000 3.333 .098 Residual 36.000 10 3.600 Total 48.000 11
Coefficients Unstand. Stand.
Coeff. Coeff. t Sig.Model B Std. Error Beta1 (Constant) 14.000 1.732 8.083 .000 GROUP -2.000 1.095 -.500 -1.826 .098
Box 6.1. Results for Windows Regression Analysis.
Single-Factor, Between-S Multiple Regression and ANOVA 6.4
t'r&0
1&r 2
n&2
'&.5&0
1&&.52
12&2
'&.5
.27386'&1.826
difference between means, that is, tSlope = -1.826, p = .098. Furthermore, in the ANOVA
summary table, SSRegression = 12.0 = SSTreatment and SSResidual = 36.0 = SSError. The dfs, MSs, F, and p
value are also identical. We will shortly see somewhat more clearly why these many
equivalencies occur.
Returning to the correlation matrix, note that the simple correlation between Group and
Agit is -.50, hence, r2 = -.52 = .25, which was the value that we had obtained earlier for η2. Group
accounts for 25% of the 48.00 units of variability in Agitation scores, which amounts to .25 ×
48.00 = 12.0 units, SSTreatment and SSRegression. We could also calculate a t or F statistic for the
correlation coefficient. For the t-test,
This t is again equivalent to the value calculated earlier for the difference between means
(or equivalently, for the slope). In essence, the significance of the difference between the group
means is equivalent to the significance of the correlation between a variable that represents the
groups and the dependent variable. Note as well, for example, that the one-tailed p value reported
for r in the correlation matrix is equal to .049, which is exactly half of the two-tailed value
reported for F and for the regression coefficient.
Single-Factor, Between-S Multiple Regression and ANOVA 6.5
RECODE group (1=-1) (2=1) INTO effect.
REGRESSION /VARIABLES = agit effect /DEPENDANT = ag it /ENTER /SAVE PRED(prd) RESID(res).
Multiple R .50000 R Square .25 000
DF Sum of Squares Mea n SquareRegression 1 12.00000 12.00000Residual 10 36.00000 3.60000F = 3.33333 Signif F = .0979
Variable B SE B Beta T Sig TEFFECT -1.000000 .547723 -.500000 - 1.826 .0979(Constant) 11.000000 .547723 2 0.083 .0000
Residuals Statistics: Min Max Mean Std Dev N*PRED 10.0000 12.0000 11.0000 1.0 445 12*RESID -3.0000 2.0000 .0000 1.8 091 12
LIST
GROUP AGIT EFFECT PRD RES 1.00 11.00 -1.00 12.00000 -1.00000 1.00 14.00 -1.00 12.00000 2.00000 1.00 12.00 -1.00 12.00000 .00000 1.00 9.00 -1.00 12.00000 -3.00000 1.00 13.00 -1.00 12.00000 1.00000 1.00 13.00 -1.00 12.00000 1.00000 2.00 7.00 1.00 10.00000 -3.00000 2.00 12.00 1.00 10.00000 2.00000 2.00 11.00 1.00 10.00000 1.00000 2.00 9.00 1.00 10.00000 -1.00000 2.00 12.00 1.00 10.00000 2.00000 2.00 9.00 1.00 10.00000 -1.00000
Box 6.2. SPSS Regression Analysis for Two-Group ANOVA
Box 6.2 shows the syntax commands to perform the regression analysis, as well as edited
Unix output for the analysis. Before the REGRESSION command, however, Box 6.2 includes a
command to modify somewhat the predictor used in the regression command. Specifically, the
RECODE command creates a new predictor called effect, which has values of -1 for group one
and +1 for group two, rather than our original values of 1 and 2. The general format of the
recode
command is:
RECODE old-
name (old-value
= new-value) ...
INTO new-
name. The new
values can be
seen in the
column for the
new variable
Effect at the
bottom of Box
6.2. A standard
REGRESSION
statement is then
used, and
predicted and
residual scores
are saved. These predicted and residual scores are informative about the parallels between the
ANOVA and REGRESSION analyses.
Output for this second regression is identical in many respects to the earlier regression in
Box 6.1, despite the different values for the predictor variable. Note in particular that FRegression =
FANOVA, and that tSlope = tr = t y&1- y&2. Moreover, SSRegression = SSTreatment, SSResidual = SSError, and R² = η².
Single-Factor, Between-S Multiple Regression and ANOVA 6.6
The dfs also correspond. The only change between Box 6.1 and Box 6.2 (other than format) is
with respect to the actual prediction equation. The best-fit equation in Box 6.2 is: y’ = 11.0 -1.0
× Effect. The intercept, b0, is equal to y&G, and the slope, b1, is equal to the deviation of the group
mean from the grand mean (i.e., y&j - y&G). But this change in the equation has no consequences
for the inferential statistical analyses, although it does make clearer perhaps the intimate
relationship between the regression analysis and ANOVA.
The reason for the equivalence of the regression and ANOVA analyses becomes even
clearer yet when we examine the predicted (PRD) and residual (RES) scores saved from the
regression analysis and listed at the bottom of Box 6.2. Note that all subjects in Group 1 have y&1
= 12.0 for their predicted value, í1 = 11.0 + (-1 × -1) = 12.0, and all subjects in Group 2 have y&2
= 10.0 for their predicted value, í2 = 11 + (+1 × +1) = 10.0. The descriptive statistics for the
predicted and residual scores show that SSPredicted = (12 - 1)1.0445² = 12.0 = SSTreatment, and that
SSResidual = (12 - 1)1.8091² = 36.0 = SSError. The secret to performing ANOVA via regression is to
generate one or more predictors in such a way as to ensure that the predicted scores equal the
group means, and the dfs for SSRegression and SSResidual = k - 1 and N - k (i.e., the dfs for SSTreatment
and SSResidual in ANOVA). If these correspondences occur, then necessarily the residual scores
will equal the deviations of observations from the group means, and MSRegression = MSTreatment,
MSResidual = MSError, and FRegression = FANOVA. As we have seen, this is easily accomplished when
there are only two groups. Although not shown here, it would be easy to demonstrate that the
predicted values in Box 6.1 would also be the two group means.
Categorical predictors such as Group in Box 6.1 and Effect in Box 6.2 are generically
referred to as indicator variables. They essentially code or indicate levels of the categorical
factor, rather than amounts of some variable. There are various types of indicator variables.
Indicator variables that sum to zero (such as Effect in Box 6.2) are called Effect codes, hence the
label for that variable. Another kind of indicator variable is called Dummy coding, with one
group coded 0 and another group coded 1. If a regression analysis of the present data were
conducted with Dummy coding, virtually all aspects of the analysis would be identical to those
already shown in Boxes 6.1 and 6.2, except that the regression equation would be again changed.
Specifically, the intercept, b0, would equal y& for whichever group was coded 0, and the slope, b1,
Single-Factor, Between-S Multiple Regression and ANOVA 6.7
would equal the difference between y&1 and y&2.
We also saw that both the t and F tests are equivalent to correlation and regression tests
of the significance of group as a predictor of agitation scores. The continuity among these
seemingly different analyses occurs because the various tests are in fact variations of the General
Linear Model. In the remainder of this chapter, the ANOVA and regression approaches are
generalized to Between-S designs that include more than two groups.
CONTRAST ANALYSES BY MULTIPLE REGRESSION
We have just seen that for k = 2, the single predictor regression analysis was equivalent to
the omnibus F test, which you may recall we also showed was equivalent to a contrast analysis
for two groups. That is, the single possible contrast for two groups (1 2, -1 +1, 0 1, or other
numerical equivalents) reproduces the omnibus F (or its t-test equivalent). Regression can be
used to perform all of the contrasts that we described earlier. Recall also that we need dfRegression =
k - 1 to reproduce ANOVA using regression. That is, we will need p = k - 1 predictors, and these
k - 1 predictors can (but need not) correspond to planned contrasts for the particular design. The
k - 1 orthogonal indicator variables (i.e., contrasts) would partition SSRegression (i.e., SSTreatment) into
mutually exclusive and exhaustive components (i.e., components where GSSL = SSTreatment). Let
us now examine more closely contrasts for k > 2 designs.
In essence, contrast analyses can
performed using regression by creating
indicator variables that correspond to the
desired contrast coefficients. In Box 6.3, for
example, to compare the first two of three
groups, one indicator variable (I1) would be !1
for all observations in group 1, 1 for all
observations in group 2, and 0 for all observations in group 3. To compare the first two groups
with the third group, one indicator variable should be !1 for group 1, !1 for group 2, and +2 for
group 3. The multiple regression tests of the significance of the regression coefficients will
correspond to the contrast analyses covered earlier. Note that these contrasts are orthogonal.
Note the general parallel between regression and contrast analyses. We previously noted
Group Subject I1 I21 1 -1 -11 2 -1 -1
...2 1 +1 -12 2 +1 -1
...3 1 0 +23 2 0 +2
...
Box 6.3. Indicator Variables for k = 3.
Single-Factor, Between-S Multiple Regression and ANOVA 6.8
RECODE group (1= -1) (2= +1) (3= 0) INTO con1 /group (1= -1) (2= -1) (3= +2) INTO con2.
REGRESS /VAR= press con1 con2 /DESC /STAT= DEFAU ZPP /DEP= press/ENTER /SAVE PRED(pred) RESID(res).
Mean Std Dev Correlation:PRESS 5.333 2.500 PRESS CON 1 CON2CON1 .000 .866 PRESS 1.000 -.34 6 .800CON2 .000 1.500 CON1 -.346 1.00 0 .000 CON2 .800 .00 0 1.000
Multiple R .87178 R Square .76 000
DF Sum of Squares Mea n SquareRegression 2 38.00000 19.00000Residual 6 12.00000 2.00000
F = 9.50000 Signif F = .0138
Variable B SE B Beta Corr PartCor Partia l T SigTCON2 1.333 .3333 .800 .800 .80000 .85280 3 4.000 .0071CON1 -1.000 .5773 -.346 -.346 -.34641 -.57735 0 -1.732 .1340(Const) 5.333 .4714 11.314 .0000
LIST GROUP PRESS CON1 CON2 PRED RES 1.00 4.00 -1.00 -1.00 5.00000 -1.00000 1.00 5.00 -1.00 -1.00 5.00000 .00000 1.00 6.00 -1.00 -1.00 5.00000 1.00000 2.00 3.00 1.00 -1.00 3.00000 .00000 2.00 2.00 1.00 -1.00 3.00000 -1.00000 2.00 4.00 1.00 -1.00 3.00000 1.00000 3.00 10.00 .00 2.00 8.00000 2.00000 3.00 8.00 .00 2.00 8.00000 .00000 3.00 6.00 .00 2.00 8.00000 -2.00000
Box 6.4. Contrast analyses by regression.
that k - 1 indicator variables were necessary to perform Anova, with the dependent variable being
regressed on the k - 1 indicator variables. This agrees with our general guideline of k - 1
orthogonal contrasts.
SPSS Regression and Contrasts for Brain Stimulation Study
The regression analysis in Box 6.4 reproduces the omnibus ANOVA and contrast
analyses for the bar-press study. The RECODE command creates the indicator variables con1
and con2. Con1 compares the two control groups (NS and A), and con2 compares the treatment
group to the average of the two control groups. These new variables can be seen at the bottom of
Single-Factor, Between-S Multiple Regression and ANOVA 6.9
Box 6.4. Con1 contains -1 for the three cases in group 1, +1 for the three cases in group 2, and 0
for the three cases in group 3. This contrasts group 1 to group 2. The second contrast contains -1
for the six cases in groups 1 and 2, and +2 for the three cases in group 3. This predictor contrasts
group 3 to groups 1 and 2. We want to know whether these new variables correlate significantly
with the data (i.e., with the dependent variable PRESS)
The ANOVA summary table provides omnibus results that agree exactly with earlier
calculations and SPSS analyses. The reason for this close agreement is that the SSRegression reflects
the variation in predicted scores and, given the indicator variables, the predicted scores turn out
to be the group means, as shown in the PRED column at the bottom of Box 6.4; hence, SSRegression
= SSTreatment. SSResidual likewise represents variation about predicted scores and hence corresponds
to SSError or SSWithin in the between-S ANOVA. Because p = k - 1 and N - p - 1 = N - k, the MSs
and F are also equivalent.
In addition, Box 6.4 reports the significance of the regression coefficients for con1 and
con2. Observe that the ts and associated significance for the coefficients agree with the earlier
contrast analyses performed by hand and using the various SPSS Anova commands. The
conclusion is again that the treatment group differs significantly from the two control groups
(con2), which do not differ significantly from one another (con1). The slopes and ts could be
used to calculate SSs and Fs reported previously in several analyses, and also to demonstrate that
the average of the Fs for the contrasts is equal to FOmnibus and the sum of the SSs for the contrasts
is equal to SSTreatment.
In addition to these now-familiar relationships, the correlation analyses provide several
novel and instructive observations. The simple correlations between the dependent variable and
the orthogonal contrasts are shown in Box 6.4, -.346 for contrast one and .800 for contrast two.
Squaring these correlations gives η2 for the contrasts; for example, .82 = .64 = η2 = 32.0/50.0.
This also means that the correlations can be used to obtain SSs for each contrast: .34641² × 50.0
= 6.0 = SSL1 and .8² × 50.0 = 32.0 = SSL2. The sum of the r2 for the contrasts equals R2, which in
turn equals η2 for the omnibus analysis: that is, .346412 + .800002 = .76 = 38.0 / 50.0. And as
noted previously, the sum of the SSs for the contrasts gives SSTreatment = 38.0 = 6.0 + 32.0. These
various relationships demonstrate that a contrast indeed represents the correlation between a
Single-Factor, Between-S Multiple Regression and ANOVA 6.10
Figure 6.3. Recoding Variables in SPSS for Windows.
pattern of coefficients (i.e., CON1 and CON2 in Box 6.4) and the dependent variable (i.e.,
PRESS in Box 6.4).
It is also interesting to note why the simple correlations can be used in the above way,
rather than requiring part correlations that control for other predictors. We noted previously that
our two contrasts are orthogonal. The correlation matrix in Box 6.4 shows explicitly that CON1
and CON2 are indeed orthogonal or uncorrelated; that is, their correlation with one another is 0.
This occurs because the sum of the cross products of the contrast coefficients is 0 (i.e., -1 × -1 +
1 × -1 + 0 × 2 = 0), a requirement for orthogonal contrasts. Because of this orthogonality, the
simple correlations and part correlations are identical for the two predictors (see the Corr and
PartCor columns of Box 6.4). Predictors that are already independent of one another are not
changed by multiple regression methods for statistically creating uncorrelated predictors.
Recode and Regression using Menus
To create the indicator variables using Menus, activate the appropriate dialog box by
selecting Transform | Recode |
Into Different Variables. In the
Recode into Different Variables
window, select the Numeric
Variable to transform, move it
into the appropriate box, enter a
new name in the Name box, and
assign it to the variable by
clicking on Change (see Recode
into ... screen in Figure 6.3).
Then select the Old and
New Values button to activate its window, which appears on top in Figure 6.3. Enter the Old
Value and the New Value into the appropriate boxes and Add them to the transformation list. In
Figure 6.3, the first two values for CON2 have already been added and the entries are made for
the final recode (i.e., group 3 to +2 in CON2). Click on Continue and then OK to invoke the
transformations. New variables and values will appear in the data window (shown in Figure 6.4).
Single-Factor, Between-S Multiple Regression and ANOVA 6.11
Figure 6.4. Recoded Variables in Data Window andRegression Specifications for Contrasts.
Figure 6.4 also shows the entries for the Linear Regression window, which would be
accessed by Analyze | Regression |
Linear. Press has been moved into
the Dependent box and the two
indicator variables, CON1 and
CON2, have been moved into the
Independent(s) box. Other options
could be selected as desired; for
example, the Statistics button would
access various optional statistics,
such as descriptive statistics, and
simple and part correlations. Click
OK to run the analysis.
Box 6.5 shows the output. Note in particular that the ts and significance levels for CON1
and CON2 duplicate previous analyses, and that the Zero-order and Part correlations also
replicate previous results.
Model Summary Model R R Square Adjusted R Std. Error of Square the Estimat e 1 .872(a) .760 .680 1.41421
Model Sum of Squares df Mean Square F Sig.
1 Regression 38.000 2 19.000 9.500 .014 Residual 12.000 6 2.000 Total 50.000 8
Coefficients(a) Unstandardized Standardized t Sig. Correlations Coefficients Coefficients Model B Std. Error Beta Zero-order Partial Part 1 (Constant) 5.333 .471 11.314 .000 CON1 -1.000 .577 -.346 -1.732 .134 -.346 -.577 -.346 CON2 1.333 .333 .800 4.000 .007 .800 .853 .800
Box 6.5. Results of Windows Regression Analysis.
Single-Factor, Between-S Multiple Regression and ANOVA 6.12
RECODE group (1= -1) (2= 0) (3= 1) INTO linr /group (1= -1) (2= 2) (3= -1) INTO quad.
REGRESS /VAR = press linr quad /DEP = press /ENTER /SAVE PRED(prd) RESID(res).
DF Sum of Squares Mea n SquareRegression 2 38.00000 19.00000Residual 6 12.00000 2.00000
F = 9.50000 Signif F = .0138
Variable B SE B Beta T Sig TQUAD -1.166667 .333333 -.700000 -3.5 00 .0128LINR 1.500000 .577350 .519615 2.5 98 .0408(Constant) 5.333333 .471405 11.3 14 .0000
LIST GROUP PRESS LINR QUAD PRD RES 1.00 4.00 -1.00 -1.00 5.00000 -1.00000 1.00 5.00 -1.00 -1.00 5.00000 .00000 1.00 6.00 -1.00 -1.00 5.00000 1.00000 2.00 3.00 .00 2.00 3.00000 .00000 2.00 2.00 .00 2.00 3.00000 -1.00000 2.00 4.00 .00 2.00 3.00000 1.00000 3.00 10.00 1.00 -1.00 8.00000 2.00000 3.00 8.00 1.00 -1.00 8.00000 .00000 3.00 6.00 1.00 -1.00 8.00000 -2.00000
Box 6.6. Polynomial Contrasts Using SPSS RECODE and REGRESSION.
Regression and Polynomial Contrasts for the Stimulation Study
A second type of contrast tests linear and nonlinear effects across treatment conditions.
Box 6.6 shows polynomial contrasts (or trend analysis) using the REGRESSION procedure.
Linear and
quadratic
indicator
variables are
created using
RECODE,
and the
regression
coefficients
for these
predictors test
the
significance
of the linear
and quadratic
sources of variability. Note that the ts for the regression coefficients are equal to the ts for the
contrasts in earlier analyses, and that both ts squared reproduce the Fs for the linear and quadratic
terms in earlier ANOVA summary tables.
The analyses indicate that both the linear and quadratic components are both significant.
That is, there is evidence of significant trends in the data for scores to increase with group level
(i.e., from 1 to 2 to 3) and for scores to decrease and then increase. The group means (5.0, 3.0,
and 8.0) are a composite of these two components.
The linear component is the contrast that is most common and often the most meaningful
in psychology. Examination of linear trends can lead to dramatic differences in the conclusions
relative to an omnibus F test. For example, if behavior in a classroom consistently improves
across teachers using 5 different amounts of praise, the omnibus F might be nonsignificant
Single-Factor, Between-S Multiple Regression and ANOVA 6.13
because SSTreatment is divided by k - 1 = 4, whereas the linear contrast could be highly significant
because SSLinear is divided by p = 1, for the single contrast. Analogous differences between the
omnibus F and other specific contrasts have been mentioned, but these differences can be
especially large using polynomial contrasts. In general, traditional ANOVA is a very insensitive
way to analyze the results of studies involving numerical predictors. Use either polynomial
contrasts to supplement the ANOVA or the equivalent regression methods, which are always
appropriate for numerical predictors.
MULTIPLE REGRESSION ANALYSES OF INTERVIEW DATA
To provide another example of contrast analysis using regression, consider the Control
groups versus Stigmatized groups contrasts for the interview study. We had three sets of contrast
coefficients: -1 -1 +1 +1, -1 +1 0 0, 0 0 -1 +1. For the corresponding regression analysis, we will
need three predictors, one corresponding to
each set of contrast coefficients. These indicator
variables are shown in Box 6.7 for the first two
subjects in each group. The predictor I1
represents the contrast between groups 1 and 2
versus groups 3 and 4. The t or F for I1 will
correspond to the t or F for this contrast in the
Anova analyses. Moreover, r2 between I1 and
the dependent variable will correspond to eta2
for this contrast. Similar effects will be
observed for I2, corresponding to the contrast between Group 1 and Group 2, and for I3,
corresponding to the contrast between Group 3 and Group 4. The regression analysis will also
demonstrate the orthogonal nature of these contrasts. That is, all three correlations between pairs
of predictors (i.e., I1 with I2, I1 with I3, and I2 with I3) will be exactly 0.
Group Subject I1 I2 I31 1 -1 -1 01 2 -1 -1 0
...2 1 -1 +1 02 2 -1 +1 0
...3 1 +1 0 -13 2 +1 0 -1
...4 1 +1 0 +14 2 +1 0 +1
...
Box 6.7. Indicator Variables for InterviewStudy.
Single-Factor, Between-S Multiple Regression and ANOVA 6.14
RECODE inst (1 = -1) (2 = -1) (3 = +1) (4 = +1) INT O o1.RECODE inst (1 = -1) (2 = +1) (3 = 0) (4 = 0) INT O o2.RECODE inst (1 = 0) (2 = 0) (3 = -1) (4 = +1) INT O o3.
REGRESSION /DESC = CORR /DEP = rate /ENTER o1 o2 o3 /SAVE PRED(prd) RESID(res) .
Correlation: RATE O1 O2 O 3O1 -.584O2 -.101 .000O3 -.210 .000 .000
Multiple R .62880 R Square .39 539
DF Sum of Squares Mea n SquareRegression 3 325.80000 1 08.60000Residual 36 498.20000 13.83889
F = 7.84745 Signif F = .0004
Variable B SE B Beta T Sig TO3 -1.350000 .831832 -.210322 -1.623 .1133O2 -.650000 .831832 -.101266 -.781 .4397O1 -2.650000 .588194 -.583865 -4.505 .0001(Constant) 24.000000 .588194 40.803 .0000
Residuals Statistics: Min Max Mean Std Dev N*PRED 20.0000 27.3000 24.0000 2.8903 40*RESID -6.0000 7.0000 .0000 3.5741 40
Box 6.8. MR Analysis of Interview Data.
Box 6.8 shows SPSS commands that can be used to perform the multiple regression
equivalent of the independent groups ANOVA. The RECODE statements create three indicator
variables (k - 1 = 3) that uniquely code each of the groups. Note how the orthogonal coding used
here as indicator
variables
corresponds to one
set of contrasts that
we previously
performed in the
chapter on planned
comparisons. Each
group has a unique
code on the three
indicator variables:
Group 1 is coded as
-1 -1 0, Group 2 as -
1 +1 0, Group 3 as
+1 0 -1, and Group
4 as +1 0 +1). That
these codes are
orthogonal means
that they are independent or uncorrelated with one another.
Omnibus ANOVA Results
The correspondences between the REGRESSION and ANOVA outputs are numerous.
The SSs, dfs, Fs, and ps are all equal, and R2 = η2. In addition to being available in the ANOVA
summary table, the ANOVA SSs could also be calculated in various ways from the regression
output; for example, SSTreatment = R2SSTotal = (n - 1)s2Predicted, SSError = (1 - R2)SSTotal = (n-1)s2
Residual.
The reason for these equalities is that the indicator variables result in a best-fit equation
that produces the group means as the predicted values and the deviations from the group means
Single-Factor, Between-S Multiple Regression and ANOVA 6.15
FORMAT subj TO o3 (F2.0)LIST
SUBJ INST RATE O1 O2 O3 PRD RES 1 1 26 -1 -1 0 27.3 -1.3 2 1 26 -1 -1 0 27.3 -1.3... 11 2 21 -1 1 0 26.0 -5.0 12 2 30 -1 1 0 26.0 4.0... 21 3 28 1 0 -1 22.7 5.3 22 3 17 1 0 -1 22.7 -5.7... 31 4 20 1 0 1 20.0 .0 32 4 24 1 0 1 20.0 4.0...
Box 6.9. Original and Derived Scores fromInterview Study.
as the residual values. To illustrate, the predicted score (PRD) for all observations in Group 1 is:
24 + -2.65×-1 + -.65×-1 + -1.35×0 = 27.3 = y&1. Recall that Group 1 was coded -1, -1, and 0 on
the three indicator variables. The residual scores (i.e., RES = y - í) in turn are the deviations of
observed scores from the group means.
Predicted and residual scores are presented in Box 6.9 for the first two subjects in each
group. Each group received its group mean as
the predicted value (PRD). The variation in
these PRD scores is SSRegression = SSTreatment.
Recall from our formula for calculating
SSTreatment from the group means that we
multiplied nj times the squared deviation of
each treatment mean from the grand mean. In
Box 6.9, we can see more readily the rationale
for multiplying by nj; each of the treatment
means occurs nj times in the PRD column. The
variation in the residual scores shown in Box 6.9 (RES) equals SSError because these are the
deviations of the observations from the predicted values, which are the group means.
Box 6.9 also shows more clearly the nature of the indicator variables used to code the
three groups. O1 compares groups 1 and 2 versus groups 3 and 4 (i.e., subjects 1 to 20 are coded
-1 and subjects 21 to 40 are coded +1). O2 compares group 1 versus group2 and O3 compares
group 3 versus group 4. If O1 were multiplied by O2 and their products summed, the result
would be 0, indicating that O1 and O2 are uncorrelated. Similarly, it is shown in the next section
that O1 is uncorrelated with O3, and that O2 is uncorrelated with O3. This complete lack of
correlation among the predictors defines orthogonal indicator variables.
Indicator Variable Results
In addition to the overall ANOVA, regression provides a test of the significance for each
of the indicator variables. The predictor O1 contrasts groups 1 and 2 versus groups 3 and 4. The
t in Box 6.8 for the predictor O1 agrees with the t for this contrast in the ONEWAY and other
ANOVA analyses. Moreover, the correlation between O1 and RATE, the attractiveness measure,
Single-Factor, Between-S Multiple Regression and ANOVA 6.16
is equal to ηL1; that is, .5842 = .341 = ηL12. This quantity measures the SSs accounted for uniquely
by contrast 1 and could be used to calculate a SS for contrast one; specifically, r2 × SSTotal = .5842
× 824.0 = 281.03 . SSL1. The simple r2s are appropriate here because the predictors O1, O2, O3
are orthogonal; note in Box 6.8 that the three rs for the correlations between pairs of predictors
are all zero. Therefore, the simple correlations are identical to the part correlations and also
equal the ηs for the contrasts. The ts, ps, and rs for the three indicator variables reproduce the
corresponding output from the earlier contrast analysis, again showing the close correspondence
between multiple regression and ANOVA approaches to data analysis. Just as we found earlier,
only indicator O1 (i.e., the first contrast between 1&2 vs 3&4) is significant; that indicator was -1
-1 +1 +1 for the four groups. In essence, O1 compares groups 1 and 2 combined (No Instructions
and Job Interview) with groups 3 and 4 combined (Psychiatric and Parole Interviews). The
comparisons between No Instructions vs. Job Interview (O2) and between Psychiatric vs. Parole
(O3) are not significant.
Although we could have requested ZPP statistics, little would be added by obtaining that
information. Because the indicator variables (i.e., contrasts) are orthogonal, the zero-order, part,
and partial correlations are all identical. Requesting CHANGE statistics and entering the
predictors sequentially (e.g., O1 last) would give F tests that again correspond to our earlier
contrast analyses, as shown in the following analysis.
One way to obtain the alternative F tests for the contrasts would be to enter predictors
REGRESS /STAT = DEFA CHANGE /DEP = rate /ENTER o2 o 3 /ENTER o1.
R R Adjusted Std. Error of Chan ge Statistics Square R Square the EstimateModel R 2 Change F Change df1 df2 Sig. 1 .233(a) .054 .003 4.58876 .054 1.066 2 37 .355 2 .629(b) .395 .345 3.72007 .341 20.298 1 36 .000
Model Sum of Squares df Mean Square F Sig.
1 Regression 44.900 2 22.450 1.0 66 .355(a) Residual 779.100 37 21.057 Total 824.000 39
2 Regression 325.800 3 108.600 7.8 47 .000(b) Residual 498.200 36 13.839 Total 824.000 39
Box 6.10. Regression Analysis with Change Statistics.
Single-Factor, Between-S Multiple Regression and ANOVA 6.17
Figure 6.5. Initiating Recodes via Menu.
Figure 6.6. Select Variables to Convert Dialogue Box.
successively and request SPSS to report CHANGE statistics. The change statistics for whichever
predictor is added last will correspond to the tests of interest. Box 6.10 shows the Windows
output from such an analysis. The /STAT = DEFAULT CHANGE requests the default and
change statistics. First o2 and o3 are entered (/ENTER o2 o3), and then o1 is added to the
equation (/ENTER o1).
We are primarily interested in the Change Statistics for Model 2, that is, the model in
which o1 has been added to o2 and o3. FChange = 20.298 . 4.5052 = to12 and rChange
2 = .341, which
agrees with earlier values. From this regression analysis, we can also calculate SSChange = SSy.123 -
SSy.23 = 325.80 - 44.90 = 280.90, which also agrees with earlier calculations and printouts.
Indicator Variables and Regression Analysis using Menus
The use of RECODE and
other SPSS data modification
commands is generally the
easiest way to create indicator
variables. It is possible,
however, to perform these
operations via Menus as well.
Figure 6.5 shows the initial
steps to constructing a new
indicator variable: Transform |
Recode | Into Different
Variables. These commands provide
access to the dialogue box shown in
Figure 6.6.
Figure 6.6 shows a dialogue
box in which all variables (initially)
are listed in the left-hand box. The
INST variable was selected and
moved into the conversion box in the
Single-Factor, Between-S Multiple Regression and ANOVA 6.18
Figure 6.7. Recoding via Menus.
center via the arrow button (which points left or right depending on which box has variable
names selected). Once INST was in the center box, the name of the output variable (o1 here) was
typed into the Output Variable Name slot and the Change button pressed. This created the
transformation equivalency shown in the center box (i.e., inst -> o1).
The next step is to specify the translation from old (i.e., original) values to new. Clicking
on the Old and New Values button in Figure 6.6 will bring up the dialogue box shown in Figure
6.7. The screen shot shows
the final step in the creation of
indicator variable o1.
Previously, old and new values
have been specified in the left
and right Value slots,
respectively, and then Added
to the Old->New box. All that
remains is to click on Add to
complete o1. The process
would be repeated for o2 and o3.
One useful option in using the Menu options to perform Recodes is to save the statements
actually generated by this process. The output screen will contain the syntax generated by the
Recode screens, and these statements can be copied to a Syntax window and saved in case the
transformations need to be repeated at some later time. Except for very simple translations,
RECODEs are generally done more easily by syntax than by dialogue boxes.
The actual analysis could be performed by syntax or via the menu commands illustrated
earlier in Figure 6.2. The only difference would be that three predictors (o1, o2, o3) would be
entered into the Independent(s) box. Box 6.11 shows the results of the analysis. The various
equivalencies reported previously for Box 6.9 are again present in the output. Notably, F = 7.847
reproduces the ANOVA analysis. There are a few additional features in Box 6.11 that are worth
mentioning. First, note that the means for o1, o2, and o3 are all 0.0. Second, note that the
correlations between o1 and o2, between o1 and o3, and between o2 and o3 are all 0. The three
Single-Factor, Between-S Multiple Regression and ANOVA 6.19
indicator variables are completely independent of one another (i.e., they are orthogonal). One
positive consequence of this is that the overall difference among the means, represented by R2 =
.395 can be partitioned into three orthogonal components. Indicator o1 accounts for .5842 ×
824.0 = .341 × 824.0 = 281.03 units of variability, o2 accounts for .1012 × 824.0 = .010 × 824.0 =
8.41 units, and o3 accounts for .2102 × 824.0 = .044 × 824.0 = 36.34 units. Note that 281.03 +
8.41 + 36.34 .325.8, the SSTreatment/Regression, and that .341 + .010 + .044 = .395 = R2 for the overall
analysis. Later chapters discuss more fully this partitioning of SSTreatment into independent
components.
Polynomial Regression for the Interview Study
We also conducted planned contrasts for the interview study assuming an order factor for
which polynomial contrasts would be appropriate. The k - 1 contrasts corresponded to the linear,
quadratic, and cubic trends in the data. The actual coefficients were obtained from Appendix A-
5, which lists polynomial contrasts for varying values of k. The specific values were: -3 -1 +1 +3
REGRESSION /VARIABLES = rate o1 o2 o3 /DESC = CORR /DEP = rate /ENTER o1 o2 o3 /SAVE PRED(prd) RESID(res).
RATE O1 O2 O3 Pearson RATE 1.000 -.584 -.101 -.210 O1 -.584 1.000 .000 .000 O2 -.101 .000 1.000 .000 O3 -.210 .000 .000 1.000
Model R R Square Adjusted R Square Std. Erro r of the Estimate 1 .629(a) .395 .345 3.7200657
Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.84 7 .000(a) Residual 498.200 36 13.839 Total 824.000 39
Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Er ror Beta 1 (Constant) 24.000 .588 40.803 .000 O1 -2.650 .588 -.584 -4.505 .000 O2 -.650 .832 -.101 -.781 .440 O3 -1.350 .832 -.210 -1.623 .113
Residuals Statistics(a) Minimum Maximum Mean Std. Deviation N Predicted Value 20.000000 27.299999 24.000000 2.8903021 40 Residual -6.000000 7.000000 .000000 3.5741235 40
Box 6.11. Windows Regression Output for Interview Study.
Single-Factor, Between-S Multiple Regression and ANOVA 6.20
for the linear, +1 -1 -1 +1 for the quadratic, and -1 +3 -3 +1 for the cubic. These contrasts define
linear, quadratic, and cubic patterns, and are orthogonal to one another.
Box 6.12 shows the default MANOVA output for this analysis. The results show a highly
significant
linear effect
(Parameter = 2),
but neither the
quadratic nor
the cubic effect
approach
significance.
Box 6.13 shows the regression equivalent for this set of contrasts. We first create the
three indicator variables, using now the coefficients for the linear, quadratic, and cubic contrasts.
The omnibus ANOVA again agrees with earlier regression and ANOVA analyses. The equation
has changed somewhat, because the coefficients now represent the polynomial effects. The t-
tests for these coefficients correspond to the polynomial contrasts done in Box 6.12. Note the
MANOVA RATE BY INST(1 4) /CONTR(INST)=POLY.
Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000 (Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13 R-Squared = .395 Adjusted R-Squared = .345
INST Parameter Coeff. Std. Err. t-Value Si g. t Lower -95% CL- Upper
2 -5.6348913 1.17639 -4.78999 .0 0003 -8.02072 -3.24907 3 -.70000000 1.17639 -.59504 .5 5554 -3.08583 1.68583 4 .581377674 1.17639 .49421 .6 2416 -1.80445 2.96720
Box 6.12. Polynomial Contrasts for Interview Study.
RECODE inst (1 = -3) (2 = -1) (3 = +1) (4 = +3) INTO lin.RECODE inst (1 = +1) (2 = -1) (3 = -1) (4 = +1) INTO qua.RECODE inst (1 = -1) (2 = +3) (3 = -3) (4 = +1) INTO cub.
REGRESSION /VARIABLES = rate lin qua cub /DESC = CORR /DEP = rate /ENTER.
RATE LIN QUA LIN -.621 QUA -.077 .000 CUB .064 .000 .000
Model R R Square Adjusted R Square Std. Error of the Estimate 1 .629(a) .395 .345 3.7200657
Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.847 .000(a) Residual 498.200 36 13.839 Total 824.000 39
Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta 1 (Constant) 24.000 .588 40.803 .000 LIN -1.260 .263 -.621 -4.790 .000 QUA -.350 .588 -.077 -.595 .556 CUB .130 .263 .064 .494 .624
Box 6.13. Polynomial Contrasts by Regression.
Single-Factor, Between-S Multiple Regression and ANOVA 6.21
Number of Groups (k) 2 3 4
Dummy Coding D1 D1 D2 D1 D2 D3 1 0 0 0 0 0 0
Group 2 1 1 0 1 0 0 3 0 1 0 1 0 4 0 0 1
Effect Coding E1 E1 E2 E1 E2 E3 1 1 1 0 1 0 0
Group 2 -1 0 1 0 1 0 3 -1 -1 0 0 1 4 -1 -1 -1
Orthogonal Coding O1 O2 O1 O2 O3 1 -1 -1 -1 -1 -1
Group 2 1 -1 1 -1 -1 3 0 2 0 2 -1 4 0 0 3
Box 6.14. Indicator variables for MR ANOVA.
equivalence of the ts and the ps. The strength of the relationships can be determined from the rs
between the predictors and the dependent variable (rate) in the correlation matrix. These simple
rs suffice because the predictors are orthogonal.
Nonorthogonal Predictors
Although orthogonal predictors have many benefits (i.e., the same benefits as orthogonal
contrasts), multiple regression is powerful enough to conduct ANOVA with indicator variables
that are not
orthogonal.
Various ways to
construct indicator
variables are
shown in Box 6.14.
All of these codes
produce exactly the
same output for the
overall regression
and ANOVA parts
of the analysis;
they differ
somewhat with respect to intercepts and slopes. The various codes produce identical statistical
omnibus results because, irrespective of the indicator variable, the predicted scores are the group
means for all nj individuals in each of the groups (i.e., MSRegression = MSTreatment) and the residual
scores are deviations from the group means (i.e., MSResidual = MSError).
Three types of indicator variable (among many) are: dummy, effect, and orthogonal
coding. In dummy coding, a comparison group (often the first group) receives 0 on all k - 1
predictors (e.g., D1 = 0 and D2 = 0 for the first of three groups). Other groups receive 1 on
exactly one predictor and 0 on others (e.g., D1 = 1 and D2 = 0 for the second of three groups).
As in dummy coding for one variable, the intercept will equal the mean of the group that receives
all 0s on the dummy indicator variables and the slopes will represent deviations from that
Single-Factor, Between-S Multiple Regression and ANOVA 6.22
comparison group's mean. Dummy indicator variables are not orthogonal.
In effect coding, one group (sometimes the last) receives -1 on all k - 1 indicator
variables. Other groups receive 1 on exactly one indicator and 0 on the others. To illustrate, in a
three-group design, E1 = 1 and E2 = 0 for group 1, E1 = 0 and E2 = 1 for group 2, and E1 = -1
and E2 = -1 for group 3. The intercept for effect coding is the grand mean, and the slopes
represent deviations from the grand mean. Effect codes are not orthogonal.
There are numerous types of orthogonal indicator variables, including the contrast
coefficients that we have discussed already. The defining characteristic is that orthogonal
predictors are all independent of one another (i.e., r = 0 for all pairs of predictors). In the
orthogonal codes shown in Box 6.14, groups are compared to one another and then their average
is compared to some other group. The O1 codes for k = 3, for instance, compare groups 1 and 2
and the O2 codes compare the average of groups 1 and 2 to the mean for group 3.
Box 6.15 illustrates the use of non-orthogonal indicator variables. Dummy coding has
been used, with the reference group being group 1 (the No Instruction condition). The omnibus
ANOVA again agrees with earlier results. As expected, the intercept is the mean for Group 1
RECODE inst (1 = 0) (2 = +1) (3 = 0) (4 = 0) INTO d12.RECODE inst (1 = 0) (2 = 0) (3 = +1) (4 = 0) INTO d13.RECODE inst (1 = 0) (2 = 0) (3 = 0) (4 = +1) INTO d14.REGRESSION /VARIABLES = rate d12 d13 d14 /DESC = CO RR /STAT = DEFAU ZPP /DESC = CORR /DEP = rate /ENTE R.
RATE D12 D13 D14 Pearson RATE 1.000 .254 -.165 -.509 D12 .254 1.000 -.333 -.333 D13 -.165 -.333 1.000 -.333 D14 -.509 -.333 -.333 1.000
Model R R Square Adjusted R Square Std. Erro r of the Estimate 1 .629(a) .395 .345 3.7200657
Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.84 7 .000(a) Residual 498.200 36 13.839 Total 824.000 39
Unstandardized Standardized t Sig. Correlations Model B Std. Error Beta Zero-order Partial Part 1 (Constant) 27.300 1.176 23.207 .000 D12 -1.300 1.664 -.124 -.781 .440 .254 -.129 -.101 D13 -4.600 1.664 -.439 -2.765 .009 -.165 -.419 -.358 D14 -7.300 1.664 -.696 -4.388 .000 -.509 -.590 -.569
Box 6.15. Nonorthogonal Dummy Indicator Variables for Interview Study.
Single-Factor, Between-S Multiple Regression and ANOVA 6.23
(27.30) and the regression coefficients represent the deviation of each of the other groups from
that mean (e.g., D14 = -7.30 = 20.00 - 27.30). The t-tests inform us that groups 3 and 4 differ
significantly from the control condition, but not group 2 (i.e., the other “control” group).
The comparisons are not orthogonal. Note in Box 6.15 that the three indicator variables
all correlate -.333 with one another, and that the part correlations are no longer equal to the
simple correlations. Because the predictors are not orthogonal, the sum of pr2 is not equal to R2.
Specifically .1012 + .3582 + .5692 = .462, which does not equal R2 = .395. We have not
partitioned SSRegression (i.e., SSTreatment) into k - 1 orthogonal components. The comparisons are not
orthogonal in essence because they all involve a comparison against group 1. Hence, that
group’s variation from the grand mean contributes to all three comparisons. Other possible non-
orthogonal comparisons might have accounted for less than 39.5% of the variability if the
comparisons “missed” important variation in the means.
The
comparisons in
Box 6.15 all
involve pairwise
comparisons;
therefore, we
might expect
some
correspondences
between these
results and the LSD post hoc tests. Box 6.16 shows that this is indeed the case. The results in
Box 6.15 agree exactly with the first three LSD comparisons (i.e., t-tests). The p values are
identical, as are the standard errors. Moreover, the differences between the means in Box 6.16
correspond to the unstandardized regression coefficients in Box 6.15. In fact, any indicator
variable that involves a pair-wise comparison will correspond to the LSD results for that
particular comparison. O2 in Box 6.8, for example, also produces p = .440 because it involves
the comparisons between groups 1 and 2 (i.e., O1 = -1 +1 0 0).
ONEWAY rate BY inst /POSTHOC = LSD.
Sum of Squares df Mean Square F Sig. Between Groups 325.800 3 108.600 7.847 .000 Within Groups 498.200 36 13.839 Total 824.000 39
LSD Mean Difference Std. Sig. 9 5% Confidence Interval (I-J) Error (I) INST (J) INST L ower Bound Upper Bound 1.0000 2.0000 1.300000 1.6636640 .440 - 2.074067 4.674067 3.0000 4.600000(*) 1.6636640 .009 1 .225933 7.974067 4.0000 7.300000(*) 1.6636640 .000 3 .925933 10.674067 2.0000 3.0000 3.300000 1.6636640 .055 - .074067 6.674067 4.0000 6.000000(*) 1.6636640 .001 2 .625933 9.374067 3.0000 1.0000 -4.600000(*) 1.6636640 .009 - 7.974067 -1.225933
Box 6.16. Post Hoc LSD Comparisons for Interview Study.
Single-Factor, Between-S Multiple Regression and ANOVA 6.24
CONCLUSIONS
In this chapter we have demonstrated how multiple regression (i.e., the General Linear
Model) can be used to compute ANOVA for comparisons between means, including not only the
omnibus F test, but also multiple comparisons of interest. We especially noted the
correspondence between the use of indicator variables and planned comparisons (or contrasts).
Using the coefficients from planned contrasts as indicator variables in multiple regression is
equivalent to the calculations conducted in Chapter 5. Or more properly, the short-hand
calculations introduced in Chapter 5 are equivalent to multiple regression results with predictors
corresponding to the contrasts. We also briefly noted that even the post hoc procedures of
Chapter 4 were equivalent to indicator variables that test the significance of differences between
pairs of means.
Single-Factor, Between-S Multiple Regression and ANOVA 6.25
Figure 6.8. Error Bar Plot of Means.
Figure 6.9. Plot of Ratings Against LinearContrast.
APPENDIX 6.1: GRAPHING CONTRASTS AND MEANS
The idea that contrasts attempt to
capture variability in scores by matching
patterns present in the data can be illustrated
graphically. Figure 6.8 shows an Error Bar
plot of the four means from the interview
study. The pattern of variability in the
means, which determines SSTreatment, can be
characterized in various ways, but Figure 6.8
shows that it is essentially linear. That is,
means decrease from group 1 to group 4 in
approximately equal steps. Although this
pattern is being identified here in a post hoc
manner (i.e., by looking at the data), it is
important to remember that planned contrasts are determined a priori (i.e., before the data is
examined based on theory or past findings).
The coefficients for the linear contrast
are -3, -1, 1, and 3, a linear pattern that will
correlate quite highly with the pattern in the
means (r = -.621). Figure 6.9 shows the
scattergram relating ratings to the linear
contrast. The linear regression (solid line in
Figure 6.9) corresponds quite well to the
trend in the means (dashed line in Figure
6.9).
Single-Factor, Between-S Multiple Regression and ANOVA 6.26
Figure 6.10. Plot of Ratings AgainstCoefficients for Groups 1 and 2 versus Groups 3and 4.
Figure 6.11. Plot of Ratings AgainstCubic Coefficients.
Another set of coefficients that worked
well (and might be more likely to be generated
a priori) contrasts groups 1 and 2 with groups
3 and 4 (-1 -1 +1 +1). Figure 6.10 shows the
ratings plotted against that contrast. These
contrast coefficients account for 34.1% of the
variability in the scores, versus 38.5% for the
linear contrasts (which, as noted, are not likely
to be predicted a priori for this study). In
Figure 6.10, the groups have been given
different size symbols (smallest = Group 1
and largest = Group 4) just to illustrate that
the coefficients (-1 -1 1 1) combine groups 1
and 2, and combine groups 3 and 4, and that it
is the pairs of conditions that are contrasted to one another.
Finally, to illustrate a set of coefficients that
does not capture much of the variability in means,
Figure 6.11 plots ratings as a function of the cubic
coefficients (1 -3 +3 -1) used in the polynomial
regression. The groups are again represented by
symbols of different sizes to emphasize that the
groups are now ordered 3, 1, 4, and 2 from left to
right, an ordering that produces a weak linear
relationship between the means and the contrast
coefficients, r2 = .004 = .0642 (.064 is the correlation
between ratings and cubic coefficients, see Box
6.13).