R A V (ANOVA)ion.uwinnipeg.ca/~clark/teach/4100/_Winter/40Chap-1to6.pdf · represent the degree of...

Scientific Psychology and Statistics Introduction - 1.1

CHAPTER 01:

INTRODUCTION TO SCIENTIFIC PSYCHOLOGY

AND THE ROLE OF ANALYSIS OF VARIANCE (ANOVA)

A General Conceptualization of Scientific Research .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Concretizing (Operationalizing) Theoretical Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Predictor and Criterion Variables . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 6

Experimental and Non-experimental Predictor Variables . . . . . . . . . . . . . . . . . . . . . . . . . 7

Numerical and Categorical Variables . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 10

Types of Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 11

An Overview of Data Analysis (Statistics) . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12

Correlation and Regression Methods . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 14

Introduction to ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 16

Properties of ANOVA Predictors . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 16

Overview of ANOVA Designs . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 18

Introduction to the F-Statistic . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 22

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 27


Psychologists want to understand human behavior and experience scientifically. Why do

some people do better in school than others? What is the effect of maternal alcohol consumption

during pregnancy on the later cognitive abilities of children? What are the perceptual

mechanisms that allow people at a crosswalk to determine whether oncoming cars are going to

hit them or stop in time? What factors determine whether people like or fall in love with one

another? What are the effects of severe head injury on different language processes, and how can

understanding the fundamental mechanisms of language be used to develop training methods that

will improve the rehabilitation of linguistic deficits? What biological and psychological

determinants lead some people to become so sad that it interferes seriously with the quality of

their lives, and what treatments are effective at easing their suffering? Curiosity about these and

innumerable similar questions is the starting point for psychological research.

The focus of this book is on the statistical analysis of data relevant to such questions, in

particular the technique of analysis of variance (ANOVA). However, it will help to place

ANOVA in the more general context of scientific research and various methods of statistical

analysis. A general characterization of scientific research will provide an important terminology

for describing and understanding various approaches to statistical analysis. Hence, we begin with

a conceptualization of how scientific psychology (and indeed any scientific discipline) works,

and then proceed to some ways in which scientific studies differ from one another. These

differences have implications for data analysis and for other important aspects of research.

A GENERAL CONCEPTUALIZATION OF SCIENTIFIC RESEARCH

Empirical research into questions about our psychological world is a complicated process

that requires many skills involved in the design, conduct, data analysis, and explanation of

scientific studies. Scientific studies can therefore differ from one another in many ways. At an

abstract level, however, any scientific study can be conceptualized in relatively simple terms.

The sample research questions listed above share a common general form. Specifically,

they involve questions about causes and effects. Consider the question of why people are

attracted to one another. The effect that we are interested in is being attracted versus not being

attracted to another person (or perhaps more finely graduated degrees of attraction from low to

high). The many factors that contribute to whether or not (or how much) people like one another


are the potential causes. For example, the old saying “Birds of a feather flock together”

hypothesizes that in general people who are similar to one another will like one another better

than people who are dissimilar. Similarity of attitudes, appearance, or whatever is therefore a

potential cause of attraction. “Opposites attract” suggests that dissimilarity breeds attraction.

Consider a second example, the effect of maternal alcohol consumption on children’s

later intelligence. Here the effect of interest concerns variation in children’s intelligence, and the

cause is whether or not their mothers consumed alcohol during pregnancy (or the amount of

alcohol that they consumed). The many other causes of intelligence (e.g., heredity, early

childhood environment, nutrition, brain damage) would not be the primary focus of such a study,

although researchers would need to consider other potential causes in their design. A research

design specifies how observations related to some specific hypothesis are to be collected.

At its most basic level, scientific research is concerned with testing the validity of these

kinds of claims about cause-effect relationships. Such claims often arise as deductions from

more general theories, or they may simply be specific relationships that researchers are interested

in for diverse reasons, including practical ones (e.g., school psychologists interested in the

determinants of academic success). Whatever the origins of the hypotheses, scientific

researchers wish to evaluate the validity of the hypothesized cause-effect relationship(s)

empirically (i.e., through observation that produces relevant data). Validity here means whether

the hypothesized cause-effect relationship represents accurately the true state of affairs in the real

world. Evaluation of validity involves many difficult issues, including the measurement of the

hypothetical constructs of interest.

Concretizing (Operationalizing) Theoretical Constructs

Any attempt to validate cause-effect relationships leads quickly to the realization that

psychological concepts tend to be highly abstract and difficult to observe. Psychological theories

posit relationships between abstract concepts (or constructs, to use a more technical term), such

as intelligence, attention, depression, problem solving, similarity, and love. These constructs

cannot be observed directly and must be inferred from behaviour. For example, we might

conclude that two people are in love if they often gaze into each other's eyes, spend time together,

write long letters to one another when apart, make sacrifices for one another, and avow their love


Figure 1.1. Model of scientific thinking.

in some verbal manner. But our conclusion about the abstract construct of love always remains

at least somewhat inferential; for example, there may be alternative internal states responsible for

these behaviours, such as a plan to exploit the other person’s affection.

The abstractness of many psychological terms is a serious challenge for scientists because

abstract terms cannot be observed directly and observation is the basic method by which science

tests theories. It is difficult to identify factors that cause people to fall in love if we have no

reliable and valid way to determine whether people were actually in love. Similar difficulties

arise without acceptable measures for managerial competence, intelligence, motivation, imagery,

and numerous other abstract psychological constructs that scientific psychologists wish to

understand. Concretizing an abstract concept generally involves specifying some procedures or

operations by which the concept can be measured, and in some cases actually manipulated

directly by researchers. Hence, the process of concretizing has been referred to as operational

definition. The concrete results of this process are called variables (rather than constructs), and it

is these variables that are subjected to statistical analysis.

The problem of concretizing or operationalizing abstract concepts is not unique to

Psychology. At one time, weight, heat, pressure, and other physical constructs also lacked

operational definitions (i.e., concrete procedures for accurate, reliable, and valid measurement of

the constructs). Early generations of natural scientists slowly and methodically developed

operational and increasingly precise measures for these constructs, which then allowed the

research underlying our current sophisticated theories of the physical world. Indeed, major

scientific advances often depended on the identification of the critical constructs and the

development of precise measurement operations. At the present time, other disciplines in the

social sciences and humanities generally lag even further behind than psychology in the

development of specific measures for their

theoretical constructs, and some appear to

have even rejected empiricism as a necessary

foundation for their disciplines.

Figure 1.1 provides a very simplified

illustration of how operationalizing constructs


allows researchers to test a theoretical hypothesis such as "Birds of a feather flock together"

(versus "Opposites attract"). In the example, the hypothesized causal relationship (symbolized by

the top arrow) is between two abstract constructs -- Similarity ("Birds of a feather") and

Attraction ("flock together"). Verbally, the hypothesis could be stated as “Similarity causes more

Attraction, whereas Dissimilarity leads to Less Attraction”.

To test this hypothesis, the two abstract constructs of Similarity and Attraction must be

translated into concrete, observable variables. That is, the constructs are operationally defined in

terms of the procedures or operations used to measure or experimentally manipulate them. The

criterion variable of Attraction might be operationally defined by ratings of how much each

person likes the other person, by observing interactions between people, or by other observable

measures relevant to liking. The predictor Similarity might be operationally defined by a

measure of the similarity of people’s independent responses to an attitude questionnaire, or by

experimenter instructions that focussed on similarities or differences between pairs of

participants in order to manipulate the perceived similarity between participants in the study.

Translating similarity and attraction into observable variables allows relevant observations to be

collected, perhaps from roommates in a university residence, couples, or even strangers meeting

for the first time.

Only occasionally is the translation from theoretical constructs to observational variables

simple (e.g., gender, age). Many psychological constructs are extremely difficult to translate into

observational terms, and an entire field of specialization, psychometrics, is devoted to the process

of operationalization. The considerable controversy around purported measures of intelligence

demonstrates the complexities. Some people deny that such abstract traits can even be measured,

others claim that any such measure is biassed against disadvantaged people, whereas

psychometrists themselves generally view intelligence tests in a positive way. One use of

statistical techniques, such as those described in this book, is to evaluate relationships relevant to

the reliability and validity of psychological measures (e.g., whether two measures of attraction

are correlated to one another, whether pre-existing groups differ as expected on some measure).

Such research would generally precede tests of a causal predictor, and would involve methods

discussed more fully in methodology texts and books for various content areas of psychology.


The results from our hypothetical study of similarity and liking would be pairs of

numbers for participants in the study. One number of each pair would represent the degree of

similarity between two people (e.g., a numerical measure of the similarity of their attitudes as

determined by an attitude questionnaire, or a categorical number representing which

experimental condition each person was assigned). The second number of each pair would

represent the degree of attraction between the two people (e.g., rated liking for a partner, perhaps

on a 7-point scale from 1=Not at All to 7=A Lot). The number of pairs of scores would depend

on the actual number of participants in the study.

Such data would permit researchers to determine statistically whether measures of liking

tend to be demonstrably higher for similar people than for dissimilar people (the ? mark in Figure

1.1). A positive relationship would occur if people having greater similarity to one another

tended to report higher liking. This outcome would support the prediction generated by the

theory that "birds of a feather flock together" and be contrary to predictions of the "opposites

attract" theory. The lack of a relationship or a negative relationship (i.e., people with greater

similarity having less liking) would be inconsistent with the hypothesis.

Predictor and Criterion Variables

In any causal model, at least one variable (and often more than one) is identified as a

predictor of some other variable, which is referred to as the dependent or criterion variable.

Criterion variables correspond to the outcomes that researchers ultimately want to explain (e.g.,

liking in the similarity study described above, anxiety in a study of the effectiveness of

psychotherapy, grades in a study of school success). Criterion variables are also called dependent

variables, a more specific term that is especially appropriate for experimental studies. Although

a researcher may try to influence a criterion or dependent variable by experimentally

manipulating some other variable, the criterion variable itself is simply measured by the

researcher and is not itself directly manipulated. Researchers do not, for example, directly

control the degree of liking between participants in research studies, although they might

manipulate variables that they think will influence liking (such as perceived similarity).

The other variable in any causal relationship is the predictor variable. In Figure 1.1,

similarity is a predictor of liking. An alternative term for predictor, especially appropriate for


experimental variables manipulated by the researcher, is independent variable. The fundamental

question in evaluating a theoretical hypothesis is whether the criterion variable is influenced

causally by the predictor variable(s) being studied (e.g., similarity in the liking study, type of

therapy in a study of the effectiveness of therapy, achievement motivation and intelligence as

predictors of school success). Properly, the term independent variable refers to causal variables

that the experimenter controls directly (i.e., manipulates), but independent variable is often used

for non-experimental causal variables, such as measures of achievement motivation.

The identification of a variable as a predictor or criterion variable or as a manipulated or

measured predictor can seldom be made without considering the specific study. A variable such

as anxiety, for example, might be used as a criterion variable in a study of therapeutic treatment

or as a predictor variable in a study of the negative impact of anxiety on interpersonal relations,

learning, or some other behavior. And anxiety as a predictor could be measured by some

appropriate scale or manipulated by the experimenter (e.g., by including or excluding in the

instructions statements that would increase or decrease participants’ anxiety levels). Only by

examining the specific causal hypothesis of interest is it possible to know the cause-effect role of

a variable (i.e., its status as predictor or criterion, as independent or dependent variable), and

whether the predictor is experimental or non-experimental, the next distinction that we consider.

Experimental and Non-experimental Predictor Variables

One distinction that can be important for statistical and other purposes is whether the

predictor is an experimental or non-experimental predictor. Experimental predictors are

manipulated by the researcher, whereas non-experimental variables are measured in their natural

state rather than manipulated. In the case of experimental predictors, the researcher is free to

determine the level of the predictor variable assigned to particular subjects (or to the same

subjects at different points in time). Examples of experimental predictors include: different

instructions given randomly to participants in a memory task, various amounts of reinforcement

in a learning study as determined by the experimenter, and experimenter-prepared instructions

that prime different degrees of perceived similarity in a social psychology study of liking. In the

last study, for example, researchers could randomly assign half of the participants to receive

instructions that emphasize similarity with their randomly chosen partner, and the other half to


receive instructions that emphasize differences. Subsequent liking for the partner would be

measured as the criterion variable.

In the case of non-experimental predictors, the researcher measures pre-existing

differences among participants rather than directly creating the differences. Examples of non-

experimental variables are gender of subjects, age of participants in a learning study, self-selected

participation in some therapeutic treatment, naturally adopted strategies in a study of learning

from textbooks, and attitude similarity for pre-existing couples. In the last study, people’s

similarity to a pre-existing partner would be measured, along with their liking. Similarity could

vary along some numerical scale, or else be roughly determined (e.g., similar, neutral,

dissimilar), but in either case it would measure similarities that existed prior to the study, rather

than the experimentally-determined “similarities” manipulated in an experimental study.

As just illustrated, one can not always tell from the construct alone whether the variable is

experimental or non-experimental. The specific procedures used in a particular study must be

examined to determine whether predictor variables are simply measured by the researchers in a

non-experimental study (e.g., administer attitude scales to different people and determine the

degree of similarity in attitudes), or manipulated by the researchers in some experimental manner

(e.g., randomly assign participants to different treatment conditions in a psychotherapy study).

Although the predictor of similarity is most easily studied through measurement of similarity,

even attitude similarity could be manipulated by a researcher (e.g., by experimentally inducing or

emphasizing similarity of attitudes in participants and then assessing their liking of one another).

The distinction between measured and manipulated predictors has several important

implications for researchers. First, the distinction is an important factor to consider in evaluating

how much confidence one should have in any causal inferences made from a study. The fact that

the experimenter controls directly an experimental predictor means that, ideally, assignment of

subjects to experimental conditions is potentially controlled in such a way that differences on the

predictor are independent of (i.e., not confounded with) other factors that might influence the

dependent variable. Random assignment to conditions is one way to achieve independence from

other potential determinants of behaviour. If participants in our attraction study were randomly

assigned to Low and High levels of similarity, for example, then we would be more confident


that any differences in liking between the two groups were actually due to similarity rather than

to other characteristics of participants that could influence liking (e.g., how sociable a person is,

whether response biases influenced ratings of liking). Randomly assigning participants to

different memory instructions similarly means that groups with different instructions will on

average not differ on a host of other variables that affect memory performance (e.g., intelligence,

fatigue, mood, motivation to do well).

The story is quite different for non-experimental predictors that are simply measured by

the researcher in their naturally occurring state. Because researchers do not randomly assign

subjects to the different levels or values of non-experimental variables, the participants are likely

to differ in a host of ways correlated with the predictor of interest. Participants who are classified

as low or high in similarity based on measured attitudes, for example, are likely to vary in

numerous other ways as well (e.g., similarity of social class, religious background, ethnicity,

susceptibility to response biases, and so on). Any seeming effect of similarity could instead be

due to these alternative factors, referred to as confounding variables.

As a second illustration, participants who voluntarily undergo therapy differ from non-

participants in many ways besides exposure to therapy (e.g., motivation, degree of distress,

income, education, duration of disorder, accessibility to treatment, attitudes toward psychological

disorders). Any of these confounded variables could account for the apparent effect of

therapeutic treatment. More subtly, these same confounding variables could mask or hide the

effect of treatment by counteracting its positive effects. That is, even a null relationship is

seriously compromised in a non-experimental study.

Thus, non-experimental designs lead to more ambiguous causal conclusions than properly

executed experiments, which allow researchers to be more confident that the intended predictor

variable was the only difference between the experimental conditions. Of course, few studies are

perfectly designed. Poorly executed experiments may provide little more assurance about

causality than a non-experimental study, so one should not accept blindly causal conclusions

simply because a study had researchers control assignment to conditions. The critical question is

whether the assignment to conditions was done in a way that eliminated the differential effects of

other variables on the dependent variable. This can be extremely difficult to achieve and even


more difficult to determine whether true.

Although the distinction between experimental and non-experimental predictors is

absolutely critical for what causal inferences one draws from the statistical results, it is also one

factor that is considered when researchers decide on appropriate statistical analyses for different

studies. Another important distinction for statistical purposes is the difference between

numerical and categorical variables.

Numerical and Categorical Variables

Predictor and criterion variables can be either categorical or numerical in nature.

Numerical variables involve the assignment of numbers (i.e., scores) to reflect the amount of

some variable. Examples of numerical variables include number of words recalled on a memory

task, IQ (i.e., a numerical score derived from an intelligence test), scores on some measure of

shyness, number of bar-presses by rats in a Skinner Box, and amount of practice at relaxation in a

therapy study. In each of these examples, the number represents the amount (less to more, or

vice versa) of some psychological quality. That is, the levels of the variable represented by the

numbers are graded in amount and the actual numbers reflect those gradations. Numerical

variables are also called quantitative variables and are sometimes further sub-classified as

ordinal, interval, or ratio. The latter distinctions are not important for statistical purposes and

will be ignored here.

Psychologists also study non-numerical or categorical variables, such as religious

affiliation, gender, psychiatric diagnoses, reasons for breaking off romantic relationships, style of

parent attachment in infants, and different kinds of therapeutic treatments. Such categorical

variables involve qualitative distinctions between the levels of the variable, rather than amount.

A person diagnosed as Schizophrenic, for example, differs in a host of qualitative ways from

someone diagnosed as having an Antisocial Personality Disorder. They do not differ simply on a

single, quantifiable dimension. Similarly, qualitative differences distinguish Roman Catholics,

Anglicans, Baptists, atheists, and people from other religious groups. Such categorical variables

are also called qualitative or nominal variables. Numbers may be used to represent the different

levels of categorical variables, but in this case the numbers are used simply as distinct labels for

the groups, rather than as representing the amount of some quality. That is, the group numbers


are similar to social insurance numbers, telephone numbers, and other cases in which numbers

identify individuals, rather than representing the amount of some quality.

Types of Relationship

A causal relation involves at least one predictor and one criterion variable. Since the

predictor and the criterion variables can each be either categorical or numerical, it is possible to

classify individual relationships as one of four types. Box 1.1 lists the four types of relationship

depending on whether the predictor is numerical or categorical and whether the criterion is

numerical or categorical. Also shown for each of the four types are examples of such

relationships. To illustrate, a study of the relationship between size (or amount) of reward and

rate of bar-pressing would involve a numerical predictor (amount of reward) and a numerical

criterion (rate of bar-pressing).

The distinction between

numerical and categorical variables

is important when deciding how to

analyze data or how to test specific

hypotheses about relations between

independent and dependent

variables. This book focuses on

statistical relations that involve

numerical dependent variables and

categorical predictor variables (i.e.,

the first type in Box 1.1), although

the methods can sometimes be

applied to numerical predictors

(i.e., the second type in Box 1.1).

Examples of relationships between categorical predictors and numerical criterion variables

include: the relationship of gender to performance on spatial ability tests, and the relationship of

treatment conditions to improvement on some measure of psychopathology. Examples of

relationships between numerical predictors and numerical criterion variables include: the

Predictor Criterion

Categorical Numerical- therapy condition - anxiety score- gender - imagery performance

Numerical Numerical- IQ - GPA- size of reward - bar-pressing

Numerical Categorical- age - religious affiliation- years experience - position in company

Categorical Categorical- gender - psychiatric diagnosis- ethnicity - type of crime

Box 1.1. A Taxonomy of Empirical Relationships.


relationship of quantified similarity of attitudes to degree of liking between people, and the

relationship of age to amount remembered from a list of words. The other two types of

relationships (i.e., those involving categorical criterion variables) generally involve other

statistical procedures that are not considered in this book on ANOVA, or its companion on

Multiple Regression and Correlation techniques.

AN OVERVIEW OF DATA ANALYSIS (STATISTICS)

The fundamental purpose of statistics is to determine the significance and strength of an

observed relationship between the predictor and criterion variables. Significance is concerned

with whether an observed relationship could have occurred by chance or not; for example, would

randomly paired numbers be unlikely to demonstrate the observed degree of relationship?

Strength is concerned with how much variation in the criterion can be accounted for by the

predictor; for example, does similarity account for 10%, 20%, ... of the variation in liking among

people?

The specific analysis would depend on several features of the relationship discussed

previously. Are the predictors categorical or numerical? Are they experimental or non-

experimental? Do the predictors involve a large number of levels or relatively few? If

categorical, do the predictors contain the same number of participants in each level? Answers to

such questions would result in the researchers choosing either correlation and regression methods

of statistical analysis, or analysis of variance methods. These are two closely related “families”

of data analysis, and account for the majority of statistical techniques used in psychology and

probably in the social sciences generally. Since ANOVA involves more restrictive conditions,

let us consider it first.

Analysis of Variance (ANOVA)

ANOVA methods of statistical analysis were designed originally to determine whether

distinct experimental conditions result in differences on some numerical dependent variable.

That is, ANOVA methods were designed specifically for the case of categorical predictors and

numerical criterion variables. In essence, the techniques analyze differences among means on the

numerical criterion variables as a function of varying conditions on the categorical predictor(s).

ANOVA determines whether the differences between condition means are significant and also


how much of the variation on the criterion variable can be attributed to the predictor.

True experiments of this sort demonstrate a number of characteristics that permit the use

of ANOVA. One condition that is generally true is that the categorical predictor has relatively

few levels to it. For example, amount of reinforcement might have three levels (Low, Medium,

High), treatment condition could have four levels (Control, Placebo Control, Treatment 1,

Treatment 2), and word concreteness in a memory study might have only two levels (Abstract,

Concrete). There are certainly cases in which a single experimental variable has many levels, but

this is not at all the norm for such studies.

A second condition that is often true of experiments is that there are the same number of

observations for each level of the predictor. Because the researcher has control of the assignment

of participants to experimental conditions (or vice versa), experimental researchers will generally

try to assign the same number of participants to different conditions. In a study comparing

Treatment and Control groups, for example, the researcher might randomly assign 20 participants

to each group. Equal numbers can greatly simplify the analysis of the results and allow for the

use of standard ANOVA methods. More sophisticated methods than standard ANOVA may be

necessary to deal with unequal numbers of observations per condition.

Another characteristic of well-designed experiments is that multiple predictors can be

made independent of one another and of many other potential confounds (i.e., variables

correlated with the target predictor in the natural world). Under these conditions, ANOVA can be

used to measure the contribution of the single independent variable of interest, or the effects of

multiple, independent predictors. If intelligence in pigs is studied as a function of the

experimental manipulation of maternal alcohol consumption, for example, then researchers can

examine the effect of alcohol consumption alone with some confidence that alcohol consumption

is independent of other potential confounding variables, or of other predictor variables being

studied (assuming sound procedures were used in assigning consumption levels to pigs).

Furthermore, multiple predictors in an experiment can be made independent of one

another by assigning equal numbers of pigs to each combination of conditions. To illustrate, a

study of both amount of maternal alcohol consumption (Low versus High), and nutritional status

(Poor versus Good) could ensure independence of the two predictors by assigning the same


number of animals to each condition (10 Low+Poor, 10 Low+Good, 10 High+Poor, 10

High+Good). The correlation between alcohol consumption and nutritional status would be 0.00.

If unequal numbers of animals were in each condition (e.g., 15 Low+Poor, 5 Low+Good, 5

High+Poor, 15 High+Good), then the two predictors would be correlated (r = 0.50 given the

numbers just presented) and their effects would be confounded. Standard ANOVA techniques

would have trouble teasing apart the unique effects of such correlated predictors.

Although the preceding qualities (i.e., few levels, equal numbers of observations) are

more characteristic of experimental variables, they are also sometimes true of non-experimental

variables. Many non-experimental studies may involve categorical predictors that have relatively

few levels to them. Consider the following non-experimental predictors: gender has two levels

(Male, Female), religious affiliation could be studied with four levels (Roman Catholic,

Protestant, Jewish, Other), and different psychological diagnoses in a particular study might have

three levels (e.g., Obsessive-Compulsive Disorder, Depression, Anxiety). It may also be possible

to have the same number of observations per level of a single predictor and, albeit with greater

difficulty in many cases, to have the same number of observations in combinations of levels of

different predictors (e.g., equal numbers of males and females in each of four religious groups).

Achieving such equality of numbers and independence of predictors, however, may be

challenging and wasteful of data. The hypothetical study above involving unequal numbers in

four groups (specifically, 15, 5, 5, 15), for example, could be analyzed with equal numbers in the

groups (5, 5, 5, 5), but only by discarding half the data. To avoid such waste, the more general

techniques of correlation and regression may be preferred.

Correlation and Regression Methods

Standard ANOVA techniques become complicated by deviations from the above

conditions. The presence of many levels to a predictor variable (e.g., a numerical, non-

experimental variable, such as intelligence) makes it difficult and less desirable to simply

compare means. The observed values of the predictor may be scattered over a wide range, with

unequal numbers of observations at each level. In many cases, there may only be a single

observation for a particular predictor score. Correlation and regression techniques are very

general and can readily accommodate such varied sets of observations, as well as the more


orderly data characteristic of ANOVA.

Rather than compare differences between means, correlational methods allow researchers

to determine whether increases or decreases on the criterion variable numbers are associated in

any systematic way with the values of the predictor variable numbers. Do high numbers on the

predictor tend to be associated with high numbers on the criterion, and low with low? Such

analyses do not require few levels of the predictor, equal numbers of participants at each level, or

even that the predictor values be independent of other predictors in the study. The statistical

procedures are powerful enough to accommodate such traits.

Researchers planning non-experimental studies may choose to use such correlation and

regression methods to measure the statistical contribution of multiple predictors to the criterion

variable, and to perhaps adjust statistically for correlations among the predictors. In a non-

experimental study of maternal alcohol use on fetal development, for example, researchers might

measure a host of predictor variables possibly confounded with alcohol use (e.g., education level,

smoking, drug use, maternal age). Multiple regression would allow the researchers to determine

the unique contribution of alcohol use over and above these other predictors, thus strengthening

the validity of any causal inferences. Doubts would remain, however, about confounding with

other variables not included in the study.

Despite the preceding correlation between qualities of predictors and method of statistical

analysis, then, there is nothing rigid about the use of correlation and regression for non-

experimental, numerical variables and analysis of variance for categorical, experimental

variables. Although ideally suited to and perhaps even required for non-experimental predictor

variables that are numerical, irregularly distributed, and confounded with other predictors,

correlation and regression methods are general ones that also accommodate the more regular

situation of categorical experimental predictors. Correlation and regression can in fact be used

for experimental variables and may in some cases be required (e.g., if loss of participants results

in unequal numbers of participants and correlated predictors). Analysis of variance may similarly

be appropriate for certain types of non-experimental predictors (e.g., relatively few levels and

equal numbers of participants in various conditions defined by combinations of predictors).

The approach in this book will be to view analysis of variance as a special case of the


more general correlation and regression methods (sometimes referred to as the general linear

model and abbreviated as GLM). That is, ANOVA can be thought of as a version of correlation

and regression analysis that has been simplified because of the highly structured nature of the

predictors.

Some books contrast correlational studies with experimental studies, but in such cases,

correlational is used as a synonym for non-experimental (i.e., no manipulated independent

variable). We will use correlation simply to describe statistical methods ideally suited to

analyzing relations between numerical, non-experimental variables (versus analysis of variance

methods for experimental, categorical predictors), and will use the more precise term "non-

experimental" to describe the underlying research method (versus experimental).

INTRODUCTION TO ANOVA

Analysis of Variance (ANOVA) refers to a family of statistical procedures that can be

used for a wide variety of experimental and non-experimental studies. ANOVA is in fact a

somewhat limited (yet still powerful) variant of multiple regression, as will be demonstrated

later. There are several properties of research designs that permit use of ANOVA rather than

regression.

Properties of ANOVA Predictors

Studies appropriate for ANOVA demonstrate some or all of certain characteristics,

predictors that are categorical (or can be treated as categorical), that involve a relatively small

number of levels (e.g., Male/Female or Low/Medium/High), and that are independent of one

another (or can be made independent in the data being analyzed).

Categorical predictors. In both experimental and non-experimental studies, researchers

are often interested in relationships between a numerical criterion variable and such categorical

predictors as gender, ethnicity, and numerous experimental variables relevant to psychological

theories or applied concerns. The predictor variables are categorical in the sense that different

levels of the variable (e.g., Male, Female) can be thought of as belonging to qualitatively

different categories, rather than differing in the amount of some underlying trait, as in the case of

such numerical or quantitative predictors as intelligence scores or self-ratings of subjective well-

being. Another characteristic of many categorical predictors is that they have a relatively small


number of discrete levels (e.g., 2 to 6 or so), although that is not necessarily the case. The

ANOVA analyses developed for such categorical variables can in some cases also be applied to

numerical predictors, especially ones that involve equal numbers of participants at each of

relatively few and well-spaced levels (e.g., Young, Middle-Age, Older). Treating the numerical

predictor as categorical might facilitate, for example, the examination of an expected non-linear

relationship with a criterion variable(s). Although the ordering of such predictors would be

ignored in the initial, omnibus analysis of differences among the groups, order could be

considered to good effect in more specific follow-up analyses.

Independent predictors. Another characteristic of ANOVA designs, especially those

involving experimentally manipulated variables, is that multiple predictors can be made

independent of one another (i.e., multiple predictors are uncorrelated), thereby avoiding the

causal ambiguity that plagues non-experimental predictors. Randomly assigning participants in

equal numbers to various combinations of conditions ensures that the predictors are independent.

Independence can even be achieved in non-experimental studies if the number of participants of

various types can be controlled by the researcher. Age and Gender, for example, would be

independent if the researcher obtained equal numbers of Males and Females at each of three

levels of Age (Young, Middle-Age, Older). In non-experimental studies that do not control the

frequency of various combinations of attributes, predictors will generally be correlated (e.g.,

more older females than older males, parent income and parent education level), and multiple

regression will be the preferred method of analysis.

Differences among means and ANOVA. The basic question in studies with one or more

categorical predictors is whether mean scores on the numerical criterion or dependent variable

differ as a function of the group to which the observation belongs. ANOVA, for example, could

be used to determine whether mean depression scores vary between treatment and control

subjects, whether children with different attachment styles in infancy grow up to differ on the

personality trait of independence, and whether mean number of words recalled varied among

subjects who learned the material using imagery mnemonics, sentence mnemonics, or no

mnemonics. Or to illustrate with two non-experimental variables, whether memory varies as a

function of Age (Young, Mid, Older) and Gender (Male, Female).


Although correlation and regression procedures can be used to analyze relationships

between numerical and categorical variables, special procedures collectively referred to as

Analysis of Variance (or ANOVA) have been developed to analyze studies in which a single

numerical criterion variable is related to one or more uncorrelated and categorical predictor

variables. We will demonstrate in subsequent chapters that ANOVA is a restricted variant of

regression analysis. More limited in some respects than regression, ANOVA is still a very

powerful statistical technique that can detect both simple and complex relationships in many

research designs. And ANOVA is more readily suited than regression for some purposes (e.g.,

analyses of factors that involve interactions or related observations at different levels of one or

more predictors).

The term “ANOVA” actually refers to a family of statistical procedures that involve some

common and some distinct elements. The research designs to which ANOVA can be applied

differ in a variety of ways that determine specific features of the analysis. Briefly stated, there

can be one or more predictors in the study, each predictor can have anywhere from two levels

(e.g., gender, treatment versus control groups) to multiple levels (e.g., different religions, effects

of several different drugs), and each predictor can involve related or unrelated observations at

each of its levels (e.g., independent subjects or the same subjects in different conditions). In all

these cases, the researchers are interested in possible differences among the means of the various

groups. We will see that the computations on these means will be identical across different

ANOVA designs. The estimate of the error variability used for hypothesis testing, however, will

depend somewhat on the specific design of the study.

Overview of ANOVA Designs

ANOVA terminology generally refers to predictor variables as Factors, the term that we

will adopt. Occasionally we will refer to them as predictors or independent variables (when

appropriate, as in a true experimental design). The ANOVA analysis appropriate for a specific

experimental design depends on the number of factors included in the study and the type of

factors, specifically whether between-subjects or within-subjects.

Between-Subject and Within-Subject factors. One important consideration in ANOVA is

whether the observations at the different levels of the factor(s) are related or not; that is, whether


researchers have reason to expect scores in one condition to correlate with scores in other

conditions.

Observations in different conditions are unrelated when there is no meaningful

connection between individual observations in the different conditions. If subjects were

randomly assigned to two or more groups, there would be no expectation of a correlation or

relationship between the specific scores in any of the groups. Position being arbitrary, the first

and ninth people in Group 1 would not be expected to have similar relationships to one another

as the first and ninth people in Group 2 (i.e., if person 1 in group 1 had a higher score than person

9 in group 1, there would be no reason to infer anything from this about the relative standing of

persons 1 and 9 in group 2). To illustrate, if a randomly selected half of participants studied a list

of abstract words and the other half studied a list of concrete words, the recall scores in the

different groups would be uncorrelated except by chance. Scores at different levels of a non-

experimental variable will be similarly uncorrelated if the various groups are selected

independently of one another (e.g., 50 men and 50 women chosen completely independently).

Individual scores would be uncorrelated because there is no systematic relationship between

specific participants in the different conditions. Such factors are called Between-Subjects (or

Between-S) factors. Between-S factors are also referred to as Independent Groups or Completely

Randomized factors.

Correlated or related observations can occur for several reasons. One common way is if

the same participants are observed in different conditions. For example, the number of concrete

and abstract words recalled by participants who studied a list containing both kinds of words

would be expected to produce correlated scores (i.e., the relative rank of people’s concrete word

recall would be similar to their rank for abstract word recall). A correlation is expected because

many individual factors (e.g., fatigue, intelligence, motivation) will influence memory for both

kinds of words, even though one type of word might be recalled better (i.e., lead to higher mean

recall), which is the question addressed by ANOVA.

Somewhat more difficult to appreciate is that related observations can also occur with

different subjects in different conditions. If specific individuals in one group match specific

individuals in another group on variables relevant to the dependent variable, then a correlation


would be expected. If one member from each of several twin pairs, for example, was assigned to

a control group and the other member of each pair to a treatment group, then we would expect

their scores on the criterion variable to be similar, at least to the extent that genetic factors and

common upbringing were associated with the criterion variable. A correlation would also be

expected if researchers matched participants prior to the experiment on some relevant variable

and then assigned matched subjects to different conditions. For example, clinical researchers

might pre-test participants on anxiety and then assign people to treatment or control conditions in

pairs; one treatment subject and one control subject would come from the highest scoring pair,

the next highest scoring pair, and so on down to the lowest scoring pair. The researchers would

expect post-test scores to be correlated, despite a lower average anxiety score for the treatment

group.

We will refer to predictor variables that involve related observations as Within-Subjects

(or Within-S) factors. Other terms for Within-S factors include Repeated Measures or

Randomized Blocks factors. Although the term Within-S is not completely appropriate because

matching can produce Within-S factors for variables that involve different subjects, this

terminology fits well with how statistical packages refer to such factors. You may be familiar

with this distinction between related and independent observations, although perhaps not the

terminology, from your previous study of the t-test. The Independent t-test is used for unrelated

observations (what we are calling Between-S factors) and the Paired Difference t-test is used for

related observations (what we are calling Within-S factors).

Single-factor and factorial designs. ANOVA designs also vary in the number of factors

in a study. Single-Factor studies involve just one predictor variable (e.g., Gender, or Age, or

Treatment Condition, ...). Because the one factor can be either a Within-S or Between-S factor,

there are two types of single-factor design and two corresponding types of ANOVA, with slightly

different calculations for Within-S and Between-S designs. Single-factor designs also vary in

terms of the number of levels of the single factor: such variables as Gender have two levels

(Male, Female), whereas other variables could have three or more levels. This has implications

for follow-up analyses, although not for the omnibus ANOVA.


Factorial studies involve two

or more predictors, each with a

certain number of levels. Consider

first the case of two predictors. A

factorial study with two predictors

would include all combinations of the two factors, say two levels of Gender and three levels of

Age. Box 1.2 shows the possible number of participants in each cell of such a factorial study. A

cell or treatment condition is defined by the combination of levels of two or more factors (e.g.,

Young Males is a cell or condition). Box 1.2 shows 10 participants in each of the 6 cells defined

by the combination of each level of Age and each level of Gender (i.e., 10 Young Males, 10

Young Females, 10 Mid Males, and so on).

Factorial designs are often described in terms of the number of levels of each factor. The

design in Box 1.2, for example, would be a 2 × 3 Factorial. The total number of cells in a two-

factor design is the number of levels of one factor times the number of levels of the other factor

(e.g., 2 × 3 = 6), and the total number of observations in the entire study will be the number per

cell times the number of cells (10 × 6 = 60, in the preceding example). Although the number of

observations per cell can vary, there is much benefit to keeping the number equal. Most

importantly, this ensures that the two factors are uncorrelated and allows standard ANOVA

procedures to be used.

Details of the ANOVA analysis for a two-factor design would depend on whether both

factors were Between-S, one factor was Within-S and the other Between-S, or both factors were

Within-S. One possible design would involve both Gender and Age in Box 1.2 being Between-

S; that is, the required number of participants of a specific age and gender would be solicited

without consideration of the other conditions. It would also be possible, however, for one of the

factors, either Age or Gender, to be Within-S. Age would be Within-S if the same people were

assessed longitudinally at different ages; Gender would be Within-S if both members of Male-

Female twin pairs participated in the study. Or both Age and Gender would be Within-S if both

members of Male-Female twin pairs were observed longitudinally at three points in their lives.

These four different designs imply three somewhat distinct Factorial ANOVAs: (1) two

AgeGender Young Mid Older

Male 10 10 10Female 10 10 10

Box 1.2. N per Cell of 2 × 3 Factorial Design


Between-S factors, (2) one Within-S factor plus one Between-S factor, or (3) two Within-S

factors. As we shall see, the analyses performed in these three distinct cases are different in

important ways, although they also share some commonalities. The differences among means

would be calculated in the same manner, but the calculation of the error terms to test significance

would vary.

Factorial designs can involve more than two factors, resulting in three, four, or even

higher order factorial designs. A 2 × 3 × 2 design, for example, is a three-factor design that

would have 12 distinct conditions defined by all possible combinations of the three factors. In

these more complex designs, each of the factors can be either Between-S or Within-S, which

produces an increasing number of distinct designs and ANOVAs.

Fortunately, the principles that apply to two factor designs generalize readily to higher

order designs, and the calculations for the differences among the means are identical for the

different designs. We will therefore focus primarily on understanding the single factor designs

and the various two factor designs, and then illustrate in the final chapters how the principles can

be extended to higher-order designs. We start by considering designs in which observations in

the different groups are independent of one another; that is, the factors are Between-S variables.

Later chapters extend the ideas presented here to Within-S designs and designs that involve a

mix of Between-S and Within-S factors. But first, we need to introduce the F-statistic itself.

Introduction to the F-Statistic

The F-statistic can be used to test many different kinds of research hypotheses. Although

nominally a test of the significance of the difference between two variances, the F-statistic can

also test the significance of the difference between two or more means (classic Analysis of

Variance or ANOVA), and the significance of regression analyses involving one or more

predictors.

The probability distribution for a test statistic (e.g., t, F) is called a sampling distribution.

The sampling distribution of F presents the probability of observing an F-value greater than or

equal to some value, making certain assumptions, such as that variances are equal, that

differences among means are due only to chance (ANOVA), or that there is no correlation

between a criterion variable and its predictor(s). Selected probabilities for the F distribution are


Figure 1.2 Sampling Distribution of F-statistic.

presented in Appendix A.3; these are the F values corresponding to the probabilities most

commonly needed for hypothesis testing (e.g., .10, .05, .01).

Testing differences between variances. At its most basic, the F-statistic is the ratio of

two variances (i.e., F = s12/s2

2), and is used to determine whether or not the two variances are

sufficiently different to reject the null hypothesis that their population variances are equal (i.e.,

H0: σ12 = σ2

2). If F is sufficiently large, then reject the null hypothesis. Although in theory one

could also determine whether F is sufficiently small, the common practice is to put the larger

variance (or the variance expected to be larger) in the numerator, which will result in F always

assuming large values greater than 1.

Given FObserved, we can then use the sampling distribution of F in Appendix A.3 to

determine a critical value for F, which is how large FObserved must be to reject the null hypothesis

of no difference. The critical value of F depends on the degrees of freedom for the numerator of

F and the degrees of freedom for the denominator. For the ratio of two sample variances,

dfNumerator = n1 - 1 and dfDenominator = n2 - 1, where n1 and n2 represent the number of observations in

the numerator and denominator samples, respectively. Table A.3 shows the probability of F $ Fa

(the tabled number) when the H0 is true (i.e., σ12 = σ2

2). Fa indicates the specific value of F such

that p(F $ Fa) = a, where a is the area (or probability) to the right of Fa. For example, when two

samples of six subjects are selected from the

same population (i.e., df = 5, 5) and a = .05,

Fa = 5.05. That is, the table indicates that

p(F $ 5.05) = .05 for df = 5, 5. Assuming that

samples are selected from the same

population is equivalent to assuming that H0:

σ12 = σ2

2 is true, because each variance would

equal the common variance in the single

population from which they were selected.

The sampling distribution of F is

different from the sampling distribution for t

or other statistical distributions that you might


know. Figure 1.2 shows an approximation of the F distribution. The horizontal axis represents

values of F from its minimum to its maximum, and the area defined by the vertical axis indicates

the probability of F taking on different values. Because variances cannot be negative, F must be

positive. Therefore, F ranges from a minimum of 0 (when the numerator is 0) to a maximum of

infinity (when the denominator is 0). Unlike t, the distribution of F is also asymmetrical. The

shape varies with the degrees of freedom, but generally there is a larger area (a hump) to the left

and a longer tail section to the right, as illustrated in Figure 1.2.

As shown in Figure 1.2, the probabilities in Appendix A.3 indicate an area (labelled a in

the Figure) above Fs of specified values. The area represents the probability that F is greater

than or equal to the specified value, Fa. In hypothesis testing, researchers decide how small the

probability must be before the null hypothesis would be rejected (e.g., .10, .05, .01, ...). This is

the alpha level (α), or probability of a Type I error (rejecting a true null hypothesis). Researchers

then find the critical value corresponding to the specified alpha and df. We will see in Chapter 2

that SPSS and other statistical programs also allow researchers to work with observed

probabilities, in addition to critical values.

We illustrate the use of the F-statistic to test the equality of two variances using data from

a study analyzed more fully in Chapter 2. The study involved scores for 6 participants in each of

two groups, s1 = 2.000 and s2 = 1.789. One way to test whether the group variances differ is to

perform an F test of the variances; that is, FObserved = 2.0002 / 1.7892 = 1.25. If we use α = .05,

then the critical value of F is F(.05; 5, 5) = 5.05. Because the observed F of 1.25 is not greater

than or equal to the critical value for F of 5.05, we would not reject H0: σ12 = σ2

2. By using a

critical value of 5.05, researchers ensure that, if the H0 is true, then FObserved will lead to rejection

of the (true) H0 only 5% of the time. It is critical to note that this F test of the ratio of two sample

variances is different than the F for testing differences between means.

The F-test and hypotheses about a single mean. The F statistic can be adapted to test

various hypotheses about means. Our discussion of differences between means begins in Chapter

2. Here we demonstrate the use of F to test hypotheses about a single mean. Consider the

situation in which a sample of 16 university students is administered an intelligence test (see Box

1.4 for the data), and obtains an average score of 108.75, with a standard deviation of 17.9536.


Does this sample provide evidence that the IQ of university students differs from the average for

the general population, which is known to be 100.0?

This test is typically done using a t test. From the Central

Limit Theorem, the Standard Error of a sample mean is known to

be approximately the standard deviation of the sample divided by

the square root of the number of observations in the sample.

That is, SEy& . s/%n = 17.9536/%16 = 4.4884, in our particular

case. The distance of the observed mean from the hypothesized

value divided by SEy& is distributed as a t statistic and can be

compared to critical values for df = n - 1 = 16 - 1 = 15, for our

problem. That is, tObserved = (108.75 - 100.0) / 4.4884 =

1.1.94947. The critical value of t for a two-tailed test with df =

15 is 2.131 (from Appendix A.2), assuming α = .05 and a two-

tailed test. Therefore, the researchers would not reject the null

hypothesis that µ = 100.0 for the population of university

students.

Doing this as an F test is straightforward. For the denominator, the variance of the

sample, s2 = 17.95362 = 322.3318, represents random variation or noise. The variation is random

because we do not know why individuals in the group have different IQs. For the numerator, the

squared deviation of the observed mean (108.75) from the hypothesized mean (100.0), weighted

by the number of observations (more on this later), provides a variance that reflects how much

the two means differ. That is, the Numerator = 16 × (108.75 - 100.0)2 = 1225.00. This quantity

represents how close the sample mean is to the hypothesized value. If the sample mean was

closer to 100.00, then the Numerator would be smaller than 1225.00, and if the sample mean was

further away from 100.00, the the Numerator would be larger than 1225.00. The Numerator is

divided by the Denominator to give FObserved = 1225.0 / 322.3318 = 3.80043, with df = 1, 15. The

critical value of F with α = .05 is 4.54, and we fail to reject H0: µ = 100.0.

Observe that the t and F tests are equivalent: FObserved = 3.80043 = 1.949472 = tObserved2, and

FCritical = 4.54 = 2.1312 = tCritical2. Therefore, whenever t is significant, so is F, and vice versa, and

IQ 85117124115 82119134106111111145116 98 96 99 82

Mean 108.75SS 4835.00SD 17.9536

Box 1.4. Sample of 16 IQs.


whenever one test is not significant, so is the other. This equality holds true for hypotheses about

a single mean, as here, and for hypotheses about the differences between two means, as covered

in the following chapter.

SPSS and Hypotheses about Single Means

There are a

number of ways to

perform the

preceding analysis in

SPSS. Box 1.5

shows the SPSS

commands to enter

the data, and then

conduct a single

sample t-test. The

results agree with our

earlier calculations,

and we learn the additional information that the p for the observed t is .07, which means that p (t

$ 1.949 | H0 true) = .07.

ANOVA is not often used to test a single mean, so the various SPSS ANOVA commands

do not provide a mechanism for specifying 100 as the null hypothesis. We can, however, test

DATA LIST FREE / subj iq.BEGIN DATA 1 85 2 117 3 124 4 115 5 82 6 119 7 134 8 106 9 111 10 111 11 145 12 116 13 98 14 96 15 99 16 82END DATA.

TTEST /TESTVALU = 100 /VARIABLE = iq.

N Mean Std. Deviation Std. Error Mean IQ 16 108.750000 17.9536440 4.4884110

Test Value = 100

t df Sig. Mean 95% Conf I nt of Diff (2-tailed) Difference Lower Upper IQ 1.949 15 .070 8.750000 -.816822 18.316822

Box 1.5. SPSS Data Entry and T-test Commands.

COMPUTE iq2 = iq - 100.MANOVA iq2 /PRINT = CELL.

Variable .. IQ2 Mean Std. Dev. N For entire sample 8.750 17.954 16

Tests of Significance for IQ2 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 4835.00 15 322.33 CONSTANT 1225.00 1 1225.00 3.80 .070

Box 1.6. F-test for Single Sample Hypothesis.


whether the mean differs from 0. If we first subtract 100 from the scores, testing whether the

mean of the scores minus 100 differs from 0 is equivalent to testing whether the mean of 108.75

differs from 100. Box 1.6 shows the relevant commands. The COMPUTE statement subtracts

100 from each score, and the MANOVA command with a single score and no factors tests

whether the mean of the scores differs from 0. Note that the p value of .07 is identical to that for

t, that F = t2, and that the SSs for the numerator (CONSTANT in Box 1.6) and denominator

(WITHIN CELLS in Box 1.6) agree with our earlier calculations. We again can reject H0: µ =

100.

CONCLUSION

To summarize, psychologists obtain empirical data relevant to theoretical constructs and

their causes by translating abstract constructs into concrete, observable variables (e.g., a score on

a standardized IQ test, amount of time looking into another person's eyes, imagery ratings for

words, number of sad words checked off a list of emotion terms), and determining whether the

observed relationships between the variables are consistent with theoretical expectations. If the

results are not consistent with the theoretical predictions, then the empirical study needs to be

reviewed for flaws, or perhaps the theory needs to be rejected or revised. In testing hypotheses,

researchers measure an outcome or criterion variable (i.e., the dependent variable), and either

measure or manipulate directly a predictor variable (or independent variable). Predictor and

criterion variables can also be classified as categorical or numerical. The levels of numerical

variables are ordered in some quantitative manner, whereas the levels of categorical variables

simply represent differences rather than amounts of some quality.

ANOVA can be used to analyze the results of studies that involve numerical criterion

variables and predictors that demonstrate most of the following properties: usually categorical

with relatively few and regular levels, the same or similar number of observations per category,

and independence of one predictor from other predictors (and from extraneous variables if

experimental in nature). Such properties are most easily achieved with experimental factors. In

the absence of such predictors, the more general correlation and regression methods are

preferred. Correlation and regression can accommodate not only ANOVA-type predictors, but

also predictors that involve scattered and irregular numerical values (i.e., numerous levels and


unequal numbers of participants) and are correlated with other extraneous variables because they

are non-experimental. Some extraneous variables may be measured, which allows statistical

testing of the unique contribution of each measured predictor independent of the others.

ANOVA refers to a family of statistical techniques that can be used for single-factor or factorial

studies, with one or more of the factors being either between-S or within-S factors. We begin in

the next chapter with a single categorical variable involving uncorrelated observations.

Between-Subject Design Single-Factor - 2.1

Chapter 02:

SINGLE-FACTOR BETWEEN-S DESIGN FOR K = 2

Single-Factor Between-S Design (k = 2) . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

T-test Approach to Two-group Study . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 3

ANOVA Approach to Two-group Study . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4

Equivalence of F and t for Two Groups . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 9

ETA2 (η2) and Strength of Relationship . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 10

SPSS Analyses for the Two-Group Between-S ANOVA . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 10

SPSS TTEST Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 11

SPSS ONEWAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 13

Other SPSS ANOVA Commands . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 15


Agitation Scores Control (1) Treatment (2)

11 7 14 12 12 11 9 9 13 12

13 9

y&1 = 12.0 y&2 = 10.0

SS1= 16.0 SS 2= 20.0

s² p = (16 + 20) / (6 + 6 -2) = 36 / 10

= 3.6 [= MS Error ]

sp = %3.6 = 1.897

H0: µ 1 - µ 2 = 0 H a: µ 1 - µ 2 =/ 0

df = 10 t .05,10 = 2.228 [= %F.05;1,10 ]

t = (12 - 10) / (1.897 %(1/6+1/6))

= 2 / 1.095 = 1.83 [= %F]

Box 2.1. T-test for Independent Groups.

This chapter introduces the Single-Factor Between-S design for two groups (i.e., k = 2,

where k stands for the number of groups). We will see that both the t-test and ANOVA can

handle the case of two independent groups. Some of this material will be review from earlier

statistics courses. We will then examine various ways in which SPSS can perform this analysis:

t-test, ANOVA, and regression. A general notation for ANOVA will be introduced in Chapter 3,

and the methods extended to more than two groups (i.e., k $ 2).

SINGLE-FACTOR BETWEEN-S DESIGN (K = 2)

The simplest Between-S study is one in which two groups are compared on some

criterion variable. Either t-tests or ANOVA can be used to test the difference between the means

of two groups, with ANOVA leading to exactly the same conclusion as a t-test. Specifically, we

will see that t2Observed = FObserved and t2

α = Fα. This means that if t is significant (i.e., tObserved $ tα),

then F will be as well (i.e., FObserved $ Fα). The ANOVA, however, is more general than the t-test

and can be used even when there are more than two levels of a factor, or when there are multiple

factors (e.g., the separate and joint effects of both Gender and Age). Thus, in more complex

designs, ANOVA will provide

information that the t-test

cannot.

The analyses will be

illustrated first with the small

dataset shown in Box 2.1.

Agitation scores were obtained

on two groups (n1 = 6, n2 = 6).

Group 2, the treatment group,

had completed a course in

relaxation methods, whereas

the Group 1, the control group,

had not received such training.

Participants were randomly

assigned to the two groups;


H0:µ1-µ2'0 Ha:

µ1-µ2…0µ1-µ2>0µ1-µ2<0

t'( y1- y2)&0

sp1n1

+ 1n2

s2p '

SS1%SS2

n1%n2&2df'n1%n2&2

Equation 2.1. Difference Between Independent Means.

hence, the observations in the groups are independent and we expect no correlation between the

two sets of scores. The data will be analyzed in several different ways, all of which lead to the

same statistical conclusions. Equalities shown in square brackets ([]) in Box 2.1 and later boxes

demonstrate some of the many parallels between the different analyses and will be explained at

appropriate times. We begin with the independent groups t-test.

T-test Approach to Two-group Study

Equation 2.1 shows formula for one approach to the independent groups t-test. The H0 of

no difference between the means is contrasted against one of three alternative hypotheses shown

in Equation 2.1 (a two-tailed “not equal” alternative, or one of two one-tailed alternatives).

Equation 2.1 shows a common variant of the independent groups t-test in which the SSs and dfs

for groups 1 and 2 are pooled to create a pooled variance, sp2. The pooled variance appears in the

denominator of the t-test (i.e., in the computation of the SE for the difference between

independent means). The tObserved is compared to tCritical for df = n1 + n2 - 2.

Box 2.1 shows the calculation of this independent groups t-test for our hypothetical data.

The test will determine whether the mean agitation score observed for the control group (y&1 =

12.0) and the treatment group (y&2 = 10.0) are far enough apart that it is unlikely (but not

impossible) that this difference occurred by chance. If the difference is large enough to be

deemed significant, then researchers will conclude that a real difference exists between the two

means for the populations from which the two groups were selected. That is, researchers will

reject the null hypothesis µ1 = µ2, or equivalently, µ1 - µ2 = 0.

The calculations shown in Box 2.1 are those represented by Equation 2.1. The SSs within

the two groups and their corresponding dfs are pooled together to create a pooled estimate of the

population variance: sp² = (SS1 + SS2) / (df1 + df2) = 3.6. This pooled estimate is used to


calculate the standard deviation of the difference between the means (i.e., the Standard Error):

SEy&1- y&2 = sp × SQRT(1/n1 + 1/n2) = 1.095. The SE of the difference indicates how much

variability in the means would be expected just by chance given the amount of random variation

observed within each of the two groups and the sample sizes.

The difference between the means is divided by its SE to produce the observed value of t.

This observed value is then compared to the critical value of t needed to reject the null hypothesis

of equal population means for groups one and two. The df for the critical value of t is (n1 - 1) +

(n2 - 1) = n1 + n2 - 2 = 10, the sum of the dfs for the two groups and the denominator for s2p. In

the present example, the observed t of 1.83 is not greater than the two-tailed critical value of

2.228 for α = .05. Therefore, the researchers cannot reject the H0, and cannot conclude that the

population means differ from one another.

The preceding analysis assumes a two-tailed alternative hypothesis (i.e., µ1 =/ µ2). For a

one-tailed test in which the researchers have a priori grounds to ignore the possibility that group

2 might have a higher score than group 1, tα = 1.812, and the null hypothesis would be rejected in

favor of Ha: µ1 - µ2 > 0 (equivalently, µ1 < µ2).

ANOVA Approach to Two-group Study

The t-test is designed for comparisons of two groups, but many studies involve

comparisons among more than two groups. Educational researchers, for example, might

compare two or more experimental reading programs relative to a standard or control technique.

Social psychologists might want to compare the effectiveness of multiple conditions for changing

people’s attitudes (e.g., Control, Layperson, Expert). Neuroscientists might compare the effects

of several different drugs versus placebo conditions. Clinical psychologists may similarly

compare three different treatments for a particular disorder. Cognitive psychologists may contrast

theoretical predictions about reading differences for low, medium, and high frequency words.

Such designs require a statistical technique that can assess the significance of differences among

more than two means. ANOVA is just such a technique.

The trick to using F to test the differences between the means is to calculate a numerator

variance that reflects the variability among the means and a denominator variance that reflects

random variability within the groups (i.e., variation due to chance or noise). The denominator for


Control Treatment y1 y-y &1 y2 y-y &2

11 -1 7 -314 2 12 212 0 11 1 9 -3 9 -113 1 12 213 1 9 -1

y&1 = 12.0 y &2 = 10.0 y &G = 11.0

SS 1 = 16.0 SS 2 = 20.0

y &1-y&G +1.0 y &2-y&G -1.0

SSTotal = 48.0 = ΣΣ(y - y &G)² = SS Treatment + SS Error

= (11-11.0) 2 + (14-11.0) 2 + ... + (9-11.0) 2

SSError = 36.0 = SS 1+SS2 = Σ(y-y&1)² + Σ(y-y&2)²= 16.0 + 20.0

SSTreatment = 12.0 = SS Treatment - SS Error = 48.0 - 36.0

= n 1(y&1-y&G)² + n 2(y&2-y&G)² = 6×1.0² + 6×-1.0²

Box 2.2. ANOVA for Two-group Comparison.

two groups is simply the pooled variance used in the t-test, sp2. With more than two groups, the

SSs and dfs would be pooled for all of the groups to produce a pooled variance: that is, sp2 = (SS1

+ SS2 + SS3 ... ) / (df1 + df2 + df3 ...). Another name for variance is Mean Square (MS). In

ANOVA this denominator is often called MSError (or MSWithin), and the corresponding SS is termed

SSError.

SSTreatment by subtraction. There are several ways to determine the numerator variance,

which will represent the treatment effect. Perhaps the simplest is to consider the total variability

in all of the scores (i.e., SSTotal) as consisting of the random or error variation just described (i.e.,

SSError) plus the treatment variation; that is, SSTotal = SSError + SSTreatment. Thus, SSTreatment can be

calculated by subtraction; SSTreatment = SSTotal - SSError. Dividing SSTreatment by its appropriate df

would give a variance that reflects differences among the means. Box 2.2 presents the data for

our example problem and the calculations relevant to ANOVA.

First, consider the total amount of variability in all 12 scores. The total variability is

measured by calculating the squared deviations of the scores from the mean of all the scores,

which is called the Grand Mean (y&G = 11.0) to distinguish it from the means for the separate


groups. This measure of total variability is called SSTotal (or simply SSy), just as in regression.

SSTotal equals 48.0 units in our example. Note that y&G and SSTotal measure the average and

variability that would result if we completely ignored the Group variable (or factor) and treated

the 12 scores as coming from a single sample.

SSError is calculated by summing the SSs within each group across all of the groups, the

identical operation used to calculate the pooled variance. Each group's SS represents the sum of

the squared deviations of each observation about its group mean; these deviations from the group

means are shown in Box 2.2. The SS for group 1 (16.0 units) and the SS for group 2 (20.0 units)

are summed to obtain an SSError equal to 36.0. This SSError or SSWithin reflects the variability within

each of the groups being compared. Summing SS within groups to calculate the SS for the

pooled estimate of the variance for the independent groups t-test is extended in ANOVA to

multiple groups. That is, the SSs could be summed across many groups, not just two; for

example, if we had 4 groups, then SSError would be the four SSs within each group summed over

all four groups.

SSError is less than SSTotal, which indicates that variability within groups does not account

for all of the variation in the scores. If the remaining SS is not due to variability within groups

(as measured by SSError), then it must reflect variability between groups; that is, differences

among the treatment means. In our example, SSTotal - SSError = 12.0 units, suggesting that 12.0

units of variability are accounted for by our treatment variable; that is, by differences between the

groups. This is SSTreatment (or SSBetween). If the two means were identical, then there would only be

variability within the groups, and SSError would equal SSTotal and SSTreatment would equal 0.

Direct calculation of SSTreatment. Although SSTreatment can be obtained by subtracting SSError

from SSTotal, as we have just done, it can also be calculated independently. Specifically, SSTreatment

can be calculated from the deviations of the group means (y&1 and y&2 ) from the Grand Mean

(y&G). This variability of treatment means about the grand mean represents the difference

between SSTotal and SSError. Calculating SSError, SSTreatment, and SSTotal independently demonstrates

directly that independent groups ANOVA divides or partitions the total variability in the

observations (SSTotal) into two distinct components, one of which represents variability within

groups (SSError) and the other variability between groups (SSTreatment). It also confirms that the


computations were done correctly.

The direct calculation of SSTreatment involves first computing the deviation of each

treatment mean from the grand mean; that is, y&1 - y&G = 12.0 - 11.0 = +1.0, y&2 - y&G = 10.0 - 11.0

= !1.0 (and so on, if there were more than two groups). Each deviation is referred to as the

treatment effect for that group. If all group means were identical (i.e., there were absolutely no

differences among group means), then the group means would all equal the grand mean and all

treatment effects would be zero. The only source of variation in the scores would be variability

within groups or error. The larger the treatment effects are (i.e., the larger the deviations of

group means from the grand mean), then the greater the variability among the group means (i.e.,

the larger their differences from one another and from the grand mean).

To obtain SSTreatment, square the deviations of the treatment means from the grand mean

(i.e., square the effects), multiply by the number of subjects in each group (n1 and n2), and sum

the resulting values. As shown in Box 2.2, these calculations lead to SSTreatment = 6 × +1.02 + 6 ×

!1.02 = 12.0, which agrees with the value obtained by subtraction. If there were more variability

among the group means (i.e., a bigger difference between the two means), then SSTreatment would

be larger than 12.0, and if there were less variability in the means then it would be smaller.

In summary, the ANOVA calculations divide the total variability for all 12 scores (SSTotal

= 48.0 units) into error variability within groups (SSError = 36.0 units) and treatment variability

between groups (SSTreatment = 12.0 units).

Calculating MSs, F, and significance. The next step in ANOVA is to determine whether

there is more variability between groups than we would expect by chance given the variability

within groups and the sample sizes. As we have seen before, the F statistic determines whether

two measures of variability (i.e., variances or Mean Squares) differ significantly from one

another. To perform the F test, SSTreatment and SSError are divided by their respective dfs to produce

Mean Squares (MSTreatment and MSError). These various quantities are presented in Box 2.3.

The df for SSError is equal to the sum of the dfs for the individual groups, (n1 - 1) + (n2 - 1)

= 5 + 5 = 12 - 2 = 10 = N - 2, where N represents the total number of observations. The rationale

for this df is that SSError is the deviation of N scores from two group means; hence N - 2. If there

were more than two groups (say k), the df would be N - k.


The df for SSTreatment equals the number of groups (two in our example) minus one, 2 - 1 =

1, because SSTreatment is the deviation of two group means about a single grand mean. For the

Single-Factor Between-S design in general, dfTreatment = k - 1, the deviation of k sample means

from a single grand mean. The sum of dfError and dfTreatment is 10 + 1 = 11 = N - 1 = dfTotal. These

dfs and the corresponding MSs (SS/df) are summarized in a standard ANOVA summary table in

Box 2.3. Note that MSError = 3.6 is identical to sp² from the t-test.

The final step in ANOVA is to determine whether MSTreatment is significantly greater than

MSError. If the null hypothesis of no difference among the group means is true, then MSTreatment and

MSError will be approximately equal to one another. The greater the variability between groups

relative to the variability within groups, the greater the likelihood that the differences are not due

to chance. The F statistic is the ratio of MSTreatment over MSError. If H0 is true, then F should be

approximately equal to 1. However, large values of F can occur by chance even when H0 is true.

But the probability of these large values is known; for example, given the present design, p(F $

4.96) = .05 if the H0 is true.

The observed value of F is 3.33. To determine whether 3.33 indicates significant

variability between groups, the observed F is compared to Fα, the value of F expected 100×α% or

less of the time if the H0 were true. As noted previously, tables of F present F values as a

function of dfs for the numerator and denominator of the F ratio. In our two-group, Between-S

ANOVA with 6 participants per group, dfNumerator = 2 - 1 = 1 and dfDenominator = 6 + 6 - 2 = 10.

Appendix A.3 gives the critical values of F at the .05 level for various degrees of freedom. For df

Source SS df MS F

Treatment 12.0 k-1 = 1 12.0 3.33 [= t 2]

Error 36.0 n-k = 10 3.6 [= sp²]

Total 48.0 n-1 = 11

H0: µ1 = µ2 Ha: (one of) equalities is false i.e., µ1 =/ µ2

F.05; 1, 10 = 4.96 Do not Reject H 0 [ %4.96 = 2.23 = t .05, 10 ]

η² = 12.0/48.0 = .25 η=.5

Box 2.3. Between-S ANOVA Summary Table.


= 1 and 10, Fcritical = 4.96. Because FObserved is not greater than or equal to FCritical, the H0 of no

difference between the group population means cannot be rejected. The probability of an F of

3.33 or larger by chance alone is too high to reject the null hypothesis. If we were to reject the

null for F less than 4.96, then the probability that we would be rejecting a true H0 and making a

Type I error would be greater than .05, a risk that we are generally willing to take only under

exceptional circumstances (e.g., a pilot study, only minor consequences of Type I error).

Equivalence of F and t for Two Groups

The equivalence of the t-test and ANOVA approaches is easy to demonstrate. Note first

that both tests led to the same conclusion (i.e., do not reject H0), and that the numerators and

denominators of t and F are sensitive to the same factors. The numerators both depend on

variation in the group means (either differences between two group means for t or deviations of

two or more group means from the grand mean for F). The variances (s2Pooled and MSError) are

identical in both cases and depend on variability within groups. The sample sizes also contribute

similarly to the magnitude of the two statistics, albeit by making the denominator smaller in the

case of t and the numerator larger in the case of F. Ultimately the equivalence of the two tests

rests on the fact that F = t² (equivalently, t = %F) and F.05;1,10 = t2.05,10 (equivalently, t.05,10 =

%F.05;1,10). Thus, F and t always lead to the same conclusions about the null hypothesis. That is,

if F is greater than F.05, then t must also be greater than t.05. These various equivalencies appear

in square brackets ([ ]) in the preceding boxes.

The equality between Fα and t2α depends on the use of a two-tailed t-test. As typically

used, the F test is a two-tailed test of the difference between means because SSTreatment is

insensitive to the direction of the differences among the groups. SSTreatment in the present case

would have been exactly the same if the control group scored 10.0 and the treatment group

scored 12.0, the reverse of the present outcome. To do a one-tailed F test, if required, use a

critical value of F corresponding to the .10 level of probability. Half of this .10 (.05) would be

for the predicted direction of difference. To illustrate, F.10; 1, 10 = 3.29 and we would reject H0 that

the means are equal and accept Ha: µ1 > µ2. Note that %3.29 = 1.81, which is the critical value of

t for a one-tailed test with α = .05.


ETA2 (η2) and Strength of Relationship

The standard F test can produce significant effects even for small differences between

groups. This can occur because larger samples produce larger values for t and F, other things

being equal. Because of this possible limitation of significance testing, researchers often report

measures of the strength of the relationship between categorical and numerical variables

(analogous to r² and R²). One such measure is called η² (eta squared). This statistic indicates the

proportion of total variability that can be attributed to treatment variability, that is to between

groups variability in the independent groups design. η² = SSTreatment/SSTotal = 12.0/48.0 = .25 in

this study. Stated verbally, the difference between groups accounted for 25% of the total

variability in the 12 scores.

SPSS ANALYSES FOR THE TWO-GROUP BETWEEN-S ANOVA

Because SPSS is a general-purpose statistical package that runs on various different

computer systems (platforms), we will demonstrate several approaches to the SPSS analyses in

this and subsequent chapters. For a general introduction to SPSS, see Appendix C. The

convention that we will follow here is to present text that users enter in bold font, with output in

non-bold. Also we will present SPSS command terms in uppercase letters and other text in

lowercase, although SPSS does not require that commands be entered as uppercase characters.

Finally, supplementary notes and calculations will be italicized when presented in the output.

For any Single-Factor Between-S design, users typically enter two numbers for each case,

one number to indicate which group the observation belongs to (i.e., the factor or independent

variable) and the other number to indicate the value observed for the criterion or dependent

variable. Such data can be entered in SPSS as shown in lines 1 to 5 of Box 2.4 (i.e., from DATA

LIST to END DATA. These commands would be typed into the syntax window (Windows

version) or job file (Unix version). Alternatively, the values can be typed directly into the SPSS

Data Editor in Windows (shown in Figure 2.1). The independent and dependent variables are

listed separately, here with 5 cases per row. The order of the agitation and group variables makes

no difference, as long as the order of the variable names in the DATA LIST statement

corresponds to the order of data. However entered, the data would ultimately be represented in

SPSS as shown in Figure 2.1.


DATA LIST FREE /group agit.BEGIN DATA1 11 1 14 1 12 1 9 1 13 1 132 7 2 12 2 11 2 9 2 12 2 9END DATA.

TTEST GROUPS = group /VARIABLE = agit.

Variable Number Standard St andard of Cases Mean Deviation Error--------------------------------------------------- ---------AGIT GROUP 1 6 12.0000 1.789 .730 GROUP 2 6 10.0000 2.000 .816

| Pooled Variance Estimate |Separate Variance Estimate F 2-tail| t Degrees of 2-tail| t Degrees of 2-tailValue Prob.| Value Freedom Prob. | Value Freed om Prob.-----------+------------------------+-------------- ------------1.25 .813 | 1.83 10 .098 | 1.83 9. 88 .098

Box 2.4. SPSS Data Entry and t-test Analysis.

Figure 2.1. SPSS Data Editorwith Values.

Figure 2.1 shows the SPSS Data Editor after the data has been entered manually or

entered using the DATA LIST commands in Box 2.1. We can

now enter commands in syntax to perform the desired tests, or

select analyses from the pull-down menus. Let us first perform

the t-test described earlier.

SPSS TTEST Analysis

Using syntax, the general format of the independent

groups TTEST command is: TTEST GROUPS =

independent[(low,high)] /VARIABLES = dependent, where

independent refers to one or more categorical predictor variables,

dependent refers to one or more dependent variables, and the

optional low, high value(s) are the levels of the variable to be

compared (if not equal to the default values of 1 and 2).

Line 6 of Box 2.4 shows the SPSS syntax to perform an independent groups t-test for our

data. The /GROUPS= independent indicates the grouping variable (called group here, although it

could have any name selected by the researchers). The /VARIABLE = dependent command


specifies the dependent variable, agit in our case. The subsequent text in Box 2.4 is output

produced by SPSS. Compare this output to the calculations performed earlier. Note that the

analysis produces the independent groups t calculated above, tObserved = 1.83, for the pooled

variance estimate.

The two-tailed significance of t is .098. This means that if the H0 is true, then p(t $ +1.83

or t # !1.83) = .098. Only 9.8 times out of 100 would we expect t to be greater than or equal to

1.83 or less than or equal to -1.83, if the H0 is true. This is a relatively low probability, but it does

not meet the standard value for alpha of .05. For a two-tailed (or non-directional) test, we would

not reject the H0 of no difference between the population means for the two groups. This is the

same conclusion that we arrived at earlier using the critical value approach. Note the equivalence

of the tCritical and p value approaches. If |tObserved| $ |tCritical| (i.e., absolute value of tObserved greater

than or equal to absolute value of tCritical), then pObserved # α.

If we had predicted for theoretical reasons that the treatment group (Group 2) would have

lower agitation scores than the control group, then a one-tailed (or directional) test would be

appropriate. The observed probability would be divided by 2 (.098/2 = .049). We would now

reject H0, which is again the same conclusion that we arrived at earlier using the tCritical approach

and a one-tailed critical value.

TTEST also provides the means and standard deviations for the two groups, which is in

fact all that is required to perform the calculations for the t-test. Detailed calculations were

shown earlier, but briefly, the pooled variance used in the denominator of t, s2p = 36.0/10 = 3.6,

involves pooled SSs from the two groups {i.e., SSPool = (6-1)1.789² + (6-1)2.000² = 16.0 + 20.0 =

36.0} and pooled dfs from the two groups {i.e., dfPool = (n1 - 1) + (n2 - 1) = n1 + n2 - 2 = 10}. The

numerator is simply the difference between the means minus 0, the hypothesized value for the

difference.

Box 2.4 also reports an F value to the left of the row containing the t. This F is a test of

the equivalence of the variances of the two groups, which is relevant to whether or not it is

justified to pool the two variances. This test was performed manually at the end of Chapter 1, and

we concluded using the FCritical approach not to reject the H0 of equal variances. Here we learn

that the probability of FObserved is not even closely to significant. The way to interpret the p of .813


ONEWAY dependent BY factor

[/STATISTICS = DESCRIPTIVES]

[/RANGES = LSD[(α)] SNK[(α)] TUKEY[(α)] [/RANGES ...]]

[/POSTHOC = ...]

[/POLYNOMIAL = n] [/CONTRAST = coefficients [/CONTRAST ...]]

Box 2.5. Format for SPSS ONEWAY command.

ONEWAY agit BY group /STATISTICS = DESCRIPTIVE.

N Mean Std. Std. 95% Confide nce Interval for Minimum Maximum Deviation Error Mean Lower Bound Upper Bound 1.0000 6 12.000000 1.7888544 .7302967 10.122712 13.877288 9.0000 14.000 2.0000 6 10.000000 2.0000000 .8164966 7.901129 12.098871 7.0000 12.000 Total 12 11.000000 2.0889319 .6030227 9.672756 12.327244 7.0000 14.000

Sum of Squares df Mean Square F Sig. Between Groups 12.000 1 12.000 3.333 .098 Within Groups 36.000 10 3.600 Total 48.000 11

Box 2.6. Independent Groups ANOVA with SPSS ONEWAY Procedure.

is: p(s12/s2

2 $ 1.25 or s22/s1

2 $ 1.25) = .813 if H0 of equal variances is true. The results in Box 2.4

are from an early version of SPSS still used on some Unix systems, and later versions report a

different test of the equality of variances.

SPSS ONEWAY

ANOVA

with a single

predictor is

sometimes called

one-way ANOVA,

hence the name of

one of several SPSS

procedures that perform ANOVA for Single-Factor Between-S designs. The syntax for

ONEWAY is shown in Box 2.5, including some common options that are discussed in later

chapters. Dependent is the list of names for the dependent variables, factor is the name of the

independent variable or factor. The /STATISTICS = DESCRIPTIVE subcommand produces

descriptive statistics (i.e., means, standard deviations, and confidence intervals for the variables).

POLYNOMIAL, CONTRAST, and RANGES subcommands are explained later.

Box 2.6 demonstrates the use of SPSS ONEWAY to perform a single-factor Between-S

ANOVA for the agitation study. Agit is our dependent variable, and group the independent

factor. ONEWAY provides standard ANOVA results in a typical summary table: SSTreatment

(called Between Groups), SSError (called Within Groups), dfs, MSs, F, and p. The values agree


Figure 2.2. Menu Approach to Oneway ANOVA.

with earlier calculations and with the equivalent t-test results just reported. Note in particular that

the p values are equivalent, except for rounding, that %F = %3.33 = 1.83 = t, and that MSError =

s2Pool from the t-test analysis.

Although η² is not provided by ONEWAY, this could be calculated by dividing SSTreatment

(Between) by SSTotal. That is, η² = 12.0/48.0 = .25. This statistic informs both researchers and

readers that 25% of the total variability in the agitation scores (agit) is associated with

differences between the two groups. The remaining 75% is variability within the two groups.

Accounting for 25% of the variability is a relatively robust effect for psychology and indeed for

many other disciplines.

The F probability in the last column of the summary table is the probability of an F as

large as that observed given the null hypothesis is true, that is, p(F $ 3.333 | H0: µ1 = µ2) = .0979.

Recall that this is a two-tailed (non-directional) probability. Because .0979 is not less than or

equal to .05, we do not reject the null hypothesis. If a one-tailed (directional) hypothesis was

appropriate, then the p would be divided by two and it would be less than the critical value of

.05; hence we would reject the null hypothesis and accept the alternative. That is, .0979/2 = .049,

which is less than .05 and falls in the rejection region.

To perform this analysis in

SPSS for Windows, one could either

enter the commands in Box 2.6 into a

syntax window or select the ANOVA

analysis from the pull-down menus.

Figure 2.2 shows the initial steps:

Analyze | Compare Means | One-Way

ANOVA.


Figure 2.3. Specification of One-WayANOVA by Menu Commands.

Selecting One-Way ANOVA brings up

the dialog box shown in Figure 2.3, at which

point users would select in turn the independent

and dependent variables and direct them into

the appropriate field. In Figure 2.3, the “group”

variable has already been selected and moved

into the Factor field. Agit is highlighted and

clicking on the black arrow will move the name

into the Dependent List. Descriptive statistics could be selected via the Options button. Once all

the commands are complete, clicking on Ok will produce output very similar or identical to that

in Box 2.6.

Other SPSS ANOVA Commands

SPSS has a number of other commands that perform ANOVA, and we will eventually

need to use these commands to analyze more complex designs. Note that ONEWAY, as its name

implies, can only analyze single factor designs, and only Between-S designs at that. Although

the specifics of the commands and their output may vary somewhat, all SPSS procedures include

in some manner the various components seen in ONEWAY; that is, some way to specify the

independent and dependent variables, some way to request descriptive statistics, and output that

includes the various statistics reported by ONEWAY (e.g., SSs, dfs, MSs, F, p) and even some

that were not (e.g., eta2).

MANOVA. Box 2.7 shows syntax commands and output for the MANOVA command, one

MANOVA agit BY group(1 2) /PRINT = CELLINFO.

FACTOR CODE Mean Std. Dev. N GROUP 1 12.000 1.789 6 GROUP 2 10.000 2.000 6 For entire sample 11.000 2.089 12

Source of Variation SS DF MS F Sig of F WITHIN CELLS 36.00 10 3.60 GROUP 12.00 1 12.00 3.33 .098 (Model) 12.00 1 12.00 3.33 .098 (Total) 48.00 11 4.36 R-Squared = .250 Adjusted R-Squared = .1 75

Box 2.7. SPSS MANOVA Analysis.


Figure 2.4. Running GLM from the Windows Menu.

of the most powerful ANOVA commands in SPSS. The first line is the commands typed into the

SPSS job file, and the remainder is the output. The /PRINT=CELLINFO subcommand is

analogous to ONEWAY’s /STATISTICS=DESCRIPTIVES, and produces the cell means and

standard deviations shown before the ANOVA summary table.

The ANOVA summary table itself has similar columns to those seen in ONEWAY, and

includes information for Error (WITHIN CELLS), Treatment (GROUP), and Total. The values

on these lines are identical to those reported earlier. MANOVA includes an extra line, called

Model, that represents the SS for all of the factors and predictors in the design. This line is

redundant in the present case with Group because we only have a single factor. In more complex

designs, Model will aggregate (add) the effects of multiple factors.

MANOVA also reports a measure of the strength of the relationship, here labelled R2 but

equal to the η2 calculated earlier. Adjusted R2 is a more conservative measure of the strength of

the relationship and adjusts for the fact that with small ns or many predictors, it is possible to

account for substantial variability just by chance. The MANOVA command is not accessible by

Windows menus, but can be typed into a syntax window. Its power makes MANOVA well

worth learning.

GLM. A

command that

is available in

Windows and

other more

recent versions

of SPSS is

GLM, which

stands for

General Linear

Model. Like

MANOVA, GLM is a very powerful analytical tool that, nonetheless, can readily accommodate

simple designs. We will illustrate the use of GLM via menus. Figure 2.4 shows the initial steps.


Figure 2.5. GLM Dialogue Box.

We select Analyze | General Linear Model | Univariate. Note just below the main Analyze box

the syntax commands to run GLM and print out descriptive statistics and eta2.

Selecting Univariate initiates the

GLM dialogue box shown in Figure 2.5.

The independent variable (group) has been

selected and moved into the Fixed Factor(s)

box, and the dependent variable (agit) has

been selected and moved into the

Dependent Variable box. Descriptive and

other statistics could be selected from the

Options button. A number of the choices in

GLM will be examined later when we talk

about more complex designs. Clicking OK

will initiate the analysis.

The

results of the

GLM procedure

are shown in

Box 2.8.

Following the

descriptive

statistics, we see

some now

familiar results,

namely the SS

and df for

Group, Error,

and Corrected

Total, as well as the F, significance, and eta2 for Group. The values and conclusions are the same

GROUP Mean Std. Deviation N

1.00 12.0000 1.78885 6 2.00 10.0000 2.00000 6

Total 11.0000 2.08893 12

Source Type III Sum of df Mean Square F Sig. Squares

Corrected Model 12.000(a) 1 12.000 3.3 33 .098

Intercept 1452.000 1 1452.000 403 .333 .000

GROUP 12.000 1 12.000 3.3 33 .098

Error 36.000 10 3.600

Total 1500.000 12

Corrected Total 48.000 11

a R Squared = .250 (Adjusted R Squared = .175)

Box 2.8. GLM Results.


as the preceding analyses. Although it predicts 25% of the variability in agitation scores, as

shown by the R squared reported at the bottom, Group does not account for a significant amount

of the variability in the scores, p = .098, at least not by a non-directional (or two-tailed) test.

There are several additional results reported by GLM, including one, Corrected Model,

that was seen earlier in the MANOVA results. This represents the relationship of the dependent

variable to all factors and predictors in the model. Since Group is the only factor in our model,

the overall Model is redundant and the values are the same as those on the Group line.

Two other lines are completely new, and quite surprising at first because of the magnitude

of the numbers reported (e.g., SS Total = 1500.00). These additional lines are reported because

GLM attempts to account for all of the variability in the scores, including not just deviations

from the grand mean but also deviations from an absolute value of 0. Not all of this variability is

of interest to researchers or even meaningful in all situations. The Intercept in this context refers

to the deviation of the grand mean (11.0) from 0 and can be calculated as: N × (MG - 0)2 = 12 ×

(11.0 - 0)2 = 1452.0, the value shown as the intercept. This SS has df = 1 because we are

examining the deviation of a single mean about zero (we lose no degrees of freedom for 0

because it is not estimated from the data). Recall that our SSTotal had df = 12 - 1 = 11. The df for

the deviation of the grand mean from zero is the “missing” df. Note also the similarity between

this computation and the use of F to test hypotheses about a single mean, as shown in Chapter 1.

The SSTotal reported by GLM includes the variability due to the deviation of the grand

mean from 0 (just calculated as 1452.00) plus the variability in the scores about the grand mean

(our SSTotal = 48.0). The sum of these two quantities is 1500.00 and it has df = 12. In most cases,

although not all as shown in later chapters, the variability due to the deviation of the grand mean

from zero is ignored. Correcting for the grand mean, gives the SSCorrected Total = 48.0 reported by

GLM in Box 2.5. This is what is normally referred to as SSTotal.

CONCLUSIONS

This chapter has laid the foundations for the Between-S ANOVA. We have seen that for

two groups, the Single-Factor Between-S ANOVA is equivalent to the independent groups t-test.

Although the two tests will always produce exactly the same conclusion for two groups, the F

test is more general than the t test because F can test the hypothesis that multiple means (i.e, two


or more) are equivalent, whereas t can only be used for two groups. In the next chapter, ANOVA

approaches are generalized to Between-S designs that include more than two groups.

Single-Factor Between-S Design SPSS - 3.1

CHAPTER 03 -

SINGLE-FACTOR BETWEEN-S ANOVA (K $ 3)

Notation and Formula for Between-S ANOVA . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Notation for Single-Factor Design . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 2

Between-S ANOVA for Revised Agitation Study (k = 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

ANOVA Calculations for k =3 . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 5

SPSS Analyses for k = 3 Agitation Study . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 7

An Interview Study (k = 4) . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Calculation of ANOVA for the k = 4 Example . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 9

SPSS Analyses for the Interview Study . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 11

A General Discussion of ANOVA . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 15

Rationale for F-test of Differences Among Means . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 15

Supplementary Discussion of ANOVA . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16

Appendix 3.1: An Alternative Notation . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20


Chapter 3 extends ideas from the last chapter to the Single-Factor Between-S design

involving more than two groups. We begin with a notation for describing succinctly the

calculations that need to be performed for this design, and then demonstrate the use of SPSS to

analyze such designs, including again several ANOVA options, as well as Regression

approaches.

NOTATION AND FORMULA FOR BETWEEN-S ANOVA

Although the calculations for the single-factor Between-S ANOVA are fairly clear for the

two-group design, as illustrated in the preceding chapter, this is less the case for more than two

groups. It is therefore helpful to have a general notation and corresponding formula to represent

the statistical calculations.

Notation for Single-Factor Design

Box 3.1 presents a general schema for the Single-Factor Between-S design. The letter y

stands for the dependent variable, and several subscripts are used to identify particular

observations and statistics. There are k levels to the factor, with j serving as a general subscript

indicating the level; that is, j = 1, 2, ..., k. Thus, SS1, n1, and y&1 indicate statistics for the first

group (Sum of Squares, sample size, and mean, respectively), and SS2, n2, and y&2 indicate

statistics for the second group. In general SSj, nj, and y&j represent statistics for each group or

level of the factor, where the subscript j = 1, 2, ..., k.

A subject number (columns S# in Box 3.4) indicates the subject within each group, with i

serving as a general subscript indicating the subject; that is, i = 1, 2, ..., nj. The subscripts i and j

Levels of Factor 1 2 ... j ... kS# y S# y ... S# y S# y1 y11 1 y12 1 y1j 1 y1k

2 y21 2 y22 2 y2j 2 y2k

. . . .i yi1 i yi2 i yij i yik

. . . .n1 y&1 n2 y&2 nj y&j nk y&k N y&G

Box 3.1. ANOVA Notation for Single Factor Design.


can be used to identify each observation by which subject and which group the observation

belongs to (see columns headed by y in Box 3.1). For example, y11 would be the first subject in

the first group, y32 would be the third subject in the second group, and so on. The general notation

for an observation would be yij, the ith observation in the jth group. Note that for the Between-S

design, observations for individual subjects in the various groups are independent or unrelated to

one another (i.e., subject 1 in group 1 has no pre-existing relationship with subject 1 in the other

groups).

In addition to group statistics, the ANOVA requires calculation of some statistics for all

observations across the groups. To indicate these operations, we use N to represent the total

number of observations (i.e., N = n1 + n2 + ... + nk), and y&G to represent the grand mean of all the

observations. Although not always needed, the subscript G (for Grand) can also be used for other

overall statistics, such as sG and sG2 for the overall standard deviation and variance, respectively.

It is now possible to write general formula that describe calculations for the Single-Factor

Between-S design. First, consider SSTotal. We need to calculate the deviation of each score from

the grand mean, square the deviations, and sum the squared deviations across all N subjects. This

would be written as: SSTotal = . Verbally this formula states to sum the squaredjk

j'1j

nj

i'1(yij& yG)2

deviations from the grand mean within each of the groups, and then sum the squared deviations

across all groups. In practice, this is equivalent to calculating the mean for all of the scores

(ignoring group) and taking the squared deviations from that mean. Hence, SSTotal is the total

variability in all of the scores, with variability within groups and variability between groups

aggregated together.

The error variability is calculated as the sum of the variability of the scores about the

individual group means, which equals the sum of the SSs within each of the groups. Using our

notation, this becomes: . This formula says to sum the squaredSSError'jk

j'1j

nj

i'1(yij& yj)

2'jk

j'1SSj

deviations from the group mean within each group and then across all of the groups. This is

equivalent to calculating a SS for each group and summing the SSs across all k groups. Because

the deviations are taken from the group means rather than the grand mean, SSError represents

variability within groups but does not include variability between groups. Generally (i.e., unless

all group means are identical), SSError will be less than SSTotal. The difference is SSTreatment.


SSTotal ' SSError

jk

j'1j

nj

i'1(yij & yG)2 ' j

k

j'1j

nj

i'1(yij & yj)

2

% SSTreatment

% jk

j'1nj( yj& yG)2

Equation 3.1. ANOVA Sums of Squares.

In addition to being calculated as a difference between SSTotal and SSError, SSTreatment can be

calculated directly using the formula: . The firstSSTreatment'jk

j'1j

nj

i'1( yj& yG)2'j

k

j'1nj( yj& yG)2

version of this formula makes clear that we are summing the squared deviations of the group

mean from the grand mean, first within each group and then across all of the groups. This is the

equivalent summation as for SSTotal and SSError. A simplification is possible for Treatment,

however, because the group mean is exactly the same for all nj observations in each group. That

is, we are adding up the identical deviation nj times, which is equivalent to multiplying the

deviation by nj (e.g., 12 + 12 + 12 + 12 + 12 + 12 = 6 × 12).

The three SSs for the Single-Factor Between-S ANOVA are summarized in Equation 3.1,

which also demonstrates the partitioning

of SSTotal into its two components. Laying

out the formula in this way nicely

demonstrates the partitioning of SSTotal into

SSError and SSTreatment. Look closely at the

bottom formula in Equation 3.1 and ignore the summation and squaring operators. The resulting

equality, (yij-y&G) = (yij-y&j) + (y&j -y&G), makes clear that the left-hand side of the equation is equal

to the sum of the two components on the right hand side (-y&j and +y&j cancel each other).

Verbally, the equation demonstrates that the deviation of the scores from the Grand Mean (which

produces SSTotal) consists of two components: the deviation of individual scores from the group

means (which produces SSError), and the deviation of group means from the Grand Mean (which

produces SSTreatment).

Equation 3.1 also demonstrates clearly the rationale for the dfs for each of the three

components. SSTotal represents the deviation of N observations from a single grand mean;

therefore, dfTotal = N - 1. SSError represents the deviation of N observations from k group means;

therefore, dfError = N - k. Finally, SSTreatment represents the deviation of k treatment means from a

single grand mean; therefore, dfTreatment = k - 1.

The final steps involve the calculation of the MSs and F ratio. These operations are

summarized in Equation 3.2.


MSTreatment'SSTreatment

k&1MSError'

SSError

N&kF'

MSTreatment

MSError

df'k&1, N&k

Equation 3.2. ANOVA Mean Squares

The notation and formula presented here are based on a standard notation that is often used

with Single-Factor, Between-S designs. There are limitations to this notation, however, for the

Within-S and Factorial designs discussed later. We will need to modify the notation at that time.

To facilitate the transition to more complex designs, an alternative notation is presented in a

supplementary section at the end of this chapter. In that notation, a letter (A) is used to represent

both the name of the factor and its number of levels (instead of using k), the lower-case a is used

to represent the generic level (rather than j), and s is used to indicate subjects (rather than i). We

will continue to use n and N to indicate the number of subjects.

BETWEEN-S ANOVA FOR REVISED AGITATION STUDY (K = 3)

The generalization of the independent groups ANOVA to a single predictor with more

than two groups is straightforward given the formula just presented. Note that these formula

accommodate different values for k, the number of groups.

ANOVA Calculations for k =3

In the standard ANOVA approach for k independent groups, SSTotal is calculated as the

squared deviations of all observations from the Grand Mean (i.e., the mean based on all N

observations). SSError is the sum over the k groups of the squared deviations of observations from

their respective Group Means (i.e., the sum of the SSs within each of the k groups). SSTreatment is

the sum of the squared deviations of the k Group Means from the Grand Mean multiplied by the

number of observations in each group. This was all summarized in Equation 3.1.


ANOVA Summary Table

Source SS df MS FBetween 3.5 k-1 = 2 1.750 .354Within 44.5 n-k = 9 4.944

Total 48.0 n-1 = 11

Ho: µ 1 = µ 2 = µ 3 Ha: one or more equality false

F.05;2,9 = 4.26 Do Not Reject H 0

η² = 3.5/48 = .073 η = .27 Strength ( R², R)

Box 3.3. ANOVA for 3-group (k=3) Design.

Box 3.2 shows the 12 agitation scores used in the previous example, but now divided into

3 groups, rather than 2 (i.e., k = 3). The ANOVA computations and results are also presented in

Box 3.2. SSTotal = 48.0, as before. Note that this must be the case because the 12 scores are

exactly the same; no variation has been removed or added by dividing the 12 scores into 3 groups

of 4. What has changed is the partitioning of SSTotal.

SSError = 44.5 is the sum of the SSs within each of the three groups. Note that the sum of

the variability within groups is almost equal to the SSTotal, suggesting that there is not much

variability among the three treatment means. The SSTreatment is only 3.5 = SSTotal - SSError, which

can also be calculated as

deviations of the group means

from the grand mean (see Box

3.2).

Box 3.3 shows the

ANOVA summary table for this

study. Given SSTreatment and

SSError, the next step is to

Group 1 Group 2 Group 311 13 1114 13 912 7 12 9 12 9

y&j 11.50 11.25 10.25 y&G = 11.00

y&j - y&G +.50 +.25 -.75 Treatment Effects

SSj 13.00 24.75 6.75

SSTotal = 48.0 = ΣΣ(y-y&G)²= (11-11) 2 + (14-11) 2 + ... + (9-11) 2

SSError = 44.5 = ΣΣ(y-y&j )² = ΣSSj = Σ(n j -1)s² j

= (11-11.5)² + (14-11.5)² ... (9-10.25)²= 13 + 24.75 + 6.75

SSTreatment = 3.5 = Σn j (y&j -y&G)²= 4(11.5-11)² + 4(11.25-11)² + 4(10.25-11)²= 4(.5² + .25² + -.75²)

Box 3.2. Results and SSs for k = 3 Example.


determine dfs for these sources of variability, compute MSs, and then an F ratio. The dfTreatment is k

- 1 = 3 - 1 = 2 because SSTreatment represents the deviations of 3 Group Means from a single Grand

Mean. The dfError is N - k = 12 - 3 = 9 because SSError represents the deviations of N = 12

observations from 3 Group Means (or equivalently, the sum of the within-groups dfs for the three

groups, each of which is 4 - 1 = 3). Note that dfTreatment + dfError = 11 = 12 - 1 = N - 1 = dfTotal. The

calculations for MSs (SS/df) and F (MSTreatment/MSError) follow the general principles summarized

above. If the null hypothesis that the group population means are equal (i.e., µ1 = µ 2 = µ 3) is

true, then the expected value of the F ratio is approximately 1. If the group population means

differ (i.e., H0 false), then variability among the treatment means will be larger than expected

relative to variability within groups and F should be correspondingly larger than 1. Our observed

value of F is .354, which is clearly not significant. In fact, even if H0 were true, we would expect

F to be 4.26 or larger 5% of the time with df = 2, 9. Our F does not even come close to this

critical value. The measure of the strength of the relationship, η², is also modest; only 7.3% of the

variability in the scores is due to differences between groups.

It is instructive to note why there is little evidence of a difference between the groups in

this study, whereas the two-group analysis approached significance and was even significant by a

one-tailed test. The 3-group effect does not even approach significance because the new division

of scores placed two relatively high scores (two 13s) from the former Group 1 into the new Group

2 along with the lowest score (7) from the former Group 2. The redistribution of these high and

low scores resulted in the three group means being very similar (i.e., SSTreatment decreased) and the

variability within groups being quite high (i.e., SSError increased; note that SS2 = 24.75 is

particularly high). Hence, less of the total variability ended up in the numerator and more in the

denominator, resulting in the much smaller value for FObserved. Even if SSTreatment had been as large

as for k = 2, note that dfTreatment = 3 - 1 = 2 for k = 3, which would result in a smaller MSTreatment and

F.

SPSS Analyses for k = 3 Agitation Study

All of the SPSS commands covered earlier, with the possible exception of the regression

approach, can be easily generalized to k > 2 studies. Indeed, several of the methods require

absolutely no changes whatsoever. We illustrate with the ONEWAY Anova.


DATA LIST FREE/ group agit.BEGIN DATA

1 11 1 14 1 12 1 92 13 2 13 2 7 2 123 11 3 9 3 12 3 9

END DATA.

ONEWAY agit BY group /STATISTICS = DESCRIPTIVE .

N Mean Std. Deviation Std. Error 95% Conf. Int. Mean Lower Upper1.0000 4 11.500000 2.0816660 1.0408330 8.187605 14.81239 52.0000 4 11.250000 2.8722813 1.4361407 6.679559 15.82044 13.0000 4 10.250000 1.5000000 .7500000 7.863165 12.63683 5

Total 12 11.000000 2.0889319 .6030227 9.672756 12.327244

Sum of Squares df Mean Square F Sig. Between Groups 3.500 2 1.750 .354 .711 Within Groups 44.500 9 4.944

Total 48.000 11

Box 3.4. SPSS ANOVA for k=3 groups.

ONEWAY for k = 3 Independent Groups. The generalization of ONEWAY to k > 2 levels

of a Between-S factor is straight-forward. Box 3.4 shows the data entry and ANOVA commands

to analyze the three-group agitation problem. The results of the ANOVA agree with those

calculated earlier and indicate no significant variability among the group means; an F of .3539 has

a very high probability of occurring if H0: µ1 = µ2 = µ3 were true. Specifically, p(F $ .3539 given

H0: µ1 = µ2 = µ3) = .7113. Group also accounts for a modest 7.3% of the variability in agitation

scores: η2 = 3.5/48.0 = .073

The /STATISTICS = DESCRIPTIVE subcommand produced the additional statistics shown

above the ANOVA summary table, including group means, standard deviations, standard errors of

the means, and confidence intervals for the means. Minimum and maximum scores were also

reported by the DESCRIPTIVE option, but these have been deleted from the printout. The 95%

confidence intervals in Box 3.4 show the range within which the population means would be

expected to fall 95% of the time. Note that the ranges are quite wide and overlap one another to a

considerable degree. Moreover, the overall confidence interval for the grand mean encompasses

the means for all three groups. This overlap among the group means is consistent with the


ANOVA conclusion that the group means do not differ significantly from one another.

These data could also be analyzed by MANOVA or GLM, with equivalent results. To

perform the equivalent analysis using regression, we would need k - 1 predictors in order to

ensure that the predicted scores would be the cell means and that the regression analysis would

parallel completely the ANOVA results.

AN INTERVIEW STUDY (K = 4)

Consider now the analysis of a more realistic social psychology study, although still

hypothetical. A total of 40 undergraduate students were randomly assigned to one of four groups

(10 per group). Each student rated the attractiveness (from 1 to 7) of five participants in a

videotaped interview. Although participants viewed and rated the same videos, they received

different instructions about the alleged purpose: Group 1 was given No Specific Instructions,

Group 2 was told the videos were of Job Interviews, Group 3 that they were Psychiatric

Interviews, and Group 4 that they were Parole Interviews for community release of prison

inmates. Subjects received a single score that was the sum of their ratings of the five interviews,

with scores ranging from 5 (all ones) to 35 (all sevens).

This study is a single-factor, Between-S design with k = 4. The four levels of the factor

are defined by the four experimental conditions, and the factor is Between-S because there is no

basis for matching specific scores in one group with specific scores in the other groups. Stated

differently, there is no reason to expect a correlation between scores obtained in the four different

groups because there are no pre-existing relationships between correspondingly numbered

subjects in the different conditions (i.e., subject 1 in each group is neither the same subject as

subject 1 in other groups, nor related to those subjects in any way).

Calculation of ANOVA for the k = 4 Example

Box 3.5 shows descriptive statistics for the four groups and for the entire set of 40

observations. The means and standard deviations are in fact all that is needed to perform a

Between-S ANOVA for this study. SSTotal can be calculated from the standard deviation of the 40

observations: since, sG2 = SSTotal / (N - 1), then SSTotal = (N - 1)s2 = 823.985. The standard

deviation for each group can be similarly manipulated to produce the SSjs that must be summed to

produce SSError = 498.199, as shown in Box 3.5. Finally, the deviation of each cell mean from the


Group ( n j = 10)

1 2 3 4y&j 27.30 26.00 22.70 20.00 y &G = 24.00

y&j - y &G +3.3 +2.0 -1.3 -4.0

s j 3.3015 3.6209 4.2177 3.6818 s G = 4.5965

SSj = (10-1)(3.3015 2 (10-1)4.2177 2

(10-1)3.6209 2 (10-1)3.6818 2

= 98.099 117.998 160.101 122.001

SS Total = 823.985 = (40-1)4.5965 2

SS Error = 498.199 = 98.099 + 117.998 + 160.101 + 122.001 SS Group = 325.800 = 10(+3.3 2 + 2.0 2 + -1.3 2 + -4.0 2)

. 823.985 - 498.199

ANOVA Summary TableSource SS df MS F F Critical (.05;3,30)Treatment 325.80 3 108.600 7.847 < 2.92Error 498.20 36 13.839Total 824.00 39

Box 3.5. Independent Groups ANOVA for Interview Study.

grand mean is squared, multiplied by (nj - 1), and summed to produce SSTreatment = 325.800, which

is very close to the value obtained from SSTotal - SSError.

These values are reproduced in the ANOVA summary table in Box 3.5, along with

dfTreatment = k - 1 = 4 - 1 = 3, dfError = N - k = 40 - 4 = 36, and dfTotal = N - 1 = 40 - 1 = 39. Dividing

SSs by their respective dfs produces MSTreatment and MSError, and F = MSTreatment / MSError = 108.60 /

13.839 = 7.847, with dfNumerator = 3, dfDenominator = 36. The table of critical F values (Appendix A.3)

does not include dfDenominator = 36, and the next smallest df = 30 is used instead. This gives FCritical =

2.92. We reject H0: µ1 = µ2 = µ3 = µ4, and accept HA that one or more of these equalities is false.

FObserved is also greater than FCritical for alpha = .01.

Although the researchers can reject H0 of no differences, the alternative hypothesis is quite

vague about which groups differ from one another. Indeed, it takes no stand on this important

question. The group means in Box 3.7 indicate that the No Instruction and Job Interview

Instructions (i.e., Groups 1 and 2) received higher attractiveness ratings than the Psychiatric

Interview Instructions (Group 3), and the Parole Interview Instructions (Group 4), with the latter

being only slightly lower than Group 3. Later chapters will discuss additional analyses that are


DATA LIST FREE / subj inst rate.BEGIN DATA 1 1 26 2 1 26 3 1 28 4 1 23 5 1 29 6 1 26 7 1 31 8 1 34 9 1 24 10 1 2611 2 21 12 2 30 13 2 26 14 2 25 15 2 2816 2 30 17 2 20 18 2 30 19 2 24 20 2 2621 3 28 22 3 17 23 3 22 24 3 26 25 3 2726 3 21 27 3 18 28 3 19 29 3 21 30 3 2831 4 20 32 4 24 33 4 23 34 4 16 35 4 1936 4 18 37 4 19 38 4 19 39 4 15 40 4 27END DATA.

Box 3.6. Interview Results and SPSS Commands to Enter Data.

ONEWAY rate BY inst /STATISTICS = DESCRIPTIVE.

N Mean Std. Std. 95% Confid ence Interval for Minimum Maximum Deviation Error Mean Lower Boun d Upper Bound 1.0000 10 27.300000 3.3015148 1.0440307 24.938239 29.661761 23.000 34.000 2.0000 10 26.000000 3.6209268 1.1450376 23.409745 28.590255 20.000 30.000 3.0000 10 22.700000 4.2176876 1.3337499 19.682848 25.717152 17.000 28.000 4.0000 10 20.000000 3.6817870 1.1642833 17.366208 22.633792 15.000 27.000

Total 40 24.000000 4.5965427 .7267772 22.529954 25.470046 15.000 34.000


Box 3.7. Independent Groups ANOVA for Interview Study.

often done following the omnibus ANOVA, as the overall test of significance is called.

SPSS Analyses for the Interview Study

The interview

study is a single-

factor, Between-S

design with k = 4, and

can be analyzed using

a number of SPSS

procedures. The raw

data are presented in Box 3.6, along with the syntax commands to enter this data into SPSS. Each

case has three variables: an optional subject code from 1 to 40 (subj), an instructional condition

code from 1 to 4 (inst), and a total rating score from 5 to 35 (rate). Alternatively, the data could be

entered directly into the SPSS Data Editor, or stored in a separate file to be read by SPSS (as

described in a later chapter).

ONEWAY Analysis of Interview Study. Box 3.7 shows the syntax commands to perform a

ONEWAY ANOVA for this data, along with the resulting output. The only changes from

previous commands are the new variable names. The ANOVA summary table shows clearly that

the means for the four groups differ significantly, p(F $ 7.8475 | H0 true) = .0004, which agrees

with our earlier calculations and conclusions. The researchers would reject the null hypothesis


Figure 3.1. Mean Attractiveness Ratings byInstructions.

that the rated attractiveness of the interviews was not affected by the instructions; that is, reject

H0: µNone = µJob = µPsyc = µParole. An examination of the means suggests that groups 1 and 2 (No

Instructions or Job Interview) were rated higher than groups 3 and 4 (Psychiatric or Parole

Interview). Note that the confidence intervals for groups 1 and 2 do not include the means for

groups 3 and 4 (and vice versa). Follow-up analyses will allow us to isolate more specifically the

significant differences among the various means, as well as any overall pattern across the four

means.

We earlier demonstrated the basic calculations for the components of the Between-S

ANOVA in Box 3.7. SSTotal can be obtained from the standard deviation of the entire set of 40

scores, and includes between-groups variability due to instructions plus random variation among

subjects within the groups. The variability within groups is computed by summing the four SSs

for the groups (i.e., SSError = ΣSSj). The group SSs can in turn be calculated from the group

standard deviations using the general formula, SS = (nj - 1)sj2. The variability that is not within-

group variability, must be due to variability between groups. SSTreatment represents the deviations

of the treatment means from the grand mean of 24.0; the greater these deviations, the larger the

effect of the treatment. SSTreatment (or SSBetween) can also be calculated by subtraction, although this

provides no independent assurance that the three quantities have been calculated correctly.

Although not computed by earlier

versions of SPSS ONEWAY, η2 can be

calculated from SSBetween and SSTotal; η2 =

325.8/824.0 = .395. Instructions account for

39.5% of the variability in the attractiveness

scores. This is a robust effect, as can be seen

in Figure 3.1. This figure can be produced in

various ways, including the /PLOT = MEANS

option in ONEWAY. The pattern of means

permits some inferences about the effect of

instructions, but does not reveal which means

or combinations of means differ significantly from one another. The omnibus ANOVA that we


just performed only demonstrates that some combination of the groups differs significantly from

some other combination of the groups. Supplementary analyses described in later chapters permit

more precise conclusions when k > 2.

GLM Analysis of Interview Study. Box 3.8 analyzes the interview study using GLM,

another SPSS ANOVA command, and one that we will use more later with factorial designs. The

basic format of GLM is quite similar to ONEWAY: GLM dependent BY factor, although the MIN,

MAX values for the factor are not required, and to request the descriptive statistics we use /PRINT

= DESCRIPTIVES. The output is also formatted somewhat differently, and presents some

different values. The descriptive statistics section is straightforward, and presents means and

standard deviations for the entire data set and for each of the four groups.

As noted previously, the ANOVA summary table is quite different and can be somewhat

confusing. First, let us find the values noted previously. Our SSTotal appears in the Corrected

Total row under the Type III SS column. The SS of 824.0 and df = 39 agree with earlier

calculations and printouts. The SSError appears on the Error row, and the SSTreatment appears on a

row labeled INST, the name of our independent factor for this study. The MSs for INST and

GLM rate BY inst /PRINT = DESCRIPTIVE .

INST Mean Std. Deviation N 1.0000 27.300000 3.3015148 10 2.0000 26.000000 3.6209268 10 3.0000 22.700000 4.2176876 10 4.0000 20.000000 3.6817870 10

Total 24.000000 4.5965427 40

Source Type III SS df Mean Square F Sig. Corrected Model 325.800 3 108.600 7.847 .000 Intercept 23040.000 1 23040.000 1664.874 .000

INST 325.800 3 108.600 7.847 .000 Error 498.200 36 13.839

Total 23864.000 40



Box 3.8. Single Factor Between-S ANOVA Using GLM.


Error, as well as the F and significance (Sig.) for INST also agree with earlier results. Let us now

consider some of the new quantities presented.

The rows labelled Intercept and Total are perhaps the most anomalous, especially the

extremely large SSs. The Intercept SS represents the deviation of the overall mean from 0; it is

calculated as N × (MG - 0)2 = 40 × (24.0 - 0)2 = 23,040.00. The further the grand mean is from 0,

the greater this value. Much of the time this quantity is not of interest, although there are

occasions when researchers do want to know whether the overall mean for a sample of scores

deviates significantly from some hypothesized value, such as 0. The SS Intercept would be the

appropriate numerator for such a test. In the present case, we obtain an F of 1,664.874, a huge

value but of no interest because a hypothesized value of 0 is meaningless (note that the lowest

possible score is 5, making a mean of 0 impossible in the present study). What GLM calls Total

is actually the SSs for our SSTotal = 824.0 (i.e., deviations of scores about the grand mean), plus the

SSIntercept = 23040.0 (i.e., the variability in the grand mean about 0). Adding these values together

gives 23864.000, as shown in Box 3.8.

MANOVA command. Box 3.9 shows the equivalent output using MANOVA to perform

the analysis of variance. Although labelled somewhat differently than in ONEWAY or GLM, the

same values are reported.

MANOVA rate BY inst(1 4) /PRINT = CELLINFO.

FACTOR CODE Mean Std. Dev. N 95 percent Conf. Int. inst 1 27.300 3.302 10 24.938 29.662 inst 2 26.000 3.621 10 23.410 28.590 inst 3 22.700 4.218 10 19.683 25.717 inst 4 20.000 3.682 10 17.366 22.634 For entire sample 24.000 4.597 40 22.530 25.470

Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 inst 325.80 3 108.60 7.85 .000

(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13

R-Squared = .395 Adjusted R-Squared = .3 45

Box 3.9. MANOVA Analysis of Instruction Study.


A GENERAL DISCUSSION OF ANOVA

Having now a general introduction to ANOVA, we will review briefly a rationale for the

F-test of differences among the means, and also examine some other general issues.

Rationale for F-test of Differences Among Means

The trick in using the F-statistic to test hypotheses about the variability in population

means is to construct one variance that represents random variability in the data (the denominator)

and another variance that represents random variability PLUS systematic differences between the

means (the numerator). The value of F is expected to increase as the variability in the population

means increases. The use of F to test differences between means was just demonstrated; the

resulting F was 3.333.

The denominator for the F test is he pooled variance, sp2 = MSError = 3.6, which represents

the random variation within the two groups. The quantity estimated by sp2 is σ2, the variance of

the population from which the samples were selected. Another way of saying this is that the

Expected Value of sp2 is σ2.

Box 3.10 shows an alternative way to

calculate the numerator of F to test the

difference between means. The two sample

means can be used to calculate a second

variance, the variance of the means. This

variance is obviously sensitive to differences

among means. This variance will be zero if the

sample means are identical and becomes larger as the difference between the sample means

increases. The calculation of the variance of the means uses the standard formula, except that the

scores are means instead of individual observations. It might appear unusual to calculate a

variance based on just two scores (means), as here, but absolutely nothing precludes it.

In Box 3.10, the numerator of the F test is the variance of the means multiplied by the

number of subjects in each group (n1 = n2 = 6). We multiply by nj because the null hypothesis for

the F-statistic is equality between the population variances from which the sample variances are

calculated, and multiplying nj times the variance of the means produces a numerator that has the

sp2 = 3.6

My& = (12.0+10.0) / 2 = 11.0

SSy& = +1.0 2 + -1.0 2 = 2.0

s y&2 = 2.0 / (2 - 1) = 2.0

F = n×s y&2 / s p

2

= 6×2.0 / 3.6 = 3.333

Box 3.10. Alternative Calculation of F Ratio.


same expected value as the denominator (i.e., σ2) when the null hypothesis of equal means is true.

To understand the reasoning, consider the Central Limit Theorem (CLT), which you may

have learned in previous statistics courses. The CLT states that if an infinite number of samples

are randomly selected from a single population, then σ y&2 = σy

2/n; that is, the variance of the sample

means will equal the variance of the original population divided by n, the sample size. Basic

algebra leads to n × σ y&2 = σy

2. Therefore, the Expected Value of nj × sy&2 is σ2, the same as the

Expected Value for the denominator, sp2. If H0: µ1 = µ2 is true, then the numerator and

denominator of F should be approximately equal (both are estimates of σ2) and F should not take

on large values. If the null hypothesis is false, however, there will be more variability in the

sample means than expected based on the CLT, and F will be larger. How large F must be to

reject the null hypothesis is determined by the critical value of F. The basic logic of ANOVA,

then, is that if there is no variation among the group means in the population, we would expect the

numerator and denominator of the F ratio to be approximately equal. If there is variation among

the group means in the population, we expect the numerator to be larger than the denominator,

and F should be correspondingly larger.

Box 3.11 shows a similar calculation

for the interview study. We first calculate a

variance for the means, and then compute an F

ratio with this variance times the number of

subjects in the numerator and MSError in the

denominator. This produces the same F as in

earlier boxes.

Supplementary Discussion of ANOVA

Here we consider several

supplementary points that might help in

understanding ANOVA and the use of the F test to test hypotheses about differences between

means. The first point has to do with a consideration of factors that influence the magnitude of F,

and the second with the directionality of the hypotheses tested by the standard F test.

Factors Promoting a Significant F. Examination of earlier calculations for t reveal that

Group y&j y&j - y &G

1 - 27.30 +3.32 - 26.00 +2.03 - 22.70 -1.34 - 20.00 -4.0

y&G = 24.00

s y&j2 = (3.3 2+2.0 2+-1.3 2+-4.0 2)/(4-1)

= 32.58 / 3 = 10.860

F = (10 × 10.860) / 13.839 = 108.60 / 13.839 = 7.847

Box 3.11. Alternative Calculation of F forInterview Study.


the value of t will increase as its numerator (y&1 - y&2) increases and as its denominator decreases

(i.e., as the Standard Error, SEy&1- y&2, gets smaller). The Standard Error will be smaller when there

is little variability within the two groups (i.e., SS1 and SS2 are small), and when the ns for the two

samples are large (i.e., 1/n1 + 1/n2 is small). Stated verbally, t is more likely to be significant

when there is a large difference between the means, when there is little variability within the

groups, and when samples are large.

These same characteristics determine the magnitude of F in ANOVA. MSTreatment is larger

as the differences between the means are larger (i.e., as their distance from a common central

value, y&G, becomes larger). Unless the purpose of a study is to detect modest or minute

differences, researchers generally should aim to manipulate or measure robust differences among

the groups being compared. Comparing 80-year old people to 30-year olds, for example, is a more

powerful manipulation than comparing 60-year olds to 50-year olds.

The denominator of F is also sensitive to the same factors as the denominator of t.

Researchers should try to minimize the SSjs that determine SSError. This means such things as

trying to ensure that all testing is done under as standard conditions as possible (e.g., instructions

that are tape recorded or read verbatim, minimizing distractions), and trying to avoid excessive

variation in participants within groups (e.g., including only participants with similar

backgrounds). Researchers also need to ensure an adequate number of participants since SSError is

divided by N - k to produce MSError, which should be kept as small as possible.

The F Distribution and One- versus Two-Tailed Tests. As noted earlier, the F

distribution is asymmetrical and researchers generally arrange hypothesis tests so that the null

hypothesis is rejected if F is greater than some value (i.e., if F falls sufficiently far in the upper tail

of the distribution). Despite the fact that only one tail of the distribution is used, the ANOVA is

actually a two-tailed test, or to use a less ambiguous terminology, a non-directional test. The

critical factor in deciding whether a test is one- or two-tailed (i.e., directional or non-directional)

is whether the test statistic and its probabilities are sensitive to the direction of the alternative

hypothesis. In testing differences between means, the F-statistic is not sensitive to the direction of

the difference between the means. In the interview study, for example, the same value of F would

have been obtained if the various means were associated with different groups.


A second way to think of why the default ANOVA is generally two-tailed is by relating the

F and t distributions. Squaring t can be conceptualized as folding over the t distribution vertically

at 0 to produce the F distribution. In squaring t to produce F, both negative and positive values of

t become positive. The upper tail of the resulting F distribution will contain both positive and

negative values of t (i.e., values of t corresponding to both directions of outcome). For a two-

tailed t test, each tail will contain α/2 (often .025) that will both be folded into the upper tail for F.

Hence, the corresponding F distribution will contain a two-tailed α (e.g., .05 = 2 x .025) in its

upper tail. For a one-tailed test, however, t has α (.05) in each tail of the distribution so the

corresponding F distribution (i.e., the folded-over t-distribution) will contain 2×α in the upper tail

(e.g., .10 = 2 x .05).

Being two-tailed, the probabilities in Table A.3 must be divided in half to obtain one-

tailed probabilities. If the desired α is .05 for a one-tailed test, then 2 × α = .10 is area that we

must use in Table A.3 to determine the critical value of F for a one-tailed test. For our example, df

= 1 and 10, FCritical = 3.28 = 1.8122 = tCritical2. Given Fobserved = 3.33, we reject H0: µ1 = µ2 and

accept Ha: µ1 > µ2. This is the same conclusion as produced by the one-tailed independent t-test.

If we had a two-tailed Ha: µ1 =/ µ2, then F.05 = 4.96 (.05 = .025 + .025) would be the critical value

and we would not reject H0.

CONCLUSION

In this and the preceding chapter, we have examined a number of important concepts, and

laid the foundations for the omnibus analysis of studies that involve a single Between-S

categorical predictor (or Factor). We have acquired a general notation for the calculations to

perform ANOVA for the single-factor Between-S design, seen how a number of SPSS procedures

can perform such analyses. In particular, SPSS provides diverse ways to conduct single-factor

Between-S ANOVA, including the procedures ONEWAY, GLM, MANOVA. All three can be

specified using syntax and all but MANOVA can also be requested via the Windows menu system.

The format of the output varies somewhat for the different Anovas and versions of SPSS, but all

include the basic quantities calculated previously.

Although the overall (or omnibus) F allows researchers to determine whether they can

reject the null hypothesis of no differences among the groups, it does not provide specific


information about which groups differ; that is, about the pattern of differences among the group

means. The next few chapters address this question and, in doing so, shed light on the nature of

the predictors in regression approaches to ANOVA.


SSTotal'jA

a'1j

na

s'1(yas& yG)2

SSError'jA

a'1j

ns

s'1(yas& ya)

2'jA

a'1SSa'j

A

a'1(na&1)s2

a

SSTreatment'jA

a'1jS

s'1( ya& yG)2'j

A

a'1na( ya& yG)2

MSTreatment'SSTreatment

A&1MSError'

SSError

N&AF'

MSTreatment

MSError

df'A&1, N&A

APPENDIX 3.1: AN ALTERNATIVE NOTATION

Although standard for single-factor Between-S designs, the notation presented in this

chapter does not lend itself readily to generalizing to multiple factor studies. There we will use

sequential uppercase letters to represent both the treatment variables and the number of levels

(i.e., A, B, ...), and lowercase letters, rather than i, as an indicator for the different levels (i.e., a =

1 to A, b = 1 to B, ...). To facilitate transition to this alternative notation, the single-factor

Between-S notation and corresponding formula are shown below.

Level of A

1 2 ... a ... A

S# y S# y S# y S# y

1 y11 1 y12 1 y1a 1 y1A

2 y21 2 y22 2 y2a 2 y2A

. . . . . . . .

s ys1 s ys2 s ysa s ysA

. . . . . . . .

n1 y&1 n2 y&2 na y&a nA y&A N y&G

Single-Factor, Between-S Design Pairwise Comparisons - 4.1

CHAPTER 04:

PAIRWISE COMPARISONS FOR THE

SINGLE-FACTOR BETWEEN-S DESIGN

Introduction to Pairwise Comparisons . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Overview of Post-Hoc Procedures . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 4

Selecting a Pairwise Procedure . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 4

Summarizing Results of Pairwise Comparisons . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 6

Least Significant Difference Method . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Description of LSD Test . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 7

Calculating the LSD Test for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . . 8

SPSS and LSD tests for the Brain Stimulation Study .. . . . . . . . . . . . . . . . . . . . . . . . . . 10

The LSD Procedure and Type I Errors . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 11

The Bonferroni Correction . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Bonferroni using Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 13

Tests Using the Studentized Range Statistic . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Studentized Range Statistic . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 14

LSD and the q Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 16

TUKEY Honestly Significant Difference . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 16

The Student-Newman-Keuls Method . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 17

Tabular Arrangement of Comparisons . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 18

More SPSS Post Hoc Analyses for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . 21

Post Hoc Comparisons Using Windows . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 22

Comparison of Post Hoc Procedures . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 24


Pairwise Comparisons for the Interview Study . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

The LSD t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26

The q Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 27

SPSS Analyses for the Interview Study . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 29

Calculating Critical and P Values for Post Hoc Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 34


The alternative hypothesis in ANOVA is quite general; it states only that one or more of

the H0 equalities is false. With more than a few means, there are many possible ways in which

the H0 can be false. Rejection of the H0 does not therefore permit a precise conclusion about

which means or combinations of means are different from one another. In the interview study,

for example, the omnibus F-test (i.e., the F for the general comparison among the four groups)

does not specify which instructional conditions or which combinations of conditions (e.g., 1+2

versus 3+4) differed significantly from one another. Less obviously, failure to reject an omnibus

H0 does not necessarily mean that more specific differences or comparisons would not be

significant. The overall F-test can be insensitive to more specific comparisons, especially when k

is large. This can occur, for example, when the SSTreatment is divided by df = k - 1 when most of

SSTreatment is in fact due to differences between one group and the rest, a df = 1 effect.

INTRODUCTION TO PAIRWISE COMPARISONS

The purpose of multiple comparison techniques is to test specific hypotheses about

differences between means, following either a significant or nonsignificant omnibus F. This

two-step approach is analogous to testing the significance of the overall regression in Multiple

Regression (i.e., F for R²), accompanied by tests of the significance for specific components

represented by the regression coefficients or srs (i.e., ts for bs or Fs for SSChange).

This and the next chapter describe pairwise procedures to compare individual means with

one another (e.g, y&1 versus y&2). Pairwise comparison procedures are typically used when

researchers have no prior expectation about which groups should differ, and are performing what

are called post hoc or a posteriori comparisons (i.e., after the fact). Chapter 6 describes

techniques that can be used for either pairwise comparisons or for more complex comparisons

that involve multiple means, such as (y&1 + y&2)/2 versus y&3. Those methods are most appropriate

when researchers have planned or predicted before examining the data which groups should

differ from one another and the direction of difference. Such comparisons are called a priori or

planned comparisons, or contrasts. Although researchers would ideally plan focussed contrasts

sensitive to specific predictions about what means are expected to differ from one another, many

Anova situations (perhaps too many?) call for post hoc procedures.


Either all k × (k - 1) / 2 possible pairwisecomparisons are made (OR fewer specificcomparisons are made based on examinationof the means), and the omnibus F-test issignificant.

Box 4.1. Conditions for Post Hoc PairwiseComparisons.

Overview of Post-Hoc Procedures

A wide variety of procedures have been

developed for performing pairwise

comparisons. The post hoc pairwise procedures

considered in this chapter are typically used

under the conditions stated in Box 4.1, although

exceptions to these principles do exist for some

post hoc procedures. The most common use of post-hoc procedures is when the researchers want

to compare all possible pairs of means. With 4 groups, for example, there are 4 × 3 / 2 = 6

pairwise comparisons involving the following combinations of groups: 1-2, 1-3, 1-4, 2-3, 2-4,

and 3-4. With 5 groups, there would be 5 × 4 / 2 = 10 pairwise comparisons, specifically, 1-2, 1-

3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5.

A second important requirement for most post hoc procedures is that the omnibus F be

significant. According to this criterion, a non-significant F means that researchers are not

permitted to “poke around” looking for significant differences among all the possible pairwise

comparisons. The reason for this, as explained later, is that looking at all possible pairwise

comparisons results in a highly inflated Type I error rate. Requiring a significant omnibus F

provides some modest protection against the Type I error.

We discuss four pairwise comparison methods (of many in the literature): the Least

Significant Difference (LSD) method, the Bonferroni procedure, one of several methods

developed by Tukey (TUKEY), and the Student-Newman-Keuls (SNK) method. Because the

BONFERRONI test benefits from statistical programs to compute the required probabilities, this

test will be discussed primarily in the context of SPSS analyses for pairwise comparisons.

Selecting a Pairwise Procedure

The above pairwise comparison methods vary in how conservative or liberal they are with

respect to the probability of a Type I Error. The LSD method is excessively liberal for most

research purposes and has a high probability that one of the comparisons found significant is in

fact a Type I Error. The TUKEY method is quite conservative (i.e., not liberal) and keeps the

probability of one or more Type I Errors under stricter control. The SNK method falls between


the LSD and TUKEY methods on the liberal-conservative scale. The Bonferroni is even more

conservative than TUKEY.

In a complementary way, the methods also vary in their power; that is, in their ability to

detect differences between group means when they do exist in the population. Power is the

complement of a Type II Error, which is a failure to reject a false null hypothesis, that is, power =

1 - ß = 1 - p(Type II Error). The LSD method is the most powerful and has the lowest probability

of a Type II Error, the SNK method is next most powerful, followed by TUKEY and then

BONFERRONI, which is the least powerful and has the highest probability of a Type II Error.

Note how protection against Type I and Type II Errors are inversely related. The

procedure that protects most against the Type I Error (BONFERRONI) protects least against the

Type II Error, and the procedure that protects least against the Type I Error (LSD) protects most

against the Type II Error. Researchers often must juggle to balance the competing demands for

protection from Type I Errors and for power to detect differences when they do exist. Some of

the post hoc procedures considered here are NOT generally recommended because they are either

too liberal or too conservative with respect to Type I errors.

Although we only consider four of many post hoc procedures, a natural question is which

of these procedures to use. This question is even more urgent, of course, if one considers the full

range of options that are available in performing pairwise and other post hoc comparisons. The

choice ultimately boils down to a balancing of Type I and Type II errors. Let us consider two

extremes that represent prototypical studies for the use of liberal (e.g., LSD) and conservative

(e.g., TUKEY, BONFERRONI) procedures.

Consider a pilot study in which a relatively small number of participants have been tested

in order to determine whether a larger-scale study is worthwhile, and the larger-scale study does

not entail inordinate costs for the experimenter or risks for subjects. Here the researcher might

be more concerned about missing a possible real effect (i.e., making a Type II Error) than about

wrongly concluding that there was a real difference (i.e., making a Type I Error). The researcher

might choose to use a liberal procedure (e.g., LSD) in order to perform pairwise comparisons,

and might even be willing to use a more liberal alpha value for the omnibus F (e.g., α = .10).

Consider next a study in which a reasonable number of participants have been tested, and


very real and perhaps expensive consequences are going to happen as a result of the statistical

conclusions. For example, perhaps some expensive albeit risky medical treatment would be

recommended for large numbers of patients if the H0 is rejected. Now we can appreciate that

Type I Errors are very important. That is, the researcher does not want to wrongly conclude that

a particular treatment is effective when it is not because this conclusion would entail much

expense and some (unnecessary) risk for clients. Instead the researcher now wants to be quite

certain that the treatment is worth the expense and risk. The prudent researcher would decide to

use a conservative procedure (e.g., TUKEY, Bonferroni) to perform the pairwise comparisons.

Although, unhappily, real-life research seldom involves such clear-cut distinctions, these

prototypes symbolize the factors that researchers must balance in deciding about the most

appropriate post-hoc procedure. The primary question is whether the costs and dangers of Type I

Errors are more or less serious than the costs and dangers of Type II Errors. We begin with the

LSD procedure, which would be most appropriate when an elevated probability of a type I error

held little risk or cost.

Summarizing Results of Pairwise Comparisons

Because there are so many comparisons being made, the results of pairwise comparison

tests can become confusing. Several methods have been developed for summarizing the results

of multiple tests of significance. One convention is to order the means from smallest to largest,

and draw lines under means that are NOT significantly different from one another (i.e., group

means together as sets whose members are not significantly different from one another).

The procedure is illustrated in Box 4.2 for several outcomes involving three groups and

different patterns of results. Outcome 1 in Box 4.2 shows groups 2 and 1 and groups 1 and 3

joined by distinct lines, indicating that the 2-1 and 1-3 t-tests were not significant. There is no

shared line joining groups 2 and 3, indicating that this difference was significant. Outcome 2

shows a single line joining all three groups (i.e., putting all in a single set). This indicates that all

three comparisons were nonsignificant. The final outcome, outcome 3, shows that the difference

between groups 2 and 1 was not significant (i.e., they belong to the same set), leading to the

inference that 2-3 and 1-3 comparisons were significant.


Group 2 1 3Mj = 3.0 5.0 8.0

Outcome 1 -------- 2-1 NS --------- 1-3 NS

2-3 Sig

Outcome 2------------- 2-1, 2-3, 1-3 NS

Outcome 3-------- 2-1 NS

2-3, 1-3 Sig

Box 4.2. Alternative (hypothetical) outcomes for pairwisecomparisons.

Note that the conclusion in outcome 1 is somewhat anomalous, because groups 2 and 3

do differ significantly, although

neither differs significantly

from group 1. One common

difficulty with pairwise

comparison procedures is that

the overall pattern of

differences may not be very

easy to describe and may be

even more difficult to explain.

This is less likely to happen

using a priori procedures that

test for particular patterns in the means.

LEAST SIGNIFICANT DIFFERENCE METHOD

One simple method of performing pairwise comparisons is to compute t-statistics for each

pair of means and compare the observed ts to the critical value for a desired α. The LSD method

is in fact a t-test, sometimes called a protected t-test because the omnibus F should be significant

before these post hoc ts are done. If the omnibus F is significant, then some means or

combination of means differ from one another, thus justifying (weakly) pairwise comparisons

using the t-test. This method is called the Least Significant Difference method because the

difference between means required for significance is less than in other post hoc procedures.

Description of LSD Test

Equation 4.1 shows one formula for the LSD t-test, which is a slight modification of the

standard t-test formula. In the formula, j and j’ represent any two of the k groups being

compared to one another (e.g., j = 1, j’ = 2). The only difference from the standard formula for t

is that MSError appears in the denominator instead of the pooled variance, sp2. Because the

denominator uses MSError rather than sp2, all of the LSD t tests will involve the same denominator

as long as nj and nj’ are the same (e.g., as in studies that involve equal njs). This greatly simplifies

computations. The use of MSError also means that the df for the t are the same as for dfError.


H0:µj'µj )

Ha:µj…µj )t '

( yj& yj ))&0

MSError1nj

%1nj )

'q

2df'dfError

Equation 4.1. Formula for LSD t Test.

Area of Stimulation (n j = 3)NS(1) A(2) B(3)

y&j 5.0 3.0 8.0 y &G = 5.333H0: µ 1 = µ 2 = µ 3

Ha: at least one pair or combination of µsdiffer

SOURCE df SS MS FTreatment k-1 = 2 38.0 19.00 9.5Error n-k = 6 12.0 2.00

Total n-1 = 8 50.0

F.05;2,6 = 5.14 F Obs $ F α ˆ Reject H 0

Box 4.3. Single-Factor Between-S ANOVA.

Equation 4.1 shows as well the relationship between the t statistic and the q statistic used

for the SNK and Tukey procedures and described in Chapter 5. Dividing q by the square root of

2 gives the corresponding t, or equivalently, multiplying t by the square root of 2 gives the

corresponding q. These equivalencies can be used to relate either observed values of the

statistics, that is, tObserved and qObserved, or the critical values of the statistics, that is, tCritical and

qCritical. These equivalencies help us on occasion with calculations of the different statistics, and

as well in conceptualizing the relationship among the different post hoc procedures.

Calculating the LSD Test for the Brain Stimulation Study

The multiple

comparison procedures will

be illustrated first using the

brain stimulation study in

Box 4.3. Nine animals were

randomly assigned to one of

three groups: a No

Stimulation control group

(NS), a neutral area A control

group (A), and a critical area B treatment group (B). Researchers measured the number of bar

presses per minute after 10 sessions in the conditioning phase of the study (pressing rewarded by

brain stimulation). Because observations in the different groups are uncorrelated, this is a

between-S ANOVA design. Calculations for the omnibus ANOVA are shown in Appendix 4-1.

The significant F in Box 4.3 indicates that there is more variability among the treatment means

than would be expected given the null hypothesis of no treatment effect (i.e., no differences

among the three population µs or, equivalently, that the three samples come from the same

population and therefore share a common µ).


H0: µ i = µ j H a: µ i =/ µ j (two tailed)

y &i - y &j

t = ))))))))))))))))))) compare to t α/2 , df Err = n-k %MSErr (1/n i + 1/n j )

LSD Comparisons for Sample Study

df = n-k = 9-3 = 6 t .05,6 = 2.447

den t = %2(1/3+1/3) = 1.1547

t A-NS = (3-5) / 1.1547 = -1.732 No Rej H 0: µ A = µ NS

t B-NS = (8-5) / 1.1547 = 2.598 Reject H 0: µ B = µ NS

t B-A = (8-3) / 1.1547 = 4.330 Reject H 0: µ A = µ B

Summary A NS B Groups ordered from Lowest

y&j = 3.0 5.0 8.0 to Highest Mean --------- A and NS NOT significantl y

different

Box 4.4. Least Significant Differences (LSD) Analysis.

Although the H0: µ1 = µ2 = µ3 is rejected, the researchers cannot state which means or

combinations of means differ from one another. The ANOVA only shows that some pairs or

combination of means vary, but not which one(s). Multiple comparison procedures provide more

specific conclusions about the differences between the means. One option is to perform all

pairwise comparisons, a situation that normally surfaces when researchers do not have specific

expectations about

which groups will

differ.

Box 4.4

summarizes the LSD

calculations and

shows the LSD results

for the brain

stimulation study.

The shared

denominator for the t-

test is calculated

using MSError from the

omnibus ANOVA in

Box 4.3, and the

denominator is then used to calculate three ts, one each for A - NS, NS - B, and A - B. The

differences between B and the other two groups are significant, whereas the control groups A and

NS do not differ significantly from one another. The quantities shown in square brackets [] with

the t-values will be used later to compare the LSD and other multiple comparison procedures.

One convention to describe pairwise comparison results, as described previously, is to

order the means from smallest to largest, and draw lines under means that are NOT significantly

different from one another. The summary is presented at the bottom of Box 4.4. The LSD test

shows that A and NS are not significantly different, but both differ from B. A single line is

therefore drawn under A and NS. The line does not extend under B, which differs significantly


from both A and NS. The line indicates that groups underlined together belong to a set of groups

whose means do not differ significantly from one another. Note this corresponds to outcome 3 in

Box 4.2. We later illustrate a second way to summarize pairwise comparison results in table

format. In the next few sections we illustrate various ways to perform the LSD test using

ONEWAY and GLM.

SPSS and LSD tests for the Brain Stimulation Study

Using syntax, the /POSTHOC = method option on ONEWAY provides a number of

standard methods for performing pairwise comparisons, including the Least Significant Difference

t-test (LSD). The term in brackets is the SPSS keyword for this method. To illustrate, the

command ONEWAY dep BY group /POSTHOC = LSD requests a between-S ANOVA followed

by unadjusted comparisons among all pairs of means (i.e., a t-test or the equivalent, a q-statistic

using a stretch of 2; the q statistic is discussed in the next section). The different pairwise

procedures discussed here and in following sections can be requested in one analysis by including

several different tests (e.g., /POSTHOC = LSD SNK). Researchers can also specify the desired

alpha level for some of the tests, for example, /POSTHOC = LSD (.10). ONEWAY also still

recognizes an earlier terminology for these posthoc tests; specifically, the command /RANGES =

option is equivalent to /POSTHOC =.

Box 4.5 shows the syntax commands to enter the data and perform pairwise comparisons

for our brain stimulation example, along with the omnibus ANOVA and LSD results. As shown

previously, the omnibus F is significant, suggesting that one or more of the equalities implied by

the H0: µ1 = µ2 = µ3 are false. Pairwise comparisons may help to determine which groups differ.

The /POSTHOC = LSD subcommand in Box 4.5 requests the LSD test also be reported. The

omnibus F is followed by the results for the post hoc tests. The results here have been edited to

remove some lines in the output as SPSS performs redundant comparisons (e.g., 1 vs. 2 and 2 vs.

1).

The I and J columns denote the levels of the factor that are being compared; I = 1 and J = 2

indicates that group 1 is being compared to group 2. The mean difference column provides the

difference between the two respective means, and the standard error column provides what would

be the denominator for the t-test. SPSS then provides significance levels for each comparison, but


DATA LIST FREE / group press.BEGIN DATA1 4 1 5 1 62 3 2 2 2 43 10 3 8 3 6END DATA.

ONEWAY press BY group /POSTHOC = LSD.


Post Hoc TestsMultiple ComparisonsLSD

(I) (J) Mean Difference Std. Sig. group group (I-J) Error 1.0000 2.0000 2.0000000 1.1547005 .134 3.0000 -3.0000000(*) 1.1547005 .041 2.0000 3.0000 -5.0000000(*) 1.1547005 .005

Box 4.5. SPSS commands for ANOVA and pairwise comparisons, with the LSD results.

not actual observed values for the test statistics, either t or the q statistics discussed later.

Dividing the mean difference by the standard error would produce the t-value.

The p values confirm our previous manual analyses. Groups 1 and 2 do not differ

significantly, whereas groups 1 and 3 and groups 2 and 3 do differ significantly. The summary of

these results would be identical to that shown in Box 4.4.

The identical results to Box 4.5 can be obtained using the following GLM commands:

GLM press BY group /POSTHOC = group(LSD), or by requesting the LSD procedure using

menus and either ONEWAY or GLM.

The LSD Procedure and Type I Errors

A serious problem with multiple t-tests for post hoc comparisons is that the probability of

a Type I Error is controlled at α for each individual comparison, but the probability of one or more

Type I Errors for the entire experiment or family of comparisons can become very high across the

k × (k - 1) / 2 comparisons. A good analogy would be tossing a coin 6 times, and thinking that the

probability of a head would be .5. The probability of a head on each toss of the coin is .5, but the

probability of at least one head (i.e., one or more heads) on 6 tosses is much higher than .5. In fact,


c = # pairwise comparisons k = k × (k-1)/2 αExp . 1-(1- α) c

--- -------------------------- -------------- 2 1 (12) .050 3 3 (12 13 23) .143 4 6 (12 13 14 23 24 34) .265 5 10 (12 13 14 15 23 24 25 34 35 45) .401 ...10 45 (12 13 14 15 16 17 18 ...) .940

Box 4.6. Experiment-wise Type I Error Rate.

the probability of one or more heads is one minus the probability that no heads are tossed on the 6

trials, which equals 1 - .56 = 1 - .016 = .984. There is a very high probability that one or more

heads will occur on 6 tosses of a fair coin, and it would be incorrect to use p = .5 rather than .984

as the probability.

Most of the time, multiple t-tests are similarly inappropriate for post hoc comparisons

because α for the experiment would be excessively high. The probability of one or more Type I

Errors across all the comparisons

of the experiment is sometimes

referred to as the experimentwise

(or familywise) α, symbolized

here as αExp; α by itself refers to

the probability of a Type I Error

for each comparison. Box 4.6

shows how αExp quickly

increases as the number of comparisons increases. Consider the case of all pairwise comparisons

between k = 4 groups. There are 4 × (4 - 1) / 2 = 6 possible t-tests (1-2, 1-3, 1-4, 2-3, 2-4, 3-4),

with the probability of a Type I Error for each individual comparison being at the specified α level

(say .05). The calculation for the probability of at least one Type I Error is: αExp = 1 - p(no Type I

error) . 1 - (1 - α)6 = 1 - .956 = 1 - .735 = .265. This is much higher than the .05 level that

researchers normally strive for. As shown in Box 4.6, the 45 comparisons among 10 groups

would virtually guarantee one or more Type I errors (p = .94).

THE BONFERRONI CORRECTION

One way to conduct pairwise comparisons to correct for the inflated probability of a Type I

error is to divide the p-value required for significance of each pairwise comparison by the total

number of comparisons being made. If three comparisons are made with an experiment-wise

alpha of .05, for example, a difference must produce a p-value of .017 (i.e., .05 / 3 = .017) in order

to be significant. This procedure is known as the Bonferroni correction and is available in several

of SPSS’s ANOVA programs. Bonferroni procedures generally need computer assistance in order


to determine the observed ps required to conduct the tests (or equivalently the critical values for t

or q given non-standard p values). Given an alpha of .017, we can see from the p values in Box

4.5 that only the difference between groups 2 and 3 will be significant using this procedure.

Bonferroni using Syntax

Box 4.7 shows the SPSS commands and output for the Bonferroni test of our three-group

study. The format of the command is similar to the LSD test, but with a different posthoc

procedure specified. Box 4.7 uses the keyword BONFERRONI (or rather its abbreviation). The

significance levels in Box 4.7 can be compared directly to the standard alpha of .05. In essence,

SPSS conducts the Bonferroni adjustment by multiplying the p values for the LSD procedure by

the number of comparisons. The reasoning is that any LSD p value that is significant when

compared to .05 divided by the number of comparisons, will be significant if multiplied by the

number of comparisons and compared to .05. Note that the p values in Box 4.7 are 3 times the p

values in Box 4.5. The results confirm that only the difference between groups 2 and 3

(stimulation areas A and B) is significant.

The important lesson from the increased p values, which can also apply to other post hoc

procedures, is that the Bonferroni procedure is very conservative and perhaps should be avoided

unless there are exceptionally harmful consequences to making a Type I Error. Using such a

conservative test necessarily increases greatly the probability of a Type II Error (i.e., failing to

reject a false null hypothesis).

One anomaly that can arise when p values are multiplied by the number of comparisons is

that the resulting produce may be greater than 1.0, something that cannot occur with a probability.

ONEWAY press BY group /POSTHOC = BONF....Post Hoc TestsMultiple ComparisonsBonferroni

(I) (J) Mean Difference Std. Sig. group group (I-J) Error 1.0000 2.0000 2.0000000 1.1547005 .402 3.0000 -3.0000000 1.1547005 .122 2.0000 3.0000 -5.0000000(*) 1.1547005 .015

Box 4.7. Bonferroni test using ONEWAY.


In that case, SPSS prints 1.0 as the p value for that comparison.

The Bonferroni test can be specified in the GLM procedure. The syntax commands for our

data would be: GLM press BY group /POSTHOC = group(BONF). The Bonferroni test is also

one of the options on the Post Hoc menu for ONEWAY and GLM, as shown shortly.

TESTS USING THE STUDENTIZED RANGE STATISTIC

The fact that multiple t-tests (i.e., the LSD method) provide very weak protection against

experimentwise Type I Errors challenges their use under most post hoc conditions. Researchers

need to use a higher significance level for each comparison so that the experimentwise error rate

does not become excessive. Although it is possible to perform such adjustments with the t-test

(e.g., set α for each comparison to αEXP divided by the # comparisons, as described for the

Bonferroni procedure), the two procedures that we consider next (TUKEY and SNK) make use of

the Studentized Range Statistic (q), a statistic that is different from t, although related (as we shall

see). Both the TUKEY and SNK methods adjust for the number of comparisons being made, but

differ in how extreme the adjustment is. We will also show that the LSD method can be viewed

as a q-statistic with no adjustment for the number of groups. Casting all tests in terms of the

q-statistic allows a clearer comparison of the various posthoc methods.

Studentized Range Statistic

Critical values for the q statistic were determined originally by selecting varying numbers

(k) of samples of a given size from a common population of scores. The two most extreme

groups (i.e., the highest and lowest means of the k groups) were contrasted using the Studentized

Range q statistic shown in Equation 4.2 with y&j = y&Max and y&j’ = y&Min. Critical values of q are

shown in Appendix A.4 as a function of dfError and a parameter called stretch, which for now can

be thought of as varying with k, the number of groups. As the stretch increases, the value of q

needed to reject the null hypothesis also increases. The reasoning is that the more groups sampled

from the population, the greater the likelihood of getting larger differences between the most

extreme groups, just by chance alone. Correcting for the number of groups maintains the

experimentwise Type I Error rate closer to α while making all possible comparisons between pairs

of means. In other words, raising the critical value of q as k increases corrects for the increased


H0:µj'µj )

Ha:µj…µj )q '

yj& yj )

MSError1nj

'yj& yj )

MSError

21nj

%1nj )

' t 2 df'dfError

Equation 4.2. Formula for q Statistic.

n1 = n2 = n3 = n j = 3 MS Error = 2.0

Denominator q = %MSE/nj = %2.0/3 = .8165

Ordered Means y&A y&NS y&B

3.00 5.00 8.00

qB-A = (8.0 - 3.0)/.8165 = 6.124 [= %2 × 4.330 = %2 × t B-A]

qNS-A = (5.0 - 3.0)/.8165 = 2.449 [= %2 × 1.732 = %2 × t NS-A]

qB-NS = (8.0 - 5.0)/.8165 = 3.674 [= %2 × 2.598 = %2 × t B-NS]

Box 4.8. Calculation of Studentized Range Statistics (q).

experimentwise Type I Error rate as the number of possible comparisons increases.

Equation 4.2 shows two formula for the q statistic. Either version can be used when the

samples have equal njs, and the second when the samples have unequal njs. If you compare the

second version with the formula for t, you will see that the only difference is the presence of 2 as a

divisor for MSError under the square root sign. In essence, the denominator for q will be %2 smaller

than denominator for t, which means that q will be %2 larger than t. As for the LSD t, the qs for

pairwise comparisons will involve a common denominator unless ns for the various groups differ

from one another.

Box 4.8 shows calculations of the q-statistic for the brain stimulation study. The

denominator used for the three tests (.8165) depends on the number of observations per treatment

group (nj = 3 in this study) and the MSError from the omnibus ANOVA (2.0). The differences

between the three pairs of means are divided by this common quantity to determine the qs.

Comparison of the qs in Box 4.8 and the ts in Box 4.4 shows that the qs are larger than the ts by a

factor of %2, because the denominator for q is %2 smaller than the denominator for t, which makes

the final values for q larger. This is most easily seen in the unequal n version of Equation 4.2.

The TUKEY and SNK procedures both use the q statistic, and use identical calculations to


obtain the observed values of q for the various comparisons. The two methods differ in what

value is used for the stretch parameter in Appendix A.4 and hence in the critical value against

which the observed qs are compared. But first, let’s redo the LSD test using the q statistic.

LSD and the q Statistic

We have stated that the LSD procedure makes no adjustment for the number of

comparisons (i.e., for the number of groups when all pairwise comparisons are made). Each test

is done as though we were only comparing the two groups involved in our test. In terms of the q-

statistic, this means that we use a stretch of two (i.e., k = 2) for all of our comparisons. The value

of v in Appendix A.4 is dfError, which equals 9 - 3 = 6 in the present example. The critical value for

the three q statistics would therefore be 3.46. Our observed q-values were qNS-A = 2.449 (not

significant), qNS-B = 3.674 (significant), and qA-B = 6.124 (significant).

This is (necessarily) the same conclusion that we came to earlier using the t statistic. Note

the values in square brackets ([ ]) in Boxes 4.8. These show that the observed t and q statistics are

related to one another by the factor of %2, that is, q = t × %2 and t = q / %2. Note also that the

critical values of t and q are similarly related; that is, tCritical = qCritical / %2, or %2 × tCritical = qCritical.

Specifically for our example, tCritical × %2 = 2.447 × %2 = 3.46 = qCritical. Because the critical values

of t and q are related to one another in exactly the same way that observed values of t and q are

related, the t and q versions of the LSD procedure must always come to the identical conclusions.

Note, however, that the equivalence occurs because we made no adjustment for the actual number

of groups in the study. Other procedures do make an adjustment.

TUKEY Honestly Significant Difference

The TUKEY Honestly Significant Difference test uses the number of groups (k) as the

stretch parameter to determine the critical value of q for all comparisons. Using k = 3 and v = 6

(i.e., dfError), q.05 = 4.34 for the TUKEY procedure. Note that 4.34 is considerably greater than the

critical value of 3.46 for the LSD procedure. Any pairwise q falling between these values will be

significant using the LSD procedure, but not significant using the TUKEY procedure. Given the

TUKEY critical value, only the difference between A and B is significant. Note that there is a

single critical value of q for the TUKEY method, just as there was a single critical value of q for

the LSD method (and for the equivalent t).


The results of the TUKEY procedure correspond to the outcome presented earlier as one

hypothetical possibility for the pairwise comparison procedures; that is, our conclusions could be

summarized as in Box 4.3. Note that the difference between B and NS is significant by the LSD

test but is not significant by the TUKEY test. The difference between means must be larger to be

significant by TUKEY than by LSD. That is what makes the TUKEY method is more

conservative than the LSD method. Because we are less likely to reject the null hypothesis, we

are less likely to make a Type I error (i.e., reject a true null hypothesis). Of course, TUKEY is

also less powerful for the same reason; that is, we are more likely to make a Type II error (i.e., fail

to reject a false null hypothesis). More real differences between means will be missed (Type II

Error) because they are too small to be detected by the increased critical value.

The Student-Newman-Keuls Method

Although the TUKEY test corrects for the inflated probability of a Type I Error that occurs

with the LSD method (and although TUKEY is less conservative than BONFERRONI, as shown

later), there is some concern that the adjustment is still too extreme. That is, the TUKEY test may

be too conservative. One way to think of this intuitively is that the TUKEY method assumes that

the two most extreme of the k groups are being compared, but in fact only one of the comparisons

is between the highest and lowest of the k groups. The other comparisons involve means that are

not the most extreme, and hence may be handicapped by using a single critical value based on k.

This is a concern because excessive Type II Errors can occur for overly conservative methods.

That is, researchers will too often fail to reject a false null hypothesis.

To compensate for TUKEY’s conservativeness, it is possible to vary the stretch value used

to obtain the critical value for q. One such procedure is the Student-Newman-Keuls (or SNK)

method, named after various individuals involved in its development. The essential idea behind

the SNK method is that the stretch used to find the critical value of q is equal to the number of

steps between y&j and y&j’ , inclusive, when the means are ranked from the smallest to largest. This

number of steps is known as the stretch or range of the groups.

When comparing the two most extreme groups, the critical value for SNK will equal k, the

critical value for TUKEY, because the stretch between the highest and lowest means is always k,

the number of treatment conditions. When comparing adjacent groups, the critical value will be 2,


A NS B SNK Critical Values (df Error = 6)

93.0 5.0 8.0 Stretch qα qObserved

B-A 1---2---3 3 4.34* 6.124 Reject H 0: µ A = µ B

NS-A 1---2 2 3.46 2.449 No Rej H 0: µ A = µ NS

B-NS 1---2 2 3.46* 3.674 Reject H 0: µ NS = µ B

A NS By&j = 3.0 5.0 8.0

--------- A vs. NS not signific antly different

Box 4.9. SNK Pairwise Comparisons.

as though there were only two groups being compared and equivalent to the LSD procedure.

Between these extremes, the stretch will be the span for the specific groups being compared (and

including other groups that fall between these when means are ordered from lowest to highest). In

essence, the value of q required to reject H0 becomes larger as the stretch increases.

The SNK method is illustrated in Box 4.9 for our example problem. Note that the group

means are ordered from lowest (A) to highest (B). The stretch used to obtain the critical value of q

is 3 for the A-B comparison (A to NS to B spans all 3 groups), whereas the stretch is 2 for the

A-NS and NS-B comparisons (A to NS spans two groups, as does NS to B). The appropriate

critical values are recorded in the column labelled qα in Box 4.8. In this particular example, the

results of the SNK procedure are the same as those for the LSD procedure; that is, group B differs

from both A and NS, which do not differ from one another.

Tabular Arrangement of Comparisons

One way in which SNK and other pairwise comparison tests are often done is to first rank

order the means from the smallest mean to the largest. A table is then prepared in which the rows

and columns are the ranked means, and the cells contain the values of q for comparisons between

corresponding row and column means. It is useful to arrange the means systematically in a table

of this sort, especially when there are many comparisons to be performed (i.e., k is large).


Denominator q = %2.0/3 = .816

LSD q .05 = 3.46 TUKEY q .05 = 4.34

Ordered Means 3.0 5.0 8.0

SNK A NS B Stretch q.05

A - 2.449 6.124 LST ---> 3 4.34

NS - 3.674 LS ---> 2 3.46

B -

LST = significant by LSD, SNK, and TUKEYLS = significant by LSD and SNK

Box 4.10. Tabular Arrangement for Q-Tests.

The steps in this procedure (see Box 4.10) are: (a) order the means from smallest to

largest, (b) label he rows and columns of the table with the group labels, (c) calculate the observed

qs for the differences between specified row and column means (remember the denominators of

all comparisons are the same) and

enter the values in the

corresponding cells, and (d)

compare the observed values to the

critical value (or values) of q for

whichever post hoc procedure is

being used. The arrangement is

shown in Box 4.10.

There are several benefits to

this systematic approach. The

ordered means and arrangement in a

table make it easier to identify the k × (k - 1) / 2 comparisons to be performed. Start in the upper

right corner and work to the left until you reach the cell in which the column is the same group as

the row. Move down a row and start again at the right. Continue until you again reach the cell in

which the column is the same group as the row. Following this method ensures that all pairwise

comparisons are made. For the LSD and TUKEY procedures, the k(k-1)/2 observed qs are

compared to a single value (using k = 2 for LSD or k = k for TUKEY).

A second benefit of the tabular arrangement occurs with the SNK procedure. For the SNK

procedure, a different q is used depending on the stretch between means for the particular

comparison; the stretch varies from k to 2. The tabular arrangement facilitates comparison of

observed values to the appropriate critical values because stretch is related systematically to

specific locations in the table. The comparison in the upper right corner always involves the

maximum stretch of k because the upper right comparison is between the largest and the smallest

means. Moving from the upper right corner to the diagonal (indicated by dashes [-] in Box 4.10),

the stretch decreases from k to 2. The comparisons just above the diagonal always involve

adjacent means, for which the SNK procedure specifies k = 2 for the critical value. Diagonal rows


Group 2 3 4 SNK Stretch q α

Means

2.00 1 3.34* 3.75* 3.98 4 4.05

5.34 2 - .41 .64 3 3.65

5.75 3 - .23 2 3.00

5.98 4 -

Box 4.11. Anomalous Outcome for SNK Procedure.

between the main diagonal and the upper right corner will involve intermediate stretches.

A third benefit

of a table occurs

because SNK uses a

different critical value

for different stretches,

which can lead to anomalous outcomes. For example, a relatively large difference between means

could be nonsignificant and a relatively small difference significant because the former occurred

at a wider stretch than the other. A wider stretch means a larger critical value for q. The

hypothetical example in Box 4.11 illustrates the problem.

The difficulty arises in row one. The 1 - 4 comparison is not significant because 3.98 is

less than 4.05, the critical value of q for a stretch of 4. But the other two comparisons on row one

are significant even though the differences between the means for 1 - 3 and 1 - 2 are smaller than

for 1 - 4. This occurs because qα is decreasing as the stretch lowers and produces an anomalous

outcome; specifically, the difference between 5.98 and 2.00 is not significant, whereas the smaller

differences between 5.75 and 2.00 and between 5.34 and 2.00 are significant.

To avoid such awkward outcomes, the usual practice is to start at the right of each row and

stop making comparisons for that row when the first nonsignificant effect occurs. In the present

example, the nonsignificant comparison for 1 - 4 means that comparisons 1 - 3 and 1 - 2 on row

one would not be performed, or at least would not be viewed as significant. Similarly for row

two, comparison 2 - 4 is not significant and would nullify comparison 2 - 3. Performing SNK

comparisons in this way ensures that the conclusions of the study are at least coherent. As we see

next, the possibility of anomalous conclusions leads SPSS to report the results of SNK

comparisons differently than the pairwise comparisons produced for the LSD and BONFERRONI

procedures. Instead, SPSS will report results as sets of means that do not differ significantly from

one another, analogous to the underlining technique that we have been using. If pairwise

probabilities were reported, uninformed users might interpret a smaller difference as significant

even though a larger difference was not significant.


GLM press BY group /POSTHOC = group(LSD SNK TUKEY B ONF)....Post Hoc TestsMultiple Comparisons

(I) (J) Mean Difference Std. Sig. group group (I-J) Error LSD 1.0000 2.0000 2.000000 1.1547005 .134 3.0000 -3.000000(*) 1.1547005 .041 2.0000 3.0000 -5.000000(*) 1.1547005 .005

Tukey HSD 1.0000 2.0000 2.000000 1.1547005 .269 3.0000 -3.000000 1.1547005 .090 2.0000 3.0000 -5.000000(*) 1.1547005 .012

Bonferroni 1.0000 2.0000 2.000000 1.1547005 .402 3.0000 -3.000000 1.1547005 .122 2.0000 3.0000 -5.000000(*) 1.1547005 .015

Homogeneous Subsets group N Subset 1 2 Student-Newman- 2.0000 3 3.000000 Keuls(a,b,c) 1.0000 3 5.000000 3.0000 3 8.000000 Sig. .134 1.000

Tukey 2.0000 3 3.000000 HSD(a,b,c) 1.0000 3 5.000000 5.000000 3.0000 3 8.000000 Sig. .269 .090

Box 4.12. SPSS commands for ANOVA and pairwise comparisons, with the LSD results.

More SPSS Post Hoc Analyses for the Brain Stimulation Study

Box 4.12 shows the results for the SNK and TUKEY procedures. The LSD and

BONFERRONI procedures have also been requested in order to better appreciate the relationship

among the four tests. Looking first at the p values for the TUKEY procedure, notice that the

values fall between those for the LSD and BONFERRONI procedures. TUKEYs is more

conservative than the LSD procedure but less conservative than the BONFERRONI procedure. In

addition to reporting pairwise probabilities for the TUKEY test, SPSS also reports the results as

homogeneous subsets. This is the method that is similar to our underlining technique. SPSS

reports two subsets of means that are not significantly different from one another. Subset 1

contains groups 2 and 1 (note that groups are ordered from lowest to highest and that column

values for the subsets are the group means), and subset 2 contains groups 1 and 3. The p values


Figure 4.1. ONEWAY Post-Hoc Comparison Screen.

printed at the bottom of the subset columns indicate the smallest p value for a comparison within

a subset. In the present case there are only two groups in each subset, so the p values correspond

to the p values reported in the Sig. column for the TUKEY test.

The SNK results are reported only as subsets, with subset 1 again containing groups 2 and

1, but subset 2 now only containing group 3. This indicates that the differences between 2 vs. 3

and 1 vs. 3 are both significant but the difference between 2 and 1 is not significant. Because

groups 2 and 1 are adjacent (i.e., stretch = 2), the p value for subset 1 is the p value for the LSD

comparison between groups 2 and 1 (.134). By similar reasoning, the p value for groups 1 vs. 3

will be .041, a significant difference. Means for groups 2 and 3 are more distant (i.e., stretch = 3

= k), and we can therefore infer that the p value for that comparison is equivalent to the TUKEY

value for this comparison (i.e., .012).

Post Hoc Comparisons Using Windows

Figure 4.1 shows the Post Hoc screen for ONEWAY ANOVA in Windows. To reach this

screen, we would follow Analyze | Compare Means | Oneway, move “press” into the Dependent

List and “group” into the

Factor list, and then click on

Post Hoc. A series of boxes

appears and the user selects

which pairwise comparison

procedures to use. In the box,

the procedures that we have

discussed (i.e., LSD, SNK,

TUKEY, Bonferroni) are

already selected. Click

Continue on this screen, and

then Ok when returned to the

One-Way ANOVA screen.

The omnibus Anova results


would be reported, followed by the post hoc results.

Box 4.13 shows the equivalent syntax. The results for the LSD procedure are also shown,

first in the complete form produced by SPSS for Windows, and then edited to eliminate

redundancies.

Note in the original version that 6 rows were printed, although only three comparisons are

of interest. The original six rows involve three pairs of redundant comparisons because it does not

matter which group is first and which second (i.e., 1-2 and 2-1 involve the same comparison). The

redundant pairs are 1-2 and 2-1, 1-3 and 3-1, and 2-3 and 3-2. This represents the above-diagonal

and below-diagonal cells of the matrix, which we noted previously were redundant. These are

redundant because the direction of comparison does not affect significance; it only affects the sign

of the difference. Note in Box 4.13, for example, that the p-values for 1-2 (.041) and 2-1 (.041)

are identical and that their confidence intervals are mirror images of one another. The rows 2-1, 3-

1, and 3-2 can be eliminated without losing any information. These six unnecessary rows have

been eliminated from the edited printout shown at the bottom, which makes the results much

clearer. This editing has been done for most of our printouts.

ONEWAY press BY group /POSTHOC = LSD SNK TUKEY BO NFERRONI....Post Hoc Tests Mean Difference Std. Sig. 95% Confidence Interval (I-J) Error (I) GROUP (J) GROUP Lower Bound Upper Bound LSD 1.00 2.00 2.0000 1.15470 .134 -.8255 4.8255 3.00 -3.0000(*) 1.15470 .041 -5.8255 -.1745 2.00 1.00 -2.0000 1.15470 .134 -4.8255 .8255 3.00 -5.0000(*) 1.15470 .005 -7.8255 -2.1745 3.00 1.00 3.0000(*) 1.15470 .041 .1745 5.8255 2.00 5.0000(*) 1.15470 .005 2.1745 7.8255

Edited Results for Post Hoc Tests Mean Difference Std. Sig. 95% Confidence Interval (I-J) Error (I) GROUP (J) GROUP Lower Bound Upper Bound LSD 1.00 2.00 2.0000 1.15470 .134 -.8255 4.8255 3.00 -3.0000(*) 1.15470 .041 -5.8255 -.1745 2.00 3.00 -5.0000(*) 1.15470 .005 -7.8255 -2.1745

Box 4.13. Original and Edited Post Hoc Results for Windows.


Figure 4.2. Post Hoc Tests using GLM.

Next we briefly examine how we would perform post hoc tests using GLM in Windows

SPSS. Figure 4.2 shows the screen after selecting Analyze | GLM | Univariate, entering group as

the Fixed Factor(s) and press as the Dependent Variable, and then clicking on Post Hoc to bring

up the Univariate:

Post Hoc window.

We have moved the

group Factor into

the Post Hoc Tests

for: window, and

selected the four

tests of interest.

Clicking on

Continue would

return to the

Univariate window,

and clicking OK

would initiate the

analysis. The format of the post hoc results is identical to that just presented for ONEWAY,

although the omnibus F output is somewhat different in format.

Comparison of Post Hoc Procedures

To facilitate comparison across the four procedures, the p values for the various post hoc

procedures are summarized in

Box 4.14 (several of the SNK ps

had to be inferred from SNK’s

equivalence to LSD for a stretch

of 2 and to TUKEY for a stretch

of 3). As we proceed from the top

row (i.e., LSD) to the bottom row

(i.e., Bonferroni), the p values

Comparison

Test 1-2 1-3 2-3

LSD .134 .041 .005

SNK .134 .041 .012

TUKEY .269 .090 .012

BONFERRONI .402 .122 .015

Box 4.14. Comparison of post hoc p values.


either remain fixed or increase (i.e., differences become less significant). This reflects the

increasing conservativeness of the procedures.

The Bonferroni was introduced as a test in which the nominal alpha level was divided by

the number of comparisons. With three comparisons, as here, the critical alpha becomes .05 / 3 =

.017. We could then compare the LSD ps to .017 to see what differences are significant. Only the

2 - 3 comparison, p = .005, has a p less than .017 and would be significant by the Bonferroni test.

But this operation suggests a way to determine the p values for the Bonferroni, as shown in Boxes

4.15 and 4.17, assuming that we already know the LSD p values. What we need to do is multiply

the LSD p values by the number of comparisons (3 in this case) and observe which ps remain #

.05. This is the reverse of the divide-p approach. Multiplying the LSD ps by three in fact produces

the BONFERRONI p values shown in Boxes 4.15 and 4.17 (e.g., 3 × .134 = .402).

Whether in terms of p values, as here, or in terms of critical values of t or q, as illustrated

later, it is clear that the Bonferroni test is very conservative, too much so, according to many

statisticians. There have been several adjustments to the basic Bonferroni test that we used, some

of which are implemented in SPSS. That the Bonferroni is overly conservative can be appreciated

by considering what would have happened if one of our LSD comparisons had produced a p as

large as .40. The Bonferroni approach would say to calculate the new probability as 3 × .40 =

1.20, which is impossible as ps must fall between 0 and 1. In fact, 1.0 would appear as the

probability in this case, but the example serves to illustrate that simply multiplying p times the

number of comparisons is a very aggressive approach to the problem of Type I errors with

pairwise comparisons.


PAIRWISE COMPARISONS FOR THE INTERVIEW STUDY

Previous chapters analyzed data from a study of attractiveness ratings for interviews given

four groups receiving different instructions about the purpose of the interview: Group 1 received

no additional information, Group 2 were told the interviews were for jobs, Group 3 were told they

were psychiatric interviews, and Group 4 were told they were parole interviews. The results for

the omnibus Anova

and descriptive

statistics are

presented in Box

4.15. The

differences among

the four means were

clearly significant.

Although the order of the means suggests something about the nature of this significant

difference, more precise conclusions require follow-up analyses to determine which groups differ

from one another (or some other tests of the pattern among the means, as in chapter 6).

The LSD t Test

The calculations

for the LSD t-test are

shown in Box 4.16. A

common denominator

(1.664) has been

calculated using MSError

= 13.839 from Box

4.15, and the common

group n of 10. The

means have been

ordered from lowest to

highest in the matrix

Denominator t = SQRT(13.839(1/10 + 1/10)) = 1.664

df = 36 t Critical = 2.028

t 1-4 = (27.30 - 20.00)/1.664 = 7.30/1.664 = 4.387

Ordered Group Means

4 3 2 120.00 22.70 26.00 27.30

4 - 1.623 3.606 L 4.387 L

3 - 1.983 2.764 L

2 - .781

1 -

Summary of Differences4 3 2 1-------- --------

--------

Box 4.16. LSD t Calculations and Results for Interview Study.

Source df SS MS F pBetween 3 325.80 108.600 7.848 .0 004Within 36 498.20 13.839Total 39 824.00

Group Mean SD n j

1 27.30 3.302 102 26.00 3.621 103 22.70 4.218 104 20.00 3.682 10

All 24.00 4.597 40

Box 4.15. Anova and Descriptive Statistics for Interview Study.


and differences between the means divided by 1.664 to produce the reported ts. The df for each t

is dfError = 36, which gives tCritical = 2.028 for α = .05. The significant ts are indicated by the

superscript “L” in Box 4.16.

The overall pattern summarized in Box 4.16 using the underline (subset) notation is less

than ideal, as reflected in the overlapping lines. The overlapping lines indicate that a number of

the nonsignificant differences are not transitive. For example, the 1 - 2 comparison is not

significant, the 2 - 3 difference is not significant, but the 1 - 3 difference is significant. The

pattern would have been much better if the 2 - 3 comparison had also been significant (note how

close 1.983 is to the critical value of 2.028). Unfortunately, the SNK and TUKEY procedures are

not going to help with the 2 - 3 comparison, since the LSD procedure is the most liberal of the

three; that is, if a difference is not significant by LSD, then it cannot be significant by SNK or

TUKEY.

The q Statistic

Box 4.17 presents the q calculations and results. Note that qDenominator is smaller than

qDenominator = SQRT(13.839(1/10)) = 1.176 [= 1.664/ %2]

df = 36

q1-4 = (27.30 - 20.00)/1.176 = 7.30/1.176 = 6.207[= 4.3 87 × %2]

Ordered Group Means4 3 2 120.00 22.70 26.00 27.30 Stretch qCritical

4 - 2.296 5.102 LST 6.207 LST 4 3.813 - 2.806 3.912 LST 3 3.452 - 1.105 2 2.871 -

Summary of Differences4 3 2 1

LSD -------- ----------------

SNK -------- ----------------

TUKEY -------- ----------------

Box 4.17. LSD q Calculations and Results for Interview Study.


tDenominator by a factor of %2, and that qObserved is larger than tObserved by the same factor of %2. With df

= 18, the critical value of q for the LSD test would be that for a stretch of 2, q.05, LSD = 2.87 (which

is larger than tCritical by a factor of %2). The superscripts in the matrix show that the same three

differences are significant as reported for t, which must always be the case barring any errors in

the calculations. The summary for LSD is identical in Boxes 4.17 and 4.16. For the Tukey HSD

procedure, the stretch is always k = 4, and q.05, Tukey = 3.81. For Tukey, the 1-3 difference stops

being significant, although just barely. The pattern then is identical to that observed with the LSD

and SNK procedures. All share the same weakness that Groups 2 and 3 do not differ from one

another.

The SNK procedure involves three different critical values. Starting in the upper right

corner with the 1 - 4 comparison, the stretch is 4 and the critical value is 3.81. The 1 - 4 difference

is significant by SNK (as it must be since it was significant by the more stringent Tukey

procedure). The 2 - 4 cell is considered next and is compared to the critical value for a stretch of

3, that is, to 3.45. This difference too is significant. The left-most cell in the top row (i.e., the cell

just above the diagonal) involves a stretch of 2 and a critical value of 2.87 (i.e., the same critical

value as the LSD procedure). This difference is not significant. The same procedure is followed

for the second row, with the rightmost cell involving a stretch of 3, a critical value of 3.45, and a

significant difference. The next cell to the left involves a stretch of 2 and is not significant. The

final comparison is for the 1 - 2 cell, which involves a stretch of 2. The difference is not

significant.

Note again that the 2 - 3 comparison in Box 4.17 just misses being significant by the LSD

and SNK procedures. If this difference had been significant, then the pattern of results would be

much tidier. Specifically, the two “control” conditions (1 and 2) would not differ from each other,

the two “stigma” conditions (3 and 4) would not differ from each other, but all four comparisons

between “control” and “stigma” groups (i.e., 1 - 3, 1 - 4, 2 - 3, 2 - 4) would be significantly

different. Such a pattern lends itself to a tidy conclusion.

Note also the potential for an awkward conclusion if only individual comparisons are

considered. In the second row, the critical value for 1 - 3 is 3.45 and the critical value for 2 - 3 is

2.87. If the observed qs for both cells fell between these values, then the 1 - 3 comparison would


be not significant and the 2 - 3 comparison would be significant. This is an awkward outcome at

best, and perhaps even an illogical one. Starting at the right of each row and stopping at the first

non-significant difference avoids this problem.

Box 4.18

summarizes the

distinct critical values

of q used for these

three post hoc

procedures. Irrespective of stretch, the LSD procedure uses the same (low) critical value of 2.87

for all six comparisons, and the Tukey HSD test uses the same (high) critical value of 3.81. The

SNK falls between these extremes, using the maximum stretch for the biggest span between 1 - 4

(qAlpha = 3.81); the minimum stretch for the adjacent spans of 1 - 2, 2 - 3, and 3 - 4 (qAlpha = 2.87);

and the in-between stretch of three for the remaining two comparisons of 1 - 3 and 2 - 4 (qAlpha =

3.45).

To perform the Bonferroni test on these data, we would divide alpha by 6, the number of

comparisons to get the alpha to use for each comparison; that is, pair-wise alpha = .05 / 6 =

.008333. This test is best done using a statistical package like SPSS. But we can also obtain a

critical value for q based on p = .008333. Using procedures described shortly, the critical value is

3.95, even larger than the critical value for Tukey. Expressed differently, the Bonferroni

adjustment is even more conservative than Tukey.

SPSS Analyses for the Interview Study

Box 4.19 shows the GLM syntax commands to conduct the omnibus ANOVA for the

interview study, and to request the post hoc analyses. Note that the format of the post hoc request

is slightly different for GLM than for ONEWAY. Because GLM can have multiple factors, it is

necessary to specify which factor should be tested (inst here), and then the various post hoc

procedures are listed. This analysis could also have been conducted using menus, as described

previously.

Comparisons Stretch LSD SNK TUKEY

1-4 4 2.87 3.81 3.81

1-3, 2-4 3 2.87 3.45 3.81

1-2, 2-3, 3-4 2 2.87 2.87 3.81

Box 4.18. Comparison of Critical Values for Interview Study.


Edited LSD results are also shown in Box 4.19. Differences 1 - 3, 1 - 4, and 2 - 4 are

significant, and 2 - 3 approaches significance, p = .055. One attractive feature of p values is that

researchers can see how close to significance, various differences are. Although not presented as

such, researchers can determine from the analysis in Box 4.19 that there are three subsets of

means that are not significantly different: 1 - 2, 2 - 3, and 3 - 4.

GLM rate BY inst /POSTHOC = inst(LSD SNK TUKEY BONF ERRONI).

Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 325.800(a) 3 108.600 7.847 .000 Intercept 23040.000 1 23040.000 1664.874 .000 INST 325.800 3 108.600 7.847 .000 Error 498.200 36 13.839

Total 23864.000 40



Mean Difference Std. Sig. (I-J) Error (I) INST (J) INST LSD 1.00 2.00 1.3000 1.663 66 .440 3.00 4.6000(*) 1.663 66 .009 4.00 7.3000(*) 1.663 66 .000 2.00 3.00 3.3000 1.663 66 .055 4.00 6.0000(*) 1.663 66 .001 3.00 4.00 2.7000 1.663 66 .113

Box 4.19. GLM Analysis of Interview Study.


Edited results for the SNK, TUKEY, and BONFERRONI tests are presented in Box 4.20,

the TUKEY in both table and subset formats. The SNK procedure is reported in terms of

homogeneous subsets. The same subsets were identified by the LSD procedure. From the

summary, we

can infer that

group 1 differs

from groups 3

and 4, and

group 2 differs

from group 4.

Note that the p

values reported

for the SNK

comparisons

within subsets,

all of which

involve

adjacent

groups, are

identical to the

LSD ps

reported for

the same three

comparisons (i.e., 4 - 3, 3 - 2, 2 - 1).

TUKEY results are presented in both table and subset format, and lead to the same

conclusions as the LSD and SNK procedures. But note that the p values for TUKEY are

considerable higher than for LSD and SNK. The 2 - 3 difference, for example, had a p value of

.055 in both LSD and SNK, but now has a much higher TUKEY p of .213.

The BONFERRONI results are even less significant; p values are larger again and now

Mean Difference Std. Sig. (I-J) Error (I) INST (J) INST Tukey HSD 1.00 2.00 1.3000 1.663 66 .862 3.00 4.6000(*) 1.663 66 .042 4.00 7.3000(*) 1.663 66 .001 2.00 3.00 3.3000 1.663 66 .213 4.00 6.0000(*) 1.663 66 .005 3.00 4.00 2.7000 1.663 66 .379

Bonferroni 1.00 2.00 1.3000 1.663 66 1.000 3.00 4.6000 1.663 66 .054 4.00 7.3000(*) 1.663 66 .001 2.00 3.00 3.3000 1.663 66 .330 4.00 6.0000(*) 1.663 66 .006 3.00 4.00 2.7000 1.663 66 .680

Homogeneous Subsets N Subset INST 1 2 3 Student-Newman- 4.00 10 20.0000 Keuls(a,b,c) 3.00 10 22.7000 22.7000 2.00 10 26.0000 2 6.0000 1.00 10 2 7.3000 Sig. .113 .055 . 440 Tukey 4.00 10 20.0000 HSD(a,b,c) 3.00 10 22.7000 22.7000 2.00 10 26.0000 2 6.0000 1.00 10 2 7.3000 Sig. .379 .213 . 862

Box 4.20. Other Post Hoc Analyses for Interview Study.


only two of the differences are significant (1 - 4 and 2 - 4). The pattern of differences defines two

subsets, 1 & 2 & 3 as one subset and 3 & 4 as another subset. Although different, this result is

untidy because group 3 (the psychiatric interview) belongs to both sets.

Box 4.21 summarizes the available

p values for the six comparisons and the

four tests. The SNK values for the

intermediate stretch of 3 are not available,

although we can infer that they are less

than .05 and even less than the

corresponding TUKEY values. The p

values remain constant or increase as the

tests become more conservative.

Moreover, the p values for the Bonferroni test are the LSD values times the number of

comparisons, six (e.g., 6 × .055 = .330 for the 2-3 comparison), although one of the Bonferroni p

values was set arbitrarily to the maximum of 1.00 (i.e., 6 × .440 > 1.00). The equal signs indicate

tests that are equivalent; specifically, the SNK ps for stretch 2 are equal to the LSD ps, and the

SNK p for stretch 4 (i.e., k) is equal to the TUKEY value.

In this case, the LSD, SNK, and TUKEY procedures all led to the same conclusions; we

cannot reject the following H0s: µ1=µ2, µ2=µ3, µ3=µ4. The overall pattern of results is somewhat

unclear because of the combined equivalence and nonequivalence of various groups. Focussing

on the parole group, for example, subjects in this condition rated the attractiveness of the

interviewees as lower than groups 1 and 2 (the no instruction and job interview groups) but did

not rate interviewees significantly lower than group 3 (the psychiatric group). However, the

psychiatric group, which was not different from the parole group was also not significantly

different from the job interview group. Overall the results are rather untidy. Ideally, researchers

should have theoretical reasons to predict differences between certain groups, and nondifferences

between other groups. Such planned or focussed comparisons are often called contrasts in the

ANOVA context. These are discussed in the next chapter.

Stretch Groups LSD SNK TUKEY BONFERRONI2 1-2 .440= .440 .862 1.00 2-3 .055= .055 .213 .330 3-4 .113= .113 .379 .6803 1-3 .009 ? .042 .054 2-4 .001 ? .005 .0064 1-4 .000 .001= .001 .001

Box 4.21. Comparison of Post Hoc p values forInterview Study.


CALCULATING CRITICAL AND P VALUES FOR POST HOC TESTS

This section describes how to use SPSS to obtain critical values of t and p for the various

Post Hoc tests described in this chapter. Calculating these values, rather than using more limited

tables, allows us to compare more clearly how conservative each of the tests is. In addition, we

can use SPSS to calculate p-values for the observed test statistic. This is helpful when the exact

degrees of freedom required are not present in tabled critical values for q.

Determining Critical Values for Post Hoc Tests

One of the nice features of the Unix

version of SPSS, as illustrated in this chapter,

is that it provided critical values of q for the

post hoc statistics. We could then arrange

these critical values in order to demonstrate

clearly which procedure is most conservative

and which most liberal. Box 4.22 summarizes critical values of q that would actually be reported

in Unix versions of SPSS for the interview study. We can clearly see that the LSD procedure is

most conservative, qAlpha = 2.87 for all stretches; SNK is next with qAlpha = 2.87, 3.45, and 3.81 for

stretches 2, 3, and 4, respectively, TUKEY is next with qAlpha = 3.81 for all stretches, and

BONFERRONI is most conservative with qAlpha = 3.95 for all stretches.

Later versions of SPSS do not

provide these critical values.

Moreover, our table for q would not

give us values for the Bonferroni test

and may not have entries for certain df.

It is possible to overcome these

limitations using SPSS, because it

allows users to determine critical

values, as illustrated in Box 4.23. Box

4.23 shows the computation of the

critical value of t for the various post

LSD SNK TUK BONStretch

2 2.87 2.87 3.81 3.953 2.87 3.45 3.81 3.954 2.87 3.81 3.81 3.95

Box 4.22. Critical Values for Interview Study.

DATA LIST FREE / str df.BEGIN DATA2 363 364 36END DATA.COMP tlsd = IDF.T(1 - .05/2, df).COMP qlsd = IDF.SRANGE(1-.05, 2, df).COMP qsnk = IDF.SRANGE(1- .05, STR, df).COMP qtuk = IDF.SRANGE(1- .05, 4, df).COMPUTE qbonf = IDF.SRANGE(1 - (.05/6), 2, df).FORMAT str df (F2.0).LIST.

str df tlsd qlsd qsnk qtuk qbonf 2 36 2.028094 2.868158 2.868158 3.808798 3.948445 3 36 2.028094 2.868158 3.456758 3.808798 3.948445 4 36 2.028094 2.868158 3.808798 3.808798 3.948445

Box 4.23. Computing Critical Values for Post HocTests.


hoc tests. The first six lines read the stretches and df for which we want to calculate critical

values. We then use the SPSS functions IDF.T and IDF.SRANGE to compute critical values for

these tests, using alpha = .05 in this example. The printed results are then listed. Note that the

columns headed qlsd, qsnk, qtuk, and qbonf reproduce the values in Box 4.22, although with more

precision. Substituting different values for stretch and df in Box 4.23 allows us to compute

critical values for post hoc tests in any study.

Determining P Values for Post Hoc Tests

We can also use SPSS to

calculate p values for the various post

hoc tests, which allows us to compare

the conservativeness of the different

procedures, to overcome the

limitations of our q test values, and to

fill in some “missing” values that are

not always reported by SPSS. See

Box 4.24 for a summary of the

available p values for the interview

study. The complete table of

observed p values is presented in Box

4.24. Except for the fact that PSNK = PLSD for stretch = 2 and PTUK = PSNK for stretch = 4,

the table clearly shows that the tests become less significant as we go from LSD to SNK to

TUKEY to BONFERRONI. Note also that the p value for PBON is greater than 1.00, .439703*6

= 2.638220. SPSS reports this value as 1.00 in the post hoc printouts because a p value cannot be

greater than 1.

CONCLUSION

We have reviewed various pairwise post hoc comparisons to draw more specific

conclusions about an omnibus ANOVA. These procedures are typically used when the overall F

is significant. We have also seen how SPSS’s ONEWAY and GLM procedures report in diverse

ways the results for such pairwise comparisons, and that post hoc procedures can lead to

DATA LIST FREE /comp str df qobs.BEGIN DATA12 2 36 1.10523 2 36 2.80634 2 36 2.29613 3 36 3.91224 3 36 5.10214 4 36 6.207END DATA.COMP plsd = 1 - CDF.SRANGE(qobs, 2, df).COMP psnk = 1 - CDF.SRANGE(qobs, str, df).COMP ptuk = 1 - CDF.SRANGE(qobs, 4, df).COMP pbon = 6*(1 - CDF.SRANGE(qobs, 2, df)).FORMAT comp str df (F2.0) qobs (F6.3).LIST.

comp str df qobs plsd psnk ptuk pbon 12 2 36 1.105 .439703 .439703 .862393 2.638220 23 2 36 2.806 .054904 .054904 .212796 .329425 34 2 36 2.296 .113204 .113204 .378626 .679226 13 3 36 3.912 .008894 .023592 .042229 .053362 24 3 36 5.102 .000931 .002616 .004931 .005584 14 4 36 6.207 .000095 .000532 .000532 .000573

Box 4.24. P Values for Observed Post Hoc Statistics.


anomalous outcomes that do not permit a tidy summary of results or a tidy explanation. Choosing

a post hoc test can also be complex. There are many such tests, including some that will

accommodate comparisons between combinations of groups (e.g., groups 1 and 2 versus groups 3

and 4). In essence researchers need to consider thoughtfully the consequences of making Type I

and Type II errors, and then select a procedure with the desired degree of conservativeness /

liberalness. The next chapter examines a second approach to multiple comparisons, one that is

more likely to lead to “elegant” conclusions, especially if a priori expectations are correct.

Single-Factor Between-S Design Planned Comparisons 5-1

CHAPTER 5:

PLANNED CONTRASTS FOR SINGLE FACTOR BETWEEN-S DESIGN

The Basics of Planned Contrasts . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Defining Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 3

Formula for Contrast Analysis . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 4

Contrasts for the Two-group Agitation Study . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Manova and Contrasts for Agitation Study . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 8

GLM and Contrasts for the Agitation Study . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 9

Contrasts for k > 2 Designs . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Orthogonal Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 11

Polynomial Contrasts (Trend Analysis) . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 15

Planned Contrasts for the Brain Stimulation Study . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Equivalence of Different Values for Contrast Coefficients . . . . . . . . . . . . . . . . . . . . . . . 19

Polynomial Contrasts for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 20

SPSS and Contrasts for the Brain Stimulation Study .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

ONEWAY and Planned Contrasts for the Stimulation Study . . . . . . . . . . . . . . . . . . . . . 21

MANOVA and Contrasts for the Brain Stimulation Study . . . . . . . . . . . . . . . . . . . . . . . 22

GLM and Contrasts for the Brain Stimulation Study . .. . . . . . . . . . . . . . . . . . . . . . . . . . 24

Polynomial Contrasts in GLM and MANOVA . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 27

Planned Contrasts for the Interview Study . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Control versus Stigmatized Groups . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 30

Control versus Stigmatized Groups: SPSS Analyses . . .. . . . . . . . . . . . . . . . . . . . . . . . . 32

Polynomial Contrasts for the Interview Study: SPSS Analyses . . . . . . . . . . . . . . . . . . . 36

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 39

Appendix 5.1: Contrasts and Differences Between Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Appendix 5.2: GLM and SSs for Planned Comparisons . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


A "few" specific comparisons are plannedprior to examining the means (researcherscannot "plan" all possible comparisons), Fneed not be significant (but can be), and littleor no adjustment is made for the number ofcomparisons (the strong version of this claimis generally doubtful).

Box 5.1. Conditions for Planned Contrasts.

Chapter 4 described four (of many) statistical procedures for performing all-possible

pairwise comparisons in a single-factor, between-S design. But researchers may be interested in

differences that cannot readily be tested by pairwise comparisons. For example, the difference

between two control groups and two treatment groups might be critical for the research, as might

a linear effect of treatment (i.e., a systematic increase or decrease) across the ordered levels of a

factor. Such analyses are generically referred to as contrasts, and are usually Planned (or A

Priori ) Contrasts. This chapter describes the logic of such planned contrasts, once again using

the studies analyzed previously by other methods. We also discuss using SPSS ANOVA

methods to conduct these planned comparisons; a later chapter demonstrates that these tests can

readily be done using multiple regression.

THE BASICS OF PLANNED CONTRASTS

Something more than pairwise comparison procedures are required when a specific

pattern of differences among groups is predicted. Given a treatment group and two control

groups, for example, researchers might expect that the treatment group mean would differ from

the average of the two control group means, and that the two control group means would not

differ. Or if the Between-S factor involved ordered levels of some variable (e.g., amount of

reinforcement, age), then the researchers might be interested in examining various linear and

non-linear effects. Such designs call for Planned Contrasts.

Box 5.1 shows the general conditions

under which "a priori" or planned comparisons

are appropriate. Planned comparisons

generally involve a relatively small number of

specific tests rather than all possible

comparisons. In the single-factor design, for

example, the number of comparisons is often

limited to the df for the treatment effect (i.e., k - 1 comparisons). Moreover, the specific tests are

planned prior to observing the data, usually on the basis of theoretical predictions and/or previous

findings. That is why such comparisons are often called a priori comparisons.

There are several advantages to a priori rather than post hoc comparisons. One advantage


is that the omnibus F need not be significant in order to perform planned comparisons. Even a

nonsignificant F can be followed by planned contrasts, which are generally more sensitive to

predicted differences than the omnibus F, as we will see in several of our examples. By more

sensitive, we mean that a planned comparison can show a particular contrast to be significant

even when the omnibus F is not. A second advantage is that less of an adjustment to alpha is

needed for the number of comparisons being performed; indeed, many researchers do not make

any adjustment, at least for certain kinds of planned comparisons.

Defining Contrasts

A contrast is defined as the sum of each group mean multiplied by a coefficient for each

group (i.e., a signed number, cj), with the restriction that the sum of the coefficients alone across

the k groups equals 0. An uppercase L (for linear contrast) will be used to represent a specific

contrast and lowercase c's used for the individual coefficients. Our definition for contrast would

be written as: L = Σcj y&j, where Σcj = 0. The subscript j in this notation indicates the level of the

factor being examined, identical to its use for group means and other group statistics.

To illustrate with four groups, one possible set of contrast coefficients would be c1 = !3,

c2 = !1, c3 = +1, and c4 = +3; the sum of these coefficients is 0. Another possible set of contrast

coefficients would be c1 = !3, c2 = +1, c3 = +1, and c4 = +1; the sum of these coefficients is also

0. Numerous other contrasts exist for four groups.

Each set of contrast coefficients in essence defines and tests for a particular pattern of

differences among the means. The second set, for example, tests whether the mean for group 1

differs from the average of the other three means. Although slightly more complex, the first set

of coefficients tests whether there is a linear increase or decrease across the four means; note that

the coefficients !3, !1, +1, and +3 increase by the same amount (2 units) for each successive

group. That defines a linear pattern. We will see, especially in Chapter 6, that contrasts are

closely related to the indicator variables used in regression analysis of ANOVA designs).


L ' j

k

j '1

c j y j j c j ' 0 SEL' MSError jc

2j

n j

t L'L&0

SEL

df 'df Error

Equation 5.1. Linear Contrasts.

SSL'n j L 2

j c 2j

FL 'SSL/1

MSError

' t 2L df L ' 1,df Error η2'

SSL

SSTotal

Equation 5.2. Anova for Contrasts.

Formula for Contrast Analysis

Equation 5.1 shows several formula relevant to contrasts, including its definition. We can

test the significance of our contrast against an hypothesized value of 0 by calculating the SEL and

using this SE as the denominator for a t-test, as shown in Equation 5.1. The t will have df = dfError

because MSError appears in the denominator for t just as it did for the LSD t test.

It is also possible to analyze contrasts using the F statistic, which has a number of

advantages, as we shall see. Equation 5.2 shows formula to calculate a SS for any contrast, which

can then be used to calculate an F statistic. One benefit of the F-test approach is that the SS

associated with the contrast can be used to obtain an η² for the contrast, as shown in Equation

5.2. A second benefit of the F-test approach is that the sum of the SSs for the k - 1 contrasts will

equal SSTreatment, assuming that the k - 1 contrasts were selected to be independent, as described

below. Equation 5.2 also shows that the t-test of Equation 5.1 is equivalent to the F-test of 5.2;

hence, only one of these tests is normally used.

These formulae may look novel, but we will see in the following section that they do

reproduce the results from more familiar versions of the t-test and the F-test. And some

substitution (e.g., sp2 for MSError) and a little algebra would show why they are equivalent.

Because contrasts do use a novel method to perform the comparisons, we begin with a two group

example.

Contrast analysis involves comparisons (or contrasts) between specific groups or specific


y&1 y&2

12.0 10.0

c j -1 +1 Σc j = -1 + +1 = 0

L = -2.0 = Σc j y&j

= -1 × 12.0 + 1 × 10.0= -2 = y &2 - y &1

T-test of C

SE L = 1.095 = %(MSError Σ(c j ²/n j ))= %(3.6(1/6+1/6))

t L = -1.83 = (L - 0) / SE L

= -2.0 / 1.095= t y&2-y&1 = %3.33 = %F

F-test of C

SS L = 12.0 = n j L²/ Σc²= 6 × -2.0² / (-1²+1²)= SS Treatment (when k = 2)

F L = 3.33 = (SS L/1) / MS Error= 12.0 / 3.6= F ANOVA = 1.83² = t L²

η2L = .25 = SS L / SS Total = 12.0/48.0

Box 5.2. Contrast Analysis for k=2 Independent GroupANOVA.

combinations of groups. When there are two groups, the only contrast is between those two

groups, so contrast analysis essentially duplicates the results of the ANOVA and the

corresponding t-test between the two group means. This redundancy means that the two-group

design provides an excellent introduction to contrasts, because the one contrast will be equivalent

to tests with which we are already familiar. This is helpful because planned comparisons can

appear somewhat abstract and a little mysterious at first. This mystery also lessens somewhat if

we keep in mind that contrast analysis can be viewed as something of a hybrid between Anova

and regression approaches. Essentially, researchers define contrasts (sets of numbers that define

some expected pattern) and then examine the correlation between these defined numbers (i.e., the

pattern) and the variability in the group means.

CONTRASTS FOR THE TWO-GROUP AGITATION STUDY

The two-group agitation study was described in previous chapters. It involved a

comparison of agitation scores for 6

Control and 6 Treatment subjects.

Our calculations produced t = 1.83,

and F = 3.33, neither of which was

significant by the standard two-tailed

test (see Boxes 2.4 and 2.6) for the

analyses. MSError (equivalently, sp2)

was 3.6.

Box 5.2 illustrates the

contrast analysis for this two-group

Agitation study, using coefficients of

-1 and +1 for groups 1 and 2,

respectively. Note that -1 + 1 = 0, a

prerequisite for a contrast. The

computed value of the contrast is !1

× 12.0 + 1 × 10.0 = !2.0, which in

this case equals the difference


between the means.

As shown in Equations 5.1 and 5.2, this difference or contrast can be tested for

significance in two ways. A t-test can be computed using as the standard error of the contrast, a

quantity based on MSError from the ANOVA, the contrast coefficients, and the sample sizes. For

the present example involving only two groups, the computations are shown in Box 5.2. SEL =

1.095, which equals the denominator for the t-test in our earlier analyses. The t for the contrast

also equals the t for the difference between means (and %F), not surprising since the numerator

(i.e., the contrast L) is the difference between means and the denominator is identical to that used

for the independent groups t-test. The df for the t-test will be the df associated with the MSError,

that is, N - k = 12 - 2 = 10, in our example.

The significance of contrasts can also be tested using an F-test, and this approach is also

shown in Box 5.2. The SS for a contrast is obtained by squaring the contrast, multiplying by the

number of observations in each group, and dividing by the sum of the coefficients squared.

Including the coefficients in the formula “adjusts” for the magnitude of the coefficients, as shown

shortly. Although somewhat impenetrable, this formula does provide a meaningful SS. In the

two-group case, for example, SSL for the contrast equals SSTreatment from the standard Anova

previously conducted on this data. The F test for the contrast is the ratio of the SSL (conceptually

divided by df = 1 to produce a MSL) to the MSError. In the two-group case, this F equals both the

ANOVA F and the independent t².

When there are only two groups, the contrast analysis is completely redundant with the

analyses that we have already performed (i.e., t, F). But this is not the case when there are more

than two groups; then, it is possible to make specific contrasts that provide additional

information about the differences among the groups.

Contrasts and SPSS ONEWAY

Whereas the /POSTHOC option in ONEWAY permits all-possible pairwise comparisons

of means, selective pairwise comparisons or comparisons involving more than two groups are

done using the ONEWAY /CONTRAST command. Several contrast subcommands can be

included as options on the same ONEWAY command. Each CONTRAST command is followed

by k coefficients, one for each of the k groups in the ANOVA. To illustrate with a k = 4 design,


ONEWAY agit BY group /CONTRAST = -1 +1.


Contrast Coefficients GROUP Contrast 1.00 2.00 1 -1 1

Contrast Tests Contrast Value of Std. Error t df Sig.

Contrast (2-tailed)

AGIT Assume equal 1 -2.0000 1.09545 - 1.826 10 .098 variances

Box 5.3. SPSS Contrasts for Two-group Design.

Figure 5.1. One-Way Contrasts via Menus.

/CONTRAST = -1 +1 0 0 would compare group 1 to group 2, and /CONTRAST = -1 -1 +1 +1

would compare groups 1 and 2 combined to groups 3 and 4 combined. Here we will use the

contrast command for a k = 2 design, which permits only one contrast, namely /CONTRAST = -1

+1, although the actual numerical coefficients could be different (e.g., -.5 +.5).

The contrast command is illustrated in Box 5.3 for the two-group agitation study. The

contrast is redundant when there are only two groups because the original t and F test the

significance of the difference between the two means, and the contrast coefficients in Box 5.3 (-1

+1) also correspond to a contrast between the two means. The value of the contrast (L = -2.0), its

standard error (SEL = 1.095), and the t-value (tL = -1.826) all agree with earlier calculations and

analyses, including in this case the initial t-test because k = 2. The SS for the contrast can be

calculated by (6 × -2²)/(-1² + 1²) = 12.0,

which equals SSTreatment for the k = 2

design. Note also that the p values for the

contrast and the omnibus F are identical,

.098, and that tContrast2 = -1.8262 = 3.334 .

FOmnibus. Contrasts are not redundant with

the omnibus analysis when k > 2.

To request this analysis by menus,


select Analyze | Compare Means | One-way ANOVA. Select and move the independent factor

and dependent variable into the appropriate boxes. Then, to specify the contrast, click on the

Contrasts button shown at the bottom of the main One-way ANOVA screen in Figure 5.1. This

brings up the One-Way ANOVA: Contrasts screen also shown in Figure 5.1. To specify a

particular set of contrast coefficients, enter each individual coefficient in the Coefficients box and

click Add to add the coefficient to the list. In Figure 5.1, +1 has just been entered and can now be

added to the set of coefficients. The first value of -1 had already been added to the list.

Following completion of one set of contrasts, it would be possible to select Next to specify values

for another contrast. When all contrasts have been specified, Continue will return to the main

One-Way ANOVA box and OK will run the analysis.

Manova and Contrasts for Agitation Study

Box 5.4 illustrates the syntax for conducting specific contrasts using MANOVA, another

of SPSS’s general purpose Anova programs. Here the CONTRAST sub-command requires the

name of the variable in parentheses because MANOVA can handle factorial designs. This is

followed by an equals sign, the keyword SPECIAL, and then k sets of k coefficients within

parentheses. Although MANOVA requires k sets of coefficients (rather than k - 1), the first set

of coefficients will always be k 1s. These k 1s represent variability associated with the deviation

of the grand mean from zero and, although required, they are not generally associated with any

MANOVA agit BY group(1 2) /CONTRAST(group)=SPECIAL( 1 1 -1 1).

Tests of Significance for AGIT using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F

WITHIN CELLS 36.00 10 3.60 GROUP 12.00 1 12.00 3.33 .098

(Model) 12.00 1 12.00 3.33 .098 (Total) 48.00 11 4.36

R-Squared = .250 Adjusted R-Squared = . 175

GROUP Parameter Coeff. Std. Err. t-Value Sig. t 2 -2.0000000 1.09545 -1.82574 .097 85

Box 5.4. Two-Group Contrasts using MANOVA.


contrast of interest. It is the remaining k - 1 sets of k coefficients that are interesting. Each set

will contain one coefficient for each of the k groups. In Box 5.4, the coefficients of +1 +1 -1 +1

represent variability due to the grand mean (+1 +1) followed by the k - 1 = 2 - 1 = 1 set of k (i.e.,

2) coefficients for the contrast(s) of interest (-1 +1). By default, MANOVA reports the test of

each contrast as a t-test, shown in Box 5.4. Note again that the ps for t and F are identical (.098),

that t2 = FOmnibus, and that SSTreatment = SSL = (6 × 2.0)2 / (-12 + 12) = 12.0. Although the statistical

tests for the contrasts are reported here as t tests, we will see later that MANOVA has alternative

ways to report the results of contrasts.

GLM and Contrasts for the Agitation Study

Box 5.5 shows the equivalent analyses using GLM. The format of the command is

similar to MANOVA, except that GLM does not require the k 1s representing the grand mean. If

included, those coefficients would represent variation of the grand mean about 0.


Although the contrast output from GLM is more extensive than the other programs, the

essential elements remain the same. We see a contrast value for L1 of -2.00, a SE of 1.095, and a

significance of p = .098, which in the k = 2 situation is identical to the p for the F test. Although

not reported, t = -2.0 / 1.095 = 1.826, agreeing with previous reports and equalling %F.

Similarly, we could calculate SSL = (6 × 2.0)2 / (-12 + 12) = 12.0 = SSTreatment. This GLM analysis

could also be requested using menus, which we demonstrate for a later example in this chapter.

CONTRASTS FOR K > 2 DESIGNS

The contrasts in the preceding sections were only done for pedagogical reasons,

specifically to introduce the basic idea of planned contrasts, and the operations for their

GLM agit BY group /CONTRAST(group) = SPECIAL(-1 1).

Tests of Between-Subjects EffectsDependent Variable: AGIT Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 12.000(a) 1 12.000 3 .333 .098 Intercept 1452.000 1 1452.000 403 .333 .000 GROUP 12.000 1 12.000 3 .333 .098 Error 36.000 10 3.600 Total 1500.000 12 Corrected Total 48.000 11


Custom Hypothesis TestsContrast Results (K Matrix) Depend ent Variab le GROUP Special Contrast AGIT L1 Contrast Estimate -2.000 Hypothesized Value 0 Difference (Estimate - -2.000 Hypothesized) Std. Error 1.095 Sig. .098 95% Confidence Lower Bound -4.441 Interval for Difference Upper Bound .441Test ResultsDependent Variable: AGIT Source Sum of Squares df Mean Square F Sig. Contrast 12.000 1 12.000 3.333 .098 Error 36.000 10 3.600

Box 5.5. Two-Group Contrasts and GLM.


calculation. Contrasts provide no additional information when there is only two groups, and

indeed there is only one possible contrast that could be conducted. With more than two groups,

however, contrasts can be critical for interpretation of the results and there are various ways to

define the contrasts.

In a three-group study, for example, it would be possible to do a specific contrast that

compared groups 1 and 2; the coefficients would be -1 1 0 for groups 1, 2, and 3 respectively

(Note: Σcj = 0). Contrasts can also be done on combinations of groups. For example, we could

compare the average of groups 1 and 2 to group 3 using -1 -1 and +2 as coefficients (or

equivalently, -.5 -.5 +1, since the absolute size of the coefficients is irrelevant for most purposes).

In general, there are as many independent or orthogonal contrasts among a set of means as there

are degrees of freedom in the between groups effect (i.e., k - 1 in the single factor design). Two

groups only allow 2 - 1 = 1 independent contrast, which we just saw is redundant with the

Anova. Three groups allow 3 - 1 = 2 independent contrasts, four groups allow 4 - 1 = 3

independent contrasts, and so on. The specific contrasts are determined by the nature of the

factor and expectations about its relationship to the dependent variable.

There are several guidelines to follow in performing planned contrasts. Although the

number of possible contrasts can in principle be quite large under special circumstances (an

adjustment for the number of comparisons should then be used), it is generally desirable to keep

the number of contrasts less than or equal to the degrees of freedom for the effect being analyzed.

In the single-factor ANOVA, this means that k - 1 planned contrasts is a desirable maximum

(although not an absolute one). Fewer contrasts than this are even better, and more contrasts than

this would need to be justified and should be adjusted for (i.e., post hoc type adjustments used).

Orthogonal Contrasts

A second desirable (but again not essential) characteristic of multiple contrasts is that

they be independent of one another. Contrasts that are uncorrelated are called orthogonal

contrasts. If contrasts are orthogonal (i.e., independent, uncorrelated), then researchers know that

each contrast is testing the significance of a unique component of the overall differences among

the means (i.e., a unique SS). If contrasts are correlated, then the same variation in means could

contribute to the SSs for different contrasts, and some of the variation across all the means may


not be captured by any of the specific contrasts. The number and orthogonality of contrasts are

related issues because sets of exactly k - 1 orthogonal contrasts are possible; a set of more than k

- 1 contrasts will necessarily involve correlated contrasts.

Because even just two contrasts can be correlated, it is necessary to plan contrasts to

ensure that they are independent. Although it can be a challenge to determine in advance what

contrasts are orthogonal to existing contrasts, it is always possible to determine whether known

contrasts are orthogonal. When two contrasts are independent, the sum of the products of their

respective coefficients is zero; that is, Σcjcj' = c1 × c1' + c2 × c2' + ... + ck × ck' = 0, where c and c'

are independent contrasts. If Σcjcj' = 0, then contrasts c and c' are uncorrelated.

This test for independence is illustrated in Box 5.6 for a hypothetical study involving four

groups. Six contrasts are actually shown. Because 6 is greater than k - 1 = 3, all of these

contrasts cannot be mutually independent. Contrast 1 might be independent of two other

contrasts (giving k - 1 independent contrasts), but one or more of those three contrasts must be

correlated with some of the remaining contrasts.


1 - Anxiety Control GroupGroups | 2 - Depression Control Group

| | 3 - Anxiety Treatment | | | 4 -Depression Treatment Group

Coefficients c 1 c2 c3 c4 Comparison

L1 -1 +1 0 0 Anx Cont vs. Dep ControlL2 0 0 -1 +1 Anx vs. Dep TreatmentL3 -1 -1 +1 +1 Controls vs. Treatments

L4 -1 0 +1 0 Anx Cont vs. Anx TreatmentL5 0 -1 0 +1 Dep Cont vs. Dep TreatmentL6 -1 +1 -1 +1 Anx Groups vs. Dep Groups

L1, L 2, and L 3 constitute one set of mutually orthogonal contrast sΣc1j c2j = 0 = -1× 0 + 1× 0 + 0×-1 + 0× 1Σc1j c3j = 0 = -1×-1 + 1×-1 + 0× 1 + 0× 1Σc2j c3j = 0 = 0×-1 + 0×-1 + -1× 1 + 1× 1

L4, L 5, and L 6 constitute a second set of mutually orthogonal con trasts

L1 and L 4 are not independent: Σc1j c4j = +1 = -1×-1 + 1× 0 + 0×+1 + 0× 0

L3 and L 6 are independent: Σc3j c6j = 0 = -1×-1 + -1×+1 + 1×-1 + 1×+1

Box 5.6. Testing Contrasts for Independence (Orthogonality).

As shown in Box 5.6, contrast 1 is independent of contrasts 2 and 3, which are

independent of one another. Therefore, contrasts 1, 2, and 3 are k - 1 mutually independent (or

orthogonal) contrasts. Mutually orthogonal means that each contrast is independent of or

uncorrelated with all other contrasts in that set and is determined by summing cross-products of

the paired coefficients (shown in Box 5.6 for contrasts 1, 2, and 3). Contrasts 4, 5, and 6 make

up a second set of k - 1 mutually orthogonal contrasts (to test this, cross-multiply all possible

pairs of coefficients).

The two sets of mutually orthogonal contrasts (i.e., one set containing 1, 2, and 3, and a

second set containing 4, 5, and 6) cannot be completely independent of one another. For

example, contrasts 1 and 4 are not orthogonal. Intuitively, contrasts 1 and 4 are not independent

because group 1 is compared to group 2 in contrast 1 and to group 3 in contrast 4 (i.e., the group

1 mean is one pole of two different contrasts). Although many contrasts across sets are not

orthogonal to one another, there are also numerous ways to select orthogonal contrasts for any


particular design. In Box 5.6, for example, contrasts 3 and 6 are orthogonal.

The preceding appears quite abstract to most students. The trick is to appreciate that

virtually any expected pattern for a set of means can be translated into contrast coefficients. We

will get many more examples of this here and in future chapters. And patterns will be

uncorrelated as long as the cross-product of their respective coefficients sums to zero. The

rationale for this last rule is actually quite basic. Remember that contrast coefficients must sum

to 0, which means that their means are 0. If we let x and y stand for two sets of contrast

coefficients, then the definitional formula for the Sum of Cross Products (SCP) can be

simplified, as follows: SCP = G(x - x& )(y - y&) = G(x - 0)(y - 0) = Gxy. Since SCP is the

numerator for the correlation and regression coefficients, SCP = Gxy = 0 ensures that r = 0 for

the correlation between the two contrasts (i.e., they are independent or orthogonal). We

demonstrate this later using correlation and regression.

The benefit of mutually orthogonal contrasts is that they permit researchers to divide the

SSTreatment into separate, independent components (i.e., SSL1, SSL2, and so on) that will sum to

SSTreatment. Each component has an associated F or t-test that will be meaningful, assuming the

contrasts are chosen carefully. One way to conceptualize this process is that the omnibus F

divides SSTotal into SSTreatment and SSError, and mutually orthogonal contrasts further divide

SSTreatment into specific components (SSL1, SSL2, ...). If contrasts are not orthogonal, then the sum

of the SSs for the separate contrasts might be greater or less than SSTreatment because some source

of variation is included in more than one contrast or some source of substantive variation is

omitted from all of the contrasts.

It is worth noting here that only SSTreatment in the Between-S design is subdivided into

contrasts; SSError remains the same. In Within-S designs, both SSTreatment and SSError are partitioned

into contrast-specific components. This introduces an additional complexity into the Within-S

design and can lead to anomalous outcomes, as shown in later chapters.

In generating mutually orthogonal contrasts, each successive contrast allows less freedom

of choice about subsequent contrasts. Indeed, by the last contrast, only one possible set of

coefficients will be orthogonal to the preceding k - 2 sets of coefficients. With four groups, for

example, there are many possibilities for the first contrast. But once we have chosen the first


contrast, this limits possibilities for the next contrast. And once we have the first two contrasts,

then the final orthogonal contrast will be fully determined. Say, for example, that our first

contrast is !1 !1 +1 +1. Then our second contrast could be one of a number of possibilities,

such as: !1 +1 0 0, 0 0 !1 +1, !1 +1 +1 !1, or !1 +1 !1 +1. Once we had settled on our first

and second contrasts, however, the third contrast is also set. Given the first contrast of !1 !1 +1

+1, and the second contrast of !1 +1 0 0, then the third contrast must be 0 0 !1 +1. Given !1 !1

+1 +1 and !1 +1 !1 +1, then the third contrast must be +1 !1 !1 +1. What this means from a

practical point of view is that researchers generally define their principal contrasts first to ensure

that they are present in the final set of orthogonal contrasts.

Polynomial Contrasts (Trend Analysis)

Although Anova and contrasts were designed especially for the treatment of categorical

variables (e.g., qualitatively different drugs or therapies, strategies in a learning study, parenting

styles, and so on), there are special orthogonal contrasts that are appropriate when the

independent variable or factor involves groups ordered along some numerical dimension (e.g.,

amount of reinforcer, study time, levels of intelligence, and so on). Such quantitative variables

as low, medium, or high levels of some factor can be analyzed by traditional ANOVA. As in

other designs, however, researchers usually want to draw specific conclusions about the nature of

the differences between the groups. When the groups are ordered, a meaningful question is

whether there is a systematic change in the means as the numerical variable increases (e.g., do the

means increase or decrease in a linear manner). Polynomial coefficients permit the division of

SSTreatment into k - 1 components (linear and nonlinear), each having one degree of freedom.

The coefficients for polynomial contrasts are shown in Appendix A.5 for up to 6 groups

(coefficients exist for more groups if needed). In principle there are k - 1 orthogonal polynomials

possible, but anything beyond the first few components is meaningful only in very special circumstances.


Figure 5.2. Polynomial Coefficients for k = 3.

Figure 5.3. Polynomial Contrasts for k = 4.

For three groups, there are two possible

polynomial contrasts, linear and quadratic. The

linear coefficients are -1, 0, and +1, and the

quadratic coefficients are +1, -2, and +1. Note

that the linear and quadratic contrasts are

orthogonal; that is, GcLincQua = !1 × !1 + 0 × 2

+ 1 × !1 = 0. Figure 5.2 plots the values of the

linear and quadratic coefficients for k = 3. The

linear coefficients form a linear pattern; that is,

there is a constant increase from coefficient 1

(!1) to coefficient 2 (0) to coefficient 3 (+1). If the means are ordered in a similar way, either

increasing or decreasing, then there will be a strong correlation between the linear coefficients

and the means. The quadratic coefficients form a perfect U-shaped pattern, first decreasing and

then increasing. If the means follow this pattern, or the inverse U-shaped pattern of an increase

followed by a decrease, then there will be a strong correlation between the coefficients and the

data. The contrasts test the significance of the correlation between the data and the linear and

quadratic coefficients.

A factor with 4 levels will have k - 1

polynomial contrasts, specifically linear,

quadratic, and cubic. The linear coefficients are

!3, !1, 1, 3; the quadratic coefficients are 1,

!1, !1, 1; and cubic coefficients are !1, 3, !3,

1. These three contrasts are mutually

orthogonal and partition the SSTreatment into three

unique components; that is, SSLinear + SSQuadratic

+ SSCubic = SSTreatment. Figure 5.3 plots the

linear, quadratic, and cubic coefficients for k =

4. The linear coefficients in Figure 5.3 show a linear increase and will capture systematic

increases or decreases in the means. The quadratic coefficients show a decrease followed by an


increase (i.e., the U-shaped pattern), and will capture systematic changes involving one change in

direction. The cubic coefficients are sensitive to several reversals in direction (up-down-up or

down-up-down).

In essence polynomial contrasts partition any pattern in the k means into k - 1

components or trends. Each pattern has a SS associated with it, and ΣSSContrast is equal to the

SSTreatment. The linear component is equivalent to a simple correlation between the ordered

treatment numbers and the dependent variable. The nonlinear components are analogous to the

power predictors (i.e., y2, y3, etc.) used for nonlinear or polynomial regression.

PLANNED CONTRASTS FOR THE BRAIN STIMULATION STUDY

Contrast analysis for a multi-group study will be demonstrated first with the 3-group

brain stimulation study used to illustrate pairwise comparisons. Recall that rate of barpressing

was measured for three groups of animals: a No Stimulation group (NS), a control stimulation

group (area A), and an experimental stimulation group (area B). The overall ANOVA was

significant, suggesting that the null hypothesis of equal population means was false. In contrast

to pairwise comparisons, however, the omnibus F need not be significant in order to do the

planned contrasts presented here.

The brain stimulation study involved two control groups (NS and A) and one treatment

group. This design suggests two specific contrasts. One contrast would test the difference

between area B and the average of NS and area A. The coefficients would be !1 !1 +2. A second

contrast would test the difference between NS and area A (i.e., between the two control groups).

The coefficients for this contrast would be !1 +1 0. These two contrasts exhaust the df associated

with the treatment effect in this study (i.e., k - 1 = 2). Moreover they are independent (i.e.,

orthogonal). Note that the sum of their cross-products equals zero; that is, (!1 × !1) + (!1 × +1)

+ (+2 × 0) = 0. Calculations and results for this contrast analysis are shown in Box 5.7. It would

also be possible to compare each treatment group to the NS control group, but these contrasts

would not be orthogonal; that is, the cross-product of !1 +1 0 (A versus NS) and !1 0 +1 (B

versus NS) sums to +1 rather than zero.


NS A B 4 3 10 5 2 8 6 4 6

y&j = 5.0 3.0 8.0 y &G = 48/9 = 5.333

SOURCE df SS MS=SS/df F=MST/MSE F .05;2,6

Treatment k-1=2 38.0 19.00 9.5 5.14Error n-k=6 12.0 2.00 F $ F α, reject H 0

Total n-1=8 50.0

Contrast Analysis

NS A B t .05,6 = 2.447 F .05;1,6 = 5.99 = t α²

y&j 5.0 3.0 8.0 L t L SSL F L

c1j -1 -1 2 8.0 4.00 32.0 16.0 = t L1²

c2j -1 1 0 -2.0 1.73 6.0 3.0 = t L2²

ΣSSL = 38.0 M F = (16+3)/2= SS Treatment = 9.5

= F Omnibus

Do Not Reject H 0: µ NS=µA Reject H 0: µ NS+A=µB

Calculations for L 2

L2 = Σc2j y&j = -1×5.0 + 1×3.0 + 0×8.0 = -2

t L2 = L 2/ %MSE( Σc2j ²/n j ) = -2.0 / %2.0(1/3+1/3) = -2/1.152 = 1.732 . %FL2

SSL2 = n j L2²/ Σc j ² = (3×-2.0²) / (-1²+1²+0²) = 6

FL2 = SS L2/MSError = 6.0/2.0 = 3.0 . 1.732² = t 2L2

Box 5.7. Contrast Analysis of Barpressing (k = 3) Study.

The analysis in Box 5.7 demonstrates several useful features of contrasts. First, the

planned contrasts permit a very "tidy" or "elegant" description of the results. The results indicate

that the difference between the two control groups (i.e., contrast 2) is not significant, whereas the

difference between the treatment group and the average of the two control groups (i.e., contrast

1) is significant. Second, the sum of the SSs for the contrasts is equal to SSTreatment, which must

occur whenever we use k - 1 = 2 orthogonal contrasts. Orthogonal contrasts partition the

SSTreatment into two independent components that sum to SSTotal. For this same reason, the average

of the Fs for the two contrasts is equal to the FOmnibus, as shown in Box 5.7.

Third, note that the F for contrast one is larger and more significant than the omnibus F


NS A B n j = 3

y&j 5.0 3.0 8.0 L SS =n j L2/ Σc2j

Original Coefficients

c1 -1 1 0 -2.0 6.0

c2 -1 -1 2 8.0 32.0

Original Coefficients ÷ Σ|c j | ( Σ|c j | = 1)

c' 1 -.5 .5 0 -1.0 6.0

c' 2 -.25 -.25 .5 2.0 32.0

Normalized Coefficients: Original ÷ %%%%Σc2j ( Σc2

j = 1) SS = nL 2

c'' 1 -.707 .707 0 -1.414 6.0

c'' 2 -.408 -.408 .816 3.264 32.0

Box 5.8. Alternative Values for Contrast Coefficients.

(16.0 versus 9.5 for the omnibus F). Although both these Fs are significant in this particular

case, the fact that the omnibus and contrast Fs do differ means that under certain circumstances

the omnibus F might not be significant even when the specific F for the contrast is significant.

This would occur because the significant contrast would be “watered down” when averaged

together with a non-significant contrast, resulting in a non-significant omnibus F. Finally, note

that the t-tests and F-tests for the contrasts are equivalent to one another (i.e., t² = F or F = %t), so

only one of the tests is normally done.

Equivalence of Different Values for Contrast Coefficients

Except for their relative values (i.e., the pattern), the actual numbers used for the contrast

coefficients are largely irrelevant to the analyses because Σc2j appears in the denominator of the

formulae for t and SSContrast "adjusts" the final statistics for whatever values were used to produce

the contrast. This point might be clearer with an example.

Box 5.8 reproduces the contrasts from Box 5.7 and shows two alternative sets of

coefficients that could be used to perform the identical contrasts. The second set of contrasts

uses fractions, the absolute values of which sum to 1. For these coefficients, the contrast L

represents the deviation of the cell means from the "grand" mean of the conditions being


compared. Specifically, L = -1.0 for the first contrast represents the deviations of the means for

NS (5.0) and A (3.0) from their average value of 4.0. The magnitude of the second contrast is

2.0, which represents the deviation of y&NS+A = 4.0 and y&B = 8.0 from their overall average of 6.0.

The third set of contrasts uses normalized coefficients (orthonormal coefficients if also

orthogonal). A property of normalized coefficients is that Σc2j = 1, which means that SSContrast =

nL2/1 = nL2. We will generally use whole-number contrasts, mainly for ease of calculation, but

will also have occasion in later chapters to use normalized contrasts to demonstrate some features

of Within-S analyses. The main point at present, however, is that it is the pattern of differences

among the contrast coefficients that matters, not the actual numbers.

Polynomial Contrasts for the Brain Stimulation Study

Although perhaps not the most relevant contrasts for this study, polynomial contrasts for

the brain stimulation study are shown in Box 5.9, along with the calculations of the SSs. Such

contrasts would be appropriate if, for example, researchers expected the means to be ordered

from No Stimulation (Group 1) to Stimulation in area A (Group 2) to Stimulation in area B

(Group 3), or if they expected some kind of curvilinear pattern (e.g., a decrease from Group 1 to

Group 2, followed by an increase from Group 2 to Group 3). Or they might have expected the

opposite patterns. Previous findings or theory would be necessary to justify such expectations in

this case, because the three groups do not constitute a natural ordering.

Note in Box 5.9 that SSLinear +

SSQuadratic = 38.0 = SSTreatment; that is, SSTreatment

has been partitioned into two orthogonal

components. Also note that SSQuadratic, the U-

shaped pattern, is stronger than SSLinear; its SS

is almost twice as large. Indeed, the dominant

pattern when we examine the three means is

the decrease from 5.0 to 3.0, followed by the

increase from 3.0 to 8.0. It is this pattern that

is being identified by the quadratic contrast. The linear component accounts for some variability

because the mean for group 3 (i.e., 8.0) is higher than the mean for group 1 (i.e., 5.0). Although

Group

1 2 3

y&j 5.0 3.0 8.0 L SS L

Lin -1 0 +1 +3.0 13.50

Qua +1 -2 +1 +7.0 24.50

GSSL = 38.00

Box 5.9. Polynomial Contrasts for BrainStimulation Study.


ONEWAY press BY group (1,3) /CONTRAST = -1 1 0 /CONTRAST = -1 -1 2.

SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES RATIO PROB.BETWEEN GROUPS 2 38.0000 19.0000 9 .5000 .0138WITHIN GROUPS 6 12.0000 2.0000TOTAL 8 50.0000

CONTRAST COEFFICIENT MATRIX Grp 1 Grp 2 Grp 3CONTRAST 1 -1.0 -1.0 2.0CONTRAST 2 -1.0 1.0 0.0

VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 8.0000 2.0000 4.000 6.0 0.007CONTRAST 2 -2.0000 1.1547 -1.732 6.0 0.134

Box 5.10. SPSS Contrasts for Three-group Design.

we could conduct F tests for these two effects manually, we will leave that for the comparable

SPSS analyses.

SPSS AND CONTRASTS FOR THE BRAIN STIMULATION STUDY

Contrasts for more than two groups are simple to do using the various SPSS Anova

commands. ONEWAY, for example, allows many contrast subcommands in a single ONEWAY.

In general, researchers would not exceed k - 1 contrasts (i.e., the maximum number of contrasts

equals the df for the treatment), and ideally these would be orthogonal contrasts. Indeed, some

SPSS procedures “insist” on orthogonal contrasts (e.g., MANOVA), although other procedures

permit non-orthogonal contrasts (e.g., ONEWAY).

ONEWAY and Planned Contrasts for the Stimulation Study

Box 5.10 shows an SPSS analysis of our two contrasts for the hypothetical brain

stimulation study, in which there were significant differences among the bar-pressing means for

No Stimulation (M1 = 5.0), stimulation in Control Area A (M2 = 3.0), and stimulation in

experimental Area B (M3 = 8.0). The first contrast compares the treatment to the two controls,

and the second compares the two controls to each other. The results of the contrasts are shown as

t-tests at the end of the output and agree with calculations reported earlier. Moreover, it would

be possible to compute SSs and Fs for the contrasts, since SSL1 = (3 × 8.0)2 / (-12 + -12 + 12) =

32.0, FL1 = (32.0/1) / 2.0 = 16.0 = tL12, SSL2 = (3 × -2.02) / (-12 + 12 + 02) = 6.0, and FL2 = (6.0/1) /

2.0 = 3.0 = tL22. Given these calculations, we see that SSL1 + SSL2 = 32.0 + 6.0 = 38.0 = SSTreatment;


that is, SSTreatment has been partitioned into two independent sources of variability, one associated

with contrast one and the other associated with contrast two.

Of particular note in Box 5.10 is that the p of .007 for contrast 1 (treatment versus

controls) is highly significant, and much more significant than the p of .0138 for the omnibus F,

demonstrating the potential benefits of a focussed contrast that matches the pattern of differences

among the means.

That the omnibus F might be less significant than the specific contrast Fs is clearly shown

by the fact that the omnibus F is the average of the k - 1 orthogonal contrast Fs. In the present

example, square the ts used to test the contrasts and average them: (1.732² + 4.000²) / 2 = (3.0 +

16.0) / 2 = 19.0/2 = 9.5 = FANOVA. In practice, an omnibus F will be weaker than one or more of

the contrast Fs whenever weak contrasts (e.g., FL2 = 3.0) are averaged together with strong

contrasts (e.g., FL1 = 16.0). Planning strong contrasts ahead of time ensures powerful tests of

predicted differences. Not planning specific contrasts risks diluting strong effects by lumping

them together with weak effects.

MANOVA and Contrasts for the Brain Stimulation Study

Box 5.11 shows the same analysis using the SPSS MANOVA command. The

MANOVA press BY group(1 3) /CONTRAST = SPECIAL(1 1 1 -1 -1 2 -1 +1 0) /DESIGN group(1) group(2).

Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 12.00 6 2.00 GROUP(1) 32.00 1 32.00 16.00 .007 GROUP(2) 6.00 1 6.00 3.00 .134

(Model) 38.00 2 19.00 9.50 .014 (Total) 50.00 8 6.25


GROUP(1) Parameter Coeff. Std. Err. t-Value Sig. t 2 8.00000000 2.00000 4.00000 .007 12

GROUP(2) Parameter Coeff. Std. Err. t-Value Sig. t 3 -2.0000000 1.15470 -1.73205 .133 97

Box 5.11. Planned Contrasts Using MANOVA.


CONTRAST = SPECIAL includes k (i.e., 3) sets of k (i.e., 3) coefficients each. The first set of k

coefficients is simply the three 1s, which represents the grand mean. The next set of 3

coefficients (-1 -1 +2) is our first contrast between group 3 and the average of groups 1 and 2,

and the final set of 3 coefficients (-1 +1 0) is our second contrast between groups 1 and 2. The

spacing between the three sets of three coefficients is for clarity and is not required by SPSS.

If we had only included the CONTRAST line and nothing more, then SPSS would have

printed out the t-tests shown at the bottom of Box 5.11. And these t-tests would have been

labelled simply as Parameter 2 and Parameter 3. However, MANOVA allows researchers to use

the /DESIGN command to specify the particular effects in which they are interested. By default,

SPSS would analyze the overall main effect for the design (and interactions in a factorial study).

Using the default is identical to including a /DESIGN = group for our present single factor study.

But we now want the contrast effects, rather than (or in addition to) the overall main effect. The

DESIGN = group(1) group(2) statement in Box 5.11 requests that MANOVA print Anova results

for the separate contrasts (i.e., SSs, MSs, F). In this notation, the 1 and 2 after group refer to the

first and second contrasts. This DESIGN statement requests the specific Anova results (i.e., SSs,

dfs, and Fs) that appear in Box 5.11 for Group(1) and Group(2). The t-test results are also

labelled as GROUP(1) and GROUP(2).

Although there is much redundancy in the printout, the results in Box 5.11 are ideal for

demonstrating the equivalence of the t and F analyses of contrasts, and as well for practicing

computations associated with these two approaches. Note in particular the following

correspondences: the p values for the corresponding ts and Fs are identical except for rounding

(e.g., .00712 for tL1 and .007 for FL1); the Fs are equal to the corresponding t2 (e.g., tL12 = 4.002 =

16.00 = FL1); and calculating SSs for the contrasts based on the coefficients in Box 5.11 produces

the SSs in the Anova portion of the output; for example, SSL1 = (3 × 8.02) / 6 = 32.0 = SSGroup(1).

Note also that the results for GROUP(1) (i.e., the first contrast) and GROUP(2) (i.e., the second

contrast) agree with earlier calculations and results.

Our omnibus F from earlier appears as the F for the entire model. We can also see that

SSL1 + SSL2 = 8.0 + 32.0 = 38.0 = SSTreatment, and that (FL1 + FL2) / 2 = (3.0 + 16.0) / 2 = 19.0 / 2 =

9.5 = FOmnibus.


Figure 5.4. GLM Contrast Screen.

GLM and Contrasts for the Brain Stimulation Study

Figure 5.4 shows the screen for using GLM to perform contrasts. The Dependent

Variable and Fixed Factor(s) were placed in the appropriate boxes, and the Contrasts box was

opened. This box does

not allow for SPECIAL

contrasts (although GLM

syntax does), but instead

presents a range of default

contrasts. Available

contrasts are: Deviation,

Simple, Difference,

Helmert, Polynomial, and

Repeated. We use

Polynomial contrasts

shortly, and information about the other types of contrast is available from SPSS Help.

The DIFFERENCE contrast involves the comparison of each group (except the first) with

the average of all preceding groups. That is, group 2 will be compared with group 1, and group 3

with the average of groups 1 and 2. If there were more levels to the factor, group 4 would be

compared to the average of groups 1 to 3, group 5 to the average of groups 1 to 4, and so on.

This type of contrast corresponds to our two contrasts (i.e., -1 +1 0, -1 -1 +2), although in a

different order than we have been using. The order of contrasts does not affect the tests.

To perform the DIFFERENCE contrast, we select Difference from the list of contrasts,

Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 38.000(a) 2 19.000 9.500 .014 Intercept 256.000 1 256.000 128.000 .000 GROUP 38.000 2 19.000 9.500 .014 Error 12.000 6 2.000 Total 306.000 9 Corrected Total 50.000 8


Box 5.12. Omnibus Anova for Brain Stimulation Study.


press Change to have the selected contrast entered into the Factors box, and press Continue. We

can now return to the Univariate window and run the analysis. The omnibus Anova output for

this analysis is shown in Box 5.12. As reported previously, the omnibus F of 9.50 is significant,

although it need not be to warrant the planned comparisons discussed in Chapters 6 and 7.

Box 5.13 shows the GLM results for the contrasts, with some statistics and text deleted.

Although the actual values for ts, Fs, and SSs are not presented, the contrast, its SE, and its

significance are reported. These values agree with earlier calculations and printouts, and could be

used to compute some

of the missing

quantities: for example,

tL2 = 4.00 / 1.00 = 4.00,

SSL2 = (3 × 4.02) / 6 =

32.0, FL2 = 4.002 =

(32.00/1)/2.00 = 16.0.

The two control groups

do not differ, p = .134,

whereas they differ significantly from the treatment group, p = .007.

SPSS and Polynomial Contrasts (Trend Analysis) for the Brain Stimulation Study

The use of orthogonal polynomials to analyze trends in ordered means is simple in SPSS.

One approach uses the polynomial coefficients in Appendix A.5 as the contrast coefficients for

whatever procedure is being used: manual calculations, ONEWAY, MANOVA, GLM,

REGRESSION, or some other procedure that permits contrasts. The tests of significance for the

contrasts represent the significance of the specific components.

Depend ent Variab le GROUP Difference Contrast PRESS Level 2 vs. Contrast Estimate -2.000 Level 1 Hypothesized Value 0 Std. Error 1.155 Sig. .134 Level 3 vs. Contrast Estimate 4.000 Previous Hypothesized Value 0 Std. Error 1.000 Sig. .007

Box 5.13. GLM Planned Comparisons Output for Brain StimulationStudy.


ONEWAY press BY group (1,3) /CONTRAST = -1 0 1 /C ONTRAST = -1 2 -1.

SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES R ATIO PROB.BETWEEN GROUPS 2 38.0000 19.0000 9. 5000 .0138WITHIN GROUPS 6 12.0000 2.0000TOTAL 8 50.0000

CONTRAST COEFFICIENT MATRIX Grp 1 2 3CONTRAST 1 -1.0 0.0 1.0CONTRAST 2 -1.0 2.0 -1.0

VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 3.000 1.1547 2.598 6.0 0.041CONTRAST 2 -7.000 2.0000 -3.500 6.0 0.013

Box 5.14. Polynomial Contrasts with SPSS ONEWAY.

Box 5.14 shows the ONEWAY commands and results for a polynomial analysis of the

brain stimulation study. Contrast one (-1 0 1) and contrast two (-1 2 -1) specify the linear and

quadratic components, respectively, and the significance levels are reported as ts below the

contrast coefficient matrix. The results in this and following analyses correspond to the

computations performed earlier. Note in particular that the contrast coefficients could be used to

calculate SSLinear = 3 × 3.02 / 2 = 13.50 and SSQuadratic = 3 × -7.02 / 6 = 24.50, that SSLinear +

SSQuadratic = SSTreatment, that tLinear2 = 2.5982 = 6.75 = 13.50 / 2.0 = FLinear and tQuadratic

2 = -3.5002 =

12.25 = 24.50 / 2.0 = FQuadratic, and that the average of FLinear and FQuadratic equals FTreatment. These

relationships parallel those discussed in the previous analyses for this study, although the

partitioning of the treatment effect results in somewhat different results for the specific contrasts.

For example, both the linear and quadratic effects are significant, with ps = .041 and .013,

respectively.

SPSS ONEWAY and other Anova commands also provide a special subcommand that

avoids the necessity of entering the contrast coefficients (this is especially helpful with factors

that have many levels). In ONEWAY, the subcommand is /POLYNOMIAL = n, where the value

of n indicates the highest order of polynomial contrasts to extract (e.g., n = 2 would specify linear

and quadratic contrasts). Box 5.15 illustrates the POLYNOMIAL statement, which results in

SPSS breaking the SSBetween with df = 2 into two single df polynomial contrasts, a linear and a

quadratic component. These single df components appear in the ANOVA summary table. The


ONEWAY press BY group (1,3) /POLYNOMIAL = 2.

SUM OF MEAN F FSOURCE D.F. SQUARES SQUARES R ATIO PROB.

BETWEEN GROUPS 2 38.0000 19.0000 9. 5000 .0138

LINEAR TERM 1 13.5000 13.5000 6. 7500 .0408 DEV'N. FROM LIN 1 24.5000 24.5000 12. 2500 .0128 QUAD. TERM 1 24.5000 24.5000 12. 2500 .0128

WITHIN GROUPS 6 12.0000 2.0000

TOTAL 8 50.0000

Box 5.15. Polynomial contrasts with SPSS ONEWAY.

LINEAR TERM represents the linear contrast, now reported as an F test. The SSLinear and FLinear

reported here agree with calculations just performed for Box 5.14, as well as with our earlier

manual calculations.

The DEV'N FROM LIN and QUAD. TERM statistics are identical because the quadratic is

the only nonlinear term when k = 3; that is, with df = 3 - 1 = 2, there are only linear and quadratic

components. For k > 2, DEV'N FROM LIN would include all nonlinear components (quadratic,

cubic, and so

on). The

quadratic

statistics agree

with earlier

calculations.

As well,

SSLinear +

SSQuadratic =

13.5 + 24.5 =

38.0 = SSTreatment, demonstrating that the orthogonal polynomial contrasts exhaust the variability

among the three group means. As well, F&Contrasts = (6.75 + 12.25) / 2 = 9.50 = FTreatment. The

polynomial analysis results in a more balanced distribution of the SSs than the preceding

contrasts. Here both contrasts are significant, as was the treatment effect, with the quadratic

effect being only slightly more significant than the omnibus effect.

That the CONTRAST and POLYNOMIAL approaches are equivalent is demonstrated by

a comparison of ps (e.g., .041 for the linear component in Box 5.14 and .0408 in Box 5.15), the

test statistics (e.g., FLinear = 6.75 = 2.5982 = t2Linear), and the SSs (SSLinear = 13.5 = 3×32/2 =

n×L2Linear/2).

Polynomial Contrasts in GLM and MANOVA

Both GLM and MANOVA allow the keyword POLYNOMIAL to be used for the

CONTRAST subcommand (or selected via menus in the case of GLM). Box 5.16 shows the

GLM syntax commands and output for this analysis. The /CONTRAST(group) = POLYNOMIAL,


instructs SPSS to perform the polynomial contrasts and report the significance level. The/PRINT

= TEST(LMATRIX) command requests that a matrix of values corresponding to the contrasts be

printed out. We will look at this output shortly.

First, note that the last few lines of output present the results for the requested contrasts.

Some superfluous lines have been edited out, leaving the contrast value, the SE, and the

significance for the Linear and Quadratic contrasts. The p values and conclusions are identical to

those presented

earlier. Recall that

SPSS does not

actually print out

the t value for

these tests in GLM,

although these or

the corresponding

SSs and Fs could

be computed: tLinear

= 2.121/.816 =

2.599, FLinear =

2.5992 = 6.756.

We will compute

SSLinear shortly.

But first,

consider the lines

produced by the

/PRINT = TEST(LMATRIX) command. The relevant values for our contrast are printed in the

section headed “Custom Hypothesis Tests.” The coefficients for the Linear contrast are -.707,

.000, and +.707, which are the normalized coefficients for -1, 0, +1. Recall that normalized

coefficients are calculated by dividing the initial coefficients by %(Gcj2): -1/%2 = -.707, 0/%2 =

.000, +1/%2 = +.707. Similarly, the normalized quadratic coefficients can be calculated as +1/%6

GLM press BY group /CONTRAST(group) = POLYNOMIAL /PRINT = TEST(LMATRIX).

Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 38.000(a) 2 19.000 9.500 .014 Intercept 256.000 1 256.000 128.000 .000 GROUP 38.000 2 19.000 9.500 .014 Error 12.000 6 2.000 Total 306.000 9 Corrected Total 50.000 8


Custom Hypothesis TestsContrast Coefficients (L' Matrix) GROUP Polynomial Contrast(a) Parameter GROUP Linear Quadratic Polynomial Contrast

Intercept .000 .000

[GROUP=1.00] -.707 .408 [GROUP=2.00] .000 -.816 [GROUP=3.00] .707 .408

Contrast Results (K Matrix) Depend ent Variab le GROUP Polynomial Contrast(a) PRESS Linear Contrast Estimate 2.121 Std. Error .816 Sig. .041

Quadratic Contrast Estimate 2.858 Std. Error .816 Sig. .013

Box 5.16. GLM Polynomial Output with PRINT=TEST(LMATRIX)Command.


= +.408, -2/%6 = -.816, and +1/%6 = +4.08. The sum of normalized coefficients squared is equal

to 1. For our normalized linear coefficients squared, -.7072 + .0002 + .7072 = .500 + .000 + .500

= 1. For our normalized quadratic coefficients squared, +.4082 + -.8162 + .4082 = .166 + .667 +

.166 .1. These normalized coefficients will be especially important later when we discuss

contrasts for Within-S factors. For now, it is important to know that sometimes GLM uses

normalized coefficients, rather than the integer contrasts that we may be working with.

One reason that it is important to know when normalized coefficients are used is to

interpret correctly the contrast coefficients and to calculate various statistics, such as SS for the

contrasts. Note that LLinear = 2.121 in Box 5.16, whereas it equalled 3.00 in Box 5.14, and that

LQuadratic = 2.858 in Box 5.16, versus -7.00 in Box 5.14. The normalized linear coefficients were

calculated by dividing our original linear coefficients by %2; hence, our normalized linear

coefficient is that much smaller than the non-normalized contrast; that is, 2.121 × %2 = 3.00.

Similarly, 2.858 × %6 = 7.001, roughly equivalent to the non-normalized quadratic coefficient.

The normalized coefficients simplify calculation of SSs for the contrasts because the sum

of the squared coefficients is 1.00 and this value appears in the denominator of our formula to

calculate SS for a contrast; specifically, SSContrast = nj × L2 / Gcj2. For normalized coefficients,

SSLinear = 3 × 2.1212 / 1 = 13.50, and SSQuadratic = 3 × 2.8582 / 1 = 24.50. These values agree with

earlier calculations and printouts. The values would have been incorrect if we had used Gcj2 for

non-normalized coefficients in the denominator.

PLANNED CONTRASTS FOR THE INTERVIEW STUDY

Analyses in previous chapters have examined study of attractiveness ratings as a function

of instructions that subjects received about the purpose of the interviews they were rating: no

special instructions (group 1), job interview (group 2), psychiatric interview (group 3), and parole

interview (group 4). The omnibus ANOVA, which is reproduced in Box 5.17, was highly

significant. The group means are also shown, and these can be used to calculate SSs for various

contrasts that might be of interest.

Planned contrasts begin with expectations about the pattern of differences among the

groups, based on either empirical or theoretical grounds. Very speculatively, let us consider

planned comparisons associated with several different ways in which researchers might partition


Source df SS MS F pTreatment 3 325.80 108.6000 7.8475 .0004Error 36 498.20 13.8389

Total 39 824.00

Group n j y&j Sj

1 10 27.30 3.30152 10 26.00 3.62093 10 22.70 4.21774 10 20.00 3.6818

Box 5.17. Omnibus Anova for Interview Study.

SSTreatment = 325.80 into df = 3 orthogonal contrasts. For the first, we will examine the possibility

that the inmate (group 4) and psychiatric patient (group 3) conditions represent two stigmatized

groups relative to the no instruction (group 1) and job interview (group 2) conditions. This

prediction would be based on the assumption that society would tend to react negatively to

people being a psychiatric patient or an inmate, and that this bias would colour their ratings of

attractiveness of the people being interviewed.

Control versus Stigmatized Groups

The possibility that there would be differences between the two control groups (groups 1

and 2) and the two potentially stigmatized groups (3 and 4) would correspond to the following

contrast: !1, !1, +1, +1. The comparison would be between the average for the first two groups

and the average for the third and fourth groups. Give this contrast, we need two more contrasts

that are orthogonal to this. One hypothesis might be that there would be a difference between the

two stigmatized groups (i.e., groups 3 and 4). People may be more affected in their judgment by

psychiatric patients or inmates. The contrast corresponding to this question is: 0, 0, !1, +1. This

contrast is orthogonal to the first: (!1 × 0) + (!1 × 0) + (1 × !1) + (1 × 1) = 0. The third contrast

orthogonal to the first two would test whether the two control groups (1 and 2) had differential

effects on the ratings. The coefficients would be: !1, +1, 0, 0. One clue for this final contrast is

to observe that groups 1 and 2 receive exactly the same coefficients in contrast one (!1 !1) and

in contrast two (0 0); therefore any difference between these two groups has not yet been

captured by the first two contrasts and must be represented in contrast three.


Group1 2 3 4

y&j 27.30 26.00 22.70 20.00 L SS F %F = t

L1 -1 -1 +1 +1 -10.60 280.90 20.298 4.505

L2 -1 +1 0 0 - 1.30 8.45 .611 .782

L3 0 0 -1 +1 - 2.70 36.45 2.634 1.623

GSSL = 325.80 F & = 7.848 = SS Treatment = F Omnibus

Sample Calculations for Contrast 1

L1 = -1×27.30 -1×26.00 +1×22.70 +1×20.00 = -10.6

SSL1 = (10 × 10.6 2) /(-1 2 + -1 2 + 1 2 + 1 2) = 280.9

FL1 = 280.9 / 13.8389 = 20.298 FCritical . 4.17 ( df = 30, not 36)

SEL1 = %13.8389(-1 2/10 + -1 2/10 + 1 2/10 + 1 2/10) = 2.353

t L1 = -10.6 / 2.3528 = 4.505 t Critical . 2.042 ( df = 30, not 36)

Box 5.18. Contrasts for Interview Study.

Box 5.18 presents the calculations for these contrasts. The first contrast comparing groups

1 and 2 (the "normal" instruction conditions) with groups 3 and 4 (the "stigmatized" instruction

conditions) accounts for most of the explained variation (280.90 of 325.80 units), and would be

highly significant, F = 20.298 or t = 4.505, versus approximate critical values of 4.17 and 2.042,

respectively. The other two tests show that groups 1 and 2 do not differ significantly, and that

groups 3 and 4 do not differ significantly. Approximate critical values with df = 30 were used for

these tests because the t and F tables did not include df = 36, which is dfError. Because critical

values get smaller as df increases, an effect that is significant with fewer degrees of freedom will

also be significant with more degrees of freedom. Note also that just as FObserved = tObserved2, FCritical

= tCritical2; that is, 2.0422 = 4.17. Therefore the tests are redundant.

Because the contrasts in Box 5.18 are orthogonal, SSL1 + SSL2 + SSL3 = SSTreatment, and the

average of the corresponding F tests equals FOmnibus, that is, (20.298 + .611 + 2.634) / 3 = 7.848.

The SSs for the contrasts could also be used to calculate η2 for each contrast: for example, η2L1 =

280.9/824 = .341, indicating that contrast 1 accounts for 34.1% of the total variability in the


ONEWAY rate BY inst(1 4) /CONT = -1 -1 1 1 /CONT = 0 0 -1 +1 /CONT = -1 +1 0 0.

SUM OF MEAN F F SOURCE D.F. SQUARES SQUARES RATIO PROB.BETWEEN GROUPS 3 325.8000 108.6000 7.8475 .0004WITHIN GROUPS 36 498.2000 13.8389TOTAL 39 824.0000

CONTRAST COEFFICIENT MATRIX Grp 1 2 3 4CONTRAST 1 -1.0 -1.0 1.0 1.0CONTRAST 2 -1.0 1.0 0.0 0.0CONTRAST 3 0.0 0.0 -1.0 1.0

VALUE S. ERROR T VALUE D.F. T PROB.CONTRAST 1 -10.6000 2.3528 -4.505 36.0 0.000CONTRAST 2 -2.7000 1.6637 -1.623 36.0 0.113CONTRAST 3 -1.3000 1.6637 -0.781 36.0 0.440

Box 5.19. Contrasts for Interview Study.

scores. Similarly, the sum of the separate η2 would equal ηTreatment2.

Control versus Stigmatized Groups: SPSS Analyses

The coefficients for this initial contrast were -1 -1 +1 +1, which compares groups 1 and 2

with groups 3 and 4. We then determined two additional contrasts orthogonal to the first contrast.

We chose -1 +1 0 0, which compares groups 1 and 2, and 0 0 -1 +1, which compares groups 3

and 4 (i.e., the psychiatric and parole groups).

ONEWAY Contrasts. Box 5.19 presents a ONEWAY analysis to test these contrasts.

The first contrast comparing groups 1 and 2 (the "normal" instruction conditions) with groups 3

and 4 (the "stigmatized" instruction conditions) is highly significant, t = 4.505, p = .000. The

other contrasts show that groups 1 and 2 (contrast 3, p = .440) and groups 3 and 4 (contrast 2, p =

.113) do not differ significantly. Although the difference between no instruction and job

interview groups (1 versus 2) is clearly not significant, the difference between psychiatric and

parole groups (3 versus 4) approaches significance by a one-tailed test (.113/2 = .057).

The average of the t2 (i.e., Fs) for the contrasts equals the F from the omnibus Anova.

That is, (4.5052 + 1.6232 + 0.7812) / 3 = 7.846 .7.8475 in Box 5.19. If we calculated SSs for the

contrasts, their sum would equal SSTreatment = 325.80, as shown in the following MANOVA.


MANOVA contrast analysis for Interview study. Box 5.20 shows the MANOVA

commands and results for the interview study. The coefficients are specified as SPECIAL

contrasts with the 3 sets of contrast coefficients preceded by k = 4 1s. There are also two

DESIGN commands, which means that MANOVA will perform two Anovas. The ability to

perform several analyses in a single procedure can be useful in analyzing data.

The first design requests the default Anova followed by t-tests for the three contrasts of

interest, which are labelled simply as Parameters 2, 3, and 4. The ts and ps correspond to

previous results.

The second MANOVA is more interesting. The /DESIGN inst(1) inst(2) inst(3) asks for

separate Anova results for each of our three contrasts. These appear as lines labelled INST(1),

MANOVA rate BY inst(1 4) /CONTRAST(inst) = SPECIAL(1 1 1 1 -1 -1 +1 +1 -1 +1 0 0 0 0 -1 +1) /DESIGN /DESIGN inst(1) inst(2) inst(3).

* * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000

(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13


Parameter Coeff. Std. Err. t-Value Sig. t 2 -10.600000 2.35278 -4.50532 .000 07 3 -1.300000 1.66366 -.78141 .439 67 4 -2.700000 1.66366 -1.62292 .113 33

* * * * A n a l y s i s o f V a r i a n c e -- design 2 * * * * *

Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 498.20 36 13.84 INST(1) 280.90 1 280.90 20.30 .000 INST(2) 8.45 1 8.45 .61 .440 INST(3) 36.45 1 36.45 2.63 .113

Parameter Coeff. Std. Err. t-Value Sig. tINST(1) 2 -10.600000 2.35278 -4.50532 .000 07INST(2) 3 -1.300000 1.66366 -.78141 .439 67INST(3) 4 -2.700000 1.66366 -1.62292 .113 33

Box 5.20. MANOVA Contrasts for Interview Study.


INST(2), and INST(3) in the second Anova summary table. Parameters 2, 3, and 4 are also now

labelled in the t-test section. The reported Fs = ts2; for example, tINST(1)2 = 4.505322 = 20.298 =

FINST(1). The average of the three Fs = FOmnibus; that is, (20.30 + .61 + 2.63) / 3 = 7.85.

The second Anova also reports the SSs for the three contrasts, which is useful for

showing that SSTreatment has been partitioned into the SSs for the contrasts. Note that SSL1 + SSL2

+ SSL3 = 280.90 + 8.45 + 36.45 = 325.80 = SSTreatment. The SSs for the contrasts can also be used

to calculate contrast-specific η2: ηL12 = 280.90 / 824.0 = .341, ηL2

2 = 8.45 / 824.0 = .010, and ηL32

= 36.45 / 824.0 = .044. The sum of these η2 for the contrasts will equal the omnibus η2, which is

reported as R-Squared in Box 5.20. That is, .341 + .010 + .044 = .395. Given just the t-test

results from ONEWAY or MANOVA, such statistics would have first required calculation of SS

from the contrast values; for example, SSLinear = 10 × -10.602 / 4 = 280.90.

GLM analyses. It would be possible to perform GLM analyses corresponding to those

reported above, but only using syntax because our contrasts do not correspond to any of the

standard GLM contrasts and the menu version of GLM does not support special contrasts. We

will instead show the GLM menu commands and results shortly, but for the polynomial contrasts

that are standard in GLM.

Polynomial Contrasts for the Interview Study

Although somewhat strained, one might also make a case that the attractiveness ratings

would decrease from no instructions, to job interview, to psychiatric interview, to parole

interview. This prediction would be captured by the linear component of polynomial contrasts,

assuming that the decrease was equivalent at each step. The remaining df would be allocated to

the quadratic and cubic components respectively. Box 5.21 shows calculations for the polynomial

contrasts.


The coefficients in Box 5.21 come from Appendix A.5 for k = 4, the number of levels of

the factor. Note the pattern defined by each of the sets of coefficients. The linear coefficients

represent a constant increase or decrease from Group 1 to Group 4. The quadratic coefficients

represent a curvilinear pattern, either decrease from 1 to 2 followed by increase from 3 to 4, or

the reverse pattern. The cubic coefficients define a more irregular pattern, much like a tilted N or

its inverse. This contrast will account for substantial variability in the means if they follow one of

two patterns. A positive correlation with these coefficients will occur when the means increase

from 1 to 2, decrease from 2 to 3, and increase again from 3 to 4. A negative correlation will

occur if the means decrease from 1 to 2, increase from 2 to 3, and decrease again from 3 to 4.

Either pattern could result in significance for this contrast.

As shown in Box 5.21, virtually all of the systematic variability in the means is due to the

linear pattern; means decrease consistently from group 1 (no instructions) to group 2 (job

interview) to group 3 (psychiatric interview) to group 4 (parole interview). This effect is clearly

significant and accounts for a substantial 38.5% of the total variation in the scores; that is, ηLinear2

Group1 2 3 4

y&j 27.30 26.00 22.70 20.00

L SS F %F = t

Lin -3 -1 +1 +3 -25.20 317.52 22.940 4.79

Qua +1 -1 -1 +1 - 1.40 4.90 .354 .595

Cubic -1 +3 -3 +1 2.60 3.38 .244 .494

GSSL = 325.80 F &=7.846 = SS Treatment

Sample Calculations for Linear Contrast

LLin = -3×27.30 -1×26.00 +1×22.70 +3×20.00 = -25.20

SSLin = (10 × 25.2 2) / (-3 2 + -1 2 + 1 2 + 3 2) = 317.52

FLinear = 317.52 / 13.8389 = 22.940 FCritical . 4.17 ( df = 30, not 36)

SELin = %13.8389(-3 2/10 + -1 2/10 + 1 2/10 + 3 2/10) = 5.261

t Linear = -25.20 / 5.261 = 4.79 t Critical . 2.042 ( df = 30, not 36)

Box 5.21. Polynomial Contrasts for the Interview Study.


= 317.52 / 824.0 = .385.

Polynomial Contrasts for the Interview Study: SPSS Analyses

Researchers might also make a case that the attractiveness ratings would decrease from

no instructions, to job interview, to psychiatric interview, to parole interview. This prediction

would be captured by the linear component of polynomial contrasts, assuming that the decrease

was equivalent at each step. The remaining two contrasts would correspond to the quadratic and

cubic components. There are several ways to request these contrasts in SPSS using either the

coefficients from Appendix A.5, or the keyword POLYNOMIAL. The outputs will vary

somewhat, but these variations will all produce some tests of significance for the Linear,

Quadratic, and Cubic components.

Polynomial contrasts and the MANOVA SINGLE DF command. A representative

analysis using MANOVA is illustrated in Box 5.22. This analysis shows a second method in

MANOVA to obtain SSs and F tests for the contrasts, namely, /PRINT = SIGNIF(SINGLEDF).

This command requests that MANOVA partition every effect with df > 1 into separate effects

with df = 1. What the partitioning is will depend on what contrasts, if any, have been requested.

Since we requested POLYNOMIAL contrasts in Box 5.22, the single df effects will correspond

to the Linear, Quadratic, and Cubic effects. If neither the SINGLEDF command nor the /DESIGN

MANOVA rate BY inst(1 4) /PRINT = SIGN(SINGLEDF) /C ONTRAST = POLYNOMIAL.

Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000 1ST Parameter 317.52 1 317.52 22.94 .000 2ND Parameter 4.90 1 4.90 .35 .556 3RD Parameter 3.38 1 3.38 .24 .624

(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13

R-Squared = .395 Adjusted R-Squared = .3 45

INST Parameter Coeff. Std. Err. t-Value Sig. t 2 -5.6348913 1.17639 -4.78999 .000 03 3 -.70000000 1.17639 -.59504 .555 54 4 .581377674 1.17639 .49421 .624 16

Box 5.22. Polynomial Contrasts with MANOVA Using SINGLEDF Option.


Figure 5.5. GLM Menu Commands for Polynomial Contrasts.

inst(1) inst(2) inst(3) command were included, only the t-tests reported for Parameters 2 (Linear),

3 (Quadratic), and 4 (Cubic) would be reported. The t and F analyses are equivalent. The

significance values are equivalent, except for rounding. The three Fs are all equal to the

corresponding t2, for example, tLinear2 = 4.789992 = 22.94 = FLinear.

It is striking how much larger FLinear = 22.94 is than FOmnibus = 7.85; indeed, FLinear is almost

3x larger than the value of FOmnibus. The latter is diluted by averaging the robust Linear effect with

the very modest Quadratic and Cubic components. This watering down is clearly shown by

averaging the three Fs (equivalently, the three t2) to obtain FOmnibus, (22.94 + .35 + .24) / 3 = 7.843

.7.85. It is the single df Linear effect that is entirely responsible for the overall significant effect.

Other observations of note include the fact that SSLinear + SSQuadratic + SSCubic = SSTreatment;

that is, 317.50 + 4.90 + 3.38 = 325.78 .325.80. These SSs could be used to calculate η2 for each

contrast: ηLinear2 = 317.50 / 824.00 = .385, ηQuadratic

2 = .4.90 / 824.00 = .006, ηCubic2 = 3.38 / 824.00

= .004. The η2 for the polynomial contrasts would in turn sum to the η2 for the overall treatment

effect: .385 + .006 + .004 = .395 = ηTreatment2. These various statistics again show that the linear

effect accounts for virtually all of the differences among the four groups.

Finally, we should note that the coefficients for the various effects are those obtained with

normalized coefficients, rather than the integer values that we used in chapter 6. This means,

among other things, that SSContrast = nj × L2 / 1. For example, SSLinear = 10 × -5.63489132 =

317.52, which is the value reported for the second (i.e., Linear) parameter in the Anova summary

table.

GLM and

polynomial contrasts

for the interview study.

Figure 5.5 shows several

GLM screens following

the initial menu

selection sequence

Analyze | General

Linear Model |


Univariate, identification of rate as the Dependent Variable and inst as the Fixed Factor(s), and

clicking on Contrasts to activate the Univariate: Contrasts screen shown on top in Figure 5.5.

The Polynomial option has been highlighted and clicking on Change will modify inst(None) to

inst(Polynomial), indicating that polynomial contrasts are requested for this factor. Continue will

return to the Univariate main screen and OK will run the analysis.

Box 5.23 shows the resulting analysis, edited somewhat to eliminate material that is

redundant (e.g.,

difference

between Contrast

Estimate and

Hypothesized

Value), or

material we have

not been

discussing (e.g.,

confidence

intervals). The

ps for the Linear,

Quadratic, and

Cubic effects are

equivalent to

those reported

previously for

the MANOVA analysis. Dividing the Contrast Estimate by its SE would produce the ts reported

in Box 5.22 by MANOVA: for example, tLinear = 5.635 / 1.176 = 4.792. Squaring these values

would give the corresponding Fs, the average of which would be FOmnibus = 7.847.

SSs for the contrasts could be calculated using our formula, but we would need to

remember again that normalized coefficients were used by SPSS; that is, SSLinear = (10 × -5.6352)

/ 1, rather than (10 × -5.6352) / (-32 + -12 + 12 + 12).

Source Type III Sum of df Mean Square F Sig. Squares Corrected Model 325.800(a) 3 108.600 7.8 47 .000 Intercept 23040.000 1 23040.000 166 4.874 .000 INST 325.800 3 108.600 7.8 47 .000

Error 498.200 36 13.839

Total 23864.000 40 Corrected Total 824.000 39


Custom Hypothesis Tests Depend ent Variab le INST Polynomial Contrast(a) RATE Linear Contrast Estimate -5.635 Hypothesized Value 0 Std. Error 1.176 Sig. .000

Quadratic Contrast Estimate -.700 Hypothesized Value 0 Std. Error 1.176 Sig. .556

Cubic Contrast Estimate .581 Hypothesized Value 0 Std. Error 1.176 Sig. .624

Box 5.23. GLM Polynomial Contrast Results for Interview Study.


CONCLUSION

This chapter has demonstrated how researchers can perform focussed analyses of single

factor Between-S designs. Specific predictions about differences among the means allow

researchers to partition SSTreatment into (ideally) orthogonal components and perform sensitive tests

of the well-specified hypotheses. Although the topic is a complex one, the methods described

here, specifically the use of contrast coefficients, provide powerful tools by which researchers

can root out patterns that correspond to theoretically or practically important outcomes.

Contrasts also provide a way to think about more complex designs involving multiple variables,

as we will see in future chapters.

Although numerous SPSS analyses have been presented in this chapter, there is much

redundancy among the results. The different procedures (i.e., ONEWAY, MANOVA, GLM),

and different variations of procedures (e.g., default, DESIGN, and SINGLEDF approaches for

MANOVA) result in what can be a confusing array of analyses. This diversity can best be

accommodated by focusing on the basic quantities of interest, which are: L for the contrast,

SSContrast, SE or MSE, the t or F, the p values, and eta2. The ultimate criteria are the p values,

which provide information about the likelihood of the observed outcomes given no effects in the

population, and the eta2s, which indicate the strength of the observed effects. These statistics

allow researchers to draw inferences about the presence or absence of hypothesized patterns in

the data.

Conceptualizing planned contrasts can be a challenge, although the basic principle is that

of a correlation between hypothesized and observed patterns in the data. This can be seen most

clearly in regression analyses that provide results equivalent to the standard ANOVA approaches

to contrasts. This is the topic addressed in the next chapter. Essentially, we will see that SSContrast

= r2 × SSTotal, where r is the correlation between the indicator variable (i.e., the contrast

coefficients) and the criterion variable.


APPENDIX 5.1: CONTRASTS AND DIFFERENCES BETWEEN MEANS

In addition to the standard approach using contrast coefficients, some SSs for contrasts

(but not all) can be viewed as differences between two means averaged across several of the

groups in the study. Here we illustrate this feature for several contrasts from the interview study.

One of our contrasts (-1 -1 +1 +1) compared groups 1 and 2 (the control groups) to

groups 3 and 4 (the stigmatized groups). Box 5.24 shows that SS for this contrast can be

calculated using the deviation of two means from the grand mean, in a manner equivalent to how

we calculate SS for the omnibus effect. One mean is the average of the means for groups 1 and

2, and the other mean is the average of the means for groups 3 and 4.

A contrast comparing groups 1, 2, and 3 to group 4 (-1 -1 -1 3) could similarly be

conceptualized as the difference between the mean averaged across the first three groups versus

the mean for the fourth group. Using the standard approach, SSContrast = 10 x (-1x27.3 + -1x26.0

+ -1x22.7 + 3x20.0)2 / 12 = 10 x -16.02 / 12 = 213.333. Using the two means approach, SSContrast

= 30 x (25.3333 - 24.00)2 + 10 x (20.00 - 24.00)2 = 30 x 1.33332 + 10 x -4.002 = 213.331.

Not all contrasts can be computed in this way; specifically, some coefficients describe

patterns or relationships that cannot easily be conceptualized in terms of differences between

means. The linear effect for the interview study (-3 -1 1 3), for example, does not equal any

comparison among two means.

Group1 2 3 4

y&j 27.30 26.00 22.70 20.00 L SS F %F = t

L1 -1 -1 +1 +1 -10.60 280.90 20.298 4.505

L1 = -1×27.30 -1×26.00 +1×22.70 +1×20.00 = -10.6

SSL1 = (10 × 10.6 2) /(-1 2 + -1 2 + 1 2 + 1 2) = 280.9

M1&2 M3&4 MGrand

26.65 21.35 24.00

Mj - M Grand +2.65 -2.65

SSL1 = 20(2.65 2 + -2.65 2) = 280.9

Box 5.24. Alternative calculation of SS for some contrasts.


APPENDIX 5.2: GLM AND SSS FOR PLANNED COMPARISONS

One of the positive features of Manova is that it can provide SSs for each of the contrasts

using either the /DESIGN contr(1) contr(2) ... option or the /PRINT = SIGNIF(SINGLEDF)

option. Box 5.25 demonstrates the latter method using contrasts examined previously for the

interview study. The SSs reported by Manova can be used to demonstrate the partitioning of

SSTreatment and to compute R2s for each contrast (i.e., eta2).

GLM can also provide SS for each contrast, although it is necessary to modify the way in

which contrasts are requested. Box 5.26 shows one of the modifications that can be used; in

essence, k - 1 CONTRAST options are specified, one for each contrast. Following the standard

statistics, GLM produces an anova summary table for each contrast, including SS for the

contrast. The same output shown in Box 5.26 could be produced by: GLM rate BY inst

/LMATRIX = inst -1 -1 1 1 /LMATRIX = inst -1 1 0 0 /LMATRIX = inst 0 0 -1 1. Output for

the first contrast is shown in Box 5.27.

MANOVA rate BY inst(1 4) /PRINT = SIGNIF(SINGLE) /CONTR(inst) = SPEC(1 1 1 1 -1 -1 1 1 -1 1 0 0 0 0 -1 1).

Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 inst 325.80 3 108.60 7.85 .000 1ST Parameter 280.90 1 280.90 20.30 .000 2ND Parameter 8.45 1 8.45 .61 .440 3RD Parameter 36.45 1 36.45 2.63 .113

(Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13

R-Squared = .395 Adjusted R-Squared = .345

Estimates for rate --- Individual univariate .9500 confidence intervals Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 2 -10.600000 2.35278 -4.50532 .00007 -15.37165 -5.82835 3 -1.3000000 1.66366 -.78141 .43967 -4.67407 2.07407 4 -2.7000000 1.66366 -1.62292 .11333 -6.07407 .67407

Box 5.25. Using MANOVA to obtain SSs for contrasts.


GLM rate BY inst /CONTR(inst) = SPEC(-1 -1 1 1) /CONTR(inst) = SPEC(-1 1 0 0) /CONTR(inst) = SPEC(0 0 -1 1).

Custom Hypothesis Tests #1 inst Special Dependent Contrast Variable L1 Contrast Estimate -10.600 Std. Error 2.353 Sig. .000

Source Sum of Squares df Mean Square F Sig. Contrast 280.900 1 280.900 20.298 .000 Error 498.200 36 13.839


Source Sum of Squares df Mean Square F Sig. Contrast 8.450 1 8.450 .611 .440 Error 498.200 36 13.839


Source Sum of Squares df Mean Square F Sig. Contrast 36.450 1 36.450 2.634 .113 Error 498.200 36 13.839

Box 5.26. SSs in GLM using repeated /CONTRAST commands.

GLM rate BY inst /LMATRIX = inst -1 -1 1 1 /LMATRIX = inst -1 1 0 0 /LMATRIX = inst 0 0 -1 1....Custom Hypothesis Tests #1 Contrast Dependent Variable L1 Contrast Estimate -10.600 Std. Error 2.353 Sig. .000

Source Sum of Squares df Mean Square F Sig. Contrast 280.900 1 280.900 20.298 .000 Error 498.200 36 13.839 ...

Box 5.27. SSs for Contrasts using GLM and /LMATRIX (partial output).

Single-Factor, Between-S Multiple Regression and ANOVA 6.1

CHAPTER 06:

CORRELATION AND REGRESSION ANALYSIS

FOR SINGLE FACTOR BETWEEN-S ANOVA

Correlation and Regression for Two Groups . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Contrast Analyses by Multiple Regression . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Recode and Regression using Menus . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 10

Regression and Polynomial Contrasts for the Stimulation Study . . . . . . . . . . . . . . . . . . 12

Multiple Regression Analyses of Interview Data . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Omnibus ANOVA Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14

Indicator Variable Results . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 15

Indicator Variables and Regression Analysis using Menus . . . . . . . . . . . . . . . . . . . . . . . 17

Polynomial Regression for the Interview Study . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 19

Nonorthogonal Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 21

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 24

Appendix 6.1: Graphing Contrasts and Means . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


Figure 6.1. Menu Sequence to Initiate Regression Analysis.

ANOVA for the comparison of group means is a special case of regression analysis

(which is sometimes known as the General Linear Model or GLM, hence the name for SPSS’s

GLM program). This also means that an alternative way to perform ANOVA is using regression

procedures in SPSS or some other statistical package. In this chapter we demonstrate how to

conduct a number of the preceding analyses, the omnibus F and follow-up tests, using multiple

regression and correlation. The approach is especially straightforward for the two-group design

(i.e., k = 2).

CORRELATION AND REGRESSION FOR TWO GROUPS

In the case of two groups, ANOVA (and the equivalent t-test) can be performed with a

single predictor that has two

values. The two values for X

indicate to which of the two

groups the score (Y) belongs.

The analysis of two groups is a

simple regression analysis with a

single predictor. Consider the

two-group agitation design that

we have analyzed in previous

chapters. Figure 6.1 shows the

Windows menu sequence to

access regression via Analysis | Regression | Linear. This brings up the dialogue box shown in

Figure 6.2. As in other analyses, we select the independent and dependent variable, and move

them into their respective boxes. We then select from the various options that are available. The

Statistics button in Figure 6.2 allows users to select various statistics, including descriptive

statistics (i.e., mean, standard deviation, correlation).


Figure 6.2. Specifying the Regression Analysis.

Box 6.1 shows edited

output of the regression analysis

requested in Figures 6.1 and 6.2.

Considering first the descriptive

statistics, the mean agitation

score (11.0) is equal to the grand

mean calculated earlier. The SD

of the 12 agitation scores about

this mean can be used to calculate

SSTotal: (12 - 1) × 2.088932 = 48.00. The mean for the group variable (X) is simply the average of

6 ones and 6 twos (i.e., 1.50).

After the correlation matrix (discussed shortly), Box 6.1 reports the results of the

regression analysis, including an ANOVA summary table and statistics related to the slope of the

best-fit regression line. The best-fit regression line is y’ = 14.0 - 2.0 × Group. The slope or

regression coefficient (i.e., -2.0) is the difference between the two means, and the test of the

significance of the slope produces the same results as the independent groups t-test of the

Descriptive Statistics Mean Std. Deviation N AGIT 11.000000 2.0889319 12 GROUP 1.500000 .5222330 12

AGIT GROUP Pearson Correlation AGIT 1.000 -.500

GROUP -.500 1.000 Sig. (1-tailed) AGIT . .049

GROUP .049 .

Model Sum of Squares df Mean Square F Sig. 1 Regression 12.000 1 12.000 3.333 .098 Residual 36.000 10 3.600 Total 48.000 11

Coefficients Unstand. Stand.

Coeff. Coeff. t Sig.Model B Std. Error Beta1 (Constant) 14.000 1.732 8.083 .000 GROUP -2.000 1.095 -.500 -1.826 .098

Box 6.1. Results for Windows Regression Analysis.


t'r&0

1&r 2

n&2

'&.5&0

1&&.52

12&2

'&.5

.27386'&1.826

difference between means, that is, tSlope = -1.826, p = .098. Furthermore, in the ANOVA

summary table, SSRegression = 12.0 = SSTreatment and SSResidual = 36.0 = SSError. The dfs, MSs, F, and p

value are also identical. We will shortly see somewhat more clearly why these many

equivalencies occur.

Returning to the correlation matrix, note that the simple correlation between Group and

Agit is -.50, hence, r2 = -.52 = .25, which was the value that we had obtained earlier for η2. Group

accounts for 25% of the 48.00 units of variability in Agitation scores, which amounts to .25 ×

48.00 = 12.0 units, SSTreatment and SSRegression. We could also calculate a t or F statistic for the

correlation coefficient. For the t-test,

This t is again equivalent to the value calculated earlier for the difference between means

(or equivalently, for the slope). In essence, the significance of the difference between the group

means is equivalent to the significance of the correlation between a variable that represents the

groups and the dependent variable. Note as well, for example, that the one-tailed p value reported

for r in the correlation matrix is equal to .049, which is exactly half of the two-tailed value

reported for F and for the regression coefficient.


RECODE group (1=-1) (2=1) INTO effect.

REGRESSION /VARIABLES = agit effect /DEPENDANT = ag it /ENTER /SAVE PRED(prd) RESID(res).

Multiple R .50000 R Square .25 000

DF Sum of Squares Mea n SquareRegression 1 12.00000 12.00000Residual 10 36.00000 3.60000F = 3.33333 Signif F = .0979

Variable B SE B Beta T Sig TEFFECT -1.000000 .547723 -.500000 - 1.826 .0979(Constant) 11.000000 .547723 2 0.083 .0000

Residuals Statistics: Min Max Mean Std Dev N*PRED 10.0000 12.0000 11.0000 1.0 445 12*RESID -3.0000 2.0000 .0000 1.8 091 12

LIST

GROUP AGIT EFFECT PRD RES 1.00 11.00 -1.00 12.00000 -1.00000 1.00 14.00 -1.00 12.00000 2.00000 1.00 12.00 -1.00 12.00000 .00000 1.00 9.00 -1.00 12.00000 -3.00000 1.00 13.00 -1.00 12.00000 1.00000 1.00 13.00 -1.00 12.00000 1.00000 2.00 7.00 1.00 10.00000 -3.00000 2.00 12.00 1.00 10.00000 2.00000 2.00 11.00 1.00 10.00000 1.00000 2.00 9.00 1.00 10.00000 -1.00000 2.00 12.00 1.00 10.00000 2.00000 2.00 9.00 1.00 10.00000 -1.00000

Box 6.2. SPSS Regression Analysis for Two-Group ANOVA

Box 6.2 shows the syntax commands to perform the regression analysis, as well as edited

Unix output for the analysis. Before the REGRESSION command, however, Box 6.2 includes a

command to modify somewhat the predictor used in the regression command. Specifically, the

RECODE command creates a new predictor called effect, which has values of -1 for group one

and +1 for group two, rather than our original values of 1 and 2. The general format of the

recode

command is:

RECODE old-

name (old-value

= new-value) ...

INTO new-

name. The new

values can be

seen in the

column for the

new variable

Effect at the

bottom of Box

6.2. A standard

REGRESSION

statement is then

used, and

predicted and

residual scores

are saved. These predicted and residual scores are informative about the parallels between the

ANOVA and REGRESSION analyses.

Output for this second regression is identical in many respects to the earlier regression in

Box 6.1, despite the different values for the predictor variable. Note in particular that FRegression =

FANOVA, and that tSlope = tr = t y&1- y&2. Moreover, SSRegression = SSTreatment, SSResidual = SSError, and R² = η².


The dfs also correspond. The only change between Box 6.1 and Box 6.2 (other than format) is

with respect to the actual prediction equation. The best-fit equation in Box 6.2 is: y’ = 11.0 -1.0

× Effect. The intercept, b0, is equal to y&G, and the slope, b1, is equal to the deviation of the group

mean from the grand mean (i.e., y&j - y&G). But this change in the equation has no consequences

for the inferential statistical analyses, although it does make clearer perhaps the intimate

relationship between the regression analysis and ANOVA.

The reason for the equivalence of the regression and ANOVA analyses becomes even

clearer yet when we examine the predicted (PRD) and residual (RES) scores saved from the

regression analysis and listed at the bottom of Box 6.2. Note that all subjects in Group 1 have y&1

= 12.0 for their predicted value, í1 = 11.0 + (-1 × -1) = 12.0, and all subjects in Group 2 have y&2

= 10.0 for their predicted value, í2 = 11 + (+1 × +1) = 10.0. The descriptive statistics for the

predicted and residual scores show that SSPredicted = (12 - 1)1.0445² = 12.0 = SSTreatment, and that

SSResidual = (12 - 1)1.8091² = 36.0 = SSError. The secret to performing ANOVA via regression is to

generate one or more predictors in such a way as to ensure that the predicted scores equal the

group means, and the dfs for SSRegression and SSResidual = k - 1 and N - k (i.e., the dfs for SSTreatment

and SSResidual in ANOVA). If these correspondences occur, then necessarily the residual scores

will equal the deviations of observations from the group means, and MSRegression = MSTreatment,

MSResidual = MSError, and FRegression = FANOVA. As we have seen, this is easily accomplished when

there are only two groups. Although not shown here, it would be easy to demonstrate that the

predicted values in Box 6.1 would also be the two group means.

Categorical predictors such as Group in Box 6.1 and Effect in Box 6.2 are generically

referred to as indicator variables. They essentially code or indicate levels of the categorical

factor, rather than amounts of some variable. There are various types of indicator variables.

Indicator variables that sum to zero (such as Effect in Box 6.2) are called Effect codes, hence the

label for that variable. Another kind of indicator variable is called Dummy coding, with one

group coded 0 and another group coded 1. If a regression analysis of the present data were

conducted with Dummy coding, virtually all aspects of the analysis would be identical to those

already shown in Boxes 6.1 and 6.2, except that the regression equation would be again changed.

Specifically, the intercept, b0, would equal y& for whichever group was coded 0, and the slope, b1,


would equal the difference between y&1 and y&2.

We also saw that both the t and F tests are equivalent to correlation and regression tests

of the significance of group as a predictor of agitation scores. The continuity among these

seemingly different analyses occurs because the various tests are in fact variations of the General

Linear Model. In the remainder of this chapter, the ANOVA and regression approaches are

generalized to Between-S designs that include more than two groups.

CONTRAST ANALYSES BY MULTIPLE REGRESSION

We have just seen that for k = 2, the single predictor regression analysis was equivalent to

the omnibus F test, which you may recall we also showed was equivalent to a contrast analysis

for two groups. That is, the single possible contrast for two groups (1 2, -1 +1, 0 1, or other

numerical equivalents) reproduces the omnibus F (or its t-test equivalent). Regression can be

used to perform all of the contrasts that we described earlier. Recall also that we need dfRegression =

k - 1 to reproduce ANOVA using regression. That is, we will need p = k - 1 predictors, and these

k - 1 predictors can (but need not) correspond to planned contrasts for the particular design. The

k - 1 orthogonal indicator variables (i.e., contrasts) would partition SSRegression (i.e., SSTreatment) into

mutually exclusive and exhaustive components (i.e., components where GSSL = SSTreatment). Let

us now examine more closely contrasts for k > 2 designs.

In essence, contrast analyses can

performed using regression by creating

indicator variables that correspond to the

desired contrast coefficients. In Box 6.3, for

example, to compare the first two of three

groups, one indicator variable (I1) would be !1

for all observations in group 1, 1 for all

observations in group 2, and 0 for all observations in group 3. To compare the first two groups

with the third group, one indicator variable should be !1 for group 1, !1 for group 2, and +2 for

group 3. The multiple regression tests of the significance of the regression coefficients will

correspond to the contrast analyses covered earlier. Note that these contrasts are orthogonal.

Note the general parallel between regression and contrast analyses. We previously noted

Group Subject I1 I21 1 -1 -11 2 -1 -1

...2 1 +1 -12 2 +1 -1

...3 1 0 +23 2 0 +2

...

Box 6.3. Indicator Variables for k = 3.


RECODE group (1= -1) (2= +1) (3= 0) INTO con1 /group (1= -1) (2= -1) (3= +2) INTO con2.

REGRESS /VAR= press con1 con2 /DESC /STAT= DEFAU ZPP /DEP= press/ENTER /SAVE PRED(pred) RESID(res).

Mean Std Dev Correlation:PRESS 5.333 2.500 PRESS CON 1 CON2CON1 .000 .866 PRESS 1.000 -.34 6 .800CON2 .000 1.500 CON1 -.346 1.00 0 .000 CON2 .800 .00 0 1.000


DF Sum of Squares Mea n SquareRegression 2 38.00000 19.00000Residual 6 12.00000 2.00000

F = 9.50000 Signif F = .0138

Variable B SE B Beta Corr PartCor Partia l T SigTCON2 1.333 .3333 .800 .800 .80000 .85280 3 4.000 .0071CON1 -1.000 .5773 -.346 -.346 -.34641 -.57735 0 -1.732 .1340(Const) 5.333 .4714 11.314 .0000

LIST GROUP PRESS CON1 CON2 PRED RES 1.00 4.00 -1.00 -1.00 5.00000 -1.00000 1.00 5.00 -1.00 -1.00 5.00000 .00000 1.00 6.00 -1.00 -1.00 5.00000 1.00000 2.00 3.00 1.00 -1.00 3.00000 .00000 2.00 2.00 1.00 -1.00 3.00000 -1.00000 2.00 4.00 1.00 -1.00 3.00000 1.00000 3.00 10.00 .00 2.00 8.00000 2.00000 3.00 8.00 .00 2.00 8.00000 .00000 3.00 6.00 .00 2.00 8.00000 -2.00000

Box 6.4. Contrast analyses by regression.

that k - 1 indicator variables were necessary to perform Anova, with the dependent variable being

regressed on the k - 1 indicator variables. This agrees with our general guideline of k - 1

orthogonal contrasts.

SPSS Regression and Contrasts for Brain Stimulation Study

The regression analysis in Box 6.4 reproduces the omnibus ANOVA and contrast

analyses for the bar-press study. The RECODE command creates the indicator variables con1

and con2. Con1 compares the two control groups (NS and A), and con2 compares the treatment

group to the average of the two control groups. These new variables can be seen at the bottom of


Box 6.4. Con1 contains -1 for the three cases in group 1, +1 for the three cases in group 2, and 0

for the three cases in group 3. This contrasts group 1 to group 2. The second contrast contains -1

for the six cases in groups 1 and 2, and +2 for the three cases in group 3. This predictor contrasts

group 3 to groups 1 and 2. We want to know whether these new variables correlate significantly

with the data (i.e., with the dependent variable PRESS)

The ANOVA summary table provides omnibus results that agree exactly with earlier

calculations and SPSS analyses. The reason for this close agreement is that the SSRegression reflects

the variation in predicted scores and, given the indicator variables, the predicted scores turn out

to be the group means, as shown in the PRED column at the bottom of Box 6.4; hence, SSRegression

= SSTreatment. SSResidual likewise represents variation about predicted scores and hence corresponds

to SSError or SSWithin in the between-S ANOVA. Because p = k - 1 and N - p - 1 = N - k, the MSs

and F are also equivalent.

In addition, Box 6.4 reports the significance of the regression coefficients for con1 and

con2. Observe that the ts and associated significance for the coefficients agree with the earlier

contrast analyses performed by hand and using the various SPSS Anova commands. The

conclusion is again that the treatment group differs significantly from the two control groups

(con2), which do not differ significantly from one another (con1). The slopes and ts could be

used to calculate SSs and Fs reported previously in several analyses, and also to demonstrate that

the average of the Fs for the contrasts is equal to FOmnibus and the sum of the SSs for the contrasts

is equal to SSTreatment.

In addition to these now-familiar relationships, the correlation analyses provide several

novel and instructive observations. The simple correlations between the dependent variable and

the orthogonal contrasts are shown in Box 6.4, -.346 for contrast one and .800 for contrast two.

Squaring these correlations gives η2 for the contrasts; for example, .82 = .64 = η2 = 32.0/50.0.

This also means that the correlations can be used to obtain SSs for each contrast: .34641² × 50.0

= 6.0 = SSL1 and .8² × 50.0 = 32.0 = SSL2. The sum of the r2 for the contrasts equals R2, which in

turn equals η2 for the omnibus analysis: that is, .346412 + .800002 = .76 = 38.0 / 50.0. And as

noted previously, the sum of the SSs for the contrasts gives SSTreatment = 38.0 = 6.0 + 32.0. These

various relationships demonstrate that a contrast indeed represents the correlation between a


Figure 6.3. Recoding Variables in SPSS for Windows.

pattern of coefficients (i.e., CON1 and CON2 in Box 6.4) and the dependent variable (i.e.,

PRESS in Box 6.4).

It is also interesting to note why the simple correlations can be used in the above way,

rather than requiring part correlations that control for other predictors. We noted previously that

our two contrasts are orthogonal. The correlation matrix in Box 6.4 shows explicitly that CON1

and CON2 are indeed orthogonal or uncorrelated; that is, their correlation with one another is 0.

This occurs because the sum of the cross products of the contrast coefficients is 0 (i.e., -1 × -1 +

1 × -1 + 0 × 2 = 0), a requirement for orthogonal contrasts. Because of this orthogonality, the

simple correlations and part correlations are identical for the two predictors (see the Corr and

PartCor columns of Box 6.4). Predictors that are already independent of one another are not

changed by multiple regression methods for statistically creating uncorrelated predictors.

Recode and Regression using Menus

To create the indicator variables using Menus, activate the appropriate dialog box by

selecting Transform | Recode |

Into Different Variables. In the

Recode into Different Variables

window, select the Numeric

Variable to transform, move it

into the appropriate box, enter a

new name in the Name box, and

assign it to the variable by

clicking on Change (see Recode

into ... screen in Figure 6.3).

Then select the Old and

New Values button to activate its window, which appears on top in Figure 6.3. Enter the Old

Value and the New Value into the appropriate boxes and Add them to the transformation list. In

Figure 6.3, the first two values for CON2 have already been added and the entries are made for

the final recode (i.e., group 3 to +2 in CON2). Click on Continue and then OK to invoke the

transformations. New variables and values will appear in the data window (shown in Figure 6.4).


Figure 6.4. Recoded Variables in Data Window andRegression Specifications for Contrasts.

Figure 6.4 also shows the entries for the Linear Regression window, which would be

accessed by Analyze | Regression |

Linear. Press has been moved into

the Dependent box and the two

indicator variables, CON1 and

CON2, have been moved into the

Independent(s) box. Other options

could be selected as desired; for

example, the Statistics button would

access various optional statistics,

such as descriptive statistics, and

simple and part correlations. Click

OK to run the analysis.

Box 6.5 shows the output. Note in particular that the ts and significance levels for CON1

and CON2 duplicate previous analyses, and that the Zero-order and Part correlations also

replicate previous results.

Model Summary Model R R Square Adjusted R Std. Error of Square the Estimat e 1 .872(a) .760 .680 1.41421

Model Sum of Squares df Mean Square F Sig.

1 Regression 38.000 2 19.000 9.500 .014 Residual 12.000 6 2.000 Total 50.000 8

Coefficients(a) Unstandardized Standardized t Sig. Correlations Coefficients Coefficients Model B Std. Error Beta Zero-order Partial Part 1 (Constant) 5.333 .471 11.314 .000 CON1 -1.000 .577 -.346 -1.732 .134 -.346 -.577 -.346 CON2 1.333 .333 .800 4.000 .007 .800 .853 .800

Box 6.5. Results of Windows Regression Analysis.


RECODE group (1= -1) (2= 0) (3= 1) INTO linr /group (1= -1) (2= 2) (3= -1) INTO quad.

REGRESS /VAR = press linr quad /DEP = press /ENTER /SAVE PRED(prd) RESID(res).

DF Sum of Squares Mea n SquareRegression 2 38.00000 19.00000Residual 6 12.00000 2.00000

F = 9.50000 Signif F = .0138

Variable B SE B Beta T Sig TQUAD -1.166667 .333333 -.700000 -3.5 00 .0128LINR 1.500000 .577350 .519615 2.5 98 .0408(Constant) 5.333333 .471405 11.3 14 .0000

LIST GROUP PRESS LINR QUAD PRD RES 1.00 4.00 -1.00 -1.00 5.00000 -1.00000 1.00 5.00 -1.00 -1.00 5.00000 .00000 1.00 6.00 -1.00 -1.00 5.00000 1.00000 2.00 3.00 .00 2.00 3.00000 .00000 2.00 2.00 .00 2.00 3.00000 -1.00000 2.00 4.00 .00 2.00 3.00000 1.00000 3.00 10.00 1.00 -1.00 8.00000 2.00000 3.00 8.00 1.00 -1.00 8.00000 .00000 3.00 6.00 1.00 -1.00 8.00000 -2.00000

Box 6.6. Polynomial Contrasts Using SPSS RECODE and REGRESSION.

Regression and Polynomial Contrasts for the Stimulation Study

A second type of contrast tests linear and nonlinear effects across treatment conditions.

Box 6.6 shows polynomial contrasts (or trend analysis) using the REGRESSION procedure.

Linear and

quadratic

indicator

variables are

created using

RECODE,

and the

regression

coefficients

for these

predictors test

the

significance

of the linear

and quadratic

sources of variability. Note that the ts for the regression coefficients are equal to the ts for the

contrasts in earlier analyses, and that both ts squared reproduce the Fs for the linear and quadratic

terms in earlier ANOVA summary tables.

The analyses indicate that both the linear and quadratic components are both significant.

That is, there is evidence of significant trends in the data for scores to increase with group level

(i.e., from 1 to 2 to 3) and for scores to decrease and then increase. The group means (5.0, 3.0,

and 8.0) are a composite of these two components.

The linear component is the contrast that is most common and often the most meaningful

in psychology. Examination of linear trends can lead to dramatic differences in the conclusions

relative to an omnibus F test. For example, if behavior in a classroom consistently improves

across teachers using 5 different amounts of praise, the omnibus F might be nonsignificant


because SSTreatment is divided by k - 1 = 4, whereas the linear contrast could be highly significant

because SSLinear is divided by p = 1, for the single contrast. Analogous differences between the

omnibus F and other specific contrasts have been mentioned, but these differences can be

especially large using polynomial contrasts. In general, traditional ANOVA is a very insensitive

way to analyze the results of studies involving numerical predictors. Use either polynomial

contrasts to supplement the ANOVA or the equivalent regression methods, which are always

appropriate for numerical predictors.

MULTIPLE REGRESSION ANALYSES OF INTERVIEW DATA

To provide another example of contrast analysis using regression, consider the Control

groups versus Stigmatized groups contrasts for the interview study. We had three sets of contrast

coefficients: -1 -1 +1 +1, -1 +1 0 0, 0 0 -1 +1. For the corresponding regression analysis, we will

need three predictors, one corresponding to

each set of contrast coefficients. These indicator

variables are shown in Box 6.7 for the first two

subjects in each group. The predictor I1

represents the contrast between groups 1 and 2

versus groups 3 and 4. The t or F for I1 will

correspond to the t or F for this contrast in the

Anova analyses. Moreover, r2 between I1 and

the dependent variable will correspond to eta2

for this contrast. Similar effects will be

observed for I2, corresponding to the contrast between Group 1 and Group 2, and for I3,

corresponding to the contrast between Group 3 and Group 4. The regression analysis will also

demonstrate the orthogonal nature of these contrasts. That is, all three correlations between pairs

of predictors (i.e., I1 with I2, I1 with I3, and I2 with I3) will be exactly 0.

Group Subject I1 I2 I31 1 -1 -1 01 2 -1 -1 0

...2 1 -1 +1 02 2 -1 +1 0

...3 1 +1 0 -13 2 +1 0 -1

...4 1 +1 0 +14 2 +1 0 +1

...

Box 6.7. Indicator Variables for InterviewStudy.


RECODE inst (1 = -1) (2 = -1) (3 = +1) (4 = +1) INT O o1.RECODE inst (1 = -1) (2 = +1) (3 = 0) (4 = 0) INT O o2.RECODE inst (1 = 0) (2 = 0) (3 = -1) (4 = +1) INT O o3.

REGRESSION /DESC = CORR /DEP = rate /ENTER o1 o2 o3 /SAVE PRED(prd) RESID(res) .

Correlation: RATE O1 O2 O 3O1 -.584O2 -.101 .000O3 -.210 .000 .000


DF Sum of Squares Mea n SquareRegression 3 325.80000 1 08.60000Residual 36 498.20000 13.83889

F = 7.84745 Signif F = .0004

Variable B SE B Beta T Sig TO3 -1.350000 .831832 -.210322 -1.623 .1133O2 -.650000 .831832 -.101266 -.781 .4397O1 -2.650000 .588194 -.583865 -4.505 .0001(Constant) 24.000000 .588194 40.803 .0000

Residuals Statistics: Min Max Mean Std Dev N*PRED 20.0000 27.3000 24.0000 2.8903 40*RESID -6.0000 7.0000 .0000 3.5741 40

Box 6.8. MR Analysis of Interview Data.

Box 6.8 shows SPSS commands that can be used to perform the multiple regression

equivalent of the independent groups ANOVA. The RECODE statements create three indicator

variables (k - 1 = 3) that uniquely code each of the groups. Note how the orthogonal coding used

here as indicator

variables

corresponds to one

set of contrasts that

we previously

performed in the

chapter on planned

comparisons. Each

group has a unique

code on the three

indicator variables:

Group 1 is coded as

-1 -1 0, Group 2 as -

1 +1 0, Group 3 as

+1 0 -1, and Group

4 as +1 0 +1). That

these codes are

orthogonal means

that they are independent or uncorrelated with one another.

Omnibus ANOVA Results

The correspondences between the REGRESSION and ANOVA outputs are numerous.

The SSs, dfs, Fs, and ps are all equal, and R2 = η2. In addition to being available in the ANOVA

summary table, the ANOVA SSs could also be calculated in various ways from the regression

output; for example, SSTreatment = R2SSTotal = (n - 1)s2Predicted, SSError = (1 - R2)SSTotal = (n-1)s2

Residual.

The reason for these equalities is that the indicator variables result in a best-fit equation

that produces the group means as the predicted values and the deviations from the group means


FORMAT subj TO o3 (F2.0)LIST

SUBJ INST RATE O1 O2 O3 PRD RES 1 1 26 -1 -1 0 27.3 -1.3 2 1 26 -1 -1 0 27.3 -1.3... 11 2 21 -1 1 0 26.0 -5.0 12 2 30 -1 1 0 26.0 4.0... 21 3 28 1 0 -1 22.7 5.3 22 3 17 1 0 -1 22.7 -5.7... 31 4 20 1 0 1 20.0 .0 32 4 24 1 0 1 20.0 4.0...

Box 6.9. Original and Derived Scores fromInterview Study.

as the residual values. To illustrate, the predicted score (PRD) for all observations in Group 1 is:

24 + -2.65×-1 + -.65×-1 + -1.35×0 = 27.3 = y&1. Recall that Group 1 was coded -1, -1, and 0 on

the three indicator variables. The residual scores (i.e., RES = y - í) in turn are the deviations of

observed scores from the group means.

Predicted and residual scores are presented in Box 6.9 for the first two subjects in each

group. Each group received its group mean as

the predicted value (PRD). The variation in

these PRD scores is SSRegression = SSTreatment.

Recall from our formula for calculating

SSTreatment from the group means that we

multiplied nj times the squared deviation of

each treatment mean from the grand mean. In

Box 6.9, we can see more readily the rationale

for multiplying by nj; each of the treatment

means occurs nj times in the PRD column. The

variation in the residual scores shown in Box 6.9 (RES) equals SSError because these are the

deviations of the observations from the predicted values, which are the group means.

Box 6.9 also shows more clearly the nature of the indicator variables used to code the

three groups. O1 compares groups 1 and 2 versus groups 3 and 4 (i.e., subjects 1 to 20 are coded

-1 and subjects 21 to 40 are coded +1). O2 compares group 1 versus group2 and O3 compares

group 3 versus group 4. If O1 were multiplied by O2 and their products summed, the result

would be 0, indicating that O1 and O2 are uncorrelated. Similarly, it is shown in the next section

that O1 is uncorrelated with O3, and that O2 is uncorrelated with O3. This complete lack of

correlation among the predictors defines orthogonal indicator variables.

Indicator Variable Results

In addition to the overall ANOVA, regression provides a test of the significance for each

of the indicator variables. The predictor O1 contrasts groups 1 and 2 versus groups 3 and 4. The

t in Box 6.8 for the predictor O1 agrees with the t for this contrast in the ONEWAY and other

ANOVA analyses. Moreover, the correlation between O1 and RATE, the attractiveness measure,


is equal to ηL1; that is, .5842 = .341 = ηL12. This quantity measures the SSs accounted for uniquely

by contrast 1 and could be used to calculate a SS for contrast one; specifically, r2 × SSTotal = .5842

× 824.0 = 281.03 . SSL1. The simple r2s are appropriate here because the predictors O1, O2, O3

are orthogonal; note in Box 6.8 that the three rs for the correlations between pairs of predictors

are all zero. Therefore, the simple correlations are identical to the part correlations and also

equal the ηs for the contrasts. The ts, ps, and rs for the three indicator variables reproduce the

corresponding output from the earlier contrast analysis, again showing the close correspondence

between multiple regression and ANOVA approaches to data analysis. Just as we found earlier,

only indicator O1 (i.e., the first contrast between 1&2 vs 3&4) is significant; that indicator was -1

-1 +1 +1 for the four groups. In essence, O1 compares groups 1 and 2 combined (No Instructions

and Job Interview) with groups 3 and 4 combined (Psychiatric and Parole Interviews). The

comparisons between No Instructions vs. Job Interview (O2) and between Psychiatric vs. Parole

(O3) are not significant.

Although we could have requested ZPP statistics, little would be added by obtaining that

information. Because the indicator variables (i.e., contrasts) are orthogonal, the zero-order, part,

and partial correlations are all identical. Requesting CHANGE statistics and entering the

predictors sequentially (e.g., O1 last) would give F tests that again correspond to our earlier

contrast analyses, as shown in the following analysis.

One way to obtain the alternative F tests for the contrasts would be to enter predictors

REGRESS /STAT = DEFA CHANGE /DEP = rate /ENTER o2 o 3 /ENTER o1.

R R Adjusted Std. Error of Chan ge Statistics Square R Square the EstimateModel R 2 Change F Change df1 df2 Sig. 1 .233(a) .054 .003 4.58876 .054 1.066 2 37 .355 2 .629(b) .395 .345 3.72007 .341 20.298 1 36 .000

Model Sum of Squares df Mean Square F Sig.

1 Regression 44.900 2 22.450 1.0 66 .355(a) Residual 779.100 37 21.057 Total 824.000 39

2 Regression 325.800 3 108.600 7.8 47 .000(b) Residual 498.200 36 13.839 Total 824.000 39

Box 6.10. Regression Analysis with Change Statistics.


Figure 6.5. Initiating Recodes via Menu.

Figure 6.6. Select Variables to Convert Dialogue Box.

successively and request SPSS to report CHANGE statistics. The change statistics for whichever

predictor is added last will correspond to the tests of interest. Box 6.10 shows the Windows

output from such an analysis. The /STAT = DEFAULT CHANGE requests the default and

change statistics. First o2 and o3 are entered (/ENTER o2 o3), and then o1 is added to the

equation (/ENTER o1).

We are primarily interested in the Change Statistics for Model 2, that is, the model in

which o1 has been added to o2 and o3. FChange = 20.298 . 4.5052 = to12 and rChange

2 = .341, which

agrees with earlier values. From this regression analysis, we can also calculate SSChange = SSy.123 -

SSy.23 = 325.80 - 44.90 = 280.90, which also agrees with earlier calculations and printouts.

Indicator Variables and Regression Analysis using Menus

The use of RECODE and

other SPSS data modification

commands is generally the

easiest way to create indicator

variables. It is possible,

however, to perform these

operations via Menus as well.

Figure 6.5 shows the initial

steps to constructing a new

indicator variable: Transform |

Recode | Into Different

Variables. These commands provide

access to the dialogue box shown in

Figure 6.6.

Figure 6.6 shows a dialogue

box in which all variables (initially)

are listed in the left-hand box. The

INST variable was selected and

moved into the conversion box in the


Figure 6.7. Recoding via Menus.

center via the arrow button (which points left or right depending on which box has variable

names selected). Once INST was in the center box, the name of the output variable (o1 here) was

typed into the Output Variable Name slot and the Change button pressed. This created the

transformation equivalency shown in the center box (i.e., inst -> o1).

The next step is to specify the translation from old (i.e., original) values to new. Clicking

on the Old and New Values button in Figure 6.6 will bring up the dialogue box shown in Figure

6.7. The screen shot shows

the final step in the creation of

indicator variable o1.

Previously, old and new values

have been specified in the left

and right Value slots,

respectively, and then Added

to the Old->New box. All that

remains is to click on Add to

complete o1. The process

would be repeated for o2 and o3.

One useful option in using the Menu options to perform Recodes is to save the statements

actually generated by this process. The output screen will contain the syntax generated by the

Recode screens, and these statements can be copied to a Syntax window and saved in case the

transformations need to be repeated at some later time. Except for very simple translations,

RECODEs are generally done more easily by syntax than by dialogue boxes.

The actual analysis could be performed by syntax or via the menu commands illustrated

earlier in Figure 6.2. The only difference would be that three predictors (o1, o2, o3) would be

entered into the Independent(s) box. Box 6.11 shows the results of the analysis. The various

equivalencies reported previously for Box 6.9 are again present in the output. Notably, F = 7.847

reproduces the ANOVA analysis. There are a few additional features in Box 6.11 that are worth

mentioning. First, note that the means for o1, o2, and o3 are all 0.0. Second, note that the

correlations between o1 and o2, between o1 and o3, and between o2 and o3 are all 0. The three


indicator variables are completely independent of one another (i.e., they are orthogonal). One

positive consequence of this is that the overall difference among the means, represented by R2 =

.395 can be partitioned into three orthogonal components. Indicator o1 accounts for .5842 ×

824.0 = .341 × 824.0 = 281.03 units of variability, o2 accounts for .1012 × 824.0 = .010 × 824.0 =

8.41 units, and o3 accounts for .2102 × 824.0 = .044 × 824.0 = 36.34 units. Note that 281.03 +

8.41 + 36.34 .325.8, the SSTreatment/Regression, and that .341 + .010 + .044 = .395 = R2 for the overall

analysis. Later chapters discuss more fully this partitioning of SSTreatment into independent

components.

Polynomial Regression for the Interview Study

We also conducted planned contrasts for the interview study assuming an order factor for

which polynomial contrasts would be appropriate. The k - 1 contrasts corresponded to the linear,

quadratic, and cubic trends in the data. The actual coefficients were obtained from Appendix A-

5, which lists polynomial contrasts for varying values of k. The specific values were: -3 -1 +1 +3

REGRESSION /VARIABLES = rate o1 o2 o3 /DESC = CORR /DEP = rate /ENTER o1 o2 o3 /SAVE PRED(prd) RESID(res).

RATE O1 O2 O3 Pearson RATE 1.000 -.584 -.101 -.210 O1 -.584 1.000 .000 .000 O2 -.101 .000 1.000 .000 O3 -.210 .000 .000 1.000

Model R R Square Adjusted R Square Std. Erro r of the Estimate 1 .629(a) .395 .345 3.7200657

Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.84 7 .000(a) Residual 498.200 36 13.839 Total 824.000 39

Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Er ror Beta 1 (Constant) 24.000 .588 40.803 .000 O1 -2.650 .588 -.584 -4.505 .000 O2 -.650 .832 -.101 -.781 .440 O3 -1.350 .832 -.210 -1.623 .113

Residuals Statistics(a) Minimum Maximum Mean Std. Deviation N Predicted Value 20.000000 27.299999 24.000000 2.8903021 40 Residual -6.000000 7.000000 .000000 3.5741235 40

Box 6.11. Windows Regression Output for Interview Study.


for the linear, +1 -1 -1 +1 for the quadratic, and -1 +3 -3 +1 for the cubic. These contrasts define

linear, quadratic, and cubic patterns, and are orthogonal to one another.

Box 6.12 shows the default MANOVA output for this analysis. The results show a highly

significant

linear effect

(Parameter = 2),

but neither the

quadratic nor

the cubic effect

approach

significance.

Box 6.13 shows the regression equivalent for this set of contrasts. We first create the

three indicator variables, using now the coefficients for the linear, quadratic, and cubic contrasts.

The omnibus ANOVA again agrees with earlier regression and ANOVA analyses. The equation

has changed somewhat, because the coefficients now represent the polynomial effects. The t-

tests for these coefficients correspond to the polynomial contrasts done in Box 6.12. Note the

MANOVA RATE BY INST(1 4) /CONTR(INST)=POLY.

Tests of Significance for RATE using UNIQUE sums o f squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 498.20 36 13.84 INST 325.80 3 108.60 7.85 .000 (Model) 325.80 3 108.60 7.85 .000 (Total) 824.00 39 21.13 R-Squared = .395 Adjusted R-Squared = .345

INST Parameter Coeff. Std. Err. t-Value Si g. t Lower -95% CL- Upper

2 -5.6348913 1.17639 -4.78999 .0 0003 -8.02072 -3.24907 3 -.70000000 1.17639 -.59504 .5 5554 -3.08583 1.68583 4 .581377674 1.17639 .49421 .6 2416 -1.80445 2.96720

Box 6.12. Polynomial Contrasts for Interview Study.

RECODE inst (1 = -3) (2 = -1) (3 = +1) (4 = +3) INTO lin.RECODE inst (1 = +1) (2 = -1) (3 = -1) (4 = +1) INTO qua.RECODE inst (1 = -1) (2 = +3) (3 = -3) (4 = +1) INTO cub.

REGRESSION /VARIABLES = rate lin qua cub /DESC = CORR /DEP = rate /ENTER.

RATE LIN QUA LIN -.621 QUA -.077 .000 CUB .064 .000 .000

Model R R Square Adjusted R Square Std. Error of the Estimate 1 .629(a) .395 .345 3.7200657

Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.847 .000(a) Residual 498.200 36 13.839 Total 824.000 39

Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta 1 (Constant) 24.000 .588 40.803 .000 LIN -1.260 .263 -.621 -4.790 .000 QUA -.350 .588 -.077 -.595 .556 CUB .130 .263 .064 .494 .624

Box 6.13. Polynomial Contrasts by Regression.


Number of Groups (k) 2 3 4

Dummy Coding D1 D1 D2 D1 D2 D3 1 0 0 0 0 0 0

Group 2 1 1 0 1 0 0 3 0 1 0 1 0 4 0 0 1

Effect Coding E1 E1 E2 E1 E2 E3 1 1 1 0 1 0 0

Group 2 -1 0 1 0 1 0 3 -1 -1 0 0 1 4 -1 -1 -1

Orthogonal Coding O1 O2 O1 O2 O3 1 -1 -1 -1 -1 -1

Group 2 1 -1 1 -1 -1 3 0 2 0 2 -1 4 0 0 3

Box 6.14. Indicator variables for MR ANOVA.

equivalence of the ts and the ps. The strength of the relationships can be determined from the rs

between the predictors and the dependent variable (rate) in the correlation matrix. These simple

rs suffice because the predictors are orthogonal.

Nonorthogonal Predictors

Although orthogonal predictors have many benefits (i.e., the same benefits as orthogonal

contrasts), multiple regression is powerful enough to conduct ANOVA with indicator variables

that are not

orthogonal.

Various ways to

construct indicator

variables are

shown in Box 6.14.

All of these codes

produce exactly the

same output for the

overall regression

and ANOVA parts

of the analysis;

they differ

somewhat with respect to intercepts and slopes. The various codes produce identical statistical

omnibus results because, irrespective of the indicator variable, the predicted scores are the group

means for all nj individuals in each of the groups (i.e., MSRegression = MSTreatment) and the residual

scores are deviations from the group means (i.e., MSResidual = MSError).

Three types of indicator variable (among many) are: dummy, effect, and orthogonal

coding. In dummy coding, a comparison group (often the first group) receives 0 on all k - 1

predictors (e.g., D1 = 0 and D2 = 0 for the first of three groups). Other groups receive 1 on

exactly one predictor and 0 on others (e.g., D1 = 1 and D2 = 0 for the second of three groups).

As in dummy coding for one variable, the intercept will equal the mean of the group that receives

all 0s on the dummy indicator variables and the slopes will represent deviations from that


comparison group's mean. Dummy indicator variables are not orthogonal.

In effect coding, one group (sometimes the last) receives -1 on all k - 1 indicator

variables. Other groups receive 1 on exactly one indicator and 0 on the others. To illustrate, in a

three-group design, E1 = 1 and E2 = 0 for group 1, E1 = 0 and E2 = 1 for group 2, and E1 = -1

and E2 = -1 for group 3. The intercept for effect coding is the grand mean, and the slopes

represent deviations from the grand mean. Effect codes are not orthogonal.

There are numerous types of orthogonal indicator variables, including the contrast

coefficients that we have discussed already. The defining characteristic is that orthogonal

predictors are all independent of one another (i.e., r = 0 for all pairs of predictors). In the

orthogonal codes shown in Box 6.14, groups are compared to one another and then their average

is compared to some other group. The O1 codes for k = 3, for instance, compare groups 1 and 2

and the O2 codes compare the average of groups 1 and 2 to the mean for group 3.

Box 6.15 illustrates the use of non-orthogonal indicator variables. Dummy coding has

been used, with the reference group being group 1 (the No Instruction condition). The omnibus

ANOVA again agrees with earlier results. As expected, the intercept is the mean for Group 1

RECODE inst (1 = 0) (2 = +1) (3 = 0) (4 = 0) INTO d12.RECODE inst (1 = 0) (2 = 0) (3 = +1) (4 = 0) INTO d13.RECODE inst (1 = 0) (2 = 0) (3 = 0) (4 = +1) INTO d14.REGRESSION /VARIABLES = rate d12 d13 d14 /DESC = CO RR /STAT = DEFAU ZPP /DESC = CORR /DEP = rate /ENTE R.

RATE D12 D13 D14 Pearson RATE 1.000 .254 -.165 -.509 D12 .254 1.000 -.333 -.333 D13 -.165 -.333 1.000 -.333 D14 -.509 -.333 -.333 1.000

Model R R Square Adjusted R Square Std. Erro r of the Estimate 1 .629(a) .395 .345 3.7200657

Model Sum of Squares df Mean Square F Sig. 1 Regression 325.800 3 108.600 7.84 7 .000(a) Residual 498.200 36 13.839 Total 824.000 39

Unstandardized Standardized t Sig. Correlations Model B Std. Error Beta Zero-order Partial Part 1 (Constant) 27.300 1.176 23.207 .000 D12 -1.300 1.664 -.124 -.781 .440 .254 -.129 -.101 D13 -4.600 1.664 -.439 -2.765 .009 -.165 -.419 -.358 D14 -7.300 1.664 -.696 -4.388 .000 -.509 -.590 -.569

Box 6.15. Nonorthogonal Dummy Indicator Variables for Interview Study.


(27.30) and the regression coefficients represent the deviation of each of the other groups from

that mean (e.g., D14 = -7.30 = 20.00 - 27.30). The t-tests inform us that groups 3 and 4 differ

significantly from the control condition, but not group 2 (i.e., the other “control” group).

The comparisons are not orthogonal. Note in Box 6.15 that the three indicator variables

all correlate -.333 with one another, and that the part correlations are no longer equal to the

simple correlations. Because the predictors are not orthogonal, the sum of pr2 is not equal to R2.

Specifically .1012 + .3582 + .5692 = .462, which does not equal R2 = .395. We have not

partitioned SSRegression (i.e., SSTreatment) into k - 1 orthogonal components. The comparisons are not

orthogonal in essence because they all involve a comparison against group 1. Hence, that

group’s variation from the grand mean contributes to all three comparisons. Other possible non-

orthogonal comparisons might have accounted for less than 39.5% of the variability if the

comparisons “missed” important variation in the means.

The

comparisons in

Box 6.15 all

involve pairwise

comparisons;

therefore, we

might expect

some

correspondences

between these

results and the LSD post hoc tests. Box 6.16 shows that this is indeed the case. The results in

Box 6.15 agree exactly with the first three LSD comparisons (i.e., t-tests). The p values are

identical, as are the standard errors. Moreover, the differences between the means in Box 6.16

correspond to the unstandardized regression coefficients in Box 6.15. In fact, any indicator

variable that involves a pair-wise comparison will correspond to the LSD results for that

particular comparison. O2 in Box 6.8, for example, also produces p = .440 because it involves

the comparisons between groups 1 and 2 (i.e., O1 = -1 +1 0 0).

ONEWAY rate BY inst /POSTHOC = LSD.


LSD Mean Difference Std. Sig. 9 5% Confidence Interval (I-J) Error (I) INST (J) INST L ower Bound Upper Bound 1.0000 2.0000 1.300000 1.6636640 .440 - 2.074067 4.674067 3.0000 4.600000(*) 1.6636640 .009 1 .225933 7.974067 4.0000 7.300000(*) 1.6636640 .000 3 .925933 10.674067 2.0000 3.0000 3.300000 1.6636640 .055 - .074067 6.674067 4.0000 6.000000(*) 1.6636640 .001 2 .625933 9.374067 3.0000 1.0000 -4.600000(*) 1.6636640 .009 - 7.974067 -1.225933

Box 6.16. Post Hoc LSD Comparisons for Interview Study.


CONCLUSIONS

In this chapter we have demonstrated how multiple regression (i.e., the General Linear

Model) can be used to compute ANOVA for comparisons between means, including not only the

omnibus F test, but also multiple comparisons of interest. We especially noted the

correspondence between the use of indicator variables and planned comparisons (or contrasts).

Using the coefficients from planned contrasts as indicator variables in multiple regression is

equivalent to the calculations conducted in Chapter 5. Or more properly, the short-hand

calculations introduced in Chapter 5 are equivalent to multiple regression results with predictors

corresponding to the contrasts. We also briefly noted that even the post hoc procedures of

Chapter 4 were equivalent to indicator variables that test the significance of differences between

pairs of means.


Figure 6.8. Error Bar Plot of Means.

Figure 6.9. Plot of Ratings Against LinearContrast.

APPENDIX 6.1: GRAPHING CONTRASTS AND MEANS

The idea that contrasts attempt to

capture variability in scores by matching

patterns present in the data can be illustrated

graphically. Figure 6.8 shows an Error Bar

plot of the four means from the interview

study. The pattern of variability in the

means, which determines SSTreatment, can be

characterized in various ways, but Figure 6.8

shows that it is essentially linear. That is,

means decrease from group 1 to group 4 in

approximately equal steps. Although this

pattern is being identified here in a post hoc

manner (i.e., by looking at the data), it is

important to remember that planned contrasts are determined a priori (i.e., before the data is

examined based on theory or past findings).

The coefficients for the linear contrast

are -3, -1, 1, and 3, a linear pattern that will

correlate quite highly with the pattern in the

means (r = -.621). Figure 6.9 shows the

scattergram relating ratings to the linear

contrast. The linear regression (solid line in

Figure 6.9) corresponds quite well to the

trend in the means (dashed line in Figure

6.9).


Figure 6.10. Plot of Ratings AgainstCoefficients for Groups 1 and 2 versus Groups 3and 4.

Figure 6.11. Plot of Ratings AgainstCubic Coefficients.

Another set of coefficients that worked

well (and might be more likely to be generated

a priori) contrasts groups 1 and 2 with groups

3 and 4 (-1 -1 +1 +1). Figure 6.10 shows the

ratings plotted against that contrast. These

contrast coefficients account for 34.1% of the

variability in the scores, versus 38.5% for the

linear contrasts (which, as noted, are not likely

to be predicted a priori for this study). In

Figure 6.10, the groups have been given

different size symbols (smallest = Group 1

and largest = Group 4) just to illustrate that

the coefficients (-1 -1 1 1) combine groups 1

and 2, and combine groups 3 and 4, and that it

is the pairs of conditions that are contrasted to one another.

Finally, to illustrate a set of coefficients that

does not capture much of the variability in means,

Figure 6.11 plots ratings as a function of the cubic

coefficients (1 -3 +3 -1) used in the polynomial

regression. The groups are again represented by

symbols of different sizes to emphasize that the

groups are now ordered 3, 1, 4, and 2 from left to

right, an ordering that produces a weak linear

relationship between the means and the contrast

coefficients, r2 = .004 = .0642 (.064 is the correlation

between ratings and cubic coefficients, see Box

6.13).

R A V (ANOVA)ion.uwinnipeg.ca/~clark/teach/4100/_Winter/40Chap-1to6.pdf · represent the degree of...

Documents

Transcript of R A V (ANOVA)ion.uwinnipeg.ca/~clark/teach/4100/_Winter/40Chap-1to6.pdf · represent the degree of...