The Fisher, Neyman-Pearson Theories of Testing Hypotheses
-
Upload
shayan-ahmad -
Category
Documents
-
view
216 -
download
0
Transcript of The Fisher, Neyman-Pearson Theories of Testing Hypotheses
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 1/9
The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?
Author(s): E. L. LehmannReviewed work(s):Source: Journal of the American Statistical Association, Vol. 88, No. 424 (Dec., 1993), pp. 1242-1249Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2291263 .
Accessed: 08/01/2013 10:38
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 2/9
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 3/9
Lehmann:
Theories
of Testing Hypotheses
1243
Fisher
with series
f papersculminating
n his book
Sta-
tisticalMethods or
Research Workers
1925),
in whichhe
created
new paradigm
forhypothesis esting.
e greatly
extended
he
applicability
f the
t test
to
the
two-sample
problem
nd
the testing f regressionoefficients)
nd
gen-
eralized t to the esting
fhypotheses
n the nalysis f
vari-
ance.
He
advocated
5% as the standard
evel with 1%as a
more stringent lternative); hrough pplyingthis new
methodology
o a
varietyfpractical xamples,
e established
it as
a highly opular tatistical
pproachformanyfields
f
science.
A question hat
Fisherdid notraise
was the origin
f his
test tatistics:Why
these rather han some
others?
his is
the
uestion
hatNeyman
nd Pearson onsidered
nd
which
(after
ome
preliminary
ork
n
Neyman
nd
Pearson 1928)
they ater
answered Neyman
and Pearson
1933a). Their
solution
nvolved
not
only
hehypothesis
ut also a class
of
possible
alternatives
nd
the
probabilities
f two
kinds of
error:
alse
ejectionError
) and false cceptance
Error
I).
The "best"testwas one thatminimized A ErrorI) subject
to a bound on
PH
Error ),
the atter eing
he
ignificance
level
of he est.
hey ompletely
olved his roblem
or he
case of testing simple i.e.,
single
distribution) ypothesis
against
a simple
alternative y
means of
the
Neyman-
Pearson emma.
For more complex
situations, he
theory
required dditional
oncepts,
nd working
ut thedetails
f
this
program
was an important oncern
of mathematical
statistics
n
the
following
ecades.
The Neyman-Pearson
ntroduction
o the two
kinds of
error ontained
brief tatement
hatwas to become
the
focus of much later debate.
"Without hoping
to know
whetherach
separate ypothesis
s true
r false", he uthors
wrote, we may earch or ules o govern ur behaviorwith
regard
o them,
n
following
whichwe insure
hat,
n the
long
run of experience,
we shall
not be too oftenwrong."
And in this nd the following aragraph hey
efer o a
test
(i.e.,
a rule to reject
r accept the
hypothesis)s "a
rule of
behavior".
3.
INDUCTIVE
NFERENCE
VERSUS NDUCTIVE
EHAVIOR
Fisher onsidered tatistics,
he science of uncertain
n-
ference,
ble to
provide
key
to
the
ong-debated roblem
of nduction.
e
started
ne
paper
Fisher
1932,p. 257)
with
the tatementLogicianshave ongdistinguishedwomodes
ofhuman
reasoning,
nder he
respective
ames
fdeductive
and inductive
easoning.
.
.
In
inductive
easoning
e at-
tempt
o
argue
fromhe
particular,
hich
s
typically
body
of observationalmaterial,
o
the
general,
which
s
typically
a
theory pplicable
o
future
xperience."
He
developed
his
ideas
n
moredetail
n
a
laterpaper Fisher
1935a, p. 39)
. .
.everyone
who does
habitually ttempt
he difficultask
of
making
ense
of
figures
s,
in
fact, ssaying
logicalprocess
f
the kind
we
call
inductive,
n
that
he
is
attempting
o draw n-
ferences
rom
he
particular
o
the
general.
uch
inferences
e
recognize o be uncertainnferences.
..
He continued
n the next
paragraph:
Although ome
uncertainnferencesan
be rigorously
xpressed
in terms f mathematical robability,t
does not follow
hat
mathematical
robabilitys an
adequateconcept
or herigorous
expression
funcertain
nferencesf every
kind.
.
.
The in-
ferences
f the lassical heory
fprobabilityre all deductive
n
character.
hey re tatementsbout
hebehaviour f
ndividuals,
or samples,
r sequences
of samples,
drawnfrom opulations
which re fully nown.
. . More generally,
owever,
math-
ematical uantity
f different
ind,which have
termedmath-
ematical
ikelihood, ppears
to take tsplace [i.e.,
theplace of
probability]s
a measure frational
eliefwhen
we arereasoning
from he ample o thepopulation.
Neyman id
notbelieve n
theneedfor
special nductive
logic
but felt hat
he usual processes
f deductive
hinking
should uffice.
ore specifically,e had
no use for isher's
idea
of ikelihood.
n his discussion
f Fisher's 1935
paper
(Neyman,
1935,
p. 74, 75) he expressed
he thought
hat t
should be possible
to construct
theory
f mathematical
statistics . . based
solelyupon
the theory f probability,"
and
wenton to
suggest
hat
he basis
for
uch
a theory an
be
provided
by
"the conception f frequency
f errors
n
judgment."
This was the pproach
hat
he and Pearson
had
earlier
escribed
s "inductive ehavior";
n
the
case of hy-
pothesisesting,hebehavior onsisted f ither ejectinghe
hypothesis
r provisionally)
ccepting
t.
Both Neyman
and Fisher onsidered
he distinction
e-
tween inductive
ehavior" nd
"inductive
nference"
o ie
at the center
f their
disagreement.
n
fact,
n
writing
et-
rospectively
bout
the dispute,
Neyman 1961,
p. 142)
said
that
the
subject
of the
dispute
may
be symbolized y
the
opposing
erms inductive
easoning"
nd "inductive
be-
havior."
How strongly
isher
felt bout this distinction
s
indicated
y
his
statement
n
Fisher 1973, p. 7)
that
there
is something
orrifying
n the deological
movement epre-
sented
by
the doctrine
hat
reasoning, roperly
peaking,
cannotbe appliedto empiricaldata to lead
to inferences
valid
n
the
real world."
4. FIXEDLEVELS
ERSUS
VALUES
A distinction requently
ade between
he
pproaches
f
Fisher nd Neyman-Pearson
s
that
n
the atter
he test s
carried
ut
at a
fixed
evel,
whereas he
principal
utcome
ofthe former
sthe tatementf a p
value that
may
or
may
notbe followed
y pronouncement
oncerningignificance
of the result.
The
history
f
this
distinction
s
curious.Throughout
he
19th entury,
esting
was carried ut rather
nformally.
t
was roughly quivalent o calculating n (approximate)
value
and rejecting
he hypothesis
f
thisvalue appeared
o
be
sufficiently
mall.
These
early
pproximate
methods
e-
quired
only
table
of the
normaldistribution.
ith he d-
vent
f
exact mall-sample ests,
ables
f
X2,
t,F,
. . were
also
required.
Fisher,
n
his
1925
book
and
later,greatly
reduced
he
needed tabulations y providing
ables
not of
the distributions
hemselves ut
of selected
uantiles.
For
an
explanation
f
this
very
nfluentialecision
by
Fisher
ee
Kendall
1963].
On
the ther and
Cowles nd
Davis
[
1982]
argue
that conventional
evels of
three
probable
errors
r
two standard
deviations, oth
roughly quivalent
in the
normal
case] to 5%
were already n place
before
Fisher.)
These tablesallow the calculationonly of ranges or hep
values;
however, hey re exactly
uited or etermining
he
This content downloaded on Tue, 8 Jan 2013 10:38:47 AMAll use subject to JSTOR Terms and Conditions
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 4/9
1244
Journal
of the American
Statistical
Association, December 1993
critical alues at which
he
tatistic nder onsideration e-
comes significant
t a
given evel. As Fisherwrote
n
ex-
plaining he use of his
X2
table 1946, p. 80):
In
preparing
his
ablewe have borne n mind hat n
practice
we
do not want
o know he exact value of P for
ny
observed
2,
but,
n thefirst
lace,
whether r not theobserved alue s
open
to suspicion.fP is
between
1
and
9,
there s
certainly
o
reason
to suspect he hypothesisested. f t is below 02, it s
strongly
indicated hat hehypothesisails o accountfor hewholeof he
facts.We shall
not often e
astray
fwe
draw
conventional
ine
at .05 and consider
hat
higher alues
of
x2 indicate
real
dis-
crepancy.
Similarly,
e
also
wrote
1935, p. 13)
that it
s usual
and
convenient or
xperimenters
o take5
percent
s a standard
level of significance,n the sense that hey re prepared o
ignore ll resultswhichfail o reach his tandard
. ."
Fisher'sviews and those of some of his contemporaries
are
discussed
n
moredetail
by
Hall and
Selinger 1986).
Neyman
nd
Pearson
ollowed
isher's doption
f
fixed
level.
n
fact, earson 1962, p. 395) acknowledged
hat
hey
were nfluencedy
"
[Fisher's] ables f5 and 1% ignificance
levelswhich ent hemselveso the dea of hoice,
n
advance
of
experiment,
f
the
risk
of
the
first
ind
of
error'
which
theexperimenter as prepared
o take." He
was
even more
outspoken
n a letter o
Neyman
of
April28,
1978
unpub-
lished; n the Neyman collection f the Bancroft ibrary,
University
f
California, erkeley):
If
therehad not
been
these
%
tables available
when
you
and
I
started
work on
testing tatistical ypotheses
n
1926,
or when
you
were
starting
o talk on confidence
ntervals, ay
in
1928,
how
much
moredifficult
t
would have been for
s
The
concept
of
the
control f 1stkind of
error
would not have come
so
readily oryour dea of following ruleofbehaviour.
Anyway, ouand I mustbe gratefulor hose wotables n
the 1925 StatisticalMethods forResearchWorkers." For
an
idea of
what the
Neyman-Pearson heorymighthave
looked like had
it
been based on p values insteadof fixed
levels, ee Schweder 988.)
It is interestingo note thatunlikeFisher,Neyman and
Pearson
1 933a, p. 296)
did notrecommend
standard
evel
but
suggested
hat how
the
balance
between
he
wo
kinds
of
error]
houldbe struckmust
be left o the
nvestigator,"
and
(1933b, p. 497)
"we
attempt
o
adjust
the balance be-
tween he
risks
I
and
PI,
to
meet the
type
of
problem
e-
fore
s."
It is thus urprisinghat nSMSI Fisher 1973, p. 44-45)
criticized
he
NP
use of a
fixed
onventional evel. He ob-
jected
that
the
ttempts
hat
have
been made
to
explain
he
ogency
f
ests
of
significance
n
scientific
esearch, y
reference
o
supposed
frequencies
f
possible tatements,ased
on
them, eing ight r
wrong, hus seemto missthe essentialnature f such tests.A
man who
rejects' hypothesis rovisionally,
s a matter f ha-
bitual
ractice, hen he ignificance
s
1%
or
higher,
ill
ertainly
be
mistaken
n
notmore han
1%
f
uch ecisions. . .
However,
the alculation
s
absurdlycademic,
or n
fact o
scientificorker
has a
fixed evel
of
significance
t whichfrom
ear o year, nd
in
all
circumstances,e rejectshypotheses; e rather iveshis
mind oeach
particular ase
inthe
ight
f
his evidence nd
his
ideas.
The diffierenceetween he reportingf a p value or that
of a statement f acceptance r rejection f the hypothesis
was linked by Fisher n
Fisher 1973, pp.
79-80), to the
distinction
etweendrawing onclusionsor
makingdeci-
sions.
The
conclusions rawnfrom uch
tests onstitutehe stepsby
which
the researchworker ains a
better nderstandingf his
experimentalmaterial,
nd of theproblemswhich
t presents.
. More recently,ndeed,
considerable ody ofdoctrine as
attempted o explain, r rather o reinterpret,hese ests n the
basis
of
quite a different
odel,namely s means o
making e-
cisions
n
an
acceptance rocedure.
Responding
o earlier
ersions
f
these
nd
related
bjec-
tions yFisher o the
Neyman-Pearson ormulation,earson
(1955, p. 206)
admitted hat the terms
acceptance" and
"rejection"
wereperhapsunfortunately
hosen,but of his
joint
work
with
Neyman
he
said
that "from he
start
we
sharedProfessor
isher's
view that
n
scientific
nquiry,
statistical est s
a
means
of
earning'
and
"I
would
agree
that some of our wordingmay have been
chosen
inade-
quately,
ut do not
hink
hat ur
position
n
some
respects
wasoris so very ifferentrom hatwhichProfessor isher
himself as
now
reached."
The distinctions nderdiscussion
re ofcourserelated
o
the
rgument bout "inductive nference" s. "inductive
e-
havior,"but
in
thisdebate Pearson
refused
o
participate.
He concludes his
responseto Fisher's 1955 attack
with:
"Professor isher's
final
riticism
oncerns
he
use of
the
term inductive
ehavior';
his
s Professor
eyman's
field
rather
han
mine."'
5. POWER
As was mentioned
n
Section
2,
a
central
onsideration
of theNeyman-Pearsonheorysthat ne must pecify ot
only
he
hypothesis
but
lso
the
lternatives
gainst
which
it
s
to
be tested.
n
terms f the
alternatives,ne can then
define he
type
I error
false cceptance)
nd the
power
of
the
test the
rejection robabilitys
a
function
f the
alter-
native). This
idea
is now
fairly enerally
ccepted
for
ts
importance
n
assessing
he
chance of
detecting
n
effect
(i.e.,
a
departure
rom
H) when
t
exists, etermining
he
sample
size
required
o
raise this chance to an
acceptable
level, nd
providing criterion nwhich obasethechoice
of an
appropriate
est.
Fisher
never
wavered
n
his strong pposition o
these
ideas. Following re some of
his objections:
1. A
type
I
error
onsists
n
falsely cceptingH,
and
Fisher 1935b, p.
)
emphasized
hat here
s
no reasonfor
"believing
hat
hypothesis
as been
proved
o be true
merely
because
t
s
not contradicted
y
the
vailablefacts." his s
of
course
orrect,
ut t
does
not
diminish
he
usefulness
f
power
alculations.
2. A
secondpoint isher aised
s,
n modem
erminology,
that
he
power
annot be
calculated
because
it
depends
on
the unknown lternative. or
example Fisher
1955,
p. 73),
he wrote:
The frequency
f he 1 t lass type error] . . is
calculable nd
thereforeontrollable
imply rom he specification
f thenull
hypothesis. he frequency f the 2nd kind must depend
..
greatlyn how closely
hey rivalhypotheses]esemble henull
This content downloaded on Tue, 8 Jan 2013 10:38:47 AMAll use subject to JSTOR Terms and Conditions
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 5/9
Lehmann:
Theories
of Testing Hypotheses
1245
hypothesis.
uch errors re therefore
ncalculable
. . merely
from he specification
f the null
hypothesis,
nd
would never
havecame
intoconsideration
n the theory
nly of tests
f
sig-
nificance,
ad the ogicof uch ests
otbeen confusedwith
hat
of cceptance rocedures.
He discussed
he ame
point
nFisher
1947,p.
16-17.)
Fisherwas
of course aware
of the
mportance f power,
as is clearfrom he following emarks1947, p. 24): "With
respect
o therefinements
ftechnique,
we
have seen above
that hese ontribute
othing
o thevalidity
f he xperiment
and of
the test
of
significance
y
which
we determine
ts
result.
heymay,
however, e
important,nd even
essential,
in permitting
hephenomenon
nder
est o manifest
tself."
The section
n which his
tatement
ppears s tellingly
n-
titled Qualitative
Methods of Increasing
ensitiveness."
Fisher ccepted
the mportance
f the concept
but denied
thepossibility
f
assessing
t
quantitatively.
Later
n the same
book Fisher
made a very imilar
dis-
tinction egarding
he choice
of
test.
Under
the heading
"Multiplicity
f
Tests
of the Same
Hypothesis,"
e
devoted
a section sec. 61) to this opic.Here again,without sing
the term, e referred
o alternatives
hen he wrote
Fisher
1947,p. 182)
that
we may now observe hat
he ame
data
may ontradict
he
hypothesis
n anyof number
fdifferent
ways."
Afterllustrating
ow different
estswould
be appro-
priate
or ifferent
lternatives,
e continued
p.
185):
The notion hat
differentests
f
significance
re
appropriate
o
test
differenteatures
f the same
null
hypothesis
resents
o
difficulty
o workers ngaged
n
practical
xperimentation
ut
hasbeen he
ccasionof
much heoretical
iscussionmong
tat-
isticians.
he reason for his
diversityf
view-points perhaps
that he xperimenter
s thinking
n
terms fobservational
alues,
and saware fwhat bservationaliscrepancy
t s
which
nterests
him, ndwhichhe thinksmay be statisticallyignificant,efore
he nquires
hat est f ignificance,
f ny, s available
ppropriate
to
his
needs.He is, therefore,
ot usually
oncernedwith
he
question:
To
what
observational
eature hould
test f
signifi-
cance be
applied?
The idea that
here
s no
need for theory
ftest hoice,
because
an
experienced
xperimenter
nows
what
s the
p-
propriate
est, s expressed
more
trongly
n
a
letter o W. E.
Hick ofOctober1951 Bennett
990,
p. 144),
who,
n
asking
about
"one-tail"
s. "two-tail"
nX
2,
had
referredo
his
ack
of
knowledge
oncerning
the
theory
f critical
regions,
power,
tc.":
I am a little
orry
hatyou
havebeenworryingourself
t all with
that unnecessarilyortentouspproachto tests f significance
represented
y
the Neyman
nd Pearson
ritical
egions,
tc.
n
fact,
and
mypupils
hroughout
heworld
would
never
hink
f
using
hem.
f I
am asked
to
give
an
explicit
eason
for his
should
ay
hat
heypproach
he
roblem
ntirely
rom he
wrong
end,
.e.,
notfrom he
point
f view
f a research
orker,
ith
basis
of well
grounded
knowledge
n
which
very
luctuating
population
f
conjectures
nd
incoherentbservations
s
contin-
ually
nder
xamination.
n these ircumstances
he
xperimenter
does
knowwhat bservation
t s
that ttracts is
ttention.
What
he needs s a confident
nswer
o the
question
ought
to
take
any
notice f hat?"This
question
an,
of
course,
nd
for
efine-
ment
f
houghthould,
e
frameds "Is
this
articular
ypothesis
overthrown,nd
f
o
at what evel
f
ignificance,
y
his articular
body of observations?"t
can be put n thisform
nequivocally
only
becausethe genuine
xperimenterlready
has theanswers
to all thequestions hat he followers fNeyman
nd
Pearson
attempt,
think ainly, o answer
y merelymathematical
on-
sideration.
6.
CONDITIONAL
NFERENCE
While Fisher's
pproach
to testing
ncludedno
detailed
consideration
f
power,
the
Neyman-Pearson
pproach
failed
o pay
attention
o an important
oncern
raised by
Fisher.To discuss
his
ssue,we must
beginby considering
briefly
hedifferenteanings
hat
isher nd Neyman
ttach
to probability.
For
Neyman, he
dea of
probability
s fairly traightfor-
ward: t represents
n
idealization
f ong-run requency
n
a
long
equence
f
repetitions
nder onstant onditions
see,
for xample,Neyman
1952, p.
27; 1957, p.
9). Later Ney-
man 1977),
he pointed
ut that y
he aw of arge
numbers,
this dea permits
n extension:
f
sequence
of
ndependent
events
s observed,
ach
with
probability
of
success,
hen
the ong-runuccess
frequency
ill
be approximately
even
if he events
re not dentical.
his property
dds greatly
o
the appeal
and applicability
f
a
frequentist
robability.
n
particular,
t s theway
n which
Neyman
ame
to
interpret
the value
of a
significance
evel.
On the ther and, hemeaning fprobabilitys a problem
with
which isher
rappled
hroughout
is
ife.Not
surpris-
ingly,
is views oo
underwent ome
changes.
The concept
at
which
he
eventually
rrived s
much broader hanNey-
man's:
"In a
statement
f
probability,
he
predicand,
which
may be conceived
s
an
object,
s an event,
r
as
a
propo-
sition,
s asserted
o be
one of
a
set
of a
number,
owever
large, f
ike entities
f which
knownproportion,
,
have
some
relevant haracteristic,
ot
possessed
y
theremainder.
It is furthersserted
hatno subset
f
the entire et,having
a differentroportion,
an
be
recognized"
Fisher
1973, p.
113). It is this
ast requirement,
isher's
version
fthe
"re-
quirement ftotal vidence" Carnap 1962, ec.45), which
is particularly
mportanto
the
present
iscussion.
Example
1
(Cox
1958). Suppose
that
we are concerned
with heprobability
(X
<
x),
where
X
is normally
istrib-
uted as N(,u, 1)
or
N(,u,
4),
depending
n
whether
he
spin
of a
fair oin
results
n
heads
H)
or tails
T).
Here the
set
of ases
n which
he oin
falls
eads
s
a
recognizable
ubset;
therefore,
isher
would not
admit
he tatement
P(X
<
x)
= 4
x
-
A)
+
2(1)
22
2
as
legitimate.
nstead,
he would
have
required
(X
<
x)
to
be evaluated onditionallys
P(X<xIH)=4(x-
A)
or
P(X?<xlIT)=
4(Xj)A
(2)
2)
depending
n the
outcome
of the
spin.
On theother
and,Neyman
would
have taken
1)
to
pro-
vide
the
natural
ssessment
f
P(X
<
x).
Despite
this
pref-
erence,
here
s
nothing
n the
Neyman-Pearsonfrequentist)
approach
o prevent onsideration
f
the
conditional
rob-
abilities
2). The critical
ssue from
frequentist
iewpoint
is what o consider s therelevant eplicationsftheexper-
iment:
sequence
of observations
rom he same
normal
This content downloaded on Tue, 8 Jan 2013 10:38:47 AMAll use subject to JSTOR Terms and Conditions
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 6/9
1246
Journal
of the American
Statistical Association, December 1993
distributionr a sequence ofcoin tosses, ach followed
y
an observation rom he appropriate ormaldistribution.
Considernow theproblem
f esting
: ,u
0 against he
simple lternative
=
1 on the
basis of
a
sample
X1,
. . ..
X,
from hedistribution1).
The Neyman-Pearson
emma
would tellus to rejectH when
1 1
e-z(xi-l)I/
1 1
-_(X,-1)2/8
2
2
22
2
2 e-x2 +
-
e-x
8 , (3
whereK is determinedo that heprobabilityf 3) when u
=
0
is equal
to the
specified
evela.
On theother and, Fisherian pproachwould djust
he
test o whether he coin fallsH or
T
and would use the re-
jection
region
1 2
-z(xi1)2/
2 K1 e
-x2/2
when he coin fallsH (4)
and
_2:-(X._1)2/8 2K 1 -X2:/8
>
K2
e~xI
2V1 21/
when he
coin falls
T,
(5)
whereK1 and K2 are determinedo that henullprobability
of both
4)
and
(5) is equal to
a.
It is easily een that hese
two tests re not equivalent.Whichone shouldwe prefer?
Test 3) has the advantage
f beingmorepowerfuln the
sensethatwhen he full xperimentfspinning coin and
then
aking observationsn
X
is repeatedmany imes, nd
when u
=
1, this testwill reject he hypothesismore fre-
quently.
The
second est as the dvantage
hat ts onditionalevel
given he outcomeof the spin s
a
bothwhenthe outcome
is
H
and
when
t s
T.
[The conditional
evelofthe firstest
will
be
<a
for one of the two
outcomes and >a forthe
other.]
Which f
hese onsiderations
s
more mportant epends
on the
ircumstances.choingFisher,wemight aythatwe
prefer1)
in
an acceptance ampling
ituation here nterest
focusesnot on the ndividual ases but on the ong-run re-
quency
of
errors,
ut
that
we
would prefer he second test
in
a scientificituationwhere ong-run onsiderations re
irrelevantnd only hecircumstances
t hand i.e., H or T)
matter.As Fisher
put t 1973,
p. 101-102), referringo a
differentut imilar ituation:
It is then bvious t the ime
that
he
udgment
f
significance
as been decided not by
theevidence f
the
ample,
but
by
thethrow
f a coin. t is
not obvioushow theresearch orkers to be made to forget
this
ircumstance,nd t s certain hat
e
ought ot o forget
it,
f
he is
concerned o assess
the weight nly of objective
observational acts gainst he hypothesisn
question."
The present xample s of course rtificial,ut the
same
issue ariseswhenever here xists n ancillary tatisticsee,
for xample,Cox
and Hinkley
1974; Lehmann
1986),
and
it seems to
lie at the heartof thecases in whichthe two
theories
isagree
n
specific ests.
he
two most
prominent
of these ases are
discussed
n
the next ection.
7. TWO EXAMPLES
For many problems, pure Fisherian or
Neymann-
Pearsonian pproachwill ead to the sametest. uppose n
particularhat heobservations
follow
distributionrom
an
exponential amily
with
density
po,a(x)
=
C(0,
d)e'U(x)+?0=1aiTl(x)
(6)
and
consider esting he hypothesis
H:
0
=
00
(7)
against heone-sided lternatives
>
00.
Then
Fisherwould
condition n
T
=
(T1,
. .
,
Tk)
nd would
n
the
onditional
model
consider
t natural
o
calculate he
p
value
as the on-
ditional
robability
f
U
2 u,
where
is
the
observed
alue
of U.
At a
given
evel
a,
the resultwould be declared
ig-
nificantfU 2 C(t), whereC(t) is determined y
P[U> C(t)1T= t]
=
a.
(8)
A
Neyman-Pearson
iewpoint
would ead to the
same test
as being
uniformly
ost
powerful mong
all similar
ests.
But as we have seen
n
Example
1,
the
wo pproaches o
not lways
ead to the ame result.We next onsider he
wo
examples
hathave
engendered
he
most
ontroversy.
Example
2: The
2 X 2
table with
nefixedmargin.
Let
X,
Y
be two
independent
binomial variables withsuc-
cess
probabilities
i
and
P2
and
corresponding
o m and n
trials.
The
problem
of
testing
H:
P2
=
Pi
against
he al-
ternatives
2 > Pi
is oftheform ivenby 6) and (7) with
0
=
log[(p2/q2)/(p1/1q)],
T
=
X
+ Y
and
U
=
Y.
Ba-
sically,
here s therefore o conflict
etween he twoap-
proaches.However, ecause of the
discreteness
f
the con-
ditional distribution f U
given t, condition
8) typically
cannot
be
satisfied.
isher's
xacttest hen hooses
C(t)
to
be the
argest
onstant orwhich
P[U
>
C(t)
I
T
=
t]
?
a.
(9)
For small
values
of
t, this may lead to conditional
evels
substantially
essthan
a;
for
mall m and
n,
the
same
may
be
true
for heunconditionalevel.For this eason, isher's
exact est as beencriticizedsbeing oo conservative. any
alternatives avebeen
proposed
orwhich
he
unconditional
level
which
s a function
f
Pi
=
P2) is much closer o a.
Upton (1982)
lists
22;
forother
urveys,
ee
Yates
(1984)
and
Agresti1992).
The
issues are similar
o those encountered
n
Example
1.
f
conditioning
s considered
ppropriateand
in
thepres-
ent
case it
typicallys),
and
if
ontrol ftype error t evel
a
isconsidered
ssential,
hen he
only
ensible
est vailable
is
of the formU
>
C(t), where C(t) is the largest alue
satisfying
9). If,
on the other
hand,only
heunconditional
performances considered elevant, henwe mayallow
the
conditional evel f heregionU>
C(
t)
to exceed
z
or ome
values of insuch way hat heunconditionalevel which
is the expected alue of the conditional evel)gets loser
o
This content downloaded on Tue, 8 Jan 2013 10:38:47 AMAll use subject to JSTOR Terms and Conditions
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 7/9
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 8/9
1248
Journal
of the American Statistical Association, December
1993
stated I regard he frequence equirement
frepeated am-
pling' s including
onditionalnferences." commonbasis
for he
discussion
f various
onditioningoncepts,
uch as
ancillaries
nd
relevant
ubsets,
hus xists. he
proper
hoice
of framework
s
a
problemneeding
urther
tudy.
We
conclude
by considering
ome more detailed
ssues
and
by reviewing
xamples
2 and 3 from he
present oint
of view.
1. Both Neyman-Pearson nd
Fisherwouldgive t most
lukewarm upport o standard ignificance
evels uch s 5%
or
1%.
Fisher,
lthough riginally ecommending
heuse of
such
levels,
ater strongly ttacked any standard
hoice.
Neyman-Pearson,
n
their riginal ormulation
f 1933,rec-
ommended balance between
he two
kinds
of
error
i.e.,
between evel nd power).
For
a
disucssion fhowto achieve
such a
balance,
see,
for
xample,
anathanan
1974).
Both
level nd power
hould f ourse e considered onditionally
whenever onditioning s deemed appropriate.
Unfortu-
nately, his s not
possible t the planning tage.
2. A secondpoint nwhich here ppears o be no conflict
between
he
two
approaches
s "truth
n
advertising."
ven
if
particular ominal
evel a, say 5%, s the arget,
hen
t
cannot
be
achieved
because of
discreteness
he
test hould
not ust be described
s conservativer iberal elative o the
nominal evel; nstead, he actual conditional
r uncondi-
tional)
evel shouldbe stated.
f
this evel s not knownbe-
cause it depends
on unknown arameters,t least
ts
range
should
be
given nd,
f
possible,
lso an estimated alue.
3.
In
both the
2 X
2 example and the Behrens-Fisher
problems, he onflict etween he olutions
roposed y he
two chools
s often iscussed
s
that f desire or similar
testi.e.,oneforwhich heunconditionalevel s -a) versus
a
suitable onditional est.
The issue becomes
learer
f
one
asks
for
he
reason hat
Neyman-Pearsonroposed
he on-
dition f
similarity.
he
explanation egins
with hecase of
a
simplehypothesis
here hese uthors ake t
for
ranted
that
n
order o maximize he
power,
ne would want the
attained evel to
be
equal
to rather han
ess
than
a. For a
compositehypothesis
, they hereforetated hat he
evel
should
qual
a
for ach
of he
imple
hypotheses aking p
H. The
requirement
or
imilarity
hushas its
origin
n
the
desire
o maximizepower, he
ssue discussed
n
Section5.
In
the ight f 1) and 2), a
unified
heory
ess concerned
with standardnominal evels might ettisonnot only the
demandfor imilarityut also that f conservatismelative
to a nominal
evel.
When
similarity
annotbe achieved nd conservations
notrequired, arious ompromiseolutions
maybe available.
Thus
n
the
2 X
2
case ofExample2,
one
could,for xample,
select foreach t the conditional evel closestto a. If this
seems too
permissive,
hen
the rule could be
modified y
adding
a
cap
on the conditional
evel
beyond
which one
would not
go.
Tests with variable onditional evel that
will
ometimes e <a and sometimesa havebeendiscussed
by
Barnard
1989)
under
he name "flexible isher."Alter-
natively, ne might ive up on a nominal evelaltogether
and instead or ach t adjust he evel o the ttainablecon-
ditional)power.
The situations
muchmore omplicated or heBehrens-
Fisherproblem.On
the one hand, the arguments or on-
ditioning
n
S2
/S2 seems less
compelling;
n the other
hand,
ven
f
his onditioning equirement
s
accepted, he
conditional istribution epends
on unknown arameters,
and thus t
s less clearhow to control
he
conditional
evel.
Robinson's formulation, entioned
n
Section7, provides
an interestingossibilityutrequiresmuch furthernvesti-
gation.
But
such work
an
be carried ut from he
present
point ofviewbycombining onsiderations
f both condi-
tioning nd power.
To summarize, values, fixed-level
ignificance tate-
ments, onditioning,nd power onsiderations
an be com-
bined nto a unified pproach.
When
ong-term ower
nd
conditioningre
n
conflict,pecification
fthe
ppropriate
frame f reference
akespriority,ecause t determines he
meaning f the probabilitytatements.
fundamental ap
in the theory s the ack of clear principles
or electing he
appropriate ramework. dditional
work
n
this area
will
have to come to termswith hefact hat hedecision nany
particular
ituation
must e based not
only
n abstract
rin-
ciplesbut also
on contextual spects.
[Received
January 992.RevisedFebruary
993.]
REFERENCES
Agresti, . (1992), "A Survey f Exact nference orContingency ables"
(withdiscussion), tatistical cience, , 131-177.
Barnard, .
A.
(1989), "On AllegedGains
n
Power rom ower Values,"
Statistics
n
Medicine,, 1469-1477.
Barnett, . (1982), Comparative tatistical nference2nd ed.), New York:
JohnWiley.
Bartlett, . S. (1984), "Discussionof Tests ofSignificanceor
X
2 Con-
tingency ables,' byF. Yates." Journal ftheRoyalStatistical ociety,
Ser. A, 147, 453.
Bennett,.H. (1990), tatisticalnferencend
AnalysisSelected orre-
spondence f
R.
A. Fisher),Oxford, .K.: Clarendon ress.
Braithwaite, . B. (1953), Scientific xplanation, ambridge, .K.: Cam-
bridgeUniversityress.
Brown, . (1967), "The Conditional evelof Student's Test," Annalsof
Mathematicaltatistics,8, 1068-1071.
Carlson,R. (1976),
"The
Logicof TestsofSignificance"
with iscussion),
Philosophyf cience, 3,
116-128.
Carnap,
R.
(1962), Logical
Foundations
fProbability2nd ed.), Chicago:
theUniversityfChicagoPress.
Cowles,M.,
and
Davis,
C.
(1982),
"On the
Origins
f the 05 Level of Sta-
tistical
ignificance,"
merican
sychologist,7, 553-558.
Cox,D. R. (1958), "Some
Problems
onnected
With tatistical
nference,"
Annals fMathematicaltatistics,9,357-372.
Cox,
D.
R.,
and
Hinkley,
D.
V.
(1974),
Theoretical
tatistics,
ondon:
Chapmanand Hall.
Fisher,R. A. (1925) (10th ed., 1946), Statistical
Methods or Research
Workers, dinburgh: liver& Boyd.
(1932), "Inverse robabilitynd theUse ofLikelihood," roceedings
of he ambridgehilosophicalociety,8,257-261.
(1935a), "The Logic of nductive nference," ournal f heRoyal
Statistical
ociety,8,
39-54.
(1935b),
"Statistical
ests,"
Nature,
136,
474.
(1935c) (4th ed., 1947), TheDesign of
Experiments, dinburgh:
Oliver&
Boyd.
(1939), "Student,"
Annals
ofEngenics, ,
1-9.
(1947), The Design ofExperiments4th ed.), New York: Hafner
Press.
(1955),
"Statistical
Methods nd Scientificnduction,"Journal f
theRoyal
tatistical
ociety,
er.
B, 17, 9-78.
(1956),
"On a
Test ofSignificance
n
Pearson's
Biometrika ables
(No. 11),
Journal
f
he
Royal tatisticalociety,
er.
B, 18,
6-60.
(1958), "The NatureofProbability," entennial eview, , 261-
274.
This content downloaded on Tue, 8 Jan 2013 10:38:47 AMAll use subject to JSTOR Terms and Conditions
8/10/2019 The Fisher, Neyman-Pearson Theories of Testing Hypotheses
http://slidepdf.com/reader/full/the-fisher-neyman-pearson-theories-of-testing-hypotheses 9/9
Lehmann:
Theories
of
TestingiHypotheses
1249
(1959), "Mathematical
robability
n theNatural ciences,"
Tech-
nometrics, ,21-29.
(1960), "Scientific
houghtnd the
RefinementfHuman Reason,"
Journal f he
Operations esearch ociety
fJapan,3, 1 10.
(1973), Statistical ethods
nd
Scientific
nference,3rd d.) London:
Collins Macmillan.
Gigerenzer,
.,et l. 1989), The
Empire fChance,New York:
Cambridge
University ress.
Hacking, . (1965), Logic of Statistical
nference, ew
York: Cambridge
Universityress.
Hall,P.,and Selinger,
. 1986), "Statistical
ignificance:alancing
vidence
Against oubt,"
Australian ournal f
tatistics, 8,
354-370.
Hedges,
L., and Olkin,
.
(1985),
Statistical
Methods or Meta-Analysis,
Orlando,
FL:
Academic
Press.
Hockberg,., and
Tamhane,A. C. (1987),
Multiple omparisonrocedures,
NewYork:
JohnWiley.
Kendall,M. G. (1963),
"Ronald Aylmer
isher, 890-1962,"Biometrika,
50,
1-15.
Kyburg,
. E., Jr. 1974),
The Logical
Foundations
f
tatistical
nference,
Boston:
D.
Reidel.
Linhart,
H., and Zucchini,
W.
(1986),
Model Selection,
New York: John
Wiley.
Linssen,
H. N. (1991),
"A Table for olving
heBehrens-Fisherroblem,"
Statistics
nd
Probability etters, 1,
359-363.
Miller,
R. G.
(1981),
Simultaneous tatistical nference,2nd
ed.),
New
York: Springer-Verlag.
Morrison,
.
E.,and Henkel,
. E.
(1970),
The
ignificance
estControversy,
Chicago:
Aldine.
Neyman,
J. 1935), "Discussion
of Fisher 1935a)."
Journal
f
theRoyal
Statistical
ociety, 8,
74-75.
(1938), "L'Estimation
Statistique raitee
Comme un Probleme
Classique
de
Probabilite," ctualites
cientifiques
t
Industrielles,39,
25-57.
(1952),
Lectures
nd
ConJerences
n Mathematical tatistics
nd
Probability2nd
ed.),Graduate chool,
Washington,
.C.: U.S. Dept.
of
Agriculture.
(1955),
"The Problem
f nductive
nference,"
ommunications
n
Pure
and Applied
Mathematics, , 13-46.
(1956),
"Note on an
Article
y
Sir RonaldFisher,"Journal
f
he
Royal Statistical ociety,
er.B, 18,288-294.
(1957),
"
'Inductive ehavior'
s
a Basic
Concept
of
Philosophy
f
Science,"
Review f he nternational
tatisticalnstitute,5,
7-22.
(1961), "SilverJubilee fMy DisputeWithFisher,"Journal f he
Operations
esearch
ociety
fJapan,3,
145-154.
(1966), "Behavioristic
oints f
View on Mathematical tatistics,"
in
On PoliticalEconomy
nd
Econometrics:
ssays
in
Honour
of Oscar
Lange,
Warsaw:PolishScientific ublishers, p.
445-462.
(1976),
"Tests of
Statistical
ypotheses
nd Their Use
in
Studies
of Natural
Phenomena,"
Communications
n
Statistics,
art
A-Theory
and
Methods, ,
737-75
1.
(1977), "Frequentistrobability
nd
Frequentisttatistics,"
ynthese,
36,
97-131.
Neyman,J.,
nd
Pearson,
. S.
(1928),
"On the Use and Interpretation
f
CertainTest Criteria or
Purposes f
Statistical
nference," iometrika,
20A, 175-240, 263-294.
(1933a), "On the
Problem f the MostEfficientestsof Statistical
Hypotheses,"hilosophicalransactionsf he oyal ocietyf ondon,
Ser.
A, 231,
289-337.
(1933b), "TheTesting f
Statistical ypotheses
n Relation o
Prob-
abilitiesA
Priori," roceedingsf
he
Cambridge
hilosophicalociety,
29, 492-510.
Oakes,
M.
(1986),
Statistical
nference:
Comment
or
he ocial
nd
Be-
havioralciences,
ewYork: JohnWiley.
Pearson,
.
S. (1955), "Statistical
oncepts
n
Their
Relation o
Reality,"
Journalf
he
Royal tatisticalociety,
er.
B, 17,204-207.
(1962), "SomeThoughts n Statistical
nference,"
nnals
f
Math-
ematical
tatistics,3,
394-403.
(1974), "Memories f
the
mpact
of Fisher'sWork
n
the
1920's,"
Internationaltatistical
eview,2,
5-8.
Pearson, .
S.,
and
Hartley,
.
0.
(1954),
Biometrikaables
or
tatisticians
(TableNo. 11),New York:
Cambridge niversity
ress.
Pedersen, .G. (1978), "Fiducial
nference,"
nternationaltatistical
eview,
46,
147-170.
Robinson,G.
K.
(1976),
"Properties
f Student's and of the Behrens-
Fisher
olution o theTwo-MeansProblem,"
TheAnnals
f tatistics,,
963-971.
(1982), "Behrens-Fisherroblem,"
n
Encyclopedia
f Statistical
Sciences
Vol. 1,
ds. S. Kotz and
N. L.
Johnson),
ew
York:John
Wiley,
pp. 205-209.
Savage,L. J. 1976), "On Rereading . A. Fisher" with iscussion),
nnals
of tatistics,
, 441-500.
Schweder,
.
(1988), "A
Significance ersion f heBasic Neyman-Pearson
Theory or cientific
ypothesis esting," candanavianournalf ta-
tistics,5, 225-242.
Seidenfeld,
.
(1979),Philosophicalroblemsf tatistical
nference,oston:
D.
Reidel.
Spielman, . (1974),
"The
Logic of Tests of
Significance,"hilosophyf
Science,1,
211-226.
(1978),
"Statistical
ogma
and the
Logic of
Significance esting,"
Philosophyf cience, 5,120-135.
Steger, .A. (ed.) (1971),
Readings
n
tatistics
or
he
ehavioralcientist,
New
York: Holt, Rinehart nd Winston.
Stuart,
.,
and
Ord,
J.
K.
(1991),
Kendall's dvanced
heoryf tatistics,
Vol. I
(5th ed.), New York:OxfordUniversity
ress.
Tukey,J.
W.
(1960), "Conclusions s.
Decisions,"Technometrics,, 424-
432.
Upton,G. J.G.
(1982),
"A
Comparison
f Alternativeests for he2
X
2
Comparative rial,"
Journal
f
he
oyal
tatistical
ociety,
er.
A, 145,
86- 105.
Wallace,
D.
L.
(1980), "The Behrens-Fishernd
Fieller-Creasy roblems,"
in
R.
A.
Fisher: n
Application,
ds. S. E.
Fienberg nd
D. V.
Hinkley,
New
York:Springer-Verlag,
p.
119-147.
Yates,
F.
(1984), "TestsofSignificanceor
X
2
Contingency
ables"
with
discussion),
Journal
f
he
Royal
tatistical
ociety,
er.
A, 147,
426-
463.
Zabell, S.
L.
(1992), "R. A. Fisher nd the
FiducialArgument,"tatistical
Science,,
369-387.
This content downloaded on Tue 8 Jan 2013 10:38:47 AM