Hypothesis Testing Neyman

30
ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 12-February-2009 Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 1 / 30

description

Hypothesis Testing Neyman

Transcript of Hypothesis Testing Neyman

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    ECE531 Lecture 4a: Neyman-Pearson Hypothesis

    Testing

    D. Richard Brown III

    Worcester Polytechnic Institute

    12-February-2009

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 1 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Hypothesis Testing: What We Know

    Bayesian decision rules: minimize the expected/weighted risk for aparticular prior distribution .

    Minimax decision rules: minimize the worst-case risk exposure over allpossible prior distributions.

    Example: To approve a new flu test, the FDA requires the test tohave a false positive rate of no worse than 10%. Should we use the Bayes criterion? Should we use the minimax criterion? How do we assign a risk structure to this sort of problem?

    In many hypothesis testing problems, there is a fundamentalasymmetry between the consequences of false positive (decide H1when the true state is x0) and miss / false negative (decide H0when the true state is x1).

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 2 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The Neyman-Pearson Criterion and Terminology

    For now, we will focus on simple binary hypothesis testing under the UCA.

    R0() = Prob(decide H1|state is x0) = Pfp= probability of false positive or probability of false alarm.

    and

    R1() = Prob(decide H0|state is x1) = Pfn= probability of false negative or missed detection.

    Definition

    The Neyman-Pearson criterion decision rule is given as

    NP = arg min

    Pfn()

    subject to Pfp()

    where [0, 1] is called the significance level of the test.Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 3 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The N-P Criterion: 3 Coin Flips (q0 = 0.5, q1 = 0.8, = 0.1)

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    R0

    R1

    D0D8

    D12

    D14

    D15

    minimax CRV

    NP CRV R0

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Neyman-Pearson Hypothesis Testing Example

    Coin flipping problem with a probability of heads of either q0 = 0.5 orq1 = 0.8. We observe three flips of the coin and count the number ofheads. We can form our conditional probability matrix

    P =

    0.125 0.0080.375 0.0960.375 0.3840.125 0.512

    where Pj = Prob(observe heads|state is xj).

    Suppose we need a test with a significance level of = 0.125.

    What is the N-P decision rule in this case?

    What is the probability of correct detection if we use this N-Pdecision rule?

    What happens if we relax the significance level to = 0.5?

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 5 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Intuition: The Hiker

    You are going on a hike and you have a budget of $5 to buy food for thehike. The general store has the following food items for sale:

    One box of crackers: $1 and 60 calories

    One candy bar: $2 and 200 calories

    One bag of potato chips: $2 and 160 calories

    One bag of nuts: $3 and 270 calories

    You would like to purchase the maximum calories subject to your $5budget. What should you buy?

    What if there were two candy bars available?

    The idea here is to rank the items by decreasing value (calories perdollar) and then purchase items with the most value until all themoney is spent.

    The final purchase may only need to be a fraction of an item.

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 6 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    N-P Hypothesis Testing With Discrete Observations

    Basic idea: Sort the likelihood ratio L =

    P,1P,0

    by observation index in descending

    order. The order of Ls with the same value doesnt matter. Now pick v to be the smallest value such that

    Pfp =

    :L>v

    P,0

    This defines a deterministic decision rule

    v(y) =

    {1 L > v

    0 otherwise

    If we can find a value of v such that Pfp =

    :L>vP,0 = then we

    are done. The probability of detection is then

    PD =

    :L>v

    P,1 = .

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 7 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    N-P Hypothesis Testing With Discrete Observations

    If we cannot find a value of v such that Pfp =

    :L>vP,0 = then

    it must be the case that, for any > 0,

    Pfp(v) =

    :L>v

    P,0 < and Pfp(v) =

    :L>v

    P,0 >

    Pfp

    v

    Pfp(v)

    Pfp(v)

    In this case, we must randomize between v and v.Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 8 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    N-P Randomization

    We form the usual convex combination between v and v as

    = (1 )v + v

    for [0, 1]. The false positive probability is then

    Pfp = (1 )Pfp(v) + Pfp(v)

    Setting this equal to and solving for yields

    = Pfp(v)

    Pfp(v) Pfp(v)

    =:L>v P,0

    :L=vP,0

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 9 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    N-P Decision Rule With Discrete Observations

    The Neyman-Pearson decision rule for simple binary hypothesis testingwith discrete observations is then:

    NP(y) =

    1 if L(y) > v

    if L(y) = v

    0 if L(y) < v

    where

    L(y) :=Prob(observe y | state is x1)Prob(observe y | state is x0) =

    P,1

    P,0

    and v 0 is the minimum value such that

    Pfp =

    :L>v

    P,0 .

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 10 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: 10 Coin Flips

    Coin flipping problem with a probability of heads of either q0 = 0.5 orq1 = 0.8. We observe ten flips of the coin and count the number of heads.

    P =

    0.0010 0.00000.0098 0.00000.0439 0.00010.1172 0.00080.2051 0.00550.2461 0.02640.2051 0.08810.1172 0.20130.0439 0.30200.0098 0.26840.0010 0.1074

    and L =

    0.00010.00040.00170.00670.02680.10740.42951.71806.871927.4878109.9512

    What is v, NP(y), and when = 0.001, = 0.01, = 0.1?

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 11 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Randomized vs. Deterministic Decision Rules

    104 103 102 101 1000

    20

    40

    60

    80

    100

    120

    v

    104 103 102 101 1000

    0.2

    0.4

    0.6

    0.8

    1

    randomized decision rulesdeterministic decision rules

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 12 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Same Results Except Linear Scale

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    randomized decision rulesdeterministic decision rules

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 13 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Remarks: 1 of 3

    The blue line on the previous slide is called the Receiver OperatingCharacteristic (ROC). An ROC plot shows the probability of detectionPD = 1R1 as a function of = R0. The ROC plot is directly related toour conditional risk vector plot.

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1conditional risk vecors

    R0 = Pr(false positive)

    R1

    = Pr

    (fals

    e n

    ega

    tive

    )

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1ROC curve

    R0 = Pr(false positive)

    1R

    1 =

    Pr

    (tru

    e po

    sitiv

    e)

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 14 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Remarks: 2 of 3

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1ROC curve

    R0 = Pr(false positive)

    1R

    1 =

    Pr

    (dete

    ctio

    n)

    The N-P criterion seeks adecision rule that maximizesthe probability of detection

    subject to the constraint thatthe probability of false alarmmust be no greater than .

    NP = arg max

    PD()

    s.t. Pfp()

    The term power is often used instead of probability of detection.The N-P decision rule is sometimes called the most powerful test ofsignificance level .

    Intuitively, we can expect that the power of a test will increase withthe significance level of the test.

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 15 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Remarks: 3 of 3

    Like Bayes and minimax, the N-P decision rule for simple binaryhypothesis testing problems is just a likelihood ratio comparison(possibly with randomization).

    Can same intuition that you developed for the discrete observationcase be applied in the continuous observation case?

    Form L(y) = p1(y)p0(y)

    . Find the smallest v such that the decision rule

    NP(y) =

    1 if L(y) > v

    if L(y) = v

    0 if L(y) < v

    has Pfp . The answer is yes, but we need to formalize this claim by

    understanding the fundamental Neyman-Pearson lemma...

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 16 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The Neyman-Pearson Lemma: Part 1 of 3: Optimality

    Recall pj(y) for j {0, 1} and y Y is the conditional pmf or pdf of theobservation y given that the state is xj .

    Lemma

    Let be any decision rule satisfying Pfp() and let be any decisionrule of the form

    (y) =

    1 if p1(y) > vp0(y)

    (y) if p1(y) = vp0(y)

    0 if p1(y) < vp0(y)

    where v 0 and 0 (y) 1 are such that Pfp() = . ThenPD(

    ) PD().

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 17 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The Neyman-Pearson Lemma: Part 1 of 3: Optimality

    Proof.

    By the definitions of and , we always have [(y) (y)][p1(y) vp0(y)] 0. HenceZY

    [(y) (y)][p1(y) vp0(y)] dy 0

    Rearranging terms, we can write

    ZY

    (y)p1(y) dy

    ZY

    (y)p1(y) dy v

    ZY

    (y)p0(y) dy

    ZY

    (y)p0(y) dy

    PD() PD() v

    Pfp(

    ) Pfp()

    PD() PD() v [ Pfp()]

    But v 0 and Pfp() implies that the RHS is non-negative. Hence

    PD() PD().

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 18 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The Neyman-Pearson Lemma: Part 2 of 3: Existence

    Lemma

    For every [0, 1] there exists a decision rule NP of the form

    NP(y) =

    1 if p1(y) > vp0(y)

    (y) if p1(y) = vp0(y)

    0 if p1(y) < vp0(y)

    where v 0 and (y) = [0, 1] (a constant) such that Pfp(NP) = .

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 19 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    The Neyman-Pearson Lemma: Part 2 of 3: Existence

    Proof by construction.

    Let 0, Y = {y Y : p1(y) > p0(y)} and Z = {y Y : p1(y) = p0(y)}. For2 1, Y2 Y1 and

    RY2

    p0(y) dy RY1

    p0(y) dy.

    Let v be the smallest value of such thatZYv

    p0(y) dy

    Choose

    =

    8 0, NP(y) and (y) must share the same value of .

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 22 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK

    Suppose a transmitter sends one of two scalar signals a0 or a1 and thesignals arrive at a receiver corrupted by zero-mean additive white Gaussiannoise (AWGN) with variance 2.We want to use N-P hypothesis maximize

    PD = Prob(decide H1 | a1 was sent)subject to the constraint

    Pfp = Prob(decide H1 | a0 was sent) .Signal model conditioned on state xj:

    Y = aj +

    where aj is the scalar signal and N (0, 2). Hence

    pj(y) =12

    exp

    ((y aj)222

    )

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 23 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK

    How should we approach this problem? We know from the N-P Lemmathat the optimum decision rule will be of the form

    NP(y) =

    1 if p1(y) > vp0(y)

    (y) if p1(y) = vp0(y)

    0 if p1(y) < vp0(y)

    where v 0 and 0 (y) 1 are such that Pfp(NP) = . How shouldwe choose our threshold v?We need to find the smallest v such that

    Yvp0(y) dy

    where Yv = {y Y : p1(y) > vp0(y)}.

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 24 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Likelihood Ratio for a0 = 0, a1 = 1, 2

    = 1

    0 1 2 30

    2

    4

    6

    8

    10

    12

    14

    observation y

    likel

    iho

    od

    ratio

    L(y

    ) = p 1

    (y)/p 0

    (y)

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 25 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK

    Note that, since a1 > a0, the likelihood ratio L(y) =p1(y)p0(y)

    ismonotonically increasing. This means that finding v is equivalent tofinding a threshold so that

    p0(y) dy Q( a0

    )

    p0(y) p1(y)

    a0 a1

    How are and v related? Once we find , we can determine v bycomputing

    v = L() =p1()

    p0().

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 26 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK: Finding

    Unfortunately, no closed form solution exists to exactly solve the inverse

    of a Q function. We can use the fact that Q(x) = 12erfc(

    x2

    )to write

    Q

    ( a0

    )=

    1

    2erfc

    ( a0

    2

    ) .

    This can be rewritten as

    2erfc1(2) + a0

    and we can use Matlabs handy erfcinv function to compute the lowerbound on . It turns out that we are going to always be able to find avalue of such that

    Q

    ( a0

    )=

    so we wont have to worry about randomization here.Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 27 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK

    1 0.5 0 0.5 1 1.5 2 2.5 3103

    102

    101

    100

    prob

    abilit

    y

    PfpPD

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 28 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Example: Coherent Detection of BPSK

    103 102 101 1004

    2

    0

    2

    4

    6

    8

    10

    12

    14

    v

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 29 / 30

  • ECE531 Lecture 4a: Neyman-Pearson Hypothesis Testing

    Final Comments on Neyman-Pearson Hypothesis Testing

    1. N-P decision rules are useful in asymmetric risk scenarios or inscenarios where one has to guarantee a certain probability of falsedetection.

    2. N-P decision rules are simply likelihood ratio comparisons, just likeBayes and minimax. The comparison threshold in this case is chosento satisfy the significance level constraint.

    3. Like minimax, randomization is often necessary for N-P decision rules.Without randomization, the power of the test may not be maximizedfor the significance level constraint.

    4. The original N-P paper: On the Problem of the Most Efficient Testsof Statistical Hypotheses, J. Neyman and E.S. Pearson,Philosophical Transactions of the Royal Society of London, Series A,Containing Papers of a Mathematical or Physical Character, Vol. 231(1933), pp. 289-337. Available on jstor.org.

    Worcester Polytechnic Institute D. Richard Brown III 12-February-2009 30 / 30