Harvard Government 2000 Lecture 3

download Harvard Government 2000 Lecture 3

of 32

Transcript of Harvard Government 2000 Lecture 3

  • 8/14/2019 Harvard Government 2000 Lecture 3

    1/32

    Point Estimation

    Interval Estimation

    Testing

    Gov2000: Quantitative Methodology forPolitical Science I

    Lecture 3: Univariate Statistical Inference

    October 1, 2007

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Outline

    1 Point EstimationSampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    2 Interval EstimationSampling Distributions for Interval EstimatorsSmall Sample PropertiesLarge Sample Properties

    3 TestingSome Statistical Decision TheorySampling Distributions for Test Statisticsp-Values, Rejection Regions, and CIs

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    2/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Point Estimation

    Suppose we are primarily interested in specific characteristics of the population

    distribution.

    A parameter is a characteristic of the population distribution (e.g. the mean), and isoften denoted with a greek letter. (e.g. )

    A statistic is a function of the sample.

    Often we use a statistic to estimate (or guess) the value of a parameter, and we willdenote this with a hat (e.g. ). Such estimation is known as point estimation.

    Point Estimators, written as or maybe X, are random quantities.

    Point Estimates are realized values of an estimator, and hence they are not random(e.g. x).

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Consider income data from the 1996 ANES

    Histogram of income

    income

    Density

    0 5 10 15 20

    0.0

    0

    0.0

    2

    0.0

    4

    0.0

    6

    0.0

    8

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    3/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Histogram of income

    income

    Density

    0 5 10 15 20

    0.00

    0.0

    2

    0.0

    4

    0.0

    6

    0.0

    8

    Population Density

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    The Balance Point for the Density

    We may not have enough data to get a good estimate of the density (infinite datahistogram), but we may have enough data to estimate one characteristic (parameter) of

    the density. Often we choose the balance point as our parameter of interest.

    Also Known As:

    expected value

    population mean

    true mean

    true average

    infinite data average

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    4/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Histogram of income

    income

    Density

    0 5 10 15 20

    0.00

    0.0

    2

    0.0

    4

    0.0

    6

    0.0

    8

    Density Balance Point

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Why the balance point?

    It is a reasonable measure for the center of the density.

    We have some intuition about balance points.

    The balance point tells us a lot about the normal density.

    Many intuitive estimators for the density balance point have properties that areeasy to describe.

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    5/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Estimators for the Density Balance Point

    Some possibilities forb:1 Y1, the first data observation

    2 12

    (Y1 + Yn), the average of the first and the last observations

    3 the number 7

    4 Yn =1n

    (Y1 + + Yn), the sample average

    Clearly, some of these estimators are better than others (which ones?), but how can we

    define better?

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Sampling Distributions of Point Estimators

    In order to assess the properties of an estimator, we assume it has a distribution underrepeated sampling, and we call this distribution a sampling distribution.

    Illustrative Example:

    X = the number of times a respondent voted in the last two presidential elections.

    We will assume three possible values {0,1,2}

    Assume P(x) =

    8

  • 8/14/2019 Harvard Government 2000 Lecture 3

    6/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    ANES Example

    If we think of the data as randomly sampled from a density, then Y1, . . . , Yn areindependent and identically distributed (i.i.d.) random variables with,

    E[Yi] =

    V[Yi] = 2

    Thenb, which is a function of Y1, . . . , Yn, will be a random variable with its ownexpectation and variance.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    How to draw a sampling distribution for

    b:

    1 sample an infinite number of data sets of size n

    2 calculateb for each data set3 form an infinite data histogram forb, where the data are thebs from each

    data set

    The next slide shows an approximation of this procedure for the four proposedestimators. I simulated 10,000 data sets of size n from the density shown at thebeginning of the lecture notes.

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    7/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    muHat1

    Density

    10 0 10 20 30 40

    0.0

    2

    0.0

    2

    0.0

    6

    muHat2

    Density

    0 10 20 30

    0.0

    2

    0.0

    2

    0.0

    6

    0.1

    0

    q

    5 10 15 20

    0.2

    0.2

    0.6

    1.0

    muHat3

    Mass

    muHat4

    Density

    12 14 16 18 20 22

    0.1

    0.1

    0.3

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Bias

    Bias is the expected difference between the estimator and the parameter. Bias is notthe difference between an estimate and the parameter.

    Bias() = Eh i= E

    hi

    For example, the sample mean is an unbiased estimator for .

    Bias(Xn) = Eh

    Xn E[X]i

    = E[ ]= 0

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    8/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Example

    1 E[Y1] =

    2 E[ 12

    (Y1 + Yn)] =12

    ( + ) =

    3 E[7] = 7

    4 E[Yn] =1n

    n =

    Estimators 1,2, and 4 all get the right answer on average. Which is better?

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    muHat1

    Density

    10 0 10 20 30 40

    0.02

    0.0

    2

    0.0

    6

    muHat2

    Density

    0 10 20 30

    0.02

    0.0

    2

    0.0

    6

    0.1

    0

    q

    5 10 15 20

    0.2

    0.2

    0.6

    1.0

    muHat3

    Mass

    muHat4

    Density

    12 14 16 18 20 22

    0.1

    0.1

    0.3

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    9/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Election Example

    Let be the proportion of voters who will vote for the Republican candidate in the 2008general election. Lets examine two estimators.

    1 = Y1 =

    1 vote rep0 otherwise

    2 = class guess

    Which is unbiased?

    Which do you prefer?

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Variance

    All else equal, we prefer estimators with small variance. In particular, if two estimatorsare unbiased, we prefer the estimator with the smaller variance.

    Low variance means that under repeated sampling, the estimates are likely to besimilar.

    Note that this doesnt necessarily mean that a particular estimate is close to the trueparameter value.

    Note also that the standard deviation from a sampling distribution is often called thestandard error.

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    10/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Variance

    1 V[Y1] = 2

    2 V[ 12

    (Y1 + Yn)] =14

    V[Y1 + Yn] =14

    (2 + 2) = 12

    2

    3 V[7] = 0

    4 V[Yn] =1

    n2n2 = 1

    n2

    Among the unbiased estimators, the sample average has the smallest variance. Thismeans that Estimator 4 (the sample average) is likely to be closer to the true value ,than Estimators 1 and 2.

    In order to fully understand this, it is helpful to again look at the sampling distributions.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    muHat1

    Density

    10 0 10 20 30 40

    0.02

    0.0

    2

    0.0

    6

    muHat2

    Density

    0 10 20 30

    0.02

    0.0

    2

    0.0

    6

    0.1

    0

    q

    5 10 15 20

    0.2

    0.2

    0.6

    1.0

    muHat3

    Mass

    muHat4

    Density

    12 14 16 18 20 22

    0.1

    0.1

    0.3

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    11/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Properties and comparisons of the estimators

    Recall the definitions of the estimators:

    1 Y1, the first data observation

    2 12

    (Y1 + Yn), the average of the first and the last observations

    3 the number 7

    4 Yn =1n

    (Y1 + + Yn), the sample average

    From the pictures on the previous slide:

    Estimators 1,2, and 4 are unbiased

    Estimator 3 has no varianceEstimator 4 has the lowest variance among the unbiased estimators

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Least Squares Estimation

    Choose ato minimize the sum of the squared errors.

    nXi=1

    (xi a)2 =nX

    i=1

    {(xi x) + (x a)}2

    =nX

    i=1

    n(xi x)2 + 2(x a)(xi x) + (x a)2

    o

    =nX

    i=1

    (xi x)2 + 2(x a)nX

    i=1

    (xi x) +nX

    i=1

    (x a)2

    =n

    Xi=1(xi x)2 + n(x a)2

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    12/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Best Linear Unbiased Estimator for

    Let X1, ..., Xn be i.i.d?(, 2

    ),Pni=1 wiXi is a linear estimator for .

    Show that X is the best linear unbiased estimator for (i.e. smallest variance unbiasedestimator).

    1 Use E[Pn

    i=1 wiXi] = to derive something aboutPn

    i=1 wi.

    2 Simplify V[

    Pni=1 wiXi].

    3

    Write each wi in this simplified expression as1

    n + ci.4 ...

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Mean Square Error

    MSE is the expected squared difference between the estimator and the parameter.

    MSE is not the squared difference between an estimate and the parameter.

    Furthermore, MSE can be written as the Bias squared plus the Variance.

    MSE() = E[( )2]= Bias()2 + V()

    For example, consider the sample mean.

    MSE(Xn) = E[(Xn )2]

    = Bias(Xn)2

    + V(Xn)= 0 + V(Xn)

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    13/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Example

    Assume an i.i.d. sample and recall the two possible definitions of sample variance:

    S20n =1

    n

    nXi=1

    (Xi Xn)2

    S21n =1

    n 1nX

    i=1

    (Xi Xn)2

    Which has less bias?

    Which has smaller variance?

    Which has smaller MSE?

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    Asymptotic Unbiasedness

    E[bn]

    0 1 2 3 4

    0.

    0

    0.1

    0.

    2

    0.3

    0

    .4

    n = 1

    ^

    0 1 2 3 4

    0.1

    0.2

    0.

    3

    0

    .4

    n = 10

    ^

    0 1 2 3 4

    0.

    05

    0.

    10

    0.

    15

    0.

    20

    0.

    25

    0.

    30

    0.

    35

    0.

    40

    n = 100

    ^

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    14/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Consistency

    An estimatorb is consistent if it converges in probability to the estimand (parameter ofinterest).

    bn p

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    The Weak Law of Large Numbers Revisited

    If X1, X2, . . . , Xn, . . . are i.i.d. with < E[X1] = < , then Xnp

    0 1 2 3 4

    0.05

    0.1

    0

    0.1

    5

    0.2

    0

    0.2

    5

    0.3

    0

    0.3

    5

    0.4

    0

    n = 1

    Xn

    n

    0 1 2 3 4

    0

    .0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    n = 10

    Xn

    n

    0 1 2 3 4

    0

    1

    2

    3

    4

    n = 100

    Xn

    n

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    15/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Point Estimators

    Small Sample Properties

    Large Sample Properties

    Asymptotic Sampling Distribution

    An estimatorbn with possibly unknown sampling distribution, has asymptotic samplingdistribution F if

    1 bn has a sampling distribution described by cdf Fn, and2 Fn d F as n

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Point EstimatorsSmall Sample Properties

    Large Sample Properties

    The Classical Central Limit Theorem

    If X1, X2, . . . , Xn, . . . are i.i.d. with E[X1] = and V[X1] = 2 and E|X|2 < , then

    n(Xn ) d N(0, 2)

    n=1

    muHat4

    Density

    0 5 10 15 2 0 2 5

    0.0

    0

    0.0

    4

    0.0

    8

    n=2

    muHat4

    Density

    0 5 10 15 20 25

    0.0

    0

    0.0

    4

    0.0

    8

    n=10

    muHat4

    Density

    10 15 20

    0.0

    0

    0.1

    0

    0.2

    0

    n=30

    muHat4

    Density

    12 14 16 18 20

    0.0

    0

    0.1

    0

    0.2

    0

    0.3

    0

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    16/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    What is Interval Estimation?

    Point estimates attempt to predict a scalar parameter with single number.

    We might want more information about the uncertainty in our estimate.

    We may want a bound for an estimate instead of trying to predict the parameterwith a single number.

    Interval estimation accomplishes both of these goals. For a scalar parameter , aninterval estimator takes the following form:

    [lower, upper]

    where the lower and upper bounds are random quantities.

    An interval estimate is a realized value from an interval estimator. For example:

    [x 1.96 sn

    , x + 1.96 sn

    ]

    where the lower and upper bounds are fixed quantities.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Example: Party ID

    QUESTION:

    ---------

    Generally speaking, do you usually think of yourself as a

    REPUBLICAN, a DEMOCRAT, an INDEPENDENT, or what?

    Would you call yourself a STRONG [Democrat/Republican] ora NOT VERY STRONG [Democrat/Republican]?

    Do you think of yourself as CLOSER to the Republican

    Party or to the Democratic party?

    VALID CODES:

    ------------

    0. Strong Democrat (2/1/.)

    1. Weak Democrat (2/5-8-9/.)

    2. Independent-Democrat (3-4-5/./5)

    3. Independent-Independent

    (3/./3-8-9 ; 5/./3-8-9 if not apolitical)4. Independent-Republican (3-4-5/./1)

    5. Weak Republican (1/5-8-9/.)

    6. Strong Republican (1/1/.)

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    17/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    Sampling Distribution for PID Interval Estimator

    Let X be a discrete random variable describing PID with the following distribution.

    x 0 1 2 3 4 5 6f(x) .16 .15 .17 .10 .12 .14 .16

    Consider the following procedure.

    1 Take a random sample of size n.

    2 Construct an interval estimate for (E[X]) with the form [x s, x + s]3 Repeat

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Sampling Distribution for PID Interval Estimator

    0 1 2 3 4 5 6

    2

    4

    6

    8

    10

    Interval Estimates

    sample

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    18/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    Example: Feeling Thermometer Scores

    ===========================================================================

    B1. INTRO THERMOMETERS PRE

    ===========================================================================

    Please look at page 2 of the booklet.

    Id like to get your feelings toward some of our political

    leaders and other people who are in the news these days. Ill

    read the name of a person and Id like you to rate that

    person using something we call the feeling thermometer.

    Ratings between 50 degrees and 100 degrees mean

    that you feel favorable and warm toward the person.

    Ratings between 0 degrees and 50 degrees mean that you

    dont feel favorable toward the person and that you

    dont care too much for that person. You wouldrate the person at the 50 degree mark if you dont feel

    particularly warm or cold toward the person.

    If we come to a person whose name you dont recognize, you

    dont need to rate that person. Just tell me and well move on

    to the next one.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Clinton and Edwards FTS

    Histogram of hcFTS

    hcFTS

    Fr

    equency

    0 20 40 60 80 100

    0

    40

    80

    Histogram of jeFTS

    jeFTS

    Frequency

    0 20 40 60 80 100

    0

    40

    80

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    19/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    Sampling Distribution for FTS Score Interval Estimator

    0 20 40 60 80 100

    2

    4

    6

    8

    Clinton FTS Mean Interval Estimates

    sample

    0 20 40 60 80 100

    2

    4

    6

    8

    Edwards FTS Mean Interval Estimates

    sample

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Coverage Probability

    Coverage probability is the probability that an interval estimator contains the true valueof the parameter.

    P(lower upper) = 1 This is usually written as 1 . (To be explained later).

    Question:What is the probability that an interval estimate contains the true value of theparameter. For example,

    [x 1.96 sn

    , x + 1.96 sn

    ]

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    20/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    FTS Example: Mean from Normal Distribution

    (Variance Known)

    Suppose we assume that JE FTS scores as normally distributed, and we know(somehow) that = 25.5. Recall that if X1, ..., Xn

    i.i.d. N(,

    2) , then

    b

    n

    N(0, 1)

    P

    1.96 b

    n

    1.96!

    = 95%

    Pb 1.96 n b + 1.96 n = 95% 1.96

    n

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Is 95% all there is?

    Our 95% CI had the following form:

    1.96 n

    Where did the 1.96 come from?

    P

    1.96 b

    n

    1.96!

    = 95%

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    21/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    (1 )% Confidence Intervals

    Pz/2 b

    n

    z/2! = (1 )%P

    b z/2 n b + z/2

    n

    = (1 )%

    We usually construct the (1 )% confidence interval with the following formula.

    z/2

    n

    Question:Why not 100% confidence?

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    FTS Example: Mean from Normal Distribution

    (Variance Unknown)

    Suppose we model JE FTS scores as normal distributed with unknown. Recall that if

    X1, ..., Xni.i.d. N(, 2) , then b

    n

    N(0, 1)

    Question:Why cant our previous interval be used?

    z/2

    n

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    22/32

  • 8/14/2019 Harvard Government 2000 Lecture 3

    23/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    (1 )% t- Intervals

    b

    n

    tn1

    P

    0@tn1,/2 b n

    tn1,/2

    1A = (1 )%P

    b tn1,/2 n b + tn1,/2

    n

    = (1 )%

    We usually construct the (1 )% confidence interval with the following formula.

    tn1,/2

    n

    For a 95% confidence interval, tn1,/2 is often close to 2.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    Asymptotic Coverage Probability

    Without making an assumption about the population distribution, we will often not know

    the sampling distribution of the interval estimator, and therefore, we will not know thecoverage probability.

    We may be able to derive the asymptotic coverage probability instead.

    P(lower,n upper,n) 1 as

    n

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    24/32

    Point Estimation

    Interval Estimation

    Testing

    Sampling Distributions for Interval Estimators

    Small Sample Properties

    Large Sample Properties

    FTS Example: Mean from Unknown Distribution

    Suppose we do not assume a distribution for HC FTS. Recall that if X1, ..., Xn

    i.i.d.?(, 2) , then bn 1

    n

    d N(0, 2)

    andnp

    it can be shown that

    bn

    nn

    d N(0, 1)

    Therefore, our normal quantile confidence intervals will have valid asymptoticcoverage. (t-quantile intervals also)

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Sampling Distributions for Interval EstimatorsSmall Sample Properties

    Large Sample Properties

    4 2 0 2 4

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    x

    Density

    t 1

    t 4

    t 15

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    25/32

  • 8/14/2019 Harvard Government 2000 Lecture 3

    26/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    The Trial Analogy

    Suppose we can somehow model the probabilities for the various outcomes conditional

    on the true state of the world.

    Table: Probabilities given the true state of the world

    TruthGuilty Innocent

    Decision Convict 1 Acquit 1

    We would like and to be small, but it may be difficult to achieve both goals.

    The standard statistical approach is to pick a small level for (e.g. 5%), and then try tominimize given this constraint.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    The Statistical Version

    Suppose we must decide whether to reject or fail to reject a prior hypothesis about theworld (null hypothesis) in favor of an alternative hypothesis.

    Table: Decisions and Outcomes

    TruthAlternative Hypothesis Null Hypothesis

    Decision Reject Correct Type I ErrorFail to Reject Type II Error Correct

    Table: Probabilities given the true state of the world

    Truth

    Alternative Hypothesis Null HypothesisDecision Reject 1

    Fail to Reject 1

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    27/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Edwards FTS Example

    As in our previous example, let be the expected value of JE FTS for the population.Lets assume the population mean for HC FTS is 55 (i.e. equal to the sample mean)Here are two possible hypothesis tests:

    H0 : = 55H1 : = 55

    H0 : 55H1 : > 55

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Test Statistics

    A test statistic is a function of the sample and the null hypothesis (and may provide

    evidence against the null hypothesis).

    Examples:

    1 If H0 : = 55, then X 55 would be a test statistic.2 If H0 : 55, then X 55 would be a test statistic.

    Why does the second test statistic make sense given the inequality in the nullhypothesis?

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    28/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    The One Sample t-Statistic

    Let 0 be the null value of the parameter (e.g. 55). Then the one sample t-statisticcan be written as the following:

    X 0S

    n

    Notice that being a function of the sample, this t-statistic will have a samplingdistribution.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Null Distributions for Test Statistics

    A null distribution is the sampling distribution for the test statistic when the nullhypothesis is true. More exactly, the null distribution is the sampling distribution for thetest statistic when = 0.

    For our example, the null distribution is the sampling distribution of the t-statistic

    X 55S

    n

    when = 55.

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    29/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    The Null Distribution for the t-Statistic

    Suppose we model JE FTS scores as normally distributed with unknown. Recall thatif X1, ..., Xni.i.d. N(, 2) , then

    X 55S

    n

    tn1

    when = 55.

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Null Distribution ( = 55 and n= 520)

    3 2 1 0 1 2 3

    0.0

    0.

    1

    0.

    2

    0.3

    0.4

    Null Distribution

    test statistic

    f(teststatistic)

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    30/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    p-Value

    The p-value is the probability under the null distribution of getting a sample at least asextreme as the one we got.

    Extreme is defined by the alternative hypothesis.

    Examples:

    H1 : = 55 p-value = P(tstat |tobs| tstat |tobs| = 55)

    H1 : > 55

    p-value = P(tstat

    tobs = 55)

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    One and Two Sided p-values

    3 2 1 0 1 2 3

    0.0

    0.

    2

    0.4

    Two Sided pvalue

    test statistic

    f(test

    statistic)

    tobs

    tobs

    3 2 1 0 1 2 3

    0.0

    0.

    2

    0.

    4

    One Sided pvalue

    test statistic

    f(teststatistic)

    tobs

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    31/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Rejection Regions

    Recall that is the probability of Type I Error. Often we want to limit to 5% whileminimizing the probability of Type II Error. This can be accomplished in the followingmanner.

    3 2 1 0 1 2 3

    0.

    0

    0.2

    0.

    4

    Two Sided Rejection Region (=5%)

    test statistic

    f(teststatistic)

    fencestobs

    3 2 1 0 1 2 3

    0.

    0

    0.2

    0.4

    One Sided Rejection Region (=5%)

    test statistic

    f(teststatistic)

    fencetobs

    Gov2000: Quantitative Methodology for Political Science I

    Point EstimationInterval Estimation

    Testing

    Some Statistical Decision TheorySampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Rejection Regions and p-values

    Notice the relationship between and p-value.

    3 2 1 0 1 2 3

    0.

    0

    0.2

    0.4

    Two Sided Rejection Region (=5%)

    test statistic

    f(teststa

    tistic)

    fencestobstobs

    3 2 1 0 1 2 3

    0.0

    0.

    2

    0.4

    One Sided Rejection Region (=5%)

    test statistic

    f(teststatistic)

    fencetobs

    Gov2000: Quantitative Methodology for Political Science I

  • 8/14/2019 Harvard Government 2000 Lecture 3

    32/32

    Point Estimation

    Interval Estimation

    Testing

    Some Statistical Decision Theory

    Sampling Distributions for Test Statistics

    p-Values, Rejection Regions, and CIs

    Rejection Regions and 1 CIs

    50 52 54 56 58 60

    0.0

    0.1

    0.2

    0.3

    Rejection Regions and CIs (=5%)

    X

    f(X|H0

    )

    fencesCI

    Gov2000: Quantitative Methodology for Political Science I