Post on 12-Jan-2016
description
Inferential statistics by example
Maarten Buis
Monday 2 January 2005
Two statistics courses
• Descriptive Statistics (McCall, part 1)
• Inferential Statistics (McCall, part 2 and 3)
Course Material• McCall: Fundamental Statistics for
Behavioral Sciences.
• SPSS (available from Surfspot.nl)
• Lectures: 2 x a week
• computer labs: 1 x a week.
• course website
setup of lectures
• Recap of material assumed to be known
• New Material
• Student Recap
How to pass this course
• Read assigned portions of McCall before each lecture
• Do the exercises• Do the computer lab assignments, and
hand them in before Tuesday 17:00!• come to the computer lab• come to the lectures• ask questions: during class or to the
course mailing list
What is inference?
• Drawing general conclusions from partial information
• Based on your observations some conclusions are more plausible than others.
• Compare with logic
Sources of uncertainty in inference
• Sample
• Measurement
• Model
• Typos when typing the data into SPSS
• Inference, as discussed here, assumes that random sampling error is by far the most dominant source of uncertainty.
How is inference done?
• If a null hypothesis is true than the probability of observing the data is so small that either we have drawn a very weird sample or the null hypothesis is false. (Ronald Fisher)
• We use a “good” procedure to choose between two hypotheses, whereby “good” means that you draw the right conclusion in 95% of the times you use that procedure. (Jerzy Neyman and Egon Pearson)
PrdV
• New populist party, wanted to participate in the next election if 41% of the Dutch population thought that “the PrdV would be an asset to Dutch politics”.
• This was asked to a sample of 2,598 people between, and on 16 December only 31% agreed.
• Peter R. de Vries decided not to participate in the next election.
The Inference Problem
• The 31% people approving is 31% of the people in the sample.
• Peter R. de Vries doesn’t care about what people in the sample think, he cares about what all the people in the Netherlands think.
• Could it be that he has drawn a “weird” sample, and that in the Netherlands 41% or more really think he would be an asset to Dutch politics?
Two hypotheses
• H0: 41% or more support PrdV
• HA: less than 41% support PrdV
A thought experiment (1)
• If support for PrdV in the Netherlands is 41% and we draw 100 random samples of 2598 persons, than we get 100 estimates of the support for PrdV, some of them a bit too high, some of them a bit too low.
• We would expect that 5 samples would show a support for PrdV of 39% or less.
• If we find a support for PrdV of 39% or less and reject H0, than we have followed a procedure that would result in taking the right decision in 95% of the times we used that procedure.
What does that 39% mean?
• We propose the following procedure: If we find a support for PrdV of less than x% than reject H0
• We choose x in such a way that the probability of rejecting H0 when we shouldn’t is only 5%
• The reason for mistakenly rejecting H0 is drawing a ‘weird’ sample.
Where does that 39% come from?
• If H0 is true, than we draw a sample from a population in which the support for PrdV is 41%
• We can let the computer draw many (100,000) samples and calculate the mean in each sample.
• 50,000 or 5% of these samples have a mean of 39% or less.
• So if we reject H0 when we find a support of 39% or less, than the probability of making a mistake is 5%
0
2000
4000
6000
8000
1.0e+04
Fre
quen
cy
.36 .38 .4 .42 .44 .46% support for PrdV
sampling distribution of support for PrdV
Where did that 39% come from?
• If we draw many random samples, and compute the mean in each sample, than the distribution of these means will be approximately normally distributed with a mean of .41 and a standard deviation of
• Remember that the sample size is 2598, and the SD of a proportion is , so the Standard Deviation of the distribution of means is
• 5% of the samples has a support for PrdV of less than 39%
N
SD
)1( pp
0096.2598
)41.1(41.
Neyman Pearson hypothesis testing
• This procedure is the Neyman Pearson hypothesis testing approach
• Note that it tells us something quality of the procedure we use to make a decision, not about the strength of evidence against H0
Thought experiment (2)
• If the H0 is true, than the probability of drawing a sample of size 2598 with a support for PrdV of 31% or less is 1.041 x 10-25.
• This is so small that we think it is safe to reject H0.
Where did that 1.041 x 10-25 come from?
• In the 100,000 samples that were drawn from the population if H0 were true none were lees than .31%
• So the probability of drawing this or a more extreme sample when H0 is true is less than 1/100,000.
• Remember that if H0 is true, the distribution of means obtained from many samples is normal with a mean of .41 and a standard deviation of .0096
• The proportion of samples with a mean less than .31 is 1.041 x 10-25
Fisher hypothesis testing
• This procedure is Fisher hypothesis testing.
• Note that it gives us a measure of evidence against H0, but it does not give us an indication of how likely we are to make the wrong decision.
Fisher vs. Neyman Pearson
• You will draw the same conclusion whichever method you use.
• However, it really helps to choose one approach when writing your results down.
Limits to inference
• More importantly, both assume random sampling, and we almost never have that.
• Testing is more helpful to determine whether the data is ‘screaming’ or whispering’ at us.
• Knowing the reasoning behind statistical inference will help you determine the weight you should assign to conclusions derived from statistical tests.
Terminology (1)
• Distribution means obtained from different samples is the sampling distribution of the mean.
• The standard deviation of the sampling distribution is the standard error.
• Proportion of samples that wrongly reject the H0 is the significance level or or Type I error rate.
• Proportion of samples that wrongly fail to reject H0 is the Type II error rate or .
• Proportion of samples that will rightly reject H0 is the power.
Terminology (2)
• The probability of the data given that H0 is true is the p-value.
• Maximum p-value that will cause you to reject H0 is also the level of significance.
What to do before Wednesday?
• Read Chapter 8
• Do exercises of chapter 8