Sample Size

Review: Base Rate Neglect

On HW 4, I asked you to find a fallacy on the internet.

More than one of you found studies where scientists took data from thousands of people and found correlations.

Misunderstanding

You said, “this is the base rate neglect fallacy. There are billions of people in the world, and these studies only looked at thousands of people. They are neglecting the base rate.”

But that is not the base rate neglect fallacy. And this is important to know: you shouldn’t ignore good science just because you’re confused about what the base rate fallacy is.

Base Rate Neglect

First of all, the base rate neglect fallacy has nothing at all to do with the number of people there are in the world. Nothing.

It has to do with the probability of a variable taking on a certain value, for instance, the probability that someone’s height = 1.5m, the probability that terrorist = true (someone is a terrorist)…

Base Rates

This is the “base rate” of people who are 1.5m tall, and the “base rate” of terrorists.

If 1 in 100 people are terrorists, then the rate of terrorists is 1 in 100 and the probability that a randomly selected person is a terrorist is 1 in 100.

Base Rates

We call this the base rate, because it is the probability that someone is a terrorist when we don’t know anything else about them.

It might be that the base rate of terrorists is 1 in 100, but the rate of terrorists among people who are holding rocket launchers is 1 in 2, and the rate of terrorists among retirees is 1 in 500.

Tests

The base rate neglect fallacy happens when we have a test that is meant to detect the value of a variable.

For example we might have a test that tells us whether someone has AIDS or not, or whether someone is driving over the speed limit, or whether they are drunk.

Reliability of Tests

Here is the important, and crucial fact. Please learn this:

As the base rate of X = x decreases, the # of false positives on tests for X = x increases.

Tests are less reliable when the condition we are testing for becomes rare (low base rate).

Base Rate Neglect Fallacy

The base rate neglect fallacy happens when:

1. There is a low base rate of some condition.2. We have a test for that condition.3. Someone tests positive.4. We assume that means they have the

condition, ignoring the unreliability of tests for conditions with low base rates.

Prosecutor’s Fallacy

The base rate neglect fallacy is often called the prosecutor’s fallacy, as I shall explain.

Murder!

Let’s suppose that there has been a murder.

There is almost no evidence to go on except that the police find one hair at the crime scene.

You are the Suspect

If someone is the killer, there is a 100% chance that their DNA will match the hair’s DNA.

The police have a database that contains the DNA of everyone in Hong Kong.

They run the DNA in the hair through their database and discover that you are a match!

Comprehension Question

If you have been following along you should be able to answer this question:

What is the probability that you are the murderer, given that you are a DNA match for the hair?

Answer

If you said 100%, then you have just committed the base rate neglect fallacy.

The correct answer is “Much lower, because the base rate of people who committed this murder out of the Hong Kong population as a whole is 1 in 7 million.”

Perfect Conditions for Fallacy

Here’s what we have:

1. A low base rate (only 1 person who committed this murder in the world).

2. A test for whether someone is the murderer.3. You, who’ve tested positive on this test.4. And the police who think you did it!

Let’s Look at the Numbers

We know that if you are the murderer, then there is a 100% chance of a DNA match.

But what is the false positive rate? How likely is a randomly selected person will match the DNA?

False Results

Here’s a quote from “False result fear over DNA tests,” Nick Paton Walsh, The Guardian:

“Researchers had asked the labs to match a series of DNA samples. They knew which ones were from the same person, but found that in over 1 per cent of cases the labs falsely matched samples, or failed to notice a match.”

Let’s assume that half of the cases where “labs falsely matched samples, or failed to notice a match.” were cases where they falsely matched samples.

So the probability of a false positive is ½ x 1% = 0.5%, or 5 in 1,000.

Since there are 7 million people in Hong Kong, we expect about 0.5% x 7 million = 35,000 of them to match the hair’s DNA.

Actually, it’s 35,000 + 1, because the true killer is a match, and not by accident.

So we expect that there are 35,001 DNA matches in all of Hong Kong.

And only one of them is the murderer. So what is the probability that you are the murderer?

1 in 35,001. That’s way less than 100%.

Important Things to Remember

There are three important things to remember:

1. If the test is more accurate (fewer false positives), then it’s more reliable

2. If the base rate is higher, the test is more reliable.

3. If the police have other reasons to suspect you, the test is more reliable.

1. If the test is more reliable…

Theoretically, DNA tests only return a false positive about 1 in 3 billion times.

In that case, we’d expect only .002 false positives in all of Hong Kong.

So your chances of being guilty would be 1 in 1.002, or 99.8%. Still, that’s lower than 100%.

2. If there base rate is higher…

Maybe the person who died was stabbed 5,000 times, once each by 5,000 different people. So there are 5,000 murderers.

Then with the previous false positive number at 35,000, you have a 5,000 in 40,000 chance of being one of the killers, or 12.5%.

3. If the police have some other reason to suspect you…

To figure out your chances of being guilty, we looked at the probability that a randomly selected person from HK would be a DNA match. We were assuming you were randomly selected.

But what if you weren’t randomly selected? What if the police tested you because you had a reason to kill the victim?

Reason to Suspect You

Then we would have to look at not the probability that a randomly selected person would match, but the probability that a person who had reason to kill the victim would match.

Suppose there are 5 people who had reasons to kill the victim, and the killer is one of them.

Much Higher Chance

Then your chances are:Let K = you’re the killer and M = you’re a matchP(K/ M) = [P(K) x P(M/ K)] ÷ P(M)= [(1/5) x 100%] ÷ P(M)= 0.2 ÷ [(1 + 0.025) ÷ 5]= 97.6%

SAMPLING

Now we know what the base rate neglect bias is (hopefully).

But this still doesn’t answer our question: how many people do we need in our scientific study to reliably generalize the results to everyone?

For example, if I want to know whether increased economic dependence in men is correlated with increased infidelity, how many people do I need to study?

Surely one is too few. Is 10 fine? Do I need 100? A million?

Sample

In statistics, the people who we are studying are called the sample. (Or if I’m studying the outcomes of coin flips, my sample is the coin flips that I’ve looked at. Or if I’m studying penguins, it’s the penguins I’ve studied.)

Our question is then: what sample size is needed for a result that applies to the population?

Evaluating Evidence

Well, remember what we learned last class. There are two measures of success for a study:

Statistical significance: how likely would my results be if they were just due to random chance? Does the study rule out the null hypothesis?

Evaluating Evidence

Well, remember what we learned last class. There are two measures of success for a study:

Effect size: If I find that A and B are positively correlated, how much does the value of A affect B? What’s the percentage difference in the odds/probability of B as we vary the odds/probability of A?

Two Questions

So there are really two questions we’re asking:

How many people do I need to study to obtain statistically significant results?

How big should my sample be to accurately estimate effect sizes in the population at large?

Law of Large Numbers

Luckily, we do know that more is always better.

The “Law of Large Numbers” says that if you make a large number of observations, the results should be close to the expected value.

(There is no “Law of Small Numbers”)

Average of Dice Rolls

Example

Let’s think about a particular problem.

Suppose we are having an election between Mitt and Barack and we want to know how many people in the population plan to vote for Mitt.

How many people do we need to ask?

Non-Random Samples

The first thing we should realize is that it’s not going to do us any good to ask a non-random group of people.

Suppose everyone who goes to ILoveMitt.com is voting for Mitt. If I ask them, it will seem like 100% of the population will vote for Mitt, even if only 3% will really vote for him.

Internet Polls

(Important Critical Thinking Lesson:

Internet polls are not trustworthy. They are biased toward people who have the internet, people who visit the site that the poll is on, and people who care enough to vote on a useless internet poll.)

Representative Samples

The opposite of a biased sample is a representative sample.

A perfectly representative sample is one where if n% of the population is X, then n% of the sample is X, for every X.

For example, if 10% of the population smokes, 10% of the sample smokes.

Random Sampling

One way to get a representative sample is to randomly select people from the population, so that each has a fair and equal chance of ending up in the sample.

For example, when we randomize our experiments, we randomly sample the participants to obtain our experimental group. (Ideally our participants are randomly sampled from the population at large.)

Problems with Random Sampling

Random sampling isn’t a cure-all, however.

For example, if I randomly select 10 people from a (Western) country, on average I’ll get 5 men and 5 women. On average.

But, on any particular occasion, I might select (randomly) 7 men and 3 women, or 4 men and 6 women.

Stratified Sampling

One way to fix these problems would be to randomly sample 5 women and randomly sample 5 men. Then I would always have an even split between men and women, and my men would be randomly drawn from the group of men, while my women were randomly drawn from the group of women.

Example

Let’s continue with our example.

We’re convinced that we should randomly sample n individuals from the population of women and n from the population of men.

Still, what is that number n?

We know that, of the people in our sample, X% will vote for Mitt.

We want to know, of the people in the population, what percent will vote for Mitt?

We can never know that it is exactly X, unless we ask everyone. But we can increase our confidence.

Confidence Interval

What we can do is find out, based on our sample, that we are Z% sure (confident) that the number of people who will vote for Mitt is between X% and Y%.

For example, we can be 90% confident that the percentage of people who vote for Mitt is between 44% and 48%.

Confidence Interval

This would mean we think there’s a 10% chance that either less than 44% or more than 48% of people vote for Mitt.

The very same data might warrant us in saying that we are 95% confident that the percentage of people who vote for Mitt is between 40% and 52%.

Sample Size Determination

So if we want to know how many people to look at, we should determine:

1. What level of confidence we want2. How big we want our confidence interval to

be.

Common Choices

Common choices for these numbers are:

1. We want to be 95% confident of our estimation.

2. We want our confidence interval to be 6% wide (e.g. between 42% and 48%).

Expected Value, Deviation

Each variable has an expected value (for example, 3.5 is the expected value of a dice roll, the average of all the sides of a die).

Each variable has an expected deviation from its expected value: how far are all the dice values (1, 2, 3, 4, 5, 6) from the expected value (3.5)– the answer is 1.5 on average.

Variance

The variance is the expected squared deviation–

[(6 – 3.5)^2 + (5 – 3.5)^2 + (4 – 3.5)^2 + (3.5 – 3)^2 + (3.5 – 2)^2 + (3.5 – 1)^2] ÷ 6

Or about 2.9 for a die. Don’t worry you don’t need to know this.

Standard Deviation

The standard deviation is the square root of the variance. So √2.9 for a die.

The important point is that we can use this number, the standard deviation, to figure out how many people we need in our sample.

Solving for Sample Size

If we want our confidence interval to be 6% wide, then a 95% confidence interval of this width will be:

4 x standard deviation = 6%

The standard deviation of any estimate of a proportion will be √(0.25/n)

A Little Bit of Math

So,

4 x standard deviation = 6%4 x √(0.25/n) = 0.06√(0.25/n) = 0.06/4 = 0.015(0.25/n) = 0.015^2 = 0.000225n = 0.25/0.000225 = 1,111

The Important Point

What’s the point?

The point is that you need about 1,000 people to be 95% sure that the vote counts you estimate from the sample are within 6% of the actual voting behavior of the population.

Things to Note

This doesn’t mean that studies with less than a thousand people can’t tell us anything—

What they tell us will just be either less confident than 95% or have greater error bars than 6%.

If a confidence interval of 20% is fine, you only need 100 people.

The Base Rate Still Matters

It also doesn’t mean that 100 or 1,000 people is sufficient for any study.

When the value of the variable being studied is rare in the population, you need more people. For example, if it’s 1 in 1 million, then most samples of 1,000 won’t contain it, but that doesn’t mean it’s at 0 prevalence.

Sample Size

Documents

Transcript of Sample Size