LT 4.1—Sampling and Surveys Day 3 Notes--Bias

34
LT 4.1—Sampling and Surveys Day 3 Notes--Bias

description

LT 4.1—Sampling and Surveys Day 3 Notes--Bias. Population and Sample. Population : The collection of all individuals or items under consideration in a statistical study. The population is determined by what we want to know. - PowerPoint PPT Presentation

Transcript of LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Page 1: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

LT 4.1—Sampling and Surveys

Day 3 Notes--Bias

Page 2: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Population and Sample

• Population: The collection of all individuals or items under consideration in a statistical study.– The population is determined by what we want

to know.• Sample: That part of the population from

which information is obtained.– The sample is determined by what is practical

and should be representative of the population.

Page 3: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Sampling Designs• The sampling design is the method used to chose the

sample.• All statistical sampling designs incorporate the idea

that chance (randomness), rather than choice, is used to select the sample.

• The value of deliberately introducing randomness is one of the great insights of Statistics.

• Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. It does that by making sure that on the average the sample looks like the rest of the population.

Page 4: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

The Valid Survey

• It isn’t sufficient to just draw a sample and start asking questions. A valid survey yields the information we are seeking about the population we are interested in. Before you set out to survey, ask yourself:– What do I want to know?– Am I asking the right respondents?– Am I asking the right questions?– What would I do with the answers if I had them;

would they address the things I want to know?

Page 5: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Sampling Designs

1. Simple Random Sample (SRS)2. Stratified Sample3. Cluster Sample 4. Systematic Sample5. Multistage Sample6. Convenience Sample7. Voluntary Response Sample

Page 6: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Bias

Page 7: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Bias• Definition: Any systematic failure of a sample to

represent its population.• Sampling methods that, by their nature, tend to over-

or under-emphasize some characteristics of the population are said to be biased.– Bias is the bane of sampling—the one thing above all to

avoid.– There is usually no way to fix a biased sample and no way to

salvage useful information from it.• The best way to avoid bias is to select individuals for

the sample at random. – The value of deliberately introducing randomness is one of

the great insights of Statistics.

Page 8: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Types of Bias

1. Undercoverage

2. Voluntary Response Bias

3. Convenience Sample Bias

4. Nonresponse Bias

5. Response Bias

Page 9: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Undercoverage

• A sampling scheme that fails to sample part of the population or that gives a part of the population less representation than it has in the population suffers from undercoverage.– A classic example of undercoverage is the Literary Digest

voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats. Undercoverage is often a problem with convenience samples.

Page 10: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Example: Literary Digest Poll

• 1936 presidential election Literary Digest magazine poll.

• The survey team asked a sample of the voting population whether they would vote for Franklin D. Roosevelt, the democratic candidate or Alfred Landon, the republican candidate.

• Based on the results, the magazine predicted an easy win for Landon.

Page 11: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Results

• When the actual results were in, Roosevelt won by a landslide.

• What happened?– The sample was obtained from among people who

owned a car or had a telephone. In 1936, that group included mostly rich people and they historically voted republican.

– The response rate was low, less than 25% of those polled responded. A disproportionate number of those responding were Landon supporters.

• Whatever the reason for the poll’s failure, the sample was not representative of the population.

Page 12: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Voluntary Response Bias

• When choice rather than randomization is used to obtain a sample, the sample suffers from voluntary response bias.

• Voluntary response bias occurs when sample members are self-selected volunteers.– An example would be call-in radio shows that solicit

audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to over represent individuals who have strong opinions.

Page 13: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Convenience Sample

• Is obtained exactly as its name suggests, by sampling individuals who are conveniently available. Convenience samples are often not representative of the population of interest because each individual in the population is not equally convenient to sample.– The classic example of a convenience sample is

standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.

Page 14: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Nonresponse Bias

• Occurs in a sample design when individuals selected for the sample fail to respond, cannot be contacted, or decline to participate.– A common problem with mail surveys. Response

rate is often low (5% - 30%), making mail surveys vulnerable to nonresponse bias.

Page 15: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Response Bias

• Anything in a survey that influences responses falls under the heading of response bias.

• Examples are biased wording of survey questions, lack of privacy while being surveyed, and appearance of the interviewer.

• Both Question Bias and Interviewer Bias are examples of response bias.

Page 16: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Response Bias: Question Bias

• Wording of the questions or the questions themselves lead to bias.

• People often don’t want to be perceived as having unpopular or unsavory views and so may not respond truthfully.– Example: Given that the threat of nuclear war is higher

now than it has ever been in human history, and the fact that a nuclear war poses a threat to the very existence of the human race, would you favor an all-out nuclear test ban?

– Question is biased in favor of a nuclear test ban.

Page 17: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Response Bias: Question Bias

Page 18: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Response Bias: Interviewer Bias

• The sex, age, race, dress, attitude, or actions of the interviewer and how the interviewer asks the questions have an influence on the way a subject responds.– Example: A male interviewer asking sex related questions

to women.• To prevent this, interviewers must be trained to remain

neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact the respondent's answer.

Page 19: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Sampling Variability

• Sampling Variability– Is the natural tendency of randomly drawn samples to

differ, one from another.– Sampling variability is not an error, just the natural result

of random sampling.– Statistics attempts to minimize, control, and understand

variability so that informed decisions can drawn from the data despite their variation.

– Although samples vary, when we use chance to select them, they do not vary haphazardly but rather according to the laws of probability.

Page 20: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Example: Sample Variability

• Each of four major news organizations surveys likely voters and separately reports that the percentage favoring the incumbent candidate is 53.5%, 54.1%, 52%, and 54.2%, respectively.

• What is the correct percentage?• Did three or more of the news organizations

make a mistake?

Page 21: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Solution

• There is no way of knowing the correct population percentage from the information given.

• The four surveys led to four statistics, each an estimate of the population parameter.

• No one made a mistake unless there was a bad survey.

• Sampling variation is natural.

Page 22: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Stratified Sampling

Page 23: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Cluster Sampling

Page 24: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

Systematic Samples

Page 25: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What Can Go Wrong?—or,How to Sample Badly

• Sample Badly with Volunteers:– In a voluntary response sample, a large group of individuals

is invited to respond, and all who do respond are counted. • Voluntary response samples are almost always biased,

and so conclusions drawn from them are almost always wrong.

– Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated.

– Since the sample is not representative, the resulting voluntary response bias invalidates the survey.

Page 26: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What Can Go Wrong?—or,How to Sample Badly

• Sample Badly, but Conveniently:– In convenience sampling, we simply include the

individuals who are convenient. • Unfortunately, this group may not be

representative of the population.– Convenience sampling is not only a problem for

students or other beginning samplers.• In fact, it is a widespread problem in the

business world—the easiest people for a company to sample are its own customers.

Page 27: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What Can Go Wrong?—or,How to Sample Badly

• Sample from a Bad Sampling Frame:– An SRS from an incomplete sampling frame introduces

bias because the individuals included may differ from the ones not in the frame.

• Undercoverage:– Many of these bad survey designs suffer from

undercoverage, in which some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population.

– Undercoverage can arise for a number of reasons, but it’s always a potential source of bias.

Page 28: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What Else Can Go Wrong?

• Watch out for nonrespondents.– A common and serious potential source of bias for

most surveys is nonresponse bias.– No survey succeeds in getting responses from

everyone. • The problem is that those who don’t respond

may differ from those who do.• And they may differ on just the variables we

care about.

Page 29: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What Else Can Go Wrong?

• Work hard to avoid influencing responses.– Response bias refers to anything in the survey

design that influences the responses. – For example, the wording of a question can influence the

responses:• Given the fact that those who understand Statistics are smarter

and better looking than those who don’t, don’t you think it is important to take a course in Statistics?

Page 30: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

How to Think About Biases

• Look for biases in any survey you encounter before you collect the data—there’s no way to recover from a biased sample of a survey that asks biased questions.

• Spend your time and resources reducing biases.

• If you possibly can, pilot-test your survey.• Always report your sampling methods in

detail.

Page 31: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What have we learned?

• A representative sample can offer us important insights about populations.– It’s the size of the sample, not its fraction of the

larger population, that determines the precision of the statistics it yields.

• There are several ways to draw samples, all based on the power of randomness to make them representative of the population of interest:– Simple Random Sample, Stratified Sample, Cluster

Sample, Systematic Sample, Multistage Sample

Page 32: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What have we learned?

• Bias can destroy our ability to gain insights from our sample:– Nonresponse bias can arise when sampled

individuals will not or cannot respond.– Response bias arises when respondents’ answers

might be affected by external influences, such as question wording or interviewer behavior.

Page 33: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What have we learned?

• Bias can also arise from poor sampling methods:– Voluntary response samples are almost always

biased and should be avoided and distrusted.– Convenience samples are likely to be flawed for

similar reasons.– Even with a reasonable design, sample frames

may not be representative.• Undercoverage occurs when individuals from a

subgroup of the population are selected less often than they should be.

Page 34: LT 4.1—Sampling and Surveys Day 3 Notes--Bias

What have we learned?

• Finally, we must look for biases in any survey we find and be sure to report our methods whenever we perform a survey so that others can evaluate the fairness and accuracy of our results.