Vocabulary of Statistics Part 2. Data Collection First problem a statistician faces: how to obtain...

25
Vocabulary of Vocabulary of Statistics Statistics Part 2 Part 2

Transcript of Vocabulary of Statistics Part 2. Data Collection First problem a statistician faces: how to obtain...

Page 1: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Vocabulary of StatisticsVocabulary of Statistics

Part 2Part 2

Page 2: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Data CollectionData Collection

First problem a statistician faces: First problem a statistician faces: how to obtain the data.how to obtain the data.

It is important to obtain It is important to obtain goodgood, or , or representativerepresentative, data., data.

Inferences are made based on Inferences are made based on statistics obtained from the data.statistics obtained from the data.

Inferences can only be as good as Inferences can only be as good as the data.the data.

Page 3: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Data Collection - SurveysData Collection - Surveys TelephoneTelephone pros: less costly, more candidpros: less costly, more candid

cons: no phone, no call listcons: no phone, no call list Mailed questionnaireMailed questionnaire pros: cover more area, less costpros: cover more area, less cost cons: low response, inappropriate cons: low response, inappropriate

responsesresponses Personal InterviewPersonal Interview

pros: in-depth responses,pros: in-depth responses, cons: training, cost, biascons: training, cost, bias

Page 4: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Biased Sampling MethodBiased Sampling Method: A sampling method : A sampling method that produces data which systematically differs that produces data which systematically differs from the sampled population. An from the sampled population. An unbiased unbiased sampling methodsampling method is one that is not biased. is one that is not biased.

Sampling methods that often result in biased Sampling methods that often result in biased samples:samples:

Convenience sampleConvenience sample

Volunteer sampleVolunteer sample

Page 5: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Process of data collection:Process of data collection:

1. Define the objectives of the survey or 1. Define the objectives of the survey or experiment.experiment.

2. Define the variable and population of interest.2. Define the variable and population of interest.

3. Defining the data-collection and data-3. Defining the data-collection and data-measuring schemes. This includes sampling measuring schemes. This includes sampling procedures, sample size, and the data-procedures, sample size, and the data-measuring device (questionnaire, scale, ruler, measuring device (questionnaire, scale, ruler, etc.).etc.).

4. Determine the appropriate descriptive or 4. Determine the appropriate descriptive or inferential data-analysis techniques.inferential data-analysis techniques.

Page 6: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

What Causes a Biased What Causes a Biased Sample?Sample?

A biased sample is any sample that is not A biased sample is any sample that is not representative of the target population. (The representative of the target population. (The sample is different from the population)sample is different from the population)

This is commonly caused by a difference in the This is commonly caused by a difference in the following factors, but can be caused by a difference following factors, but can be caused by a difference in other factors:in other factors:

racerace gender gender household household

income income religion geographic religion geographic locationlocation

ageage

Page 7: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Let’s assume that you were Let’s assume that you were interested in profiling the religious interested in profiling the religious beliefs of the people who live in beliefs of the people who live in Oyster Bay. Would you poll the Oyster Bay. Would you poll the people walking down Anstice St on people walking down Anstice St on Sunday morning?Sunday morning?

Why not?

Page 8: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Let’s assume that you were Let’s assume that you were interested in the percentage of interested in the percentage of Americans who listen to country Americans who listen to country music. Could you find this music. Could you find this information by polling people on the information by polling people on the street in New York City?street in New York City?

Why not?

Page 9: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Types of BiasTypes of Bias

Selection Bias:Selection Bias: When the method of data When the method of data collection leads to individuals being more or less collection leads to individuals being more or less likely to be selected for the study than the overall likely to be selected for the study than the overall population.population.

Nonresponse Bias: Nonresponse Bias: When those responding to a When those responding to a survey differ significantly from those who do survey differ significantly from those who do respond.respond.

Page 10: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

If you’re trying to find the If you’re trying to find the percentage of high school students percentage of high school students who use illegal drugs, can you poll who use illegal drugs, can you poll them in front of their parents?them in front of their parents?

What kind of bias is this?

How can you change your method to minimize this bias?

How will your results be affected?

Page 11: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Sampling FrameSampling Frame: A list of the elements : A list of the elements (people) belonging to the population from which (people) belonging to the population from which the sample will be drawn.the sample will be drawn.

NoteNote: It is important that the sampling frame be : It is important that the sampling frame be representative of the population.representative of the population.

Ex: Phone directory, electoral register Ex: Phone directory, electoral register

Page 12: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Sample DesignSample Design: The process of selecting : The process of selecting sample elements from the sampling frame in sample elements from the sampling frame in order to create the sample.order to create the sample.

NoteNote: There are many different types of sample : There are many different types of sample designs. Usually they all fit into two categories: designs. Usually they all fit into two categories: judgment samples and probability samples.judgment samples and probability samples.

Ex: Simple Random, Stratified Random, Cluster, Ex: Simple Random, Stratified Random, Cluster, SystematicSystematic

Page 13: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Judgment SamplesJudgment Samples: Samples that are selected : Samples that are selected on the basis of being “typical.”on the basis of being “typical.”

Items are selected that are representative of the Items are selected that are representative of the population. The validity of the results from a population. The validity of the results from a judgment sample reflects the soundness of the judgment sample reflects the soundness of the collector’s judgment.collector’s judgment.

Probability SamplesProbability Samples: Samples in which the : Samples in which the elements (people) to be selected are drawn on elements (people) to be selected are drawn on the basis of probability. Each element in a the basis of probability. Each element in a population has a certain probability of being population has a certain probability of being selected as part of the sample.selected as part of the sample.

Page 14: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Misuse of StatisticsMisuse of Statistics

Page 15: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

The pure and simple truth is rarely pure and never simple

Oscar Wilde First get your facts; then you can distort

them at your leisureMark Twain

There are three kinds of lies: lies, damn lies, and statistics

Benjamin Disraeli

Page 16: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Three out of four doctors Three out of four doctors recommend new Zimentorecommend new Zimento

Page 17: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Suspect SamplesSuspect Samples

How many doctors were actually How many doctors were actually used?used?

4?4? 40?40? 100?100? 10,000?10,000? How were they chosen?How were they chosen?

Page 18: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Misleading GraphsMisleading Graphs

Page 19: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,
Page 20: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,
Page 21: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

51% agree

49% disagree

Page 22: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Misleading wording in surveysMisleading wording in surveys

Do you support bringing freedom and Do you support bringing freedom and democracy to the people of Iraq?democracy to the people of Iraq?

Do you support the unprovoked Do you support the unprovoked military action of the U.S. taking military action of the U.S. taking place in Iraq?place in Iraq?

Page 23: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Changing values to represent the Changing values to represent the same datasame data

The incumbent states, “During my tenure The incumbent states, “During my tenure expenditures have only risen 1%.”expenditures have only risen 1%.”

The challenger states, “During my The challenger states, “During my opponents tenure expenditures have opponents tenure expenditures have risen $10,000,000.”risen $10,000,000.”

Both statements are true, but one uses Both statements are true, but one uses percentage and the other dollar amounts.percentage and the other dollar amounts.

Page 24: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Detached StatisticsDetached Statistics

Our brand of crackers has one-third Our brand of crackers has one-third fewer calories.fewer calories.

compared to what?compared to what?

Zimento works four times faster.Zimento works four times faster.

faster than what?faster than what?

Page 25: Vocabulary of Statistics Part 2. Data Collection  First problem a statistician faces: how to obtain the data.  It is important to obtain good, or representative,

Implied connectionsImplied connections

Eating fish may help to reduce your Eating fish may help to reduce your cholesterol.cholesterol.

Studies suggest that using Zimento Studies suggest that using Zimento will help you reduce your weight.will help you reduce your weight.

Taking calcium will lower blood Taking calcium will lower blood pressure in some people.pressure in some people.