Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data...

45
Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design

Transcript of Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data...

Page 1: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Chapter 1Introduction to Statistics

1Larson/Farber 4th ed.

• 1.1 An Overview of Statistics

• 1.2 Data Classification

• 1.3 Experimental Design

Page 2: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Section 1.1 and 1.2

An Overview of Statistics

Classifying Data

Critical Thinking

2Larson/Farber 4th ed.

Page 3: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

What is Statistics?

Statistics The science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

3Larson/Farber 4th ed.

Page 4: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

DefinitionsPopulation The collection of all outcomes, responses, measurements, or counts that are of interest.

Sample The collection of data from a subset of the population.

Census The collection of data from every member of the population.

Page 5: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Example: Identify the population, and whether a census or sample would be done.

1. HCC is doing a study on how many credit hours a HCC student is taking.

2. HCC is doing a study on many hours a week a HCC student is working.

3. A fashion magazine gathers data on the price of women’s jeans.

Page 6: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

What is Data?Data The responses, counts, measurements, or observations that have been collected.

Data can be classified as one of 2 types:

1. Qualitative Data2. Quantitative Data

6Larson/Farber 4th ed.

Page 7: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Qualitative Data

Qualitative Data: Consists of non-numeric, categorical attributes or labels

Major Place of birth Eye color

Common statistic calculated: percentages

Page 8: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Quantitative DataQuantitative data: Numerical measurements or counts.

Age Weight of a letter Temperature

8Larson/Farber 4th ed.

Common statistic calculated: averages

Page 9: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Quantitative Data:Discrete vs. Continuous

Discrete data: finite number of possible data values: 0, 1, 2, 3, 4….

ex: Number of classes a student is taking

9Larson/Farber 4th ed.

Continuous data: infinite number of possible data values on a continuous scale

ex: Weight of a baby

Page 10: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.
Page 11: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Parameters and StatisticsParameter A number that describes some characteristic of an entire population.

Average age of all people in the United States

Statistic A number that describes some characteristicfrom a sample. Average age of people from a sample of three states

11

Page 12: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Ex: Parameters vs. Statistics

Decide whether the numerical value describes a population parameter or a sample statistic.

1. The average credit load of all HCC full-time students is 14.2 credit hours.

2. From a sample of 300 HCC full-time students showed the average work hours a week is 18.3 hours.

3. A gallup poll of 1012 adults nationwide showed 34% owned a handgun.

Page 13: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

White House 2008: Republican Nomination        

Pew Research Center for the People & the Press survey conducted by Princeton Survey Research Associates International. Dec. 19-30, 2007. N=471 registered voters nationwide who are Republicans or lean Republican. MoE ± 5.

"I'm going to read you the names of some Republican presidential candidates. Which one of the following Republican candidates would be your first choice for president: [see below]?" If unsure: "Just as of today, would you say you lean toward [see below]?" (Names were rotated)

Candidate Percent

John McCain 22%Rudy Giuliani 20%

Mike Huckabee 17%Mitt Romney 12%

Fred Thompson 9%Ron Paul 4%

Duncan Hunter 1%Other (vol) 1%None (vol.) 2%

Unsure 12%

Page 14: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Branches of StatisticsDescriptive Statistics: Involves organizing, summarizing, and displaying data. Describes the important characteristics of the data. e.g. Tables, charts, averages, percentages

Inferential Statistics: Involves using sample data to draw conclusions or make inferences about an entire population.

Page 15: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Example: Descriptive and Inferential Statistics

Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?

A sample of Illinois adults showed that 22.7% of those with a high school diploma were obese, and 16.7% of college graduates were obese. (Source: Illinois BRFSS, 2004)

15Larson/Farber 4th ed.

Page 16: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Example: Descriptive and Inferential Statistics

Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?

A sample of 471 registered republicans showed that 22% would pick John McCain as the republican nominee for president. (Margin of error: 5%). (Source: USA Today/CNN poll)

16Larson/Farber 4th ed.

Page 17: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Uses of Statistics

• Almost all fields of study benefit from the application of statistical methods

• Statistics often lead to change

Page 18: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

• Bad Samples• Small Samples• Misleading Graphs• Pictographs• Loaded Questions• Correlation & Causality• Self Interest Study

Misuses of Statistics

Page 19: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Bad Samples

• Voluntary response sample : Respondents themselves decide whether to be included in the sample

• Ex. Online surveys• Ex. Ratemyprofessor.com

• Samples must be unbiased and fairly represent the entire population.

• If the data is not collected appropriately, the data may be completely useless. “Garbage in, garbage out”

Page 20: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Misleading Graphs

Page 21: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

CNN/USA Today Gallup poll on Terri Schiavo (March 2005)

Page 22: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

CNN/USA Today Gallup poll on Terri Schiavo

(March 2005) Reprinted

Page 23: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Pictographs

Page 24: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Loaded Questions

“Should the President have the line item veto to eliminate waste?” (97% said yes: )

“Should the President have the line item veto?” (57% said yes: )

Page 25: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Loaded Questions

Page 26: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuse: Correlation does not imply Cause and Effect

Page 27: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.
Page 28: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Misuses: Self Interest and Deliberate Distortions

Page 29: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.
Page 30: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Section 1.3

Experimental Design

30Larson/Farber 4th ed.

Page 31: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Designing a Statistical Study

1. What is it you want to study?

2. What is the population to gather data from?

3. *Collect data. If you use a sample, it must be representative of the population.

4. Descriptive Statistics – organize, present, summarize data

5. Inferential Statistics – draw conclusions about the population based on sample data

31Larson/Farber 4th ed.

Page 32: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Things to Consider with Samples

1. The sample must be unbiased and fairly represent the entire population.

2. If the data is not collected appropriately, the data may be completely useless. “Garbage in, garbage out”

3. Want the maximum information at the minimum cost. What sample size is needed?

32Larson/Farber 4th ed.

Page 33: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Methods of Collecting Data

• Observational study

• Survey

• Experiment

• Simulation

33Larson/Farber 4th ed.

Page 34: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Methods of Collecting Data

Observational study

• A researcher observes or measures characteristics of interest of part of a population but does not change any existing conditions.

Experiment• A treatment is applied to part of a population

and responses are observed.

34Larson/Farber 4th ed.

Page 35: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Methods of Collecting Data

Survey• An investigation of one or more characteristics of

a population, usually be asking people questions.• Commonly done by interview, mail, or telephone.

Simulation• Uses a mathematical or physical model to

reproduce the conditions of a situation or process. Often involves the use of computers.

35Larson/Farber 4th ed.

Page 36: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Example: Methods of Data CollectionConsider the following studies. Which method of data collection would you use to collect data for each study?

1. A study of salaries of NFL players. 2. A study of the emergency response times during a

terrorist attack.3. A study of whether changing teaching techniques

improves FCAT scores. 4. A study of whether Tampa residents support a

mass transit system.

Page 37: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Sampling Techniques

• Random versus Non-Random Samples

• Convenience Samples

• Simple Random Samples

• Systematic Samples

• Cluster Samples

37Larson/Farber 4th ed.

Page 38: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Random and Non-Random Sampling

Random Sampling• Every member of the population has an equal

change of being selected.

Non-Random Sampling• Some members of the population have no

chance of being picked. Often leads to biased samples.

38Larson/Farber 4th ed.

Page 39: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Convenience Samples

• Data is collected that is readily available and easy to get.

• Self-selected surveys or voluntary response surveys (online surveys, magazine surveys, 1-800-Verdict, Ratemyprofessor.com)

Page 40: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Simple Random Sample

• A random sample where every member of the population and every group of the same size has an equal chance of being selected.

• Usually involves using a random number generator.

x xxxx

xx

x x

x

xx

x x

xx x

xxx

xx

x

xx xx x

xx

x

xxx

xx x

xx x

xxx

xx

x

xx xx x

xx

x

xxx

xx x

xx x

xxx

xx

x

xx xxx

x x

xxx

xx x

xx x

xxx

xx

x

xx xx x

x x

xx

x xxxx

xxx

x

Page 41: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Simple Random Sampling

• Number each element of the population from 1 to N.

• Use a random number generator (table, calculator, computer) to randomly selected a sample of size “n”.

• TI-83/4: randint (1,N,n), or:

• Table 1 in text. Pick a random start.

Page 42: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Systematic Sampling

Choose a starting value at random. Then

choose every kth member of the population.

• example: Select every 3rd patient who enters the ER.

Page 43: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Stratified Sampling• Divide a population into at least 2 different subgroups (strata)

that share the same characteristics (age, gender, ethnicity, income, etc) and select a random sample from each group.

• Advantages: More information

Page 44: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Cluster Sampling• Divide the population into many like subgroups

(clusters); randomly select some of those clusters, and then select all of the members of those clusters to be in the sample.

• Advantage: geographically separately populations

Page 45: Chapter 1 Introduction to Statistics 1 Larson/Farber 4th ed. 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Experimental Design.

Sources of Error in Sampling

Sampling Error• the expected difference between a sample

result and the true population result. (e.g. “Margin of error”).

Non-Sampling Error *• sample data is incorrectly gathered,

collected, or recorded. Selection Bias - bad sampleResponse Bias- bad data: incorrect

responses, inaccurate measurements, etc.)