Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

22
Sampling and Experiments Data Collection Jan 28,2014 Math 119 - Fall 2011 1

Transcript of Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Page 1: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Sampling and ExperimentsData CollectionJan 28,2014

Math 119 - Fall 2011 1

Bruce Smith
50 minutes.
Page 2: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Identify the population in a sampling situation Recognize bias due to sampling methods Recognize sources of errors in a sample survey

Math 119 - Fall 2011 2

Overview

Bruce Smith
prior to this, at end of 1st hour, I showed 15m of TED talks. One needs to be showed/viewed every week to remind us of the power and beautyof statistics
Page 3: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Presidential election between Franklin D. Roosevelt (D) and Alfred Landon (R).

Before the election, Literary Digest magazine conducted an opinion poll of the voting population. Its survey predicted that Landon would win the 1936 election, and this was widely reported◦ sampling was done by phone calls

most home owners with telephones were Republicans Roosevelt won convincingly

Math 119 - Fall 2011 3

Good Statistics = Good Data:1936 Presidential Election (see wiki)

Page 4: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Observational Study◦ researchers simply observe characteristics and

take measurements can reveal association, not causation

Designed Experiment◦ researchers impose treatments and controls

and THEN observe characteristics and take measurements can help establish causation

Math 119 - Fall 2011 4

Observation vs Experiment

Page 5: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Vasectomies and Prostate Cancer◦ 450,000 performed each year in US

tube carrying sperm from testicles cut and tied Study by E. Giovanucci

◦ 113 cases of prostate cancer per 22,000 men with vasectomies

◦ 70 per 22,000 is expected rate study shows ~60% elevated risk, revealing an

association, but it does not establish cause

Math 119 - Fall 2011 5

An Observational Study

Page 6: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Folic Acid and Birth Defects (study by Czeizel and Istvan Dudas)

4,753 women divided into two groups◦ One group took daily multivitamins containing

0.8 mg of folic acid◦ other group received only trace elements

Drastic reduction in the rate of major birth defects◦ 13 per 1,000 vs 23 per 1,000

Math 119 - Fall 2011 6

A Designed Experiment

Page 7: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

If we had simply done a survey and asked women if they took supplements, the explanatory variables (folic acid consumption) might be confounded.◦ women who would voluntarily choose to take

vitamins might generally make healthier decisions and exercise more often Healthier decisions CONFOUND the impact of folic acid

on birth defects

Math 119 - Fall 2011 7

Confounding: (folic acid & birth defects)survey vs experiment for controlling confounding factors

Page 8: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Population◦ group of individuals from

whom we wish to get more information; typically not able to assess directly

Sample◦ a subset of the group of

population

Sampling Design◦ the method by which we

choose the subset

Math 119 - Fall 2011 8

Sample vs Population

Popu la tion

Sam p le

A parameter is a number describing a characteristic of the population.

A statistic is a number describing a characteristic of a sample.

Page 9: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Whether an observational study or an experiment is used to collect data, the data has to be representative of the population.

Let’s look at methods by which data is collected.

Math 119 - Fall 2011 9

Collecting Sample Data

Page 10: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Random Sample members of the population are selected in such a way

that each individual member has an equal chance of being selected. (Contrast this with voluntary & convenience .)

Definitions

Simple Random Sample (of size n)

subjects selected in such a way that every

possible sample of the same size n has the same chance of being chosen

* I.e., sample 10 people to determine voter preference. Select 10 from font of room? Put names in a hat? Whichever 10 are chosen, should be equally representative. (Not convenient or voluntary)

Page 11: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Copyright © 2007 Pearson Education, Inc Publishing as

Pearson Addison-Wesley.

Random Sampling selection so that each

individual member has an equal chance of being selected

Page 12: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Copyright © 2007 Pearson Education, Inc Publishing as

Pearson Addison-Wesley.

Systematic SamplingSelect some starting point and then

select every k th element in the population

Page 13: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Copyright © 2007 Pearson Education, Inc Publishing as

Pearson Addison-Wesley.

Convenience Samplinguse results that are easy to get

Page 14: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Copyright © 2007 Pearson Education, Inc Publishing as

Pearson Addison-Wesley.

Stratified Samplingsubdivide the population into at

least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or

stratum)

Page 15: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Copyright © 2007 Pearson Education, Inc Publishing as

Pearson Addison-Wesley.

Cluster Samplingdivide the population into sections

(or clusters); randomly select some of those clusters; choose all members from selected clusters.  Each cluster should be a small scale representation of the total

population.

Page 16: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Math 119 - Fall 2011 16

Cluster Sampling

Page 17: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

The Current Population Survey (CPS)◦ http://www.census.gov/cps

monthly survey of about 60,000 households the sample is scientifically selected to represent the

civilian population◦ employment status of each member of household◦ data is used to make model-based estimates for

individual states and other geographic areas◦ estimates obtained from the CPS include employment,

unemployment, earnings, hours of work, et al available by age, sex, race, marital status, educational

attainment, school enrolment◦ used by policymakers and legislators as important

indicators of our nation's economic situation

Math 119 - Fall 2011 17

Sampling in the Wild

Bruce Smith
stopped here on 1/20 and moved on to chapter 2, frequency table and histograms
Page 18: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Math 119 - Fall 2011 18

Page 19: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Math 119 - Fall 2011 19

Page 21: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Among the factors that contributed to the decrease in the percentage of family households with children under 18: Increases in longevity — The average numbers of years of life remaining at age 30 increased about three years,

comparing those age 30 in 1960 with baby boomers who turned 30 in 1980 (Table 11 [PDF], U.S. Life Tables, National Center for Health Statistics). As adults live longer, a larger proportion of married couple households will be those who are older and either childless, or whose adult children live elsewhere. In 1968, 29 percent of married men were age 55 and over, as were 22 percent of married women. In 2008, 38 percent of married men were 55 and over, as were 33 percent of married women.

Increases in childlessness — The percentage of women age 40 to 44 who were childless increased from 10 percent in 1976 to 20 percent in 2006. (Supplemental Table 1 [Excel], U.S. Census Bureau).

     Other highlights from America’s Families and Living Arrangements: 2008 include: The median age for men at first marriage was 27.4 years. For women, the median age at first marriage was 25.6. The percentage of family households with children under 18 in 2008 that had three or more of their own children

present was 21 percent in both 1998 and 2008. The percentage of adults ages 45 to 49 who were married varied by race and ethnicity. For example,

among women 45 to 49, 79 percent of Asians, 69 percent of white non-Hispanics, 62 percent of Hispanics and 43 percent of blacks were married.

In 2008, 66.9 million opposite-sex couples lived together — 60.1 million were married, and 6.8 million were not. The United States had an estimated 5.5 million “stay-at-home” parents: 5.3 million mothers and 140,000 fathers. The percentage of children living with two parents varied by race and origin. Eighty-five percent of Asian children

lived with two parents, as did 78 percent of white non-Hispanic children, 70 percent of Hispanic children and 38 percent of black children.

About 9 percent of all children (6.6 million) lived in a household that included a grandparent. Twenty-three percent of children living with a grandparent had no parent present.

In 2008, 6 percent of white non-Hispanic children lived in a household with a grandparent present, compared with 10 percent of Hispanic children, and 14 percent of both Asian and black children.

Math 119 - Fall 2011 21

Regarding earlier slide…

Page 22: Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

Bureau of Economic Analysis (bea.gov) http://www.bea.gov/regional/gsp/action.cfm

Math 119 - Fall 2011 22

Bureau of Economic Analysis

GDP:the market value of all final goods and services made within the borders of a nation in a year