Data Collection Jan 28,2014 Math 119 - Fall 2011 1.
-
Upload
coby-stratton -
Category
Documents
-
view
215 -
download
0
Transcript of Data Collection Jan 28,2014 Math 119 - Fall 2011 1.
Sampling and ExperimentsData CollectionJan 28,2014
Math 119 - Fall 2011 1
Identify the population in a sampling situation Recognize bias due to sampling methods Recognize sources of errors in a sample survey
Math 119 - Fall 2011 2
Overview
Presidential election between Franklin D. Roosevelt (D) and Alfred Landon (R).
Before the election, Literary Digest magazine conducted an opinion poll of the voting population. Its survey predicted that Landon would win the 1936 election, and this was widely reported◦ sampling was done by phone calls
most home owners with telephones were Republicans Roosevelt won convincingly
Math 119 - Fall 2011 3
Good Statistics = Good Data:1936 Presidential Election (see wiki)
Observational Study◦ researchers simply observe characteristics and
take measurements can reveal association, not causation
Designed Experiment◦ researchers impose treatments and controls
and THEN observe characteristics and take measurements can help establish causation
Math 119 - Fall 2011 4
Observation vs Experiment
Vasectomies and Prostate Cancer◦ 450,000 performed each year in US
tube carrying sperm from testicles cut and tied Study by E. Giovanucci
◦ 113 cases of prostate cancer per 22,000 men with vasectomies
◦ 70 per 22,000 is expected rate study shows ~60% elevated risk, revealing an
association, but it does not establish cause
Math 119 - Fall 2011 5
An Observational Study
Folic Acid and Birth Defects (study by Czeizel and Istvan Dudas)
4,753 women divided into two groups◦ One group took daily multivitamins containing
0.8 mg of folic acid◦ other group received only trace elements
Drastic reduction in the rate of major birth defects◦ 13 per 1,000 vs 23 per 1,000
Math 119 - Fall 2011 6
A Designed Experiment
If we had simply done a survey and asked women if they took supplements, the explanatory variables (folic acid consumption) might be confounded.◦ women who would voluntarily choose to take
vitamins might generally make healthier decisions and exercise more often Healthier decisions CONFOUND the impact of folic acid
on birth defects
Math 119 - Fall 2011 7
Confounding: (folic acid & birth defects)survey vs experiment for controlling confounding factors
Population◦ group of individuals from
whom we wish to get more information; typically not able to assess directly
Sample◦ a subset of the group of
population
Sampling Design◦ the method by which we
choose the subset
Math 119 - Fall 2011 8
Sample vs Population
Popu la tion
Sam p le
A parameter is a number describing a characteristic of the population.
A statistic is a number describing a characteristic of a sample.
Whether an observational study or an experiment is used to collect data, the data has to be representative of the population.
Let’s look at methods by which data is collected.
Math 119 - Fall 2011 9
Collecting Sample Data
Random Sample members of the population are selected in such a way
that each individual member has an equal chance of being selected. (Contrast this with voluntary & convenience .)
Definitions
Simple Random Sample (of size n)
subjects selected in such a way that every
possible sample of the same size n has the same chance of being chosen
* I.e., sample 10 people to determine voter preference. Select 10 from font of room? Put names in a hat? Whichever 10 are chosen, should be equally representative. (Not convenient or voluntary)
Copyright © 2007 Pearson Education, Inc Publishing as
Pearson Addison-Wesley.
Random Sampling selection so that each
individual member has an equal chance of being selected
Copyright © 2007 Pearson Education, Inc Publishing as
Pearson Addison-Wesley.
Systematic SamplingSelect some starting point and then
select every k th element in the population
Copyright © 2007 Pearson Education, Inc Publishing as
Pearson Addison-Wesley.
Convenience Samplinguse results that are easy to get
Copyright © 2007 Pearson Education, Inc Publishing as
Pearson Addison-Wesley.
Stratified Samplingsubdivide the population into at
least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or
stratum)
Copyright © 2007 Pearson Education, Inc Publishing as
Pearson Addison-Wesley.
Cluster Samplingdivide the population into sections
(or clusters); randomly select some of those clusters; choose all members from selected clusters. Each cluster should be a small scale representation of the total
population.
Math 119 - Fall 2011 16
Cluster Sampling
The Current Population Survey (CPS)◦ http://www.census.gov/cps
monthly survey of about 60,000 households the sample is scientifically selected to represent the
civilian population◦ employment status of each member of household◦ data is used to make model-based estimates for
individual states and other geographic areas◦ estimates obtained from the CPS include employment,
unemployment, earnings, hours of work, et al available by age, sex, race, marital status, educational
attainment, school enrolment◦ used by policymakers and legislators as important
indicators of our nation's economic situation
Math 119 - Fall 2011 17
Sampling in the Wild
Math 119 - Fall 2011 18
Math 119 - Fall 2011 19
Math 119 - Fall 2011 20
Among the factors that contributed to the decrease in the percentage of family households with children under 18: Increases in longevity — The average numbers of years of life remaining at age 30 increased about three years,
comparing those age 30 in 1960 with baby boomers who turned 30 in 1980 (Table 11 [PDF], U.S. Life Tables, National Center for Health Statistics). As adults live longer, a larger proportion of married couple households will be those who are older and either childless, or whose adult children live elsewhere. In 1968, 29 percent of married men were age 55 and over, as were 22 percent of married women. In 2008, 38 percent of married men were 55 and over, as were 33 percent of married women.
Increases in childlessness — The percentage of women age 40 to 44 who were childless increased from 10 percent in 1976 to 20 percent in 2006. (Supplemental Table 1 [Excel], U.S. Census Bureau).
Other highlights from America’s Families and Living Arrangements: 2008 include: The median age for men at first marriage was 27.4 years. For women, the median age at first marriage was 25.6. The percentage of family households with children under 18 in 2008 that had three or more of their own children
present was 21 percent in both 1998 and 2008. The percentage of adults ages 45 to 49 who were married varied by race and ethnicity. For example,
among women 45 to 49, 79 percent of Asians, 69 percent of white non-Hispanics, 62 percent of Hispanics and 43 percent of blacks were married.
In 2008, 66.9 million opposite-sex couples lived together — 60.1 million were married, and 6.8 million were not. The United States had an estimated 5.5 million “stay-at-home” parents: 5.3 million mothers and 140,000 fathers. The percentage of children living with two parents varied by race and origin. Eighty-five percent of Asian children
lived with two parents, as did 78 percent of white non-Hispanic children, 70 percent of Hispanic children and 38 percent of black children.
About 9 percent of all children (6.6 million) lived in a household that included a grandparent. Twenty-three percent of children living with a grandparent had no parent present.
In 2008, 6 percent of white non-Hispanic children lived in a household with a grandparent present, compared with 10 percent of Hispanic children, and 14 percent of both Asian and black children.
Math 119 - Fall 2011 21
Regarding earlier slide…
Bureau of Economic Analysis (bea.gov) http://www.bea.gov/regional/gsp/action.cfm
Math 119 - Fall 2011 22
Bureau of Economic Analysis
GDP:the market value of all final goods and services made within the borders of a nation in a year