Week04 Learning From Data Handout(1)
description
Transcript of Week04 Learning From Data Handout(1)
-
1
Week04: Learning Week04: Learning from Datafrom Data
What are/is Statistics?What are/is Statistics?1. Familiar (plural) meaning = facts and figures1. Familiar (plural) meaning = facts and figures
What are/is Statistics?What are/is Statistics?
-
2
What are/is Statistics?What are/is Statistics?
Source: Silver, Nate. Post-Midterm Ratings Dont Predict Re-election Chances, New York Times (1/5/11)
What are/is Statistics?What are/is Statistics?http://sda.berkeley.edu/GSS
Source: 1972-2008 General Social Survey (GSS)
What are/is Statistics?What are/is Statistics?2. Academic (singular) meaning = 2. Academic (singular) meaning =
collecting, organizing, interpreting and reportingcollecting, organizing, interpreting and reporting
Source: OECD Health Data (2005)
-
3
What are/is Statistics?What are/is Statistics? 1657: Christiaan Huygens publishes first printed work on games of chance
(probability = first main historical line of modern statistics)
1662: John Graunts Natural and Political Observations Made Upon the Bills of Mortality(analysis of social data = second main historical line of modern statistics)
1665: Sir William Petty (father of econometrics) publishes first known national income estimates.
1693: Edmund Halley produces first correct life table (showing link between age & death) 1693: Edmund Halley produces first correct life table (showing link between age & death)
1790: First decennial U.S. Census (start of oldest periodic continuous census)
1828: Adolphe Quetelet publishes first general statistics handbook in Belgium
1854: John Snow shows link between contaminated water and London cholera deaths
1837-1880: William Farr develops the field of vital statistics
1880s-1940s: Galton, Pearson, Wright, Spearman, Hotelling, Wilks & Neyman work on mathematics of evolution, heredity and psychology enriches statistics
What are/is Statistics?What are/is Statistics?
Collecting sample data to answer questions of interest(Part I of course)
Describing or summarizing sample data (Part II of course)
Inferring (making decisions or predictions for apopulation ) from sample data (Part III of course)
Why Study Statistics?Why Study Statistics?
1.1. Objectivity Objectivity thought to be captured best by thought to be captured best by random samples of statistical (quantitative) data random samples of statistical (quantitative) data is highly and widely valued (Porter 1995, p. 3)is highly and widely valued (Porter 1995, p. 3)
When do you find it difficult to be objective?When do you find it difficult to be objective?
Why do lawyers ask if jury members can be Why do lawyers ask if jury members can be impartial?impartial?
Are scientists always objective?Are scientists always objective?
-
4
Why Study Statistics?Why Study Statistics?
2.2. Real world relevance: Results generated by Real world relevance: Results generated by statistical analysis reflecting a statistical analysis reflecting a population average population average (rather than one or few personal stories) are (rather than one or few personal stories) are embraced by businesses for industrial quality embraced by businesses for industrial quality control, employed by quantitativelycontrol, employed by quantitatively--oriented oriented researchers, and enshrined by public policy researchers, and enshrined by public policy experts as most representative of population experts as most representative of population behavior and health (Porter 1986, p. 3).behavior and health (Porter 1986, p. 3).
Sociologists Using EconometricsSociologists Using Econometrics
Source: Cohen (01/18/10 New York Times, C1, C8)
-
5
Learning about a Population Learning about a Population from a Samplefrom a Sample
1.1. We typically collect data from individuals in a We typically collect data from individuals in a household, or telephone, sample survey to obtain household, or telephone, sample survey to obtain information about a population (which we information about a population (which we cannot observe due to cost and other cannot observe due to cost and other constraints).constraints).
2.2. 2001 Los Angeles County Mexican Immigrant 2001 Los Angeles County Mexican Immigrant Legal Status Survey (LACLegal Status Survey (LAC--MILSS) and 2007 MILSS) and 2007 Boston Metropolitan Immigrant Health & Legal Boston Metropolitan Immigrant Health & Legal Status Survey (BMStatus Survey (BM--IHLSS), & 2012 L.A. County IHLSS), & 2012 L.A. County Mexican Immigrant Health & Legal Status SurveyMexican Immigrant Health & Legal Status Survey
30 Years of Legal Status Sample Surveys30 Years of Legal Status Sample Surveys
19801980--1981 1981 Los Angeles County Parents Los Angeles County Parents Survey (LACPS) Survey (LACPS) 1988 National Agricultural Workers Survey (NAWS)1988 National Agricultural Workers Survey (NAWS) 1994/2001 L.A. County Mexican Immigrant Legal Status Survey (LAC1994/2001 L.A. County Mexican Immigrant Legal Status Survey (LAC--MILSS)MILSS) 1996 Survey of Income and Program Participation (SIPP)1996 Survey of Income and Program Participation (SIPP) 19961996--1997 Hispanic Immigrant Health Care Access Survey (HIHCAS)1997 Hispanic Immigrant Health Care Access Survey (HIHCAS) 19961996--1997 Hispanic Immigrant Health Care Access Survey (HIHCAS)1997 Hispanic Immigrant Health Care Access Survey (HIHCAS) 19991999--2000 2000 L.A.L.A.--NYC Immigrant Survey (LANYCIS)NYC Immigrant Survey (LANYCIS) 1999 California 1999 California Health Interview Health Interview Survey (CHIS)Survey (CHIS) 20002000--2001 L.A. Family and Neighborhood Survey (LAFANS)2001 L.A. Family and Neighborhood Survey (LAFANS) 2004 Mexican Immigrant Migration and Mobility 2004 Mexican Immigrant Migration and Mobility StatusuStatusu (MIMMS)(MIMMS) 2005 2005 Chicago Chicago Metro MexicanMetro Mexican--origin Population Studyorigin Population Study 2007 2007 Boston Metro Immigrant Health & Legal Status Survey (BMBoston Metro Immigrant Health & Legal Status Survey (BM--IHLSS)IHLSS) 2012 L.A. County Mexican Immigrant Legal Status Survey (LAC2012 L.A. County Mexican Immigrant Legal Status Survey (LAC--MIHLSS)MIHLSS)
2007 Boston 2007 Boston Metropolitan Metropolitan Immigrant Health Immigrant Health & Legal & Legal Status Survey (BMStatus Survey (BM--IHLSS)IHLSS)
Harvard University & UMASS Boston EnricoEnrico MarcelliMarcelli, Ph.D., Principal Investigator, Ph.D., Principal Investigator Gary Bennett, Ph.D., CoGary Bennett, Ph.D., Co--Principal InvestigatorPrincipal Investigator Howard Howard KohKoh, Ph.D., Co, Ph.D., Co--Principal InvestigatorPrincipal Investigator Phillip Phillip GranberryGranberry, Ph.D., Project Manager (BM, Ph.D., Project Manager (BM--IHLSS)IHLSS) Louisa Holmes, Project Manager (BMLouisa Holmes, Project Manager (BM--IHLSS)IHLSS) OrfeuOrfeu Buxton Ph D ConsultantBuxton Ph D ConsultantOrfeuOrfeu Buxton, Ph.D., ConsultantBuxton, Ph.D., Consultant Anthony Roman, MA, ConsultantAnthony Roman, MA, Consultant Jonathan Jonathan WinickoffWinickoff, Ph.D., , Ph.D., ConsultantConsultant
Community Community PartnersPartners FaustoFausto de Rocha, Executive Director, Brazilian Immigrant Centerde Rocha, Executive Director, Brazilian Immigrant Center MagalisMagalis TroncosoTroncoso, , Executive Director, Dominican Development CenterExecutive Director, Dominican Development Center
Robert Wood Johnson Robert Wood Johnson Foundation, NCI, UMASS Boston, & Foundation, NCI, UMASS Boston, & Blue Cross Blue Blue Cross Blue Shield Foundation of Shield Foundation of MassachusettsMassachusetts
-
6
2007 BM2007 BM--IHLSS DataIHLSS Data Two systematic Two systematic blockblock--level probability level probability household samples of 307 household samples of 307
foreignforeign--born Brazilian adults (and 120 of their children) and 299 born Brazilian adults (and 120 of their children) and 299 Dominican adults (and 74 of their children) residing Dominican adults (and 74 of their children) residing in the in the BCQBCQ--MSAMSA
Data collected between June and September, Data collected between June and September, 2007 by 50 student 2007 by 50 student J p ,J p , yyand other foreignand other foreign--born interviewers trained at UMASS Bostonborn interviewers trained at UMASS Boston
Instrument included household roster, adult questionnaire, child Instrument included household roster, adult questionnaire, child questionnaire, and biological data collection checklistquestionnaire, and biological data collection checklist
Five sections of adult questionnaire: Five sections of adult questionnaire: (1) Migration experience, (2) (1) Migration experience, (2) SES, (3) Social Capital, (4) Health, and (5) SocioSES, (3) Social Capital, (4) Health, and (5) Socio--political identitypolitical identity
2007 BM2007 BM--IHLSS Sampling Frame IHLSS Sampling Frame
427 427 Brazilian Brazilian Subjects from Subjects from 73 Neighborhoods in Middlesex County73 Neighborhoods in Middlesex County
-
7
373 373 Dominican Dominican Subjects from Subjects from 84 Neighborhoods Located in Essex County84 Neighborhoods Located in Essex County
SociogeographicSociogeographic Model of Model of Insufficient SleepInsufficient Sleep
SOC
IOG
EO
GR
AP
HIC
FAC
TO
RS
OUTCOME
4. Civic Groups Church, PTA, CBO Sports, Music, etc.
Internet-based
1. Home Income, Tenure
Sleep partner, Children Meals, Noise
2. Work/School Travel and work time
Exposure to smoke, etc. Co-worker trust
3. Neighborhood Population density Homeownership Disorder, Noise
METROPOLITAN AREA
IND
IVID
UA
L-L
EV
EL
FA
CT
OR
S
Individual-Sociogeographic Interaction
5. Socioeconomic Status Age, sex, skin pigmentation
Time in U.S.A., migration experience Migrant legal status Education, Earnings
6. Health Biomarkers, BMI, Diabetes, etc.
Diet, Physical activity, Sleep Meds Cigarette smoking, Alcohol
Healthy DietSleep Behavior
Our two BM-IHLSS samples included 299 Dominican and 307 Brazilian adults (or 606 subjects), of which 599 provided responses for questions (variables) included in Marcelli & Buxtons (2011) study on how several sociogeographic factors influenced whether migrants slept 7-9 hours on workdays.
Descriptive vs. Inferential StatisticsDescriptive vs. Inferential Statistics
FB Brazilian & Dominican adults in our sample were 36 years old on average, 48% were male, 9% had a college degree, 39% were unauthorized to reside in the USA, and about two-thirds were sleeping a healthy number of hours each workday (sample descriptive statistics).
We are 68% confident that the mean age of all foreign-born Brazilian and Dominican adults residing in the Boston metropolitan area fell between 24 and 48 years, and that mean skin color on a scale of 1-10 fell between XX and YY (interval population parameter estimates) . . .
-
8
Sleep Duration among ForeignSleep Duration among Foreign--born Brazilian and born Brazilian and Dominican Migrant Adults in the Boston Dominican Migrant Adults in the Boston
Metropolitan Area, 2007 BMMetropolitan Area, 2007 BM--IHLSSIHLSS
Use of Computer Technology and Data Files Use of Computer Technology and Data Files (Databases) to Perform Statistical Analysis(Databases) to Perform Statistical Analysis
There are various competing statistical software packages available on calculators and for use on other computers (e.g., STATA, SPSS , SAS), but YOU and not software must decide what kind of statistical tools should be used to answer a specific question. This is why it is important to understand how to compute means, standard deviations, etc.; as well as under what condition different software commands are to be used.
Statistical analysis requires that data be organized (structured) electronically and sequentially in a data file (e.g., Excel spreadsheet) to be analyzed . . .
STATA Computer Code Used for AnalysisSTATA Computer Code Used for Analysis
-
9
Sample Sample Randomness and Size Randomness and Size (Percent Unauthorized Migrant)(Percent Unauthorized Migrant)
Sample Variability: Unauthorized Migrants, Sample Variability: Unauthorized Migrants, 2007 BM2007 BM--IHLSS, PercentIHLSS, Percent