Centre for Market and Public Organisation Measuring socio-economic position in ALSPAC Liz Washbrook,...
-
Upload
alejandro-wiley -
Category
Documents
-
view
218 -
download
2
Transcript of Centre for Market and Public Organisation Measuring socio-economic position in ALSPAC Liz Washbrook,...
Centre for Market and Public Organisation
Measuring socio-economic position in ALSPAC
Liz Washbrook, [email protected]
ESRC/ALSPAC Large Grant Meeting5th November 2008
But first! US cohort studies
Early Child Longitudinal Study – Birth Cohort (ECLS-B) 10,000 children born 2001, nationally representative when weighted Over-samples of low birth weight babies, twins, some ethnic groups (e.g. Native
Americans, Chinese) Samples from birth certificates, follow-ups at 9 months, 2 years, Fall prior to
kindergarten (~4y), Fall of kindergarten year (~5y). But no more! Data from parent CAPI, direct child assessments, child care providers and
teachers. Some resident and non-resident father questionnaires.
Early Child Longitudinal Study – Kindergarten Cohort (ECLS-K) 20,000 children starting kindergarten in 1998 (b. 1992/3) Children sampled from 1277 schools in 100 counties. Target 24 children per
school. Nationally representative when weighted. Follow ups at Fall & Spring kindergarten year (~5-6y), Fall & Spring 1st grade (~6-
7y), Spring 3rd grade (~9y), 5th grade (~11y), 8th grade (~14y) Data from direct child assessments, parental phone interviews, teacher and
school administrator questionnaires.
Data is publicly available (on CD). See http://nces.ed.gov/ECLS/index.asp
US cohort studies
Fragile Families 5000 children born 1998-2000 in large US cities Designed to follow children born to unmarried parents but includes
control sample of married parent families (~25%). Focus on deprived families – 44% mothers at baseline black, 35%
Hispanic, 27% teenagers, 79% high school or less Detailed information on fathers’ roles and involvement Parent interviews in hospital at birth, follow ups at 1, 3, 5 and 9.
Includes direct in-home child assessments. Data publicly available: www.fragilefamilies.princeton.edu/index.asp
Aims
Aim to stimulate discussion about the construction of an index of parental socio-economic position (SEP) from the ALSPAC data
Talk will cover The range of indicators available and their features Sample selection/missingness issues (multiple imputation) Combining the indictors into a single index (principal components
analysis)
Illustrated using a case study: Measures of social inequality in Key Stage 2 exam results (age 11)
Would a standard SEP variable available to all ALSPAC researchers be useful?
If so, how should it be constructed?
Input, feedback, discussion would be appreciated!
What is SEP?
Extensive literature on theories of social stratification (Galobardes, Lynch and Davey Smith, 2007; Bradley and Corwyn, 2002).
“Socially derived economic factors that influence what positions individuals or groups hold within the multiple-stratified structure of society” (Galobardes et al)
In practice researchers have used a multitude of individual indicators to measure SEP, each of which captures a different aspect of stratification
Composite SEP is a relative measure, whereas some indicators (income, education) measure absolute levels of resources. This may have implications when thinking about policy.
Why measure parental SEP?
SEP as a summary measure of ‘family background’ that defines sub-groups of the population. Social mobility/life chances Nature vs. nurture
Example: Joint CMPO project on the role of attitudes and aspirations in explaining the educational deficits of children in poverty
SEP as a way of capturing long-term access to resources over the life course, e.g. ‘permanent income’ in economics
To classify deprived or vulnerable groups in a way that captures the idea of multiple risks
As a control for confounding influences (e.g. studying the effects of smoking)? Disaggregated sets of control variables may be more appropriate
SEP indicators in ALSPAC
Included in the index:
Income
Education (mother and father)
Social class (mother and father)
Housing tenure
Local deprivation/affluence
Subjective financial hardship
Excluded:
Wealth
Employment status
Race/ethnicity
Family structure
How is the indicator constructed from multiple pieces of information? (High frequency of measurement in ALSPAC)
How is the indicator distributed? (E.g. discrete/continuous) For whom is it available? (Differential missingness) How well does it distinguish between high- and low-performing
children? (KS2 is an example – relationships will differ with different outcomes)
The sample
11 071 children with: A valid Key Stage 2 score Minimum of 2 (out of 10) non-missing SEP indicators (30% complete
cases)Sample is 69% of the eligible birth cohort (15 994 in NPD)
Key Stage 2 score derived from exam marks in English, maths and science in Year 6 (age 11). National tests compulsory in all state schools.
Test scores are averaged and normalised to mean zero, standard deviation 1 on the full eligible population of 15 994
The working sample is not randomly selectedMean KS2 (SD)
Working sample (N=11071) 0.11 (0.95)<2 SEP indicators (N=4923) -0.26 (1.05)
Household income
Measures: Take home weekly family income at 33, 47, 85, 97 months; 11 years
£ per week 33 mths 47 mths 85 mths 97 mths
<100 8.7 7.8 4.0 2.1
100-199 17.7 15.8 11.3 9.2
200-299 28.4 26.2 18.4 16.6
300-399 21.2 22.1 22.3 21.1
>400 24.0 28.2 44.0 50.9
N 8832 8655 7525 7037
Proportion of valid responses in bands:
Failure to update the bands means that the usefulness of the 85 and 97 month income measures is limited.
Household income
The age 11 income measure is better:
£ per week Valid %
< £120 2.3
£120-189 5.2
£190-239 5.5
£240-289 7.0
£290-359 11.7
£360-429 11.0
£430-479 7.1
£480-559 15.3
£560-799 20.6
>£800 14.2
N 6552
Household income
The SEP index uses:
Log average real equivalised weekly take home income at 33 & 47 mths
Median income for band imputed using FES data for households containing a child of the cohort member’s age, in the relevant year and income interval
Adjustment made for housing benefit income if respondent reports zero housing expenditures and lives in rented accommodation (predicted value from FES for HB recipients in the Southwest, varying with year, lone parent status and number under 16s in household)
Expressed in 1995 prices using All Items RPI
Equivalised using modified OECD scale
Averaged and logged
Nominal banded income at 85 months
Nominal continuous income at 11 years, using band midpoints
Average KS2, by preschool income quintiles
-0.25
-0.01
0.19
0.39
0.59
-0.14
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Q1 (18.3%)
Q2 (18.4%)
Q3 (23.1%)
Q4 (20.1%)
Q5 (20.2%)
Missing (24.3% total)
Average KS2 (std)
0.84
100%
Average KS2, by nominal income at age 7
-0.24
-0.07
0.08
0.25
0.52
-0.10
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
< £100 (4.1%)
£100-199 (11.8%)
£200-299 (19.1%)
£300-399 (23.2%)
> £400 (41.7%)
Missing (41.7% total)
Average KS2 (std)
0.76
100%
Average KS2, by nominal income quintiles at age 11
-0.04
0.18
0.32
0.52
0.70
-0.07
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Q1 (20.5%)
Q2 (23.9%)
Q3 (23.6%)
Q4 (20.5%)
Q5 (11.4%)
Missing (49.2% total)
Average KS2 (std)
0.73
100%
Parental education
Measures: Mother and partner reports for both spouses’ qualifications: antenatal, 61 and 97 months.
The SEP index uses maternal reports of own and partner’s highest qualification at 32 weeks gestation.
Issues
Non-response to the question is coded as no qualifications (don’t know, no quals and no partner were all possible responses)
Possible discrepancies between own and partner report
Possible changes in the identity of the partner over time
Possible changes in qualifications over time
Average KS2, by mother’s highest qualification
-0.39
0.07
0.41
0.83
-0.08
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
CSE/none (20.9%)
Voc/O-level (46.4%)
A-level (21.8%)
Degree (11.0%)
Missing (6.9% total)
Average KS2 (std)
1.23
100%
Average KS2, by partner’s highest qualification
-0.31
0.10
0.30
0.78
-0.17
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
CSE/none (26.9%)
Voc/O-level (31.3%)
A-level (26.2%)
Degree (15.6%)
Missing (10.5% total)
Average KS2 (std)
1.09
100%
Parental social class
Measures: Mother reports of own and partner’s occupation: antenatal, 8 and 97 months. Partner reports more frequent but not
coded.
The SEP index uses maternal reports of own and partner’s social class at 32 weeks gestation.
Question related to occupation in current or last job
Occupations coded according to 1991 SOC classification
Used to derive Registrar General’s Social Class – this is what is available in the datafiles. Hierarchical measure.
No other data on occupation is currently coded
Average KS2, by mother’s social class
-0.47
-0.20
-0.08
0.17
0.42
0.88
-0.18
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Unskilled (2.3%)
Semi-skilled (10.2%)
Skilled manual (8.2%)
Skilled non-man (44.6%)
Manag/tech (30.0%)
Professional (4.8%)
Missing (24.5% total)
Average KS2 (std)
1.35100%
Average KS2, by partner’s social class
-0.32
-0.16
-0.04
0.32
0.37
0.72
-0.22
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Unskilled (3.0%)
Semi-skilled (10.3%)
Skilled manual (33.5%)
Skilled non-man (11.0%)
Manag/tech (32.8%)
Professional (9.5%)
Missing (17.4% total)
Average KS2 (std)
1.04100%
Housing tenure
Measures: Mother reports of tenure: 8, 21, 33 and 61 months.
The SEP index uses a derived variable
‘Always owner-occupier’ – mortgaged/owned outright/buying from council at all 4 dates
‘Ever in social housing’ – council rented/Housing Association rented at any of 4 dates
‘Other’ – not otherwise classified and at least one valid response (other responses: private rented furnished/unfurnished, other). Includes all people with a missing value who were never observed in social housing, as well as renters.
Average KS2, by housing tenure 8-61 months
-0.42
0.15
0.37
-0.24
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Social housing ever (19.3%)
Other (31.7%)
Always owner-occ (49.0%)
Missing (8.7% total)
Average KS2 (std)
0.79
100%
Local deprivation/affluence
Measures: Ward-level Index of Multiple Deprivation (IMD) currently matched at birth, age 5 and age 8, but postcodes
available on an annual basis
The SEP index uses the (continuous) rank of the IMD for ward at birth
IMD provided by government statistics. Derived from data in 6 domains: income, education, employment, housing, health, access to services
Wards in England (approx. 5500 individuals) ranked on basis of deprivation from 1 to 8414. This allows definition of ‘national’ quantiles.
Can be matched to ALSPAC via postcode data
Average KS2, by national quintiles of IMD
-0.19
0.06
0.09
0.21
0.43
0.02
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Q1 (22.2%)
Q2 (17.5%)
Q3 (18.9%)
Q4 (16.9%)
Q5 (24.5%)
Missing (8.9% total)
Average KS2 (std)
0.62
Subjective financial hardship
Measures: Mother-completed financial difficulties questionnaires at 8, 21, 33, 61 and 85 months
Format: ‘How difficult at the moment do you find to afford these items: food; clothing; heating; rent/mortgage; things for child?’
Very (3); Fairly (2); Slightly (1); Not difficult (0)Responses to the 5 items at each date summed to give to score between 0 and 15
The SEP index uses the mean score across the 5 datesThe 61 and 85 month measures include questions on educational courses, medical care, child care and other things‘Do not pay for this/DSS pays’ options for rent and heating coded as 0The distribution of the resulting variable in highly skewed
24
19
14
11
87
54
3 2 2 1 1 0 0 00
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Mean financial difficulties score
% s
amp
le
Average KS2, by quintiles of financial difficulties score
-0.10
-0.01
0.16
0.27
0.39
-0.25
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Q1 (19.4%)
Q2 (19.4%)
Q3 (20.4%)
Q4 (21.0%)
Q5 (19.9%)
Missing (8.3% total)
Average KS2 (std)
0.49100%
# SEP indicators missing (out of 10) Iterative multivariable regression technique – switching regression
Stata’s ice command
1. Specify a prediction equation for each variable
2. Randomly allocate values to missing cases
3. Predict values for missing cases
4. Update RHS variables and repeat cycle (10 times)
Options allow choice of estimation method, passive imputation and substitution of RHS dummies, constrained intervals for predicted values
# missing Obs % Cum. %0 3,887 29.0 29.01 2,746 20.5 49.52 2,277 17.0 66.53 1,679 12.5 79.04 755 5.6 84.75 827 6.2 90.86 488 3.6 94.57 403 3.0 97.58 337 2.5 100.0
Total 13,399 1009 1,110
Multiple Imputation by Chained Regression
Current method:Imputation carried out using 10 SEP variables only – does not use other informationOnly a single imputed dataset created
The ice command
ice $sesvars using sesimp.dta, replace cmd(mumed daded mumclass dadclass rawinc85:ologit, ownhouse: mlogit) passive (own_2:ownhouse==1 \ own_3: ownhouse==2 \ mumed_2:mumed==2 \ mumed_3:mumed==3 \ mumed_4:mumed==4 \ daded_2:daded==2 \ daded_3:daded==3 \ daded_4:daded==4 \ mclass_2: mumclass==2 \ mclass_3: mumclass==3 \ mclass_4: mumclass==4 \ mclass_5: mumclass==5 \ mclass_6: mumclass==6 \ dclass_2: dadclass==2 \ dclass_3: dadclass==3 \ dclass_4: dadclass==4 \ dclass_5: dadclass==5 \ dclass_6: dadclass==6 \ inc85_2: rawinc85==2 \ inc85_3: rawinc85==3 \ inc85_4: rawinc85==4 \ inc85_5: rawinc85==5 ) substitute (ownhouse:own_2 own_3, mumed:mumed_2 mumed_3 mumed_4, daded:daded_2 daded_3 daded_4, mumclass: mclass_2 mclass_3 mclass_4 mclass_5 mclass_6, dadclass:dclass_2 dclass_3 dclass_4 dclass_5 dclass_6, rawinc85:inc85_2 inc85_3 inc85_4 inc85_5) genmiss (miss_) seed(100);
Prediction equations
Variable | Command | Prediction equation ------------+---------+---------------------------------------------------- logavinceq | regress | findiff rawinc11 own_2 own_3 imd mumed_2-mumed_4 | | daded_2-daded_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 mumed | ologit | logavinceq findiff rawinc11 own_2 own_3 imd | | daded_2-daded_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 daded | ologit | logavinceq findiff rawinc11 own_2 own_3 imd | | mumed_2-mumed_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 mumclass | ologit | logavinceq findiff rawinc11 own_2 own_3 imd | | mumed_2- mumed_4 daded_2-daded_4 dclass_2-dclass_6 | | inc85_2-inc85_5 dadclass | ologit | logavinceq findiff rawinc11 own_2 own_3 imd | | mumed_2-mumed_4 daded_2-daded_4 mclass_2-mclass_6 | | inc85_2-inc85_5 findiff | regress | logavinceq rawinc11 own_2 own_3 imd mumed_2-mumed_4 | | daded_2-daded_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 rawinc85 | ologit | logavinceq findiff rawinc11 own_2 own_3 imd | | mumed_2-mumed_4 daded_2-daded_4 mclass_2-mclass_6 | | dclass_2-dclass_6 rawinc11 | regress | logavinceq findiff own_2 own_3 imd mumed_2-mumed_4 | | daded_2-daded_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 ownhouse | mlogit | logavinceq findiff rawinc11 imd mumed_2-mumed_4 | | daded_2-daded_4 mclass_2-mclass_6 dclass_2-dclass_6 | | inc85_2-inc85_5 own_2 | | [Passively imputed from ownhouse==1] own_3 | | [Passively imputed from ownhouse==2] imd | regress | logavinceq findiff rawinc11 own_2 own_3 | | mumed_2-mumed_4 daded_2-daded_4 mclass_2-mclass_6 | | dclass_2-dclass_6 inc85_2-inc85_5
Principal components analysis
PCA provides a way of combining (weighting) the individual components into a single index
PCA conducted on the 10x10 polychoric correlation matrixStandard PCA techniques assume continuous, normally distributed variables. Polychoric correlation can be used when there are binary and categorical components (e.g. education). It assumes that ordinal variables obtained by categorizing an normally distributed underlying variable.
PCA extracts a single component that maximises the explained proportion of the variation in the (standardised) components
Each component is assigned a scoring coefficient that is used as a weight in the construction of the SEP index
Principal components analysis
Scoring coefficients:
Preschool income 0.3556Age 7 income 0.3593Age 11 income 0.3307Mother's education 0.3272Partner's education 0.3365Mother's social class 0.2811Partner's social class 0.3033Ever social housing -0.3641Other housing 0.0350IMD rank at birth 0.2300Financial difficulties -0.2387
SEP index explains 46% of total variation in components
Average KS2, by quintiles of SEP index
-0.46
-0.10
0.12
0.38
0.73
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Q1 (20.2%)
Q2 (21.1%)
Q3 (21.2%)
Q4 (20.2%)
Q5 (17.4%)
Average KS2 (std)
1.19
Summary
ALSPAC contains numerous indicators that can be used to construct an SEP index
Indicators vary in The type of resources they measure The sections of the population they distinguish (e.g. tenure appears
good at picking out the very disadvantaged, but does not discriminate at the top of the distribution)
The likelihood of non-response by different groups
Issues that need to be considered when constructing an index: Which components should be included? (Should education be
separate?) How should observations at multiple dates/by multiple respondents be
treated? How should missing values be dealt with? How should the components be combined?