Shortcomings of Census Interaction Data

49
School of Geography FACULTY OF EARTH & ENVIRONMENT Shortcomings of Census Interaction Data Oliver Duke-Williams [email protected]

description

Shortcomings of Census Interaction Data. Oliver Duke-Williams [email protected]. Shortcomings. Overall data quality Statistical Disclosure Control Variant geographies Lack of comparability over time. Overall data quality. Generic issues Unit non-response Item non-response - PowerPoint PPT Presentation

Transcript of Shortcomings of Census Interaction Data

Page 1: Shortcomings of Census Interaction Data

School of GeographyFACULTY OF EARTH & ENVIRONMENT

Shortcomings of Census Interaction DataOliver Duke-Williams

[email protected]

Page 2: Shortcomings of Census Interaction Data

Shortcomings

• Overall data quality

• Statistical Disclosure Control

• Variant geographies

• Lack of comparability over time

Page 3: Shortcomings of Census Interaction Data

Overall data quality

Generic issues

• Unit non-response

• Item non-response

Interaction data issues

• Problems of address recall for migration data

• Problems of address accuracy for workplace data

• Changing concept of usual residence

Page 4: Shortcomings of Census Interaction Data

Non-response

Unit non-response – under-enumeration – is a problem for all Census data

• It particularly affects migration data

• Migrants are 2-10 times more likely to be missed from a Census than residents who have not moved – Simpson & Middleton (1997)

Item non-response refers to those people who have completed a Census form, but not answered a specific question

Page 5: Shortcomings of Census Interaction Data

Patterns of non-response: 2001

• Address one year ago, non-response quantiles

Legend

2%

2% <= 3%

3% <= 4%

> 4%

Page 6: Shortcomings of Census Interaction Data

Patterns of non-response: 2001

• Workplace postcode, non-response quantiles

Legend

4% < = 6%

6% <= 7%

7% <= 9%

> 9%

Page 7: Shortcomings of Census Interaction Data

Patterns of non-response: 2001

• Method of travel, non-response quantiles

Legend

3% <= 4%

4% <= 5%

5% <= 7%

> 7%

Page 8: Shortcomings of Census Interaction Data

Item non-response

• Various possibilities for former residence and workplace addresses

• Address correct but no postcode

• Part postcode given (e.g. ‘LS1’)

• No information given

• The 1991 interaction data included the categories ‘address not stated’ and ‘workplace not stated’

Page 9: Shortcomings of Census Interaction Data

Migrant origin not stated

Migrants with origin unstated as % of total inflow, 1990-91

• Limited spatial patterns

• Significant numbers for most districts

<= 5%

5% <= 10%

10% <= 15%

> 15%

Page 10: Shortcomings of Census Interaction Data

Item non-response

In 2001, unknown or incomplete addresses were imputed using donor records

• First, select possible donors on the basis of predictive variables

• SWS: Industry, occupation, establishment size, mode of transport

• SMS: Other migrants in household, country of birth, marital status

• Use partial information if available

• Then, select geographically nearest donor

Page 11: Shortcomings of Census Interaction Data

Shortcomings

• Overall data quality

• Statistical Disclosure Control

• Variant geographies

• Lack of comparability over time

Page 12: Shortcomings of Census Interaction Data

Statistical Disclosure Control

Methods applied to interaction data

• 1981

• 1991

• 2001

Page 13: Shortcomings of Census Interaction Data

SDC: 1981

• Workplace data – based on 10% sample, therefore no further modification required

• Migration data

• Set 1

• Within ward

• Ward to rest of district (for flows > 25 persons) or ward to rest of county etc.

• Set 2

• Ward level, total males and females only

Page 14: Shortcomings of Census Interaction Data

SDC: 1991

• Workplace data – based on 10% sample, therefore no further modification required

• Migration data

• Suppression applied to some tables

Page 15: Shortcomings of Census Interaction Data

SDC: 1991 – SMS

• Set 1: Flows within and between wards

• Set 2: Flows within and between districts

Page 16: Shortcomings of Census Interaction Data

SDC: 1991 – SMS Set 2

1 m Age (broad) x Sex2 77m Wholly moving households & residents3 m Age (5 yr) x Sex4 m Marital Status x Sex5 m Ethnic group6 m Household residency status x LLTI status7 m Economic position (16+)

8 7 Tenure

8S 7 Tenure

9 7 Households: by sex & economic position of head

10 7m Residents: by sex & economic position of head11S m Gaelic speakers

11W m Welsh speakers

Page 17: Shortcomings of Census Interaction Data

Extent of suppression

• Districts are grouped by county

• Shading:

• Red: Total migrants >= 10

• Blue: Total migrants 0 < n < 10

• White: Total migrants = 0

Greater London

Metro counties

Other counties (sorted alphabetically)

per district totals

Page 18: Shortcomings of Census Interaction Data

Effect of suppressionWhite migrants, 1990-91Published value as % of estimated correct value

Origins

Destinations

North

Yorkshire and H

umberside

East M

idlands

East A

nglia

South E

ast

South W

est

West M

idlands

North W

est

Wales

Scotland

North 99% 73% 22% 20% 25% 17% 18% 50% 11% 41%

Yorkshire and Humberside 69% 100% 77% 51% 50% 46% 51% 73% 27% 37%

East Midlands 20% 72% 99% 65% 40% 28% 59% 44% 21% 22%

East Anglia 15% 46% 64% 100% 55% 33% 29% 26% 23% 39%

South East 23% 48% 42% 73% 97% 67% 41% 36% 25% 34%

South West 16% 43% 25% 34% 56% 99% 52% 31% 45% 29%

West Midlands 24% 55% 65% 34% 42% 56% 99% 59% 50% 21%

North West 54% 75% 43% 29% 38% 38% 55% 100% 65% 27%

Wales 6% 23% 18% 24% 28% 47% 48% 53% 99% 17%

Scotland 41% 33% 24% 33% 35% 30% 21% 31% 14% 99%

Page 19: Shortcomings of Census Interaction Data

Effect of suppressionBlack migrants, 1990-91Published value as % of estimated correct value

Origins

Destinations

North

Yorkshire and H

umberside

East M

idlands

East A

nglia

South E

ast

South W

est

West M

idlands

North W

est

Wales

Scotland

North 100% 83% 50% 0% 47% 45% 33% 71% 29% 25%

Yorkshire and Humberside 86% 100% 88% 62% 72% 70% 70% 83% 22% 63%

East Midlands 60% 86% 99% 84% 66% 81% 83% 72% 17% 38%

East Anglia 0% 89% 64% 100% 59% 35% 40% 50% 33% 25%

South East 39% 74% 67% 87% 99% 71% 70% 62% 43% 47%

South West 54% 63% 31% 56% 66% 99% 69% 62% 73% 67%

West Midlands 60% 80% 85% 56% 69% 71% 100% 76% 67% 75%

North West 54% 72% 58% 17% 70% 39% 69% 100% 67% 20%

Wales 33% 36% 45% 50% 53% 69% 50% 73% 99% 17%

Scotland 38% 65% 54% 33% 61% 67% 29% 63% 17% 98%

Page 20: Shortcomings of Census Interaction Data

Effect of suppressionIndian, P‘stani, B’deshi migrants, 1990-91Published value as % of estimated correct value

Origins

Destinations

North

Yorkshire and H

umberside

East M

idlands

East A

nglia

South E

ast

South W

est

West M

idlands

North W

est

Wales

Scotland

North 99% 79% 38% 68% 50% 44% 45% 46% 67% 67%

Yorkshire and Humberside 87% 100% 87% 72% 74% 56% 85% 85% 38% 61%

East Midlands 30% 89% 99% 70% 64% 72% 85% 56% 27% 26%

East Anglia 18% 88% 79% 100% 69% 9% 58% 15% 33% 22%

South East 36% 66% 65% 73% 98% 67% 66% 55% 51% 51%

South West 60% 48% 72% 27% 65% 100% 72% 63% 50% 41%

West Midlands 58% 83% 88% 34% 70% 72% 100% 69% 69% 49%

North West 40% 90% 76% 57% 49% 34% 65% 100% 55% 30%

Wales 5% 11% 26% 20% 49% 48% 62% 38% 99% 38%

Scotland 71% 84% 76% 58% 50% 8% 70% 65% 0% 99%

Page 21: Shortcomings of Census Interaction Data

Effect of suppressionChinese and other migrants, 1990-91Published value as % of estimated correct value

Origins

Destinations

North

Yorkshire and H

umberside

East M

idlands

East A

nglia

South E

ast

South W

est

West M

idlands

North W

est

Wales

Scotland

North 100% 82% 45% 60% 52% 31% 32% 50% 0% 38%

Yorkshire and Humberside 85% 100% 84% 59% 69% 64% 81% 88% 55% 51%

East Midlands 42% 81% 99% 63% 57% 32% 73% 66% 33% 46%

East Anglia 5% 78% 72% 100% 63% 55% 70% 32% 27% 58%

South East 35% 61% 54% 74% 98% 71% 55% 51% 40% 47%

South West 26% 72% 34% 71% 59% 99% 54% 40% 47% 33%

West Midlands 48% 82% 65% 30% 65% 75% 100% 67% 50% 35%

North West 67% 86% 56% 49% 53% 38% 71% 99% 55% 54%

Wales 56% 26% 42% 33% 54% 65% 72% 66% 99% 19%

Scotland 51% 65% 30% 38% 54% 52% 74% 61% 0% 99%

Page 22: Shortcomings of Census Interaction Data

Effect of suppressionMis-reporting of largest non-white migrant group

Origins

Destinations

North

Yorkshire and H

umberside

East M

idlands

East A

nglia

South E

ast

South W

est

West M

idlands

North W

est

Wales

Scotland

North X XYorkshire and Humberside X XEast Midlands XEast Anglia X XSouth East XSouth West X X XWest Midlands

North West XWales X XScotland X X

Page 23: Shortcomings of Census Interaction Data

Coping with problems - 1991

Under-enumeration

Suppression

Page 24: Shortcomings of Census Interaction Data

The MIGPOP data set

0

200

400

600

800

1,000

1,200

1-4

5-9

10-1

4

15-1

9

20-2

4

25-2

9

30-3

4

35-3

9

40-4

4

45-4

9

50-5

4

55-5

9

60-6

4

65-6

9

70-7

4

75-7

9

80-8

4

85+

Age group

Nu

mb

er o

f m

igra

nts

(th

ou

san

ds)

MIGPOP

SMS2

Page 25: Shortcomings of Census Interaction Data

The MIGPOP data set

0

200

400

600

800

1,000

1,200

1-4

10-1

4

20-2

4

30-3

4

40-4

4

50-5

4

60-6

4

70-7

4

80-8

4

Age group

Nu

mb

er

of

mig

ran

ts (

tho

usan

ds)

MIGPOP data set

• Produced by Simpson and Middleton (1999)

• Available from CIDER through WICID

• Allows for

• ‘Missing million’

• Under-reporting of migrants

• Migrants with unknown origin

• Contains one age by sex table

Page 26: Shortcomings of Census Interaction Data

Suppression

Migration from Mid-Bedfordshire to Avon, 1990-91

Bath

Bristol

Kingsw

ood

Northavon

Wansdyke

Woodspring

TO

TA

L

White 0 11 0 16 0 69 99 3

Black 0 1 0 0 0 0 1 0

Indian, Pakistani,

Bangladeshi0 0 0 0 0 0 0 0

Chinese and other 0 0 0 0 0 1 1 0

TOTAL 3 12 0 16 0 70 101

Page 27: Shortcomings of Census Interaction Data

SMSGAPS

SMSGAPS dataset incorporates recovered and estimated data for most suppressed tables

• Produced by Rees and Duke-Williams (1997)

• Contains versions of all SMS Set 2 tables except 11S and 11W

• Available from CIDER through WICID

Page 28: Shortcomings of Census Interaction Data

SDC: 2001

• Outputs of the 2001 Census were subject to Small Cell Adjustment Methodology

• Initial version of cross-tabulation produced from raw data

• ‘Small values’ were then modified

• Sub-totals and totals for each table were then recalculated from the modified values

Page 29: Shortcomings of Census Interaction Data

SCAM example

Persons Male Female

Total

0-15 1 2

15-Pensionable 17 15

Pensionable+ 1 4

Page 30: Shortcomings of Census Interaction Data

SCAM example

Persons Male Female

Total 40 19 21

0-15 3 1 2

15-Pensionable 32 17 15

Pensionable+ 5 1 4

Page 31: Shortcomings of Census Interaction Data

SCAM example

Persons Male Female

Total 40 19 21

0-15 3 1 2

15-Pensionable 32 17 15

Pensionable+ 5 1 4

? ?

?

Page 32: Shortcomings of Census Interaction Data

SCAM example

Persons Male Female

Total 40 19 21

0-15 3 0 3

15-Pensionable 32 17 15

Pensionable+ 5 3 4

Page 33: Shortcomings of Census Interaction Data

SCAM example

Persons Male Female

Total 42 20 22

0-15 3 0 3

15-Pensionable 32 17 15

Pensionable+ 8 3 4

Page 34: Shortcomings of Census Interaction Data

SCAM

• SCAM was applied differentially across the UK

• This is particularly confusing for the interaction data, as they are explicitly presented as UK level data set

• SCAM was applied on the basis of where the data were collected

• Migration data were collected at the destination

• Flows with destinations in England, Wales and Northern Ireland were subject to SCAM

• Workplace data were collected at the residence (origin)

• Flows with origins in England, Wales and Northern Ireland were subject to SCAM

• In addition, OA level workplace data with origins in Scotland were subject to SCAM

• OA level workplace data were not published for Northern Ireland

Page 35: Shortcomings of Census Interaction Data

Effects of SCAM

Interaction data are characterised by:

• Sparse matrices

• Dominance of small values

• 2001 data characterised by over-reporting of multiples of 3

1

10

100

1000

10000

100000

1000000

10000000

0 100 200 300 400

Flow size (number of migrants)

Nu

mb

er o

f fl

ow

s

Frequency of flow totals, 2001SMS Table MG301

0

200000

400000

600000

800000

1000000

1200000

1400000

0 10 20 30 40 50

Flow size

Fre

qu

ency

Frequency of flow totals, 2001SMS Table MG301: detail

0

1000000

2000000

3000000

4000000

5000000

6000000

0 10 20 30 40 50

Flow size

Fre

qu

ency

Frequency of flow totals, 2001 SWS Table W301: detail

Page 36: Shortcomings of Census Interaction Data

2001 data and multiples of 3

• It is the interior cells that are modified

• Flow totals are re-calculated from these modified values

Page 37: Shortcomings of Census Interaction Data

Contribution of interior cells to SCAM adjustment of MG301

Number of potentially modified interior cells per table Number of flows Number of migrants

6 1,761,815 5,852,646

5 31,973 218,929

4 4,626 78,013

3 457 8,428

2 182 6,435

1 6 388

0 2 157

Page 38: Shortcomings of Census Interaction Data

Coping with problems: 2001

Tactics for using SCAM affected data

• Use average values?

• Useful in some situations, but could lead to errors if rates are calculated

• Use minimum number of cells to calculate required value

Page 39: Shortcomings of Census Interaction Data

Shortcomings

• Overall data quality

• Statistical Disclosure Control

• Variant geographies

• Lack of comparability over time

Page 40: Shortcomings of Census Interaction Data

Variant geographies

Changes between Censuses

• A problem that is common across all Census outputs

Differences compared to other Census products

• Problems specific to the interaction data, in particular the 2001 data

Page 41: Shortcomings of Census Interaction Data

Differences between Census products

The 2001 interaction data have geographies that do not always match those in the other aggregate data• Level 1: Output Areas

• Interaction data are the same as other outputs

• Level 2: ‘Wards’

• Interaction data are an amalgam of

• CAS wards in England and Wales

• ST wards in Scotland

• Standard wards in Northern Ireland

• Level 3: ‘Districts’

• Interaction data are an amalgam of

• London boroughs, metro and other districts, Unitary authorities, Scottish Council Areas

• Parliamentary constituencies in Northern Ireland

Page 42: Shortcomings of Census Interaction Data

Problems of different geographies

• When mapping data, correct boundary sets are time consuming to assemble

• When constructing rates, correct denominators are time consuming to gather

• Not all area data are easily available for all of these geographies

Page 43: Shortcomings of Census Interaction Data

Shortcomings

• Overall data quality

• Statistical Disclosure Control

• Variant geographies

• Lack of comparability over time

Page 44: Shortcomings of Census Interaction Data

Lack of comparability over time

As well as changes in geography, there are significant changes in data structure over time

General issues

• Changes in population base, inclusion of students etc.

• Handling of unknown migrant origins or workplace locations

Migration data

• Handling of overseas origins

• Use of ‘no usual residence’

Workplace data

• Handling of off-shore workers

• Handling of home-workers

Page 45: Shortcomings of Census Interaction Data

No usual residence in 2001 migration data

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

Percentage of all migrants 2000-1, by district, who had ‘no usual residence’ one year prior to the Census

• Mean: 6.9%

• Minimum: 3.7% - Ribble Valley

• Maximum: 19% - Newham

• 19/20 districts with highest levels are in London

Page 46: Shortcomings of Census Interaction Data

Home-workers

1981 – Workplace at home is part of general ‘within ward’ flow

• Home-workers only be distinguished from others in the ‘mode of transport’ table

1991 – Workplace at home is a distinct workplace location

• All tables can be extracted separately for home-workers

2001 – Workplace at home is part of general ‘within ward’ flow

• Home-workers only be distinguished from others in the ‘mode of transport’ table

Page 47: Shortcomings of Census Interaction Data

Coping with compatibility issues

Various data sets exist that attempt to bridge some of these gaps

• Re-estimate for newer geographies

• eg 1981 data on 1991 and 2001 boundaries (Boyle and Feng, 2002)

• Create hybrid sets

• eg merge home-workers into main flow for 1991

• Create best-fit geographies than span time periods

• eg CIDS common geographies

Page 48: Shortcomings of Census Interaction Data

Summary

• The interaction data suffer from problems related to

• Disclosure control modifications

• Changes over time

• Awkward geographies in 2001

• These have been addressed by

• Estimated and re-worked data sets

• Data estimated for different boundary sets

Page 49: Shortcomings of Census Interaction Data

References

Boyle PJ and Feng Z (2002) A method for integrating the 1981 and 1991 GB Census interaction data Computers, Environment and Urban Systems 26 241-56

Rees, P.H. and Duke-Williams, O. (1997) Methods for estimating missing data on migrants in the 1991 British Census, International Journal of Population Geography, 3: 323-368

Simpson, S. and Middleton, E. (1997) Who is missed by a national Census? A review of empirical results from Australia, Britain, Canada and the USA, CCSR Working Paper No 2 Centre for Census and Survey Research, University of Manchester

Simpson, S. and Middleton, E. (1999) Undercount of migration in the UK 1991 Census and its impact on counterurbanisation and population projections, International Journal of Population Geography, 5: 387-405