Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses...
Transcript of Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses...
![Page 1: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/1.jpg)
The Impact on Survey Operations and Sampling
Jizhou Fu and Lee Fiorio
Modeling Coverage Error in Address Lists Due to Geocoding Error:
AAPOR 2012, Orlando
![Page 2: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/2.jpg)
![Page 3: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/3.jpg)
• ABS Background• Analysis Goals• Data and Methodology• Results• Discussion • Limitations
Outline
3
![Page 4: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/4.jpg)
• Address based frames first need geographical boundaries• Types of address-based frames
• US Postal Service Delivery Sequence File (DSF)– Purchased through market research vendors– Updated frequently– Adequate replacement for field listing in urban and suburban areas
• Dependent or Enhanced Listing– Provide DSF to listers for enhancement in the field– Reduces cost and increases accuracy of traditional lisitng
• Because of costs, DSF should be used where possible• Enhanced listing should be used where DSF is inadequate• Evaluating DSF coverage: DSF-to-Census Ratio
Address-Based Sampling (ABS) Background
4
![Page 5: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/5.jpg)
• Geographic information on the DSF:• Address, city, county, state, zip, zip4, carrier route, walk
sequence
• Geographic information not on the DSF:• Census block, census block group, census tract, latitude or
longitude
• Geocoding • Appends latitude and longitude as well as census geography• Requires commercial software • PO Boxes and Rural Route address not easily geocoded• Potential for error
DSF Geography
5
![Page 6: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/6.jpg)
Geocoding Error
6
![Page 7: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/7.jpg)
7
Geocoding Error
![Page 8: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/8.jpg)
8
Geocoding Error
![Page 9: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/9.jpg)
9
Geocoding Error
![Page 10: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/10.jpg)
10
Geocoding Error
![Page 11: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/11.jpg)
11
Geocoding Error
![Page 12: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/12.jpg)
• What are the correlates of geocoding error?• Logistic Model
– Urbanicity– Housing unit density– Vacancy rates– Drop delivery– Housing unit type (single family home, apartment)– Home ownership– Adjacent to water blocks
• Does geocoding error exhibit spatial clustering?• Moran’s I• Logistic Model
– Autocovariate
Research Questions
12
![Page 13: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/13.jpg)
• NORC National Frame Listing effort• Fall 2011• Out of 1,516 segments (census tracts or block groups), 126 segments
needed enhancement• Device based listing
– Latitude and longitude collected– Segment level address list– Real-time QC in central office
• Selected 21 enhanced segments for analysis• Geocapture worked for at least 90% of addresses• Mix of urban and rural• Range of DSF-to Census ratios -- 0.31 to .81
• 8,560 DSF lines
Data and Methodology
13
![Page 14: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/14.jpg)
14
Geocoding error: over-coverage vs under-coverage
Addresses added in the
field
Final enhanced list
Confirmed DSF
addresses
Unconfirmed DSF
addresses
DSF
(over-coverage) (under-coverage)(coverage)
4,8597,5041,056
• 12.3% of DSF lines unconfirmed in field
• Difficult to separate causes of under-coverage
• Focus on over-coverage
![Page 15: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/15.jpg)
• Sample drawn of 4,000 DSF lines provided for enhancement• Dependent variable: flag if correctly geocoded into the segment• Independent variables:
• Address-level (DSF)– Drop point flag– Vacant flag– Record type indicator (High rise, rural, single family home)
• Block-level (census)– DSF-to-Census ratio – four categories(<0.9, 0.9 to 1.25, 1.25 to 2, >2)– TEA Code Flag– Type of Enumeration Area– Principal city flag– Water adjacency flag– Housing unit density– Area– Percent Multi-unit
Data and Methodology (cont’d)
15
![Page 16: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/16.jpg)
Table 1: Logistic Model ResultsParameter Estimate
Intercept -***DSF-to-Census <0.9 +***DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***
16
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
Significance: * p<0.05, ** p<0.01, *** p<0.001
![Page 17: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/17.jpg)
17
Table 2: A closer look at impact of DSF-to-Census Ratio
Category Parameter Odds Ratio
Signifi-cance
1 DSF-to-Census <0.9 2.25 ***
3 DSF-to-Census 1.25 to 2.0 2.37 **
4 DSF-to-Census >2.0 4.29 ***
• Addresses in category 1 census blocks have the same odds of being geocoded incorrectly as category 2
![Page 18: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/18.jpg)
• Does geocoding error exhibit spatial clustering?• Do blocks with geocoding error neighbor blocks with
geocoding error?
y = β1x1 + β2x2 + … + βpWy + ε
• Where Wy is weighted average of neighboring values or ‘spatial lag’
18
Spatial Autocorrelation
![Page 19: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/19.jpg)
Spatial Autocorrelation
19
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
![Page 20: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/20.jpg)
Spatial Autocorrelation
20
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
![Page 21: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/21.jpg)
Moran’s I – Measure of Spatial Autocorrelation
21
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
![Page 22: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/22.jpg)
Spatial Autocorrelation
22
1 2
3 4 5
Example Segment
Error1 12 13 04 05 1
Example variable of interest y
![Page 23: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/23.jpg)
23
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
y1 12 13 04 05 1
=
Wy1 12 23 14 25 1
Weight Matrix W Geocoding Error y
Spatial Autocorrelation
*
Weighted average of neighbors Wy
![Page 24: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/24.jpg)
• Degree of linear association between observed values y and a weighted average of neighboring values Wy
• Observed: 0.0281• Very significant (p < 0.0001)• Positive, indicating possible spatial clustering
• Add Wy to final logistic model
y = Xβ1x1 + Xβ2x2 + … + XβpWy + ε
24
Moran’s I and Spatial Autocorrelation Model
![Page 25: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/25.jpg)
Table 3: Logistic Regression with Spatial AutocovariateParameter EstimateIntercept -***DSF-to-Census <0.9 +**DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***Autocovariate (W.y) +* 25
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
![Page 26: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/26.jpg)
Map 1: Example of Clustering
26
![Page 27: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/27.jpg)
Map 2: Example of Clustering
27
![Page 28: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/28.jpg)
• Urbanicity, postal characteristics, block-level DSF-to-census ratio are highly correlated with geocoding error
• Addresses in low DSF-to-Census ratio blocks have similar odds of geocoding error as addresses in high DSF-to-Census ratio blocks
• Geocoding error exhibits spatial clustering• Problematic blocks within a segments can be used as a potential
flag for larger geocoding error
• Help with address frame decisions
Discussion
28
![Page 29: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504](https://reader033.fdocuments.net/reader033/viewer/2022050110/5f479fed46a8275a697d3673/html5/thumbnails/29.jpg)
• Analysis was limited to segments that already have less than acceptable DSF coverage
• Possible that census characteristics and DSF flags behave differently above threshold
• Sample of 21 segments used in analysis not random• Limits the ability to generalize findings
• Definition of geocoding error limited to over-coverage error
Limitations
29