An Application of Selective Editing to the US Census Bureau Trade Data
description
Transcript of An Application of Selective Editing to the US Census Bureau Trade Data
Maria GarciaUS Census Bureau
UNECE/SDE, Oslo, Norway, 24-26 September 2012
An Application of Selective Editing to the US Census
Bureau Trade Data
Foreign Trade Statistics Programs
• Official source of US international merchandise trade statistics
• Electronic data collection• Complete enumeration • Pre – editing: check for fatal errors• Micro editing
– Range and ratio edits– Automatic imputation – “Rejects” – imputation not successful
Foreign Trade Data Processing
• Rejects– Distribute among analysts for manual correction. – Analysts review large number of records under tight
time constraints• Goal: Use selective editing to identify highly
suspicious errors having a high potential effect on the estimates– Value (V)– Quantity (Q)– Shipping weight (SW)
Hidiroglou-Berthelot Method (HB)
• Latouche and Berthelot (1992) used HB when developing their score functions.
• HB uses historical ratios () to detect outlying observations in periodic data (1986).
• Our data: cannot test historical ratios.• For record , instead of using HB to identify
errors in or , identify errors in unit prices
• Apply a series of transformations:– Identify outliers at both ends of the distribution
of unit prices
– Size transformation, where
HB Method for Our Trade Data
• Measure distance of quartiles of from median
• Measure displacement from median
weighted by appropriate distance
HB for Our Trade Data (Cont’d)
Effect on Publication Totals• Examine effect of changes on final publication
totals (Adapted from Latouche and Berthelot, 1992)
• If no anticipated value is available, use median of ratios and reported data, e.g., for Value
• For every record (Similar to Jäder and Norberg, 2005)
Simulation and Evaluation
• Simulation– Extract from a one-year exports data file– Archived raw and edited (final) data
• Evaluation– Absolute Pseudo-bias = (Latouche and
Berthelot, 1992)
Evaluation Results
• Examining results at lowest level of aggregation: – Data users may closely scrutinize the
statistics for particular types of products– Ex: import/export of rough diamonds – Kimberley Process - joint governments,
industry and civil society initiative to stem the flow of conflict diamonds
Absolute Pseudobias for Exported Non-industrial Diamonds (>0.5 carats)
Using Ratio to Measure Suspicion
0
0.2 0.4
0.6
0.8
1 1.2 1.4
1.6
0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
Percent of Erroneous Records Corrected
Abs
olut
e P
seud
obia
s
Evaluation Results
Evaluation Results
Customer’s Feedback
• Subject matter experts questioned:– High ranking given to records that by
experience they consider insignificant to final cell estimates
– Low ranking given to records that would have been flagged for manual correction
Total Value (V)
Total Quantity (Q)
Unit Price
(V/Q) Ratio
Bounds
Reported cell total $102,190 7,217 $14.15 90 3000
Reported suspicious Record $3,024 7,144 $0.42 90 3000
Final suspicious record $3,024 10 $302.40 90 3000
Final cell total $102,190 83 $1,231.20 90 3000
Commodity XXXXXXXXXX
Suspicious record is correctly identified by selective editing as having a large effect on the total quantity:
Customer’s Feedback
Total Value (V)
Total Quantity (Q)
Unit Price
(V/Q) Ratio
Bounds
Lower Bound
Upper Bound
128 records, 87 records imputed, three rejects $3,142,622 129,973,502 $0.02 0.25 50
Final cell totalAll three rejects corrected $3,142,622 1,230,629 $2.55 0.25 50Selective editing cell totalTwo highest ranked records corrected $3,142,622 1,804,699 $1.74 0.25 50
Commodity YYYYYYYYYY
Customer’s Feedback
Concluding Remarks• Combination of Hidiroglou-Berthelot and
Latouche-Berthelot methods. • Tried alternative ways to calculate : Quartile
method and Resistant fences method.• Looking at alternative evaluation methods,
determining optimal levels of aggregation, and including seasonality in calculation of simple statistics.
Thank you!