Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures
description
Transcript of Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures
Steve Matthews and Wesley YungMay 16, 2004
The United Nations Statistical Commission and Economic Commission for Europe
Conference of European Statisticians
Outline
Introduction Tax data programs at Statistics Canada The Annual Survey of Manufactures (ASM)
Overview Strategy for use of tax data Analytical studies
Conclusions and Future Work
Introduction
Desire to increase use of tax data Reduce respondent burden Reduce survey costs
Can be used at many stages of survey process Stratification Survey data validation Edit and imputation Estimation
Tax Data programs at Statistics Canada
Tax data available to Statistics Canada Collected by Canada Revenue Agency (CRA) Access via a data-sharing agreement To be used only for statistical purposes
Two extensive tax data programs Unincorporated businesses (T1) Incorporated businesses (T2)
Tax Data programs at Statistics Canada (cont’d)
T1 - Population Unincorporated businesses Account for small share of revenues
Administrative Data Sample-based Limited set of variables Edit and imputation is applied Weighted benchmarked estimates
Tax Data programs at Statistics Canada (cont’d)
T2 - Population Incorporated businesses Account for large share of revenues
Administrative Data Census-based Extensive set of variables Edit and imputation is applied Micro-data is produced
The Annual Survey of Manufactures
Manufacturing is an important sector of Canadian economy
~17% of GDP
Annual Survey of Manufactures Take-none Portion and Survey Portion Extensive questionnaire (financial and commodity) Data requirements (pseudo-census)
The Annual Survey of Manufactures (cont’d)
Target population Drawn from Statistics Canada’s Business Register (BR) All businesses classified to manufacturing
Sample design Non-survey portion
Administrative data Survey portion
Stratified SRS (Stratum = NAICS * Province * Size) Small take-some / Large take-some / Take-all Collected via mail-out / mail-back, follow-up via telephone
The Annual Survey of Manufactures (cont’d)
Edit and Imputation Edits applied to ensure accuracy and coherence Extensive imputation to produce ‘pseudo-census’
datasetHistorical imputationRatio imputationNearest-neighbour donor imputation
The Annual Survey of Manufactures (cont’d)
Estimation Non-survey portion (tax data)
Total Expenses onlyT1: weighted domain estimates T2: aggregates from administrative census dataset
Survey portion (survey data and imputed data)Aggregates from pseudo-census datasetDomains of interest: NAICS and Province
Analytical Studies
Motivation for two studies:
Which variables should be ‘replaced’?
What are the effects of the strategy on final estimates for all variables?
Study 1 – Data comparison
Study 2 – Impact Analysis
Analytical Study 1
Study to select appropriate variables Comparison of reported data collected via survey
and tax Simple businesses only Assess suitability for substitution of survey data
Based on ~6,000 businesses
Analytical Study 1 (cont’d)
Correlation Analysis Wide range of correlations
Total Expenses: 0.9 Total Energy Expenses: -0.10
Reporting Patterns Same pattern (zero or positive) for individual businesses
Total Expenses: 99% Total Energy Expenses: 50%
Analytical Study 1 (cont’d)
Distribution of Ratios Examined histograms, fraction between 0.9 and 1.1
Total Expenses: 60% Total Energy Expenses: 16%
Population Estimates Relative difference between tax and survey-based
estimates Total Expenses: 3% Total Energy Expenses: 28%
Analytical Study 1 (cont’d)
Selected several variables for direct substitution Section totals and sub-totals
expenses, revenues, inventories, etc.
Remaining variables are imputed Imputation => assign distribution of details
within each total
Analytical Study 1 - Conclusions
Distinctively different results for different variables Direct substitution seems feasible for totals Direct substitution not recommended for details
Use standard methods to impute other variables
Analytical Study 2
Analysis to evaluate impact of tax data strategy
Bias Comparison of estimates from different scenarios
Variance Shao-Steel approach for variance estimation Reflects variance from sampling and imputation Assume equal probability of response within
imputation class
Analytical Study 2 (cont’d)
ScenariosTax Data Used in
ImputationEstimator Variance
HT – No Tax
None (ratio imputation based on frame revenues)
Horvitz-Thompson
Sampling Imputation
PC – No Tax
None (ratio imputation based on frame revenues)
Pseudo-census
Imputation
PC - Tax
Non-response (in or out of sample)Direct substitutionRatio imputation
Pseudo-census
Imputation
Analytical Study 2 (cont’d)
Comparison of resulting estimates for Total Expenses
Relative Difference from “HT – No Tax” – Total Expenses
* Median value for all such domains
All Manufacturing
NAICS3 x
Province*
PC – No Tax 1.8% 0.0%
PC – Tax 0.5% 1.3%
Analytical Study 2 (cont’d)
Comparison of estimated CV’s for Total ExpensesCo-efficient of Variation – Total Expenses
* Median value for all such domains
All Manufacturing
NAICS3 x
Province*
HT – No Tax 0.3% 1.5%
PC – No Tax 0.3% 1.5%
PC – Tax 0.1% 0.7%
Analytical Study 2 (cont’d)
Comparison of resulting estimates for Total Energy Expenses
Relative Difference from “HT – No Tax” – Total Energy Expenses
* Median value for all such domains
All Manufacturing
NAICS3 x
Province*
PC – No Tax 1.2% 0.0%
PC – Tax 0.8% 1.2%
Analytical Study 2 (cont’d)
Comparison of estimated CV’s for Total Energy ExpensesCo-efficient of Variation– Total Energy Expenses
* Median value for all such domains
All Manufacturing
NAICS3 x
Province*
HT – No Tax 0.3% 1.8%
PC – No Tax 0.4% 1.8%
PC – Tax 0.4% 1.8%
Analytical Study 2 - Conclusions
Bias Small relative difference between estimated totals from
scenarios
Variance Relatively low CV for all options Tax substitution variables: Scenario 3 most efficient Non-tax substitution variables: Scenario 1 most efficient
Analytical capabilities Scenarios 2 and 3 provide most detail
Conclusions
Results used to select 2004 strategy – “PC – Tax” Meets needs of data users Reduced cost and response burden Maintain (improve) quality
Striving to further increase use of tax data Increased portion of population Increased number of variables
Future Work
Editing of tax data Similar approach to survey data approach Potential to expand list of direct substitution variables
Indirect use of tax data More adaptive models
Quality indicators Account for increased variance and potential for bias due
to imputation