Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director...

52
Editing And Imputation For Editing And Imputation For Manufacturing Statistics Manufacturing Statistics At At Statistics Canada Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011

Transcript of Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director...

Page 1: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Editing And Imputation For Editing And Imputation For Manufacturing Statistics Manufacturing Statistics

AtAt

Statistics CanadaStatistics Canada

Marie BrodeurDirector General, Industry Statistics Branch

Santiago, ChileMarch 15 to 17, 2011

Page 2: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Outline Of The Presentation

Overview of the Manufacturing Program Centralized Process Surveys Overview of the UES Survey Process Post Collection Processing Inputs & Tools Use of Tax Data The many phases of UES Post Collection

Process Managing the UES Post Collection Process

2

Page 3: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Statistics Canada

Chief Statistician Chief Statistician of Canadaof Canada

CorporateCorporateServicesServices

National AccountsNational AccountsAnd AnalyticalAnd Analytical

StudiesStudiesBusiness and TradeBusiness and Trade

StatisticsStatisticsInformatics andInformatics and

MethodologyMethodology

Census andCensus andOperationsOperations

Social, Health andSocial, Health andLabour StatisticsLabour Statistics

3

Page 4: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Statistics CanadaBusiness andBusiness andTrade StatisticsTrade Statistics

Industry Industry StatisticsStatistics

Economy-wideEconomy-wideStatisticsStatistics

Agriculture,Technology andTransportation

Statistics

ManufacturingManufacturingand Energyand Energy

DistributiveDistributiveTradesTrades

Service IndustriesService Industries

Enterprise StatisticsEnterprise Statistics

Consumer PricesConsumer Prices

International TradeInternational Trade

Producer PricesProducer Prices

Investment andInvestment andCapital StockCapital Stock

Enterprise StatisticsEnterprise Statistics

AgricultureAgriculture

Small BusinessSmall BusinessAnd Special SurveysAnd Special Surveys

Science, InnovationScience, InnovationAnd ElectronicAnd Electronic

InformationInformation

TransportationTransportation

4

Page 5: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Share of manufacturing sales by industry, 2010

0.0% 2.0% 4.0% 6.0% 8.0% 10.0% 12.0% 14.0% 16.0% 18.0%

Leather and allied product manufacturing Textile mills

Textile product mills Clothing manufacturing

Printing and related support activitiesElectrical equipment, appliance and component

Furniture and related product manufacturing Beverage and tobacco product manufacturing

Miscellaneous manufacturing Non-metallic mineral product manufacturing

Computer and electronic product manufacturing Wood product manufacturing

Plastics and rubber products manufacturingPaper manufacturing

Machinery manufacturing Fabricated metal product manufacturing

Primary metal manufacturing Chemical manufacturing

Petroleum and coal product manufacturingFood manufacturing

Transportation equipment manufacturing

Manufacturing Distribution Of Sales

5

Page 6: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products

Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations

Transformation of own materials or those owned by others

Service outputs: custom work, repair and maintenance Product outputs: finished goods, intermediate goods

Who Are Manufacturers?

6

Page 7: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Monthly Survey of Manufacturing (MSM)

Annual Survey of Manufactures and Logging (ASML)

Series of sub-annual commodity surveys

Manufacturing Program At Statistics Canada (STC)

7

Page 8: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Monthly indicator of manufacturing activity Last Redesign in 1999 Designed to be a reliable indicator for both

trends and levels Establishment Survey (n= 10,500) Stratified by Province, NAICS and Size

General Characteristics Of The MSM

8

Page 9: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Sales• Goods of own manufacture

Inventories• Raw materials• Goods-in-process• Finished products

Orders• New orders• Unfilled orders

Goods purchased for resale (revenue and inventory)• These data are collected but not released

Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding)

MSM Concepts

9

Page 10: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Simple Complex

Total number of establishments on the business register

2,278,730 110,557

Value of sales of all establishments on the Business Register

$2,214.9 billion

$1,859.1 billion

Total number of manufacturing establishments on the business register

84,215 6,648

Value of sales of manufacturing establishments on the Business Register

$340.8 billion

$234.5 billion

Frame And Coverage

10

Page 11: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

MSM Sampling Plan

Take-Some

Take-All

Take-None

11

Tax replaced

Survey

Page 12: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Background• The Goods and Services Tax (GST) is the federal

Value Added Tax• GST is collected by the Canada Revenue Agency

(CRA)• The CRA provides tax data to Statistics Canada

Information received includes the Business Number, revenue, tax remitted and input tax credit

MSM Sampling Plan: Use Of Tax

12

Page 13: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Who is replaced?• Single establishment enterprises

Replace 50% of sampled data with GST data

• Chronic refusals

Who are not replaced?• Very large single enterprise establishments• Complex units (i.e. multiple establishments) – as it is

found in the GST database

Use Of Tax Data

13

Page 14: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Measures the contribution of manufacturing industries to economic activity in Canada

In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH)

Key input to SNA Input-Output tables Survey collects data on

• what commodities are produced (Make matrix)

• where commodities are destined (provincial I/O tables)

• what commodities and primary inputs are used in production (Use matrix)

What Is The Annual Survey Of Manufactures And Logging (ASML)?

14

Page 15: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES)

Same as MSM

Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113

Estimates produced for 261 NAICS6 level industries

Estimates produced for the 10 provinces and 3 territories.

Survey Coverage

15

Page 16: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5)

Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA

Commodities consumed (inputs) and produced (outputs) both goods and services

Collect commodity values and quantities (for selected goods)

Services produced and consumed collected as expense items and classified based on COA

Content: Commodity Variables

16

Page 17: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Types Of Administrative (Tax) Data

From the Canadian Revenue Agency (CRA)

• Agreement between two agencies

• T1 (unincorporated businesses)

• T2 (incorporated businesses)

• T4 (pay slips)

• GST (goods and service tax)

• PD7 (payroll deduction accounts)17

Page 18: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Editing And Imputation For Manufacturing Surveys

Page 19: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Why A Centralized Process?

Best Practices Standardization of Processes

• Cross Survey Comparisons• Enterprise Centric Processing/Coherence

Analysis Efficient use of Resources Transportable Knowledge Across Survey

Programs

19

Page 20: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Challenges Of A Centralized Process Remain Centralized

Distribute processing

Priority Setting

Communication and Coordination

20

Page 21: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Pre-Grooming

Allocation / Estimation

Edit & Imputation

“Clean” Records

Central Data Store

Subject Matter Review & Correction

Tool

Tax Data

USTART

UES Post-Collection Processing

21

Page 22: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Collection

Collection Period: February to early October

Collection Processing System: Blaise• Blaise can be seen as being a Collection Control

Center

• Blaise has many functions: Call Scheduler Transaction history files Audit Trail Files And more

22

Page 23: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Blaise: Variables

Questionnaire number Mail-out date Number of calls Length of the call Number of contact attempts Response code And more

23

Page 24: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Blaise: Bonuses Over The Years

Blaise Transaction History (BTH) Files• Collection data analysis:

Produced a paper on best time to call Produced a paper on maximum # of attempts

Audit Trail Files• Find outliers• Difficult to answer questions

24

Page 25: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Collection Precontact (Dec-Jan)

– Mostly for Business Register (BR) births; verification of contact information (name, address, …)

– By phone (in a few cases, a letter or a fact sheet is sent)

Mail-out of questionnaires (Jan-March)– 2 or 3 mail-out dates

Follow-up in case of non-response for some units (begins about a month atfer mail-out)

– Phone call, remail or fax

Mail-back of questionnaires

Verifications of received questionnaires / Edits– Is the questionnaire complete or are some key variables

missing? (Edit follow-up by phone in some cases) 25

Page 26: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Collection

Coding of questionnaires (about 20 response codes)

• Response, non-response, out-of-scope, …

Imaging / Data capture (CADI - Computer Assisted Data Input)

26

Page 27: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Centralized Collection

Mailout(38K CEs)

Pre-Contact(17K Businesses)

Edit / Verification(BLAISE)

Receipt(75% target)

Delinquent Follow-Up

Capture / Imaging

“Clean” Records

Score Function

27

Page 28: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

UES: Data Collection / Score Function Introduced in 2002, the UES score function is the main

tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year.

Reduces collection costs yet retains data quality

Similar to the collection goal of obtaining a high weighted coverage response rate.

PRIORITY 1: Extensive follow-up for the larger revenue CEs in cases of non-response.

PRIORITY 0: Minimum follow-up for the smaller CEs in cases of non-response.

28

Page 29: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

DISSEMINATION

COLLECTION

Chart Of Accounts

SalesOperatingrevenue Cost of

sales

Grossprofit

Expenses

EBIT

OutputsInputs

Valueadded

ShipmentsOperatingSurplus

GDP

LINK, BRIDGE, CONCORDANCE

29

Page 30: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Expected Benefits Of A Chart Of Accounts

Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative

data

30

Page 31: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Links To Chart Of Accounts

CHART OF

ACCOUNTEstablishment

Legal entity

Enterprise

31

Page 32: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

UES: Use Of Tax Data Validation (comparison)

Verify dubious collected data against the equivalent tax data record

Imputation One of the methods used for non-response

Estimation Below take-none Direct Data Replacement

Update Business Register Allocation of survey data (use tax revenues, salaries

and expenses)

Page 33: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Develop centralized systems• Move away from stand-alone• Single point of access for security

Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse

Centralized Processing Systems And Databases

Page 34: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Enterprise Portfolio Managers

Top 350 enterprises in Canada Status

• Platinum, Gold, Silver, Bronze Personal visits Enterprise Profiling Coordination of mail-out and collection Enterprise/ Establishment coherence Holistic Response Management

• Strategic Response Unit• Escalation Process / Statistics Act

34

Page 35: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Review and Correction (Post-Capture)

Done via an application which is a micro-editing tool

Opportunity to perform edits and to manually correct data before the automated edit and imputation process

Opportunity to gain an understanding of the quality of data coming in from the field

35

Page 36: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

What Is Generally Done By SMOs During This Process?

Ensure that industry codes are valid and Ensure that industry codes are valid and response code are correctresponse code are correct

Ensure that equivalent survey cells have Ensure that equivalent survey cells have consistent dataconsistent data

Enter data for records that came in after the Enter data for records that came in after the collection cut-off datecollection cut-off date

Review high impact outliers in terms of profit, Review high impact outliers in terms of profit, average salary, etc.average salary, etc.

Check comments made by respondents and Check comments made by respondents and collection staff collection staff

36

Page 37: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Why Is This Process Necessary? Reviewing and correcting records will increase

the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I.

Need to assess the quality of collected data Determine if problems with questionnaire Inability of respondent to provide a given

data point Determine if enough data for E&I

37

Page 38: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

What Should Not Be Done During This Process?

Do not plug data for non-response records. They will be imputed during the automated E&I.

38

Page 39: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

What Is E & I? Editing

• Verify that parts add-up to total • Ensure that there are no missing values where parts

add up to total• There must be consistency between related

variables Imputation

• Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed

• Impute for missing data or partially responded data• Impute entire records in the case of total non-

response39

Page 40: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Why Is E&I Necessary?

To produce a complete and consistent data file that accounts for all sampled units

Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed

Correct erroneous responses

40

Page 41: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

E&I Terminology

Data Group• Groupings (defined by SM) of records that will be kept together Groupings (defined by SM) of records that will be kept together

for imputation purposesfor imputation purposes• These groupings are based on multi dimensions:These groupings are based on multi dimensions:

industry (NAICS)industry (NAICS) geography (province)geography (province)

Data groups that will be used for a specific survey will depend on:• initial sample design (number of units sampled and the level of initial sample design (number of units sampled and the level of

stratification used)stratification used)• number of records that respond to the survey (a minimum of 5 number of records that respond to the survey (a minimum of 5

or 10 records are required in a data group)or 10 records are required in a data group) May be changed during production if not enough donors

41

Page 42: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

E&I Terminology (continued) Edit Group

• Grouping of variables within a record that will be processed together in an imputation method

• Generally edit groups may be defined as follows for most surveys: revenue and expense sections employment section and provincial

distribution of goods/services sold• Allows for a record to be a donor if it has clean

data in one section even when other sections are blank; this increases the donor pool

42

Page 43: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

E&I Terminology (continued)

Key variables• Total operating revenue

• Total operating expenses• Salaries• Cost of goods sold

43

Page 44: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

The Stages Of The E&I System

Pre-processing

BANFF E & I System Post-Processing

Allocation

44

Page 45: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Preprocessing

Deterministic Edits Conditional edits - If A then B Sum of Parts (SOP) Assign 100% to percentage totals Impute reporting period Donor Outlier Detection

45

Page 46: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

BANFF E & I System

Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses)

Impute for other missing variables:• Apply Historical Trend• Apply Current Year Trend• Use donor (for partial imputation),

Select a donor for massive imputation for total non-response

46

Page 47: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

BANFF Algorithms

DIFTREND - Historical trend imputation

CURRATIO - Current ratio imputation

PREVALUE – Value from the previous period for the same unit is imputed

PREAUX – Historical value of a proxy variable for the same unit

CURAUX – Current value of a proxy variable for the same unit

47

Page 48: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Post-Processing

Prorate components to ensure that they sum exactly to totals

Perform a number of consistency checks to ensure that micro-data are valid

Assign customer location (percentage cells)

Massive Imputation (donor selected during processor but applied in the post-processor)

48

Page 49: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Allocation - Definition & Purpose

Definition: Allocation is the distribution of survey and administrative

data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame.

Purpose: To provide fully-processed micro data on a fiscal year

basis, for establishments or locations in-sample for the UES

Determine the distribution of value added by province

49

Page 50: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Establishment 1

Establishment 4

Establishment 3

Establishment 2

SAMPLE

Questionnaire 2

Collection/Processing

Allocation

Establishment 1

Establishment 4

Establishment 3

Establishment 2

Establishment U

Questionnaire 1

Sample Survey Allocation

50

Page 51: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Post Collection Operations Committee• Discuss production issues of common interest• Provide status reports on production and production readiness

Divisional Production meetings• Working group level dealing with production issues relating to a

specific subject matter division, including planning and adhoc requests

Post Collection Processing Teams• Structured by Subject Matter Division to provide the best support

and to maximise subject matter expertise Change Management Requests

• Improvements Service Request Management Portal (SRM)

• Corrections

Managing The UES Post Collection Process

51

Page 52: Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Future Directions

IBSP (Integrated Business Statistics Project)• New and Improved UES, to consolidate and

standardise processing for more annual and sub-annual business surveys

• Start RY2013. To be completed for RY2015• Number of surveys to increase from 60 annual surveys

to 120 annual and sub-annual surveys.

52