Treatment Of Unit Non-response In Establishment Surveys ICES –III: June 18 -21, 2007 M.A....

Treatment Of Unit Non-response In Establishment Surveys

ICES –III: June 18 -21, 2007

M.A. Hidiroglou

Wesley Yung

Statistics Canada

Outline

1. Why is it a problem?2. Causes3. Measurement4. Follow-up5. Score Function6. Adjusting for nonresponse7. Weight adjustment8. Imputation9. Summary

Why is it a Problem?

Bias Non-respondents differ from respondents in the characteristics measured

Sampling varianceIncreased

Reduced effective sample size

Causes

Frame qualityContact information

name, address, telephone number and fax number

Classification (industry/geography)Over-coverage: sampled unit not in scope to the survey - does not respond

Under coverage: units declared out-of-scope – not contacted

Causes, cont.

Questionnaire Design and layout

Coverage: complex businesses

Language

Length / time to fill out

Causes, cont.

Data collection methodDid not adjust to respondent’s preferred contact modeMail, personal interview, telephone interview, computer assisted interviewing, etc

Causes, cont.

Contact: Agency and respondentLack of communication and follow-up

Too much contact: editing checks

TimingBest day and time

Fiscal year end

Causes, cont.

Contact: Agency and respondentData availabilityResponse load

Who else is asking?

Legal obligations for respondents and statistical agencyConfidentiality protection

Measurement

Compile non-response ratesRefusals

Non-contactOut-of-scope

Seasonality /death status (unknown)

Mail returns

Other reasons

Follow Up

Follow-up non-respondentsAll and/or targeted sub-group

Effective way to increase the response rate

Follow Up, cont.

Prioritise follow-up

Who?Target large or significant units first

Non-responding births

Delinquent businesses

How? Score-function

Follow Up, cont.

Annual business census type surveys Split non-respondents by into take-all and take-some strata

Boundary

Select with certainty ta units:

Select n - ta remaining units from take-some stratum

0 52 2 2 .

kx bx c Nx S

Largest

Smallest

Response

Follow-up

Responsekx b

0 52 2 2 .b x c Nx S

Follow Up, cont.

Hansen-Hurwitz (1946) Initial sample:

Follow-up sample of non-respondents

Estimator

r rn n n

ri is sr

nNY y y

1r rn n

Score Function

Basic ideaFollow-up non-responding units that have most impact on estimates

Adaptation of Latouche and Berthelot (1992), McKenzie (2001), and Hedlin (2003).

Score Function, cont.

Key steps1. Define and compute score function from

past values

2. Determine score cut-off: minimize absolute standard bias

3. Follow-up units above score cut-off

1. Define and compute score function

Use past data at time (say)

( ) respond: Sample: ( )

( ) do not respond:

ˆ Follow-up everybody: past

RESP past j past

prev imppast j pastRESP

RESPFs t j past j past

s t y ts t

s t y t

Y w t y t

Compute score function using non-responding units

) *100ˆ

impj past j past j past

j past FU

pastRESP

w t y t y tscore

2. Determine score cut-off

Rank scores ( ) from highest to lowest

Follow-up highest scores 1

- response set : ( )

- non-response set : ( )

ˆ New estimate:

j past

RESPB past

pastRESPB

score t

RESPB RESPB

RESPB impj past j past j past j past

w t y t w t y t

( ) ( )

ˆ ˆ Absolute standard bias: ( )=

Score cut-off: ( ) where ( )

Reasonable value for A=0.10

If cv=2%, then ( )=0.2%

past past

RESPF RESPBs t s t

RESPFs t

CUT past

Y YASB B

score t A

ASB CUT

SB CUT A

2. Determine score cut-off

3.Follow-up units above score cut-off

Score-function (Latouche and Berthelot 1992)

Establish threshold based on ASB

Follow-up k-th unit if

1score

survey weight at time

importance of variable

impQk q k q k,q

kk k,qq = s t

w t I x t - x tt =

w t x t

score thresholdk t

Number of recontacts

Absolute standard bias

Cut-off

Weight Adjustment, cont.

Select sample s: Design weights

Portion of sampled units that respond: Portion of sampled units that does not respond:

( ) r rs n

iw s n

( ) r rs n

Adjusting for nonresponse

Two options

1. Weight adjustment:Inverse of response probability

Use of auxiliary data

2. Imputation:Impute for missing values to get a full data matrix

Weight Adjustment

Used to reduce bias due to non-response

Depends on the probability to respond

Assumes independent of variable of interest, y Ignorable non-response

Respondents behave same as non-respondents

If known, then adjustment is

Unbiased estimator is

However, not knownUse estimates of : may be biased

If are ‘good’, then estimates are approximately unbiased

i 1/ i

i ywY1ˆ

Let true response mechanism be

If assume missing at random:

Bias for estimated total:

Pr r kk s s

Pr , r kk s s

ˆ 1/ /r r

k k ks s

ˆk k k k

Y N y y

How to estimate (approximate) ?

Auxiliary variablesLogistic regression

Auxiliary data (discrete, continuous)

Logistic regressionDefine indicator response variable

Probability that unit k responds

Equivalent to:

1Pr 1 1 expi i i i

z β z β

1 if unit responds

0 otherwisei

11, , , ; auxiliary data

a vector of logistic regression coefficients

i i ipz z

Logistic regression

Response probability adjusted weight

Reweighed estimator:

ˆ ˆ1 1i i

i i i i i ii s i s

w e w e

z β z βz z z

ˆˆ ˆ/ where 1 ii i i iw w e z β

LR i isY w y

127 sampled businesses

71 businesses respond

Same : 0.56

Probability of Response

40 50 60 70 80

x-values

Response status Theta hat Mean theta hat

Example: Logistic regression

Response

x- values

71 Respond 55 Respond

Example Logistic regression

Probability of Response

40 50 60 70 80

x-values

Response status Theta hat Mean theta hat

127 sampled businesses

55 businesses respond

Same : 0.43

Example: Logistic regression

Discrete (Count Adjustment)Assume that and for all i and j

That is, everyone has the same probability of response and the probability of response is independent between individuals (Uniform Response Mechanism)

Estimate of is

i jiij

Discrete (Count Adjustment)Non-response adjustment is

Non-response adjusted estimator is

i is s

i i i is s s

Y w y w w

Continuous (Auxiliary Data)Suppose we have auxiliary data xi and the known population total XEstimate by either

Under a Uniform Response Mechanism (URM), and provide approximately unbiased estimates

1 2ˆ ˆ or

i i i i i is s s

w x w x w x X

Continuous (Auxiliary Data)Note that leads to a two-phase estimator and to the well known ratio estimator

calibrates to the known total X

Continuous (Auxiliary Data)If we have marginal totals for 2 auxiliary variables, X and Z, one can use raking

15-30 ? ? Z1

30-65 ? ? Z2

65+ ? ? Z3

Continuous (Auxiliary Data)Raking assumes that and

Raking is an iterative procedureRake to one margin then the other

At convergence, get adjustment so that marginal totals are met

jkijk kjjk

Continuous (Auxiliary Data)Generalized Regression (GREG) estimator

Weight adjustment not really an estimate of response probability

Can show that bias is function of response probability and predictive power of X

Unbiased under URM

Continuous (Auxiliary Data)

Weight adjustment

Adjusted estimator:

i r i i i is

X X x x x

iii yawY

r i is

ˆ wX x

Weighting ClassesAssumption of URM very strong and somewhat unrealistic

Usually define weighting classesMutually exclusive and exhaustive groups C1, C2, …,CR

Assume URM within each class

How to define weighting classes?Using auxiliary data to group units so that within the weighting class

Using auxiliary data and logistic regression models

Obtain for all i

Form groups so that

iri ˆ

Weighting ClassesIf weighting class variable is good at predicting y and non-response, bias and variance will be reduced

If weighting class variable unrelated to non-response but is good predictor of y, no bias reduction but variance reduced

If weighting class variable unrelated to y, no bias reduction. Variance could increase if weighting class variable good predictor of non-response!

Imputation

Usually used for item non-response

Can be used for unit non-response also

Several methods available

Deductive imputation

Class mean imputation

Cold-deck imputation (earlier survey/ historical)

Imputation

Hot-deck imputation (current survey)Random overall imputation

Random imputation classes

Sequential hot deck

Distance function matching

Regression imputationSimplest example is ratio

Imputation, cont.

For business surveys, most commonly used methods involve auxiliary data

Historical dataIf data available from previous time period, use it with a trend (last month / last year)

If none available, use a mean imputation

Administrative data (i.e. tax)Use tax data with or without an adjustment

At Statistics Canada, annual tax data used to directly replace and monthly tax data adjusted before use

Summary

Reduce non-response at front-endFrameContact vehicleEditing

Measure non-responseFollow-up selectively and representatively

Adjust for non-responseModel (Weighting /imputing / Logistic Regression)Homogeneous classes

References

Bethlehem, J.G. (1988) reduction of Nonresponse bias through regression estimation. Journal of Official Statistics, Vol. 4, No. 3, 251-260.

Cochran, W.G. (1977): Sampling Techniques. Third Edition, Wiley, New York. Cornish J. (2004). Response Problems In Surveys: improving response and minimising the load for UNSD. Regional

Seminar on 'Good Practices in the Organization and Management of Statistical Systems’ for ASEAN countries, Yangon Myanmar, 11-13 December 2002.

DeLeeuw, Edith D (ed) (1999). Special issues on Survey Nonresponse Journal of Official Statistics 15, 2.Dillman, D. A. Procedures for Conducting Government-Sponsored Establishment Surveys: Comparisons of the Total

Design Method (TDM), a Traditional Cost- Compensation Model, and Tailored Design, Washington State University.

Ekholm, A. and Laaksonen, S. (1991). Weighting via Response Modeling in the Finnish Household Budget Survey. Journal of Official Statistics, 7, 325–337.

Elliot, M.R., Little, R.J.A., and Lewitzky, S. (2000). Subsampling Callbacks to Improve Survey Efficiency. Journal of the American Statistical Association, 95, 730–838.

Groves R M, Dillman D A, Eltinge J L & Little R J A (eds), Survey Nonresponse, 2002, Chichester: WileyHansen, M. H., and Hurwitz, W. N. (1946), The Problem of Nonresponse in Sample Surveys, Journal of the American

Statistical Association, 41, 517–529.Hedlin, D. (2003).Score Functions to Reduce Business Survey Editing at the U.K. Office for National Statistics . Journal

of Official Statistics, Vol.19, No.2, 177-199Hidiroglou, M. A, Drew, D. J, and Gray, G. B, June 1993 A frameworkfor Measuring and Reducing Nonresponse in

Surveys, Survey Methodology 19:81-94International Conference on Survey Nonresponse (1999). http://jpsm.umd.edu/icsn/papers/Index.htm.Kalton G. and Flores-Cervantes I. (2003). Weighting Methods. Journal of Official Statistics, Vol.19, No.2, 2003. pp. 81-

References

Laaksonen, S. and Chambers, R. (2006). Survey Estimation under Informative Nonresponse with Follow-up. Journal of Official Statistics, Vol. 22, No. 1, 2006, 81–95.

Latouche, M. and Berthelot, J.-M., (1992). Use of a Score Function to Prioritize and Limit Recontacts in Editing Business Surveys. Journal of Official Statistics, Vol.8, No.3, 1992. 389-400.

Lawrence D. and McKenzie R. (2000).The General Application of Significance Editing . Journal of Official Statistics, Vol.16, No.3, 243-253

Little, R. (1986). Survey Nonresponse Adjustments for Estimates of Means. International Statistical Review, 54, 139–157.

Lundstrom Sixten and Särndal C.-E. (1999). Calibration as a Standard Method for Treatment of Nonresponse. Journal of Official Statistics, Vol. 15, No. 2, 1999, 305-327.

Lynn, Peter and Clarke, Paul, Separating refusal bias and con-contact bias: evidence from UK national surveys, The Statistician, 51, Part 3, 391-333.

Madow, W.G., Nisselson, H., and Olkin, I. (eds.) (1983): Incomplete Data in Sample Sur veys. Vol. 1: Report and Case Studies. Academic Press, New York.

McKenzie, Richard. (2000). A Framework for Priority Contact of Non Respondents. In the Proceedings of The Second International Conference on Establishment Surveys, Buffalo, New York. 473 - 482.

Rao, J.N.K.(1973 ).Double sampling for stratification and survey.Biometrika ,Vol. 60, No. 1 : 125-133 Särndal, C.-E. and Swensson, B. (1987). A General View of Estimation for Two Phases of Selection with Applications to

Two-Phase Sampling and Nonresponse. International Statistical Review, 55, 279–294.Strauss, E.E., and Hidiroglou, M.A. (1984). A Follow-up Procedure for Business Census Type Surveys. In Topics in

Applied Statistics. Y.P. Chaubey and T.D. Dwivedi ed., 447-453. Published by Concordia University, Montréal.Valliant R. (2004) The Effect of Multiple Weighting Steps on Variance Estimation Journal of Official Statistics, Vol.20,

No.1, 1-18.Wang, J.E. (2004). Non-response in the Norwegian Business Tendency Survey. Statistics Norway Department of

Economic Statistics.

Score Function, cont

No follow-up on occasion t-a

Partial follow-up on occasion t-a

Full follow-up on occasion t-a

ˆ IMPNO FU j j j j

j RESP j RESP

Y t - a w t a y t - a w t a y t - a

PART FU j jj RESP

IMPj j

Y t - a w t a y t - a

w t a y t -

w t a y t - a

_FULL FU j jj RESPFULL

Y t - a w t a y t - a

Treatment Of Unit Non-response In Establishment Surveys ICES –III: June 18 -21, 2007 M.A....

Documents

Transcript of Treatment Of Unit Non-response In Establishment Surveys ICES –III: June 18 -21, 2007 M.A....

Ices liber01

CRONICA DE ESPAÑA INCIE, ICEs); ICEs),

Ices liber37

Ices - Liber97

Gal Yung Props

Yung Lyul Lee

Ices liber32

Ices liber30

Ices liber33

Ices liber40

Ices liber50

Ices liber43

ETO YUNG OPPORTUNITY!

Ices - Liber98

ICES | CIEM

Ices liber39

ICES A C ICES CM Reports/Expert Group Report/acom... · ICES WKNEPH REPORT 2009 ICES ADVISORY COMMITTEE ICES CM 2009/ACOM:33 Report of the Benchmark Workshop on Nephrops (WKNEPH)

Ices liber48

ICES/NAFO WGDEC REPORT 2015ices.dk/sites/pub/Publication Reports/Expert Group Report...ICES/NAFO WGDEC REPORT 2015 ICES A DVISORY C OMMITTEE ICES CM 2015/ACOM:27 Report of the ICES/NAFO

NTCRI and yung