1 A new version of the CALMAR calibration adjustment program.
-
Upload
leger-colin -
Category
Documents
-
view
191 -
download
10
Transcript of 1 A new version of the CALMAR calibration adjustment program.
1
A new version of the A new version of the
CALMAR CALMAR
calibration adjustment calibration adjustment
programprogram
2
The CALMAR2 The CALMAR2 macros macros
3
I.1. Background
CALMAR = CALibration on MARgins
CALMAR 1 = SAS macro program, written in 1992-1993 at
France’s INSEE by Sautory
Scope : implementing calibration methods developped by
Deville & Särndal (JASA, 1992)
CALMAR 2 = SAS macro, written in 2000 at France’s INSEE
Scope : implementing generalized calibration method for
handling total non-response (Deville, 1998)
4
I.2. What’s new in CALMAR 2
• Simultaneous calibration with 2 or 3 levels
• Total non- response adjustment using generalized calibration
• Handling collinearities between auxiliary variables
• A 5th distance function : generalized hyperbolic sine
• Interactive screens to enter parameters, thanks to
CALMAR2_GUIDE
5
SimultaneousSimultaneous ccalibrationalibration
6
Informations are collected at several levels of observation :
• households + every household’s member
or : firms + every establishment of the firms
i.e. cluster sampling survey, including questions about the clusters
• households + some of their members (Kish individuals)
i.e. two-stages sampling, including questions about the primary units (P.U.)
• households + every household’s member + Kish units
+ auxiliary information available at every level
II.1. The method
7
How performing calibration ?
• Independent calibration at every level of observation
• Simultaneous calibration (or "integrated") :
- same weights for all members of a household
- consistency between statistics obtained from varied data files
Simultaneous calibration method
A single calibration is performed at the P.U. level, after having computed the calibration variables totals defined at the secondary levels for each P.U. (Sautory, 1996)
8
II.2. An example
• households (sM sample)
• all the members of the selected households (sI sample)
• one member (Kish individual) in each selected household m, chosen by simple random sampling among
the eligible members of the household (sK sample)
Weight of the household m :
Weight of the member i of the household m :
Weight of the Kish-individual of the household m :
mm 1/πd
mmim, housidd
mk
me
mmk eddm
9
= auxiliary variables vector for each household m in
= vector of the known auxiliary variables totals in the households population
= auxiliary variables vector for each individual (m,i) in sI
= vector of the known totals in the individuals population
= auxiliary variables vector for the Kish- individual in
= vector of the known totals in the Kish-units population
mx
mUm
mxX
MU
im,z
IU
Ms
IUi
izZ
mk
Ksmkv
eIUi
ivV
Auxiliary information
eIU
10
For each household m we compute :
• the totals of the individual variables :
• the estimated totals of the Kish- individual variables :
Vector of the calibration variables for the household m :
Vector of the totals : (X , Z, V)
Calibration equations :
mmeni)(m,
im,m zz
)v,z,(x mmm
mkmm vev
Msm
mmmmmmm V),Z,(X)v,z,(xγ)vμzλxF(d
11
weights
= weight of the household m in
= weight of the individual (m,i) of the
household m in
= weight of the Kish-individual of the
household m in
The 3 samples are correctly calibrated on totals X, Z et V :
mw
Ms
mim, ww
Is
mmk ewwm mk
Ks
Km Km
m
Km
mm
skm
skmk
skmmkk Vvwvewvw
mw
I MM msi smmm
sm meniim,mim,im, Zzwzwzw
12
The user must provide the entry tables for the various
levels (sample data files and calibration variables totals
files) : the program performs all the required operations
necessary to reduce the process to a single calibration,
and creates the varied calibrated weights files.
Calmar 2 performs such simultaneous calibrations.
13
An example of An example of simultaneoussimultaneous
ccalibrationalibration
14
The survey• Sampling design : two stages sampling
– primary units = households, selected by stratified sampling with S.R.S. in the stratum
– secondary units (Kish-units) = one member per selected household, withdrawn by S.R.S. among more than 14 years old members
• Questionary– variables of interest are measured on Kish-units – questions about the habitation and the whole family– questions about each member of the household (age, sex, profession)
• Calibration variables (xk)– Households : household size + head of household professional group
+ strata (~ agglomeration size) – All individuals : sex + age group– Kish individuals : sex + age group
• Population totals (X) come from the sampling frame
15
The program
16
• %CALMAR2 (datamen=base.echant_menages,• marmen=base.marge_men,• poids=poids1,• ident=ident,• dataind=base.echant_indiv,• marind=base.marge_ind,• ident2=id,• datakish=base.echant_kish,• markish=base.marge_kish,• poidkish=nbelig,• m=1,• datapoi=poidsmen,• datapoi2=poidsind,• datapoi3=poidskish,• poidsfin=w3,• labelpoi=calage 3 niveaux, • poidskishfin=w3k,• labelpoikish=poids kish total,• edition=3)
17
The output
18
**********************************
*** PARAMÈTRES DE LA MACRO *** **********************************
TABLE(S) EN ENTRÉE : TABLE DE DONNÉES DE NIVEAU 1 DATAMEN = BASE.ECHANT_MENAGES IDENTIFIANT DU NIVEAU 1 IDENT = IDENT TABLE DE DONNÉES DE NIVEAU 2 DATAIND = BASE.ECHANT_INDIV IDENTIFIANT DU NIVEAU 2 IDENT2 = ID TABLE DES INDIVIDUS KISH DATAKISH = BASE.ECHANT_KISH PONDÉRATION INITIALE POIDS = POIDS1 FACTEUR D'ÉCHELLE ECHELLE = 1 PONDÉRATION QK PONDQK = __UN PONDÉRATION KISH POIDKISH = NBELIG
TABLE(S) DES MARGES : DE NIVEAU 1 MARMEN = BASE.MARGE_MEN DE NIVEAU 2 MARIND = BASE.MARGE_IND DE NIVEAU KISH MARKISH = BASE.MARGE_KISH MARGES EN POURCENTAGES PCT = NON EFFECTIF DANS LA POPULATION : DES ÉLÉMENTS DE NIVEAU 1 POPMEN = DES ÉLÉMENTS DE NIVEAU 2 POPIND = DES ÉLÉMENTS KISH POPKISH =
19
MÉTHODE UTILISÉE M = 1 BORNE INFÉRIEURE LO = BORNE SUPÉRIEURE UP = COEFFICIENT DU SINUS HYPERBOLIQUE ALPHA = 1 SEUIL D'ARRÊT SEUIL = 0.0001 NOMBRE MAXIMUM D'ITÉRATIONS MAXITER = 15 TRAITEMENT DES COLINÉARITÉS COLIN = NON
TABLE(S) CONTENANT LA POND. FINALE DE NIVEAU 1 DATAPOI = POIDSMEN DE NIVEAU 2 DATAPOI2 = POIDSIND DE NIVEAU KISH DATAPOI3 = POIDSKISH MISE À JOUR DE(S) TABLE(S) DATAPOI(2)(3) MISAJOUR = OUI PONDÉRATION FINALE POIDSFIN = W3 LABEL DE LA PONDÉRATION FINALE LABELPOI = CALAGE 3
NIVEAUX PONDÉRATION FINALE DES UNITES KISH POIDSKISHFIN = W3K LABEL DE LA PONDÉRATION KISH LABELPOIKISH = POIDS KISH
TOTAL CONTENU DE(S) TABLE(S) DATAPOI(2)(3) CONTPOI = OUI
ÉDITION DES RÉSULTATS EDITION = 3 ÉDITION DES POIDS EDITPOI = NON STATISTIQUES SUR LES POIDS STAT = OUI CONTRÔLES CONT = OUI TABLE CONTENANT LES OBS. ÉLIMINÉES OBSELI = NON NOTES SAS NOTES = NON
20
COMPARAISON ENTRE LES MARGES TIRÉES DE L'ÉCHANTILLON (PONDÉRATION INITIALE) ET LES MARGES DANS LA POPULATION (MARGES DU CALAGE)
MARGE MARGE POURCENTAGE POURCENTAGE VARIABLE MODALITÉ ÉCHANTILLON POPULATION ÉCHANTILLON POPULATION
NBIND 01 1525.60 1539 26.30 26.53 02 1914.37 1860 33.00 32.06 03 797.71 1000 13.75 17.24 04 930.78 885 16.05 15.26 05 365.18 361 6.30 6.22 06 267.36 156 4.61 2.69
PCSPR 1 80.70 124 1.39 2.14 2 191.78 290 3.31 5.00 3 822.81 624 14.18 10.76 4 832.34 870 14.35 15.00 5 569.41 682 9.82 11.76 6 1279.53 1237 22.06 21.32 7 1839.32 1831 31.71 31.56 8 185.11 143 3.19 2.47
STRATE 0 1453.00 1453 25.05 25.05 1 966.00 966 16.65 16.65 2 805.00 805 13.88 13.88 3 1689.00 1689 29.12 29.12 4 888.00 888 15.31 15.31
21
MARGE MARGE POURCENTAGE POURCENTAGE VARIABLE MODALITÉ ÉCHANTILLON POPULATION ÉCHANTILLON POPULATION
AGE 00-14 ans 3245.32 2857 21.46 19.52
15-24 ans 2217.86 2044 14.67 13.96
25-59 ans 6699.70 6800 44.31 46.45
60- ? ans 2957.50 2939 19.56 20.08
SEXE 1 7546.69 7108 49.91 48.55
2 7573.69 7532 50.09 51.45
AGEK A15 2155.94 2044 18.28 17.35
A25 6752.61 6800 57.25 57.71
A60 2885.84 2939 24.47 24.94
SEXEK 1 5596.30 5673 47.45 48.15
2 6198.09 6110 52.55 51.85
22
MÉTHODE : LINÉAIRE
PREMIER TABLEAU RÉCAPITULATIF DE L'ALGORITHME
LA VALEUR DU CRITÈRE D'ARRÊT ET LE NOMBRE DE POIDS NÉGATIFS APRÈS CHAQUE ITÉRATION
CRITÈRE POIDS
ITÉRATION D'ARRÊT NÉGATIFS
1 1.31960 1
2 0.00000 1
23
MÉTHODE : LINÉAIREDEUXIÈME TABLEAU RÉCAPITULATIF DE L'ALGORITHME
LES COEFFICIENTS DU VECTEUR LAMBDA DE MULTIPLICATEURS DE LAGRANGE APRÈS CHAQUE ITÉRATION
VARIABLE MODALITÉ LAMBDA1 LAMBDA2
NBIND 01 -0.15325 -0.15325 NBIND 02 -0.24295 -0.24295
NBIND 03 0.00562 0.00562
NBIND 04 -0.17355 -0.17355
NBIND 05 -0.00502 -0.00502
NBIND 06 -0.44773 -0.44773
PCSPR 1 0.92036 0.92036
PCSPR 2 0.50376 0.50376
PCSPR 3 -0.18514 -0.18514
PCSPR 4 0.15354 0.15354
PCSPR 5 0.36019 0.36019
PCSPR 6 0.08424 0.08424
PCSPR 7 0.16042 0.16042
PCSPR 8 . .
24
VARIABLE MODALITÉ LAMBDA1 LAMBDA2
STRATE 0 -0.14172 -0.14172 STRATE 1 -0.07338 -0.07338 STRATE 2 -0.12634 -0.12634 STRATE 3 -0.03106 -0.03106
STRATE 4 . . AGE 00-14 ans -0.03549 -0.03549 AGE 15-24 ans -0.65576 -0.65576 AGE 25-59 ans -0.52872 -0.52872 AGE 60- ? ans -0.64430 -0.64430 SEXE 1 -0.08395 -0.08395
SEXE 2 . . AGEK A15 0.67198 0.67198 AGEK A25 0.68366 0.68366 AGEK A60 0.74262 0.74262 SEXEK 1 0.01727 0.01727
• SEXEK 2 . .
25
MARGE MARGE POURCENTAGE POURCENTAGE VARIABLE MODALITÉ ÉCHANTILLON POPULATION ÉCHANTILLON POPULATION
NBIND 01 1539 1539 26.53 26.53 02 1860 1860 32.06 32.06 03 1000 1000 17.24 17.24 04 885 885 15.26 15.26 05 361 361 6.22 6.22 06 156 156 2.69 2.69
PCSPR 1 124 124 2.14 2.14 2 290 290 5.00 5.00 3 624 624 10.76 10.76 4 870 870 15.00 15.00 5 682 682 11.76 11.76 6 1237 1237 21.32 21.32 7 1831 1831 31.56 31.56 8 143 143 2.47 2.47 STRATE 0 1453 1453 25.05 25.05 1 966 966 16.65 16.65 2 805 805 13.88 13.88 3 1689 1689 29.12 29.12 4 888 888 15.31 15.31
COMPARAISON ENTRE LES MARGES FINALES DANS L'ÉCHANTILLON (AVEC LA PONDÉRATION FINALE)
ET LES MARGES DANS LA POPULATION (MARGES DU CALAGE)
26
MARGE MARGE POURCENTAGE POURCENTAGE
VARIABLE MODALITÉ ÉCHANTILLON POPULATION ÉCHANTILLON POPULATION
AGE 00-14 ans 2857 2857 19.52 19.52 15-24 ans 2044 2044 13.96 13.96 25-59 ans 6800 6800 46.45 46.45 60- ? ans 2939 2939 20.08 20.08
SEXE 1 7108 7108 48.55 48.55 2 7532 7532 51.45 51.45
AGEK A15 2044 2044 17.35 17.35 A25 6800 6800 57.71 57.71 A60 2939 2939 24.94 24.94
SEXEK 1 5673 5673 48.15 48.15 2 6110 6110 51.85 51.85
27
STATISTIQUES SUR LES RAPPORTS DE POIDS(= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure Variable: _F_ (RAPPORT DE POIDS)
Basic Statistical Measures
Quantiles (Definition 5) Location Variability Quantile Estimate
Mean 1.000000 Std Deviation 0.24564 100% Max 2.009262 Median 0.996533 Variance 0.06034 99% 1.745002 Mode 0.991339 Range 2.32886 95% 1.377982 Interquartile Range 0.21258 90% 1.278637 75% Q3 1.105492 50% Median 0.996533 25% Q1 0.892917 10% 0.749877 5% 0.613091 1% 0.251528 0% Min -0.319601 Extreme Observations
-------------Lowest------------- ------------Highest-----------
Value IDENT Obs Value IDENT Obs
-0.3196012 1163032100 27 1.76397 5363019600 293 0.0374385 7363016270 365 1.79618 7463000450 381 0.1498661 1169040310 73 1.85813 2369004180 129 0.1872096 7269001420 348 1.97094 5463007950 326 0.2314417 7363017990 366 2.00926 5263016110 268
28
STATISTIQUES SUR LES RAPPORTS DE POIDS (= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable: _F_ (RAPPORT DE POIDS)
Histogram # Boxplot 2.05+* 1 * .* 1 * .* 1 * .* 3 0 .* 3 0 .** 5 0 .*** 7 0 .********* 26 | .********* 27 | .******************** 59 +-----+ .************************************* 110 | + | .******************************************* 128 *-----* 0.85+******************* 57 +-----+ .*********** 33 | .****** 17 | .*** 8 0 .** 5 0 .* 3 0 .* 2 0 .* 2 * .* 1 * . . . -0.35+* 1 * ----+----+----+----+----+----+----+----+--- * may represent up to 3 counts
29
STATISTIQUES SUR LES RAPPORTS DE POIDS (= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure Variable: __WFIN (PONDÉRATION FINALE)
Basic Statistical Measures
Quantiles (Definition 5) Location Variability Quantile Estimate
Mean 11.60200 Std Deviation 4.62597 100% Max 29.19457 Median 10.11949 Variance 21.39957 99% 25.69548 Mode 9.57633 Range 32.03263 95% 20.11085 Interquartile Range 5.70090 90% 18.04434
75% Q3 13.98763 50% Median 10.11949 25% Q1 8.28672 10% 7.15056 5% 6.41373 1% 2.50660 0% Min -2.83806 Extreme Observations
-------------Lowest------------ ------------Highest-----------
Value IDENT Obs Value IDENT Obs
-2.838058 1163032100 27 25.7604 5369016540 317 0.543982 7363016270 365 26.0985 7463000450 381 1.330811 1169040310 73 28.6378 5463007950 326 1.808444 7269001420 348 28.6643 8269018030 421 2.235727 7363017990 366 29.1946 5263016110 268
30
STATISTIQUES SUR LES RAPPORTS DE POIDS (= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable: __WFIN (PONDÉRATION FINALE)
Histogram # Boxplot 29+* 3 0 .* 1 0 .** 4 0 .*** 8 0 .**** 11 | .********* 25 | .************** 41 | .*********** 32 | 13+*********************** 67 +-----+ .********************** 64 *--+--* .********************************************* 134 +-----+ .****************************** 88 | .***** 14 | .** 4 | .* 3 | . -3+* 1 0 ----+----+----+----+----+----+----+----+----+ * may represent up to 3 counts
31
MÉTHODE : LINÉAIRE RAPPORTS DE POIDS MOYENS (PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
NOMBRE D'OBSERVATIONS RAPPORT VARIABLE MODALITE DE NIVEAU 1 DE POIDS
NBIND 01 133 1.00152 NBIND 02 167 0.97304 NBIND 03 69 1.24647 NBIND 04 79 0.95151 NBIND 05 31 0.99271 NBIND 06 21 0.58818 PCSPR 1 6 1.55064 PCSPR 2 15 1.52001 PCSPR 3 73 0.76645 PCSPR 4 73 1.04281 PCSPR 5 51 1.20566 PCSPR 6 111 0.96902 PCSPR 7 157 0.99429 PCSPR 8 14 0.76191 STRATE 0 100 1.00000 STRATE 1 100 1.00000 STRATE 2 100 1.00000 STRATE 3 100 1.00000 STRATE 4 100 1.00000 ENSEMBLE 500 1.00000
32
MÉTHODE : LINÉAIRE RAPPORTS DE POIDS MOYENS (PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
NOMBRE D'OBSERVATIONS RAPPORT VARIABLE MODALITE DE NIVEAU 2 DE POIDS
AGE 00-14 an 274 0.88664 AGE 15-24 an 184 0.93210 AGE 25-59 an 581 1.01758 AGE 60- ? an 249 0.99088 SEXE 1 640 0.94443 SEXE 2 648 0.99993 ENSEMBLE 1288 0.97235
NOMBRE D'INDIVIDUS RAPPORT VARIABLE MODALITE KISH DE POIDS
AGEK A15 66 0.95043 AGEK A25 283 1.01108 AGEK A60 151 1.00090 SEXEK 1 232 0.98540 SEXEK 2 268 1.01264 ENSEMBLE 500 1.00000
33
MÉTHODE : LINÉAIRE
CONTENU DE LA TABLE poidsmen CONTENANT LA NOUVELLE PONDÉRATION w3
The CONTENTS Procedure
# Variable Type Len Pos Label 1 IDENT Char 10 8 2 w3 Num 8 0 calage 3 niveaux
CONTENU DE LA TABLE poidsind CONTENANT LA NOUVELLE PONDÉRATION w3 # Variable Type Len Pos Label 2 IDENT Char 10 20 1 id Char 12 8 3 w3 Num 8 0 calage 3 niveaux
CONTENU DE LA TABLE poidskish CONTENANT LA NOUVELLE PONDÉRATION w3
# Variable Type Len Pos Label 2 ID Char 12 26 1 IDENT Char 10 16 3 w3 Num 8 0 calage 3 niveaux 4 w3k Num 8 8 poids kish total
34
********************* *** BILAN *** *********************
* * DATE : 24 AOUT 2005 HEURE : 11:12 * * ************************************* * TABLE EN ENTRÉE : BASE.ECHANT_MENAGES * ************************************* * * NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE : 500 * NOMBRE D'OBSERVATIONS ÉLIMINÉES : 0 * NOMBRE D'OBSERVATIONS CONSERVÉES : 500 * * VARIABLE DE PONDÉRATION : POIDS1 * * NOMBRE DE VARIABLES CATÉGORIELLES : 3 * LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS : nbind (6) pcspr (8) strate (5) * * SOMME DES POIDS INITIAUX : 5801 * TAILLE DE LA POPULATION : 5801 * * * *********************************** * TABLE EN ENTRÉE : BASE.ECHANT_INDIV * *********************************** * * NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE : 1288 * NOMBRE D'OBSERVATIONS ÉLIMINÉES : 0 * NOMBRE D'OBSERVATIONS CONSERVÉES : 1288 * * NOMBRE DE VARIABLES CATÉGORIELLES : 2 * LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS : * age (4) sexe (2)
* SOMME DES POIDS INITIAUX : 15120 * TAILLE DE LA POPULATION : 14640 *
35
* *********************************** * TABLE EN ENTRÉE : BASE.ECHANT_KISH * *********************************** * * NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE : 500 * NOMBRE D'OBSERVATIONS ÉLIMINÉES : 0 * NOMBRE D'OBSERVATIONS CONSERVÉES : 500 * * VARIABLE DE PONDÉRATION CONDITIONNELLE : NBELIG * NOMBRE MAXIMUM D'UNITES SECONDAIRES PAR UP : 1 * * NOMBRE DE VARIABLES CATÉGORIELLES : 2 * LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS : agek (3) sexek (2) * * SOMME DES POIDS INITIAUX : 11794 * TAILLE DE LA POPULATION : 11783 * * * MÉTHODE UTILISÉE : LINÉAIRE * LE CALAGE A ÉTÉ RÉALISÉ EN 2 ITÉRATIONS * IL Y A 1 POIDS NÉGATIFS * LES POIDS ONT ÉTÉ STOCKÉS DANS LA VARIABLE W3 DE LA TABLE POIDSMEN * ET DE LA TABLE POIDSIND * ET DE LA TABLE POIDSKISH * LES POIDS DES UNITES KISH ONT ÉTÉ STOCKÉS DANS LA VARIABLE W3K * DE LA TABLE POIDSKISH
36
Handling Handling
total non-response total non-response
with generalized with generalized
calibrationcalibration
37
III.1. Generalized calibration
Calibration functions :
where : vector of p adjustment parameters
Calibration equations :
Solving for
s
kkk XxλzFd
sinablesknown varipofvector:zk
10Fassuch kzF
λzFdw kkk
38
= parameter estimates of the instrumental regression
of on with as instrumental variables,
weighted by
kdkx
skkkk
skkkkszx
szxHTHTireg
kkw
yzdxzdB
BXXYY
ywY
1
s
ˆwhere
ˆˆˆˆ
toequivalentally asymptoticˆ
Basic result
iregwkk YYzzF ˆˆthen,1If
szxB
ky kz
39
Ukkzx
Ukkzxkkk
kkU
kw
yzBxztqBxyE
EdEdΔYAV
Precision
szxkkk
skk
k
kw
Bxye
ededπΔ
YV
= residual of the regression of Y on X in U with the instrumental variables Z
Note : the instruments are equal to
1(0)Fifz
z(0)Fλ)zF(grad
k
k0λk
40
III.2. Calibration in case of total non-response
Calibration after adjustment for non-response
1.a. Adjustment for non-response
Response probabilities (conditionnally to s) :
is estimated referring to a response model and an estimation method
Expansion estimator :
sk/rkPpk
kp kpLM
r
kk
kexp yp1
dY
41
Examples
• Uniform response model :
• Homogeneous response groups :
• Generalized linear model :
vector of explanatory non-response variables
Note : for estimating , must be known both for
respondents AND NON-RESPONDENTS
βzH1
pk
k
hkifn
rp
h
h k
nr
pk
kz
kzkp
42
1.b. Calibration
We start from corrected weights
Conventional calibration :
*k
k
k dpd
r
kk**
k XxμxFd
43
Direct conventional calibration
rkkk XxλxFd
is equivalent to with a uniform non-response model.
Comparison between and (Dupont, 1993)
Let’s suppose :
- N.R. is corrected by a GLM, in which H is one of the usual
calibration functions F :
- non-response variables are included into calibration set of
variables .
Then : and are " similar "
kxkz
kk zF
1p
44
expHFF,zxa *kk
and are identical when :
(b) . N.R. is corrected by HRG model based on a categorical
variable X
. The sample is calibrated on the number of units in U
for
each X level
= = formal post-stratification on U
45
Direct generalized calibration
(E)
Interpretation
Response model :
(E) can be written :
r
kkk XxzHd
0kk zH
1p
λββwith
xλzFp
1d
xβzH
λβzHβzHdx
βzH
βzHβzHdX
0
kr
kk
k
rk
0k
0k0kk
rk
0k
k0kk
46
So, if the were known :
(E) = generalized calibration equation, with :
F is defined as
and such as
0k
0kk zH
zHzF
functionn calibratio F
weightsinitialp
d
k
k
kp
*kk
0k
0k0k zz
zHzH
zFgrad:sinstrument
10F
47
Precision
• uses the residuals in the population
• uses the residuals of the instrumental regression
in r, weighted by the :
estimator for if response probabilities
were known
wYAV
Uxzkk
*k
xzkkk
0Bxyzwhere
BxyE
*
*
wYV
0kk zHd
0xrz*B
xz*B 0k1 zH
rx0rzkk
*k0kk
x0rzkkk
0BxyzβzHdwhere
Bxye
*
*
Response probabilities are unknown
"estimate" and the residuals :
i.e. instrumental regression weighted by final weights
Note : looks like
= estimated variance 1st phase (sample s selection)
= estimated variance 2nd phase (respondents r "selection")
0xrz*B
rxrzkk
*kkk
xrzkkk
0BxyzβzHdwhere
Bxye
*
*
ˆzHdw kkk
wYV
)e(Q)e(QYV k2k1w
1Q
2Q
49
• allows non–response correction even when explanatory variables are only known for respondents
• Handles the particular situation in which non-response explanatory variables are variables of interest (non ignorable response mechanism )
• reduces the bias produced by non–response thanks to variables , and reduces the variance thanks to variables
Properties of the method
kz kx
This method is performed in Calmar 2.
50
An example of An example of generalizedgeneralized c calibrationalibration
51
The survey• Sampling frame : population census (1990)• Sampling design : cluster sampling
– clusters = households– secondary units = all members of selected households
• Response model– H.R.G. – response variables = household size (alone or not) + head of household profession (6 levels) + strata (~ agglomeration size)
• Calibration variables (xk)– Households : the same as before (in the sampling frame)– Individuals : sex + age group (in the sampling frame)– Simultaneous calibration with two levels
• Instrumental variables (zk)– Response variables as they are measured in the survey, that is in 1996
52
The population totals data
Constraint : the xk and zk vectors must have same dimension
• Primary units (households)
var n R mar1 mar2 mar3 mar4 mar5 mar6
strate90 5 0 1314 833 704 1477 777 . seul90 2 0 3933 1172 . . . . cs90 6 0 457 470 537 435 1254 1952 strate96 5 1 . . . . . . seul96 2 1 . . . . . . cs96 6 1 . . . . . .
• Secondary units (individuals)
var n R mar1 mar2 mar3 mar4
sexe 2 0 6255 6628 . . age 4 0 2514 1799 5984 2586 sexe_bis 2 1 . . . . age_bis 4 1 . . . .
53
%calmar2_guide
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Merci de votre attention !