Post on 20-Jul-2020
Bosna i HercegovinaAgencija za statistiku Bosne i Hercegovine
Bosna i HercegovinaAgencija za statistiku Bosne i Hercegovine
Some quality indicators of editing and imputations in
Household Budget Survey in Bosnia and Herzegovina
CONFERENCE OF EUROPEAN STATISTICIANS
Workshop on Statistical Data EditingNeuchâtel, Switzerland, 18-20 September 2018
Edin Šabanović, Assistant Director
Sector for Statistical Methodology, Standards, Planning, Quality and Coordination
Agency for Statistics of Bosnia and Herzegovina
Content
• Editing and imputation in GSBPM
• Indicators of raw data for measuring the quality of editing
- Indicators for missingness of data
- Indicators for impact of observation
• Indicators of raw data for measuring the quality of editing from 2015 Household Budget Survey
• Conclusions and future steps
2
Editing and imputation in GSBPM
Quality Management/Metadata Management
1. Specify needs
2. Design
3. Build
4. Collect
5. Process
6. Analyse
7. Disseminate
8. Evaluate
1.1. Identify needs
2.1. Design outputs
3.1. Build collection
instruments
4.1.Create frame and select
sample
5.1. Integrate data
6.1. Prepare draft output
7.1. Update output system
8.1. Gather evaluation
inputs
1.2. Consult and confirm
need
2.2. Design variable
description
3.2. Build or enhance
process components
4.2. Set up collection
5.2. Classify and code
6.2. Validate outputs
7.2. Produce dissemination
products
8.2. Conduct evaluation
1.3. Establish output
objectives
2.3.
Design collection
3.3. Build or enhance
dissemination components
4.3.
Run collection
5.3.
Review and validate
6.3. Interpret and explain
outputs
7.3. Manage release of
dissemination products
8.3.
Agree an action plan
1.4. Identify concepts
2.4.
Design frame and sample
3.4. Configure workflows
4.4. Finalise collection
5.4. Edit and impute
6.4. Apply disclosure control
7.4.
Promote dissemination products
1.5. Check data availability
2.5. Design processing and
analysis
3.5. Test production system
5.5. Derive new variables
and units
6.5. Finalise outputs
7.5. Manage user support
1.6. Prepare business case
2.6. Design production
system and workflow
3.6. Test statistical business
process
5.6. Claculate weights
3.7. Finalise production
system
5.7. Calculate aggregates
5.8. Finalise data files
3
Indicators for missingness of data (1)
• Unweighted item rasponse rate:
• Weighted item rasponse rate:
4
gmisisif
observedisif
y
yR
ij
ij
ijsin,0
,1
n
iijR
I
1
iij
iij
w
RwI
ij
2
Indicators for missingness of data (2)• Weighted rasponse rate for variable yj proportioned with
auxiliary variable x:
• Number of observations with at least one missing value:
5
iiij
iij
xw
RxwI
iij
3
i jijRI 1
4
Indicators for missingness of data (3)
• Rate of observations with at least one missing value:
• Missingness proportion:
6
n
i jijR
I
1
5
k
jijR
I
1
6
Indicators for missingness of data (4)
• Average proportion of missing values:
• Ratio of item non-response estimated and survey weightestimated totals of x:
• Proportion of variation of item-non response estimated andsurvey weight estimated totals of x:
7
nk
i jijR
I
1
7
iii
i ij
xw
RxwI
iij*
8
iii
i iii
ij
xw
xwRxwI
iij*
9
Indicators for impact of observation (1)
• Significance of each individual observation yij in sum ofvariable yj:
• Significance of each individual observation yij subgroup q:
8
iij
ij
y
yI 10
iij
qiij
y
yI 11
Indicators for impact of observation (2)
• Significance of each weight wi in sum of weights:
• Significance of each weighted observation wiyij in estimate oftotal variable yj:
9
ii
ij
ww
I 12
iiji
ijij
yw
ywI 13
Indicators of raw data from 2015 HSB BiH (1)Descriptive statistics of weighted item response rates:
• The average item response rate is about 25.5% showing low response forconsumption variables in the diary of purchase.
• The consumption of cigars had the lowest response rate, while theconsumption of coffee shown the highest response rate.
• Low response rates are caused, inter alia, by impossibility to definestructurally missing values.
10
NValid 135
Missing 0
Mean 25.5507
Median 19.4900
Mode 2.40
Minimum 0.46
Maximum 81.48
Percentiles
1 0.5176
5 1.7820
25 8.2100
50 19.4900
75 37.0800
95 69.9680
Indicators of raw data from 2015 HSB BiH (2)
• Descriptive statistics of weighted response rates forconsumption variables proportioned with household size:
• The extreme values of this indicators are again present by theconsumption of cigars (lowest value) and coffee (highestvalue).
11
N
Valid 135
Missing 0
Mean 27.3696
Median 20.7900
Mode 0.50a
Minimum 0.50
Maximum 82.80
Percentiles
1 0.5504
5 2.0220
25 9.5600
50 20.7900
75 42.6000
95 69.9220
Indicators of raw data from 2015 HSB BiH (3)• Descriptive statistics on missingness proportion by households:
• Average proportion of missing values (indicator I7) is 74.55% and itis considered as very high.
12
NValid 7702
Missing 0
Mean 74.55
Median 75.56
Mode 76.30
Minimum 11.85
Maximum 100.00
Percentiles
1 40.74
5 51.11
25 66.67
50 75.56
75 83.70
95 93.33
Indicators of raw data from 2015 HSB BiH (4)• Descriptive statistics of weighted significance of individual
observation in total of variable:
• The lowest weighted significance of individual observation intotal of variable is shown by the consumption of coffee (0.0157),while the highest value of this indicator is present by theconsumption of cigars (20.9144), which is consistent with valuesof previous quality indicators for these two variables 13
DA_A_01
535013 498439 0.0250 0.0187 0.02 0.01 0.32 0.0085 0.0102 0.0149 0.0187 0.0289 0.0564
DA_A_02
502088 531363 0.0262 0.0298 0.03 0.00 0.24 0.0017 0.0026 0.0091 0.0298 0.0326 0.0630
DA_A_03
192237 841215 0.0688 0.0316 0.03 0.01 0.76 0.0111 0.0158 0.0221 0.0316 0.0671 0.3000
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
DA_U_02
52693 980758 0.2588 0.1183 0.23 0.00 4.55 0.0159 0.0262 0.0637 0.1183 0.2629 0.9104
DA_U_02
96728 936724 0.1380 0.0908 0.04 0.01 2.09 0.0182 0.0182 0.0454 0.0908 0.1635 0.4540
DA_U_02
47424 986028 0.2737 0.1862 0.04 0.02 2.50 0.0208 0.0374 0.0822 0.1862 0.3369 0.8320
Indicators of raw data from 2015 HSB BiH (5)• Descriptive statistics of significance of each weight in the sum of
weights, 2015 HBS BiH:
• The lowest significance of weight is shown by households enumeratedin rural area of Brcko district BiH in third quarter 2015 (0.0037%), whilethe highest significance of weight is present by households enumeratedin urban areas of Federation BiH during the second quarter 2015(0.022%). The average significance of weight is 0.013%.
14
N
Valid7702
Missing0
Mean0.000130
Median0.000127
Mode0.000105
Minimum0.000037
Maximum0.000222
Percentiles
10.000038
50.000068
250.000115
500.000127
750.000138
950.000184
Conslusions and future steps• Editing quality indicators can be measured in different phases
of the GSBPM
• Indicators of raw data were presented within 2015 HBS BiH
• Selection of indicators calculated was limited by the absence ofediting process documentation, which caused lack of specificinformation on edit failures, detection of erroneous units orstructurally missing values
• Results must be considered as overestimation of poor qualityof data
• This is only first step in measuring editing quality in HBS
Further steps:
• Questionnaire design, which must allow definition ofstructurally missing values
• Better documentation of all processes and sub-processeswithin editing and imputation stages
15
Thank you for your attention!
Edin Šabanovićedin.sabanovic@bhas.gov. ba