12 Scaling outcomes - OECD...by-country cells through the total item-by-country cells. Note that the...
Transcript of 12 Scaling outcomes - OECD...by-country cells through the total item-by-country cells. Note that the...
12
PISA 2015 TECHNICAL REPORT © OECD 2017 225
Scaling outcomes
Results of the IRT scaling and population modeling ............................................ 226
Transforming the plausible values to PISA scales .................................................. 233
Linking error ........................................................................................................... 237
International characteristics of the item pool ....................................................... 237
The statistical data for Israel are supplied by and under the responsibility of the relevant Israeli authorities. The use of such data by the OECD is without prejudice to the status of the Golan Heights, East Jerusalem and Israeli settlements in the West Bank under the terms of international law.
12SCALING OUTCOMES
226 © OECD 2017 PISA 2015 TECHNICAL REPORT
This chapter illustrates the outcomes of applying the item response theory (IRT) scaling and population model for the generation of plausible values to the PISA 2015 main survey assessment data. In the item response theory (IRT) scaling stage, all available items and data from prior PISA cycles (2006, 2009, 2012) were scaled together with the 2015 data via a concurrent calibration using country-by-language-by-cycle groups. However, only results based on the item parameters for the 2015 items are presented here.
RESULTS OF THE IRT SCALING AND POPULATION MODELINGThe linking design for the PISA main survey was aimed at establishing comparability across countries, languages, assessment modes (paper-based and computer-based assessments), and between the 2015 PISA cycle and previous PISA cycles (as far back as 2006, which had been the last time that science was the major domain). By imposing constraints on the item parameters in the item response scaling, the estimated parameters for trend and new items were placed on the same scale, along with items that were used in previous PISA cycles (but not selected for 2015). An additional outcome of the item response theory scaling is that paper-based (PBA) and computer-based (CBA) assessment items can be placed on the same scale. The items generally fit well across countries, allowing for the use of common international item parameters. These international (or common) parameters are what allow for comparability of results across countries and years. However, there are cases where the international item parameters for a given item do not fit well for a particular country or language group, or subset of countries or language groups. In these instances (i.e. when there is item misfit), which imply interactions in certain groups (e.g. item-by-country/language interactions, item-by-mode interactions, item-by-cycle interactions), item constraints were released to allow the estimation of unique item parameters. This was done for a relatively small number of cases across items and groups.
Unique item parameter estimation and national item deletionThe item response theory calibration for the PISA 2015 main survey data was carried out separately for each of the PISA 2015 domains (reading, mathematical, science, financial literacy, and collaborative problem solving). Both science (as the main domain in PISA 2015) and collaborative problem solving (CPS) (as a new domain in PISA 2015) included new items; science also included trend items. All of the other domains included trend items only. Item fit was evaluated using the mean deviation and the root mean squared deviation. Both deviations were calculated for all items in each country-language group for each mode and PISA cycle.
The final item parameters were estimated based on a concurrent calibration using the data from PISA 2015 as well as from previous PISA cycles going back to 2006. There were only a few items in mathematics and collaborative problem solving that had to be excluded from the item response theory analyses (in all country-by-language-by-cycle groups) due to either almost no response variance, scoring or technical issues (either problems with the delivery platform or with the coding on the platform), or very low or even negative item total correlations; Table 12.1 gives an overview of these items.
Table 12.1 Items that were excluded from the IRT analyses
Domain Item Mode Reason
Maths (1 item) CM192Q01 CBA Technical issue
CPS (4 items) CC104104CC104303CC102208CC105405
CBA Very few responses in category 0Technical issue
Very few responses in category 0Low and negative item-total correlation (correlation close to zero)
Note: The problems observed for the items in the table were shown over all countries.
The international/common item parameters and unique national item parameters were estimated for each domain using unidimensional multigroup item response theory models. For analysis purposes, the international/common item parameters are divided into two groups: scalar invariant and metric invariant parameters. Scalar invariant items correspond to items where the slope and threshold parameters are constrained to be the same in both paper-based and computer-based modes. Metric invariant items correspond to items where the slope is constrained to be the same, but the threshold differs across modes. For new items from science and collaborative problem solving, there are no metric invariant item parameters because these were administered only as part of the computer-based assessment; for financial literacy, all items were constrained to be scalar invariant. As such, only scalar invariant percentages are reported in these domains. For each domain, the scalar and metric invariant item parameters represent the stable linked items between the previous and PISA 2015 scales; the unique parameters are included to reduce measurement error. Table 12.2 shows
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 227
the percentage of common and unique item parameters by domain computed by dividing the number of unique item-by-country cells through the total item-by-country cells. Note that the percentage of scalar/metric invariant international/common item parameters was above 90% in cognitive domains with the exception of reading and science. Further, only a small number of items received unique item parameters (either group-specific or the same parameters across a subset of groups) except for reading. In reading, the proportion of scalar/metric invariant international/common item parameters was 89.01%, the proportion of group-specific item parameters was 3.01%, and 7.98% received the same unique item parameters across a subset of countries. For trend items in science, 89.70% received scalar/metric invariant international/common item parameters, while 2.62% received group-specific item parameters, and 7.68% received the same parameters across a subset of countries.
Table 12.2 Percentage of common and unique item parameters in each domain for PISA 2015
Maths Reading Science trend Science new CPS Financial literacy
% of unique item parameters (group-specific) 2.16% 3.01% 2.62% 2.05% 1.85% 4.40%
% of unique item parameters (same parameters across a subset of groups)
3.36% 7.98% 7.68% 4.60% 3.19% 2.69%
% of metric invariant common/international item parameters
33.22% 30.33% 20.96% N/A N/A N/A
% of scalar invariant common/international item parameters
61.25% 58.68% 68.74% 93.35% 94.96% 92.91%
Mode and number of items in the PISA 2015 main survey
PBA: 83 items, CBA: 81 items
PBA: 103 items, CBA: 103 items
PBA: 85 items, CBA: 85 items CBA: 99 items CBA: 117 items CBA: 43 items
Note: Interactions go across modes and cycles; Kazakhstan is not included due to adjudication issues.
An overview of the proportions of international/common (invariant) item parameters and group-specific item parameters in each domain for each relevant assessment cycle is given in Figures 12.1 to 12.6. The figures also provide an overview of the proportion of scalar invariant item parameters (items sharing common difficulty and slope parameters across modes) and partially or metric invariant item parameters (items sharing common slope parameters across modes) with regard to the mode effect modeling described in Chapter 9: dark blue indicates scalar invariant item parameters, light grey (the lighter grey above the horizontal line) indicates metric invariant item parameters, medium blue indicates scalar invariant item parameters for a subset of groups (unique parameters different from the common parameter, but for several groups sharing the same unique parameter), and dark grey indicates group-specific item parameters. In addition, Annex H provides information about which trend items are scalar invariant and which are partially or metric invariant for each cognitive domain. Recall that both scalar and metric invariant item parameters (dark blue and light grey) contribute to improve the comparability across groups, while unique item parameters (medium blue and dark grey) contribute to the reduction of measurement error. Across every cycle and every domain, it is clear that international/common (invariant) item parameters dominate and only a small proportion of the item parameters are group-specific (i.e. dark grey). Results show that the overall item fit in each domain for each group is very good, resulting in a small numbers of unique item parameters and high comparability of the data. There was no consistent pattern of deviations for any one particular country-by-language group. The results also illustrate that the trend items show good fit, ensuring the quality of the trend measure across different assessment cycles (2015 data versus 2006-2012), different assessment modes (PBA versus CBA), and even across different countries and languages. An overview of the number of deviations per item across all country-by-language-by-cycle groups for items in each domain is given in Annex G.
After the IRT scaling was finalised, item parameter estimates were delivered to each country, including an indication of which items received international/common item parameters and which received unique item parameters. Table 12.3 gives an example of the information provided to countries: the first column shows the domain; the second column shows the flag that indicates whether an item received a unique parameter or was excluded from the IRT scaling; and the remaining columns show the final item parameter estimates (for each item, the slope, difficulty and threshold parameters for polytomous items were listed). A slope parameter of 1 indicates that a Rasch model was fitted for these items; slope estimates different from 1 indicate that the two-parameter logistic model (2PLM) was fitted.
12SCALING OUTCOMES
228 © OECD 2017 PISA 2015 TECHNICAL REPORT
Math 2006
Freq
uenc
y50
3010
1030
5070
90
Total item-by-group pairs = 2531 Scalar invariant subset pairs = 109 (4.31%)Country-specific pairs = 16 (0.63%)
Scalar invariant pairs = 2406 (95.06%)
Math 2009
Freq
uenc
y50
3010
1030
5070
90
Total item-by-group pairs = 3059 Scalar invariant subset pairs = 129 (4.22%)Country-specific pairs = 16 (0.52%)
Scalar invariant pairs = 2914 (95.26%)
Math 2012
Freq
uenc
y50
3010
1030
5070
90
Total item-by-group pairs = 5998 Scalar invariant subset pairs = 292 (4.87%)Country-specific pairs = 57 (0.95%)
Scalar invariant pairs = 5649 (94.18%)
Math 2015
Freq
uenc
y50
3010
1030
5070
90 Scalar invariant Metric invariantScalar invariant Subset Country-specific
Total item-by-group pairs = 7022 Scalar invariant subset pairs = 236 (3.36%)Country-specific pairs = 152 (2.16%)
Scalar/metric invariant pairs = 6634 (94.47%)
Reading 2006
Freq
uenc
y60
4020
020
4060
8010
012
0
Total item-by-group pairs = 2113 Scalar invariant subset pairs = 229 (10.84%)Country-specific pairs = 24 (1.14%)
Scalar invariant pairs = 1860 (88.03%)
Reading 2009
Freq
uenc
y60
4020
020
4060
8010
012
0
Total item-by-group pairs = 8153 Scalar invariant subset pairs = 696 (8.54%)Country-specific pairs = 106 (1.3%)
Scalar invariant pairs = 7351 (90.16%)
Reading 2012
Freq
uenc
y60
4020
020
4060
8010
012
0
Total item-by-group pairs = 3801 Scalar invariant subset pairs = 290 (7.63%)Country-specific pairs = 19 (0.5%)
Scalar invariant pairs = 3492 (91.87%)
Reading 2015
Freq
uenc
y60
4020
020
4060
8010
012
0
Total item-by-group pairs = 8912 Scalar invariant subset pairs = 711 (7.98%)Country-specific pairs = 268 (3.01%)
Scalar/metric invariant pairs = 7933 (89.01%)
Scalar invariant Metric invariantScalar invariant subset Country-specific
• Figure 12.1 •Frequencies of international (invariant) and unique item parameters in maths
(note that frequencies were counted using item-by-group pairs)
• Figure 12.2 •Frequencies of international (invariant) and unique item parameters in reading
(note that frequencies were counted using item-by-group pairs)
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 229
Science 2006
Freq
uenc
y70
5030
1010
3050
7090
110
Total item-by-group pairs = 6148 Scalar invariant subset pairs = 585 (9.52%)Country-specific pairs = 60 (0.98%)
Scalar invariant pairs = 5503 (89.51%)
Science 2009
Freq
uenc
y70
5030
1010
3050
7090
110
Total item-by-group pairs = 5079 Scalar invariant subset pairs = 473 (9.31%)Country-specific pairs = 25 (0.49%)
Scalar invariant pairs = 4581 (90.19%)
Science 2012
Freq
uenc
y70
5030
1010
3050
7090
110
Total item-by-group pairs = 4602 Scalar invariant subset pairs = 438 (9.52%)Country-specific pairs = 34 (0.74%)
Scalar invariant pairs = 4130 (89.74%)
Science 2015
Freq
uenc
y70
5030
1010
3050
7090
110
Total item-by-group pairs = 8650 Scalar invariant subset pairs = 664 (7.68%)Country-specific pairs = 227 (2.62%)
Scalar/metric invariant pairs = 7759 (89.7%)
Scalar invariant Metric invariantScalar invariant subset Country-specific
New Science 2015
Freq
uenc
y
5030
1010
3050
7090
110
Scalar invariant Scalar invariant subset Country-specific
Total item-by-group pairs = 8213
Scalar/metric invariant pairs = 7667 (93.35%)
Scalar invariant subset pairs = 378 (4.6%)
Country-specific pairs = 168 (2.05%)
• Figure 12.3 •Frequencies of international (invariant) and unique item in trend science
(note that frequencies were counted using item-by-group pairs)
• Figure 12.4 •Frequencies of international (invariant) and unique item in new science
(note that frequencies were counted using item-by-group pairs)
12SCALING OUTCOMES
230 © OECD 2017 PISA 2015 TECHNICAL REPORT
Collaborative problem solving 2015
Freq
uenc
y
4020
020
4060
8010
012
014
0
Total item-by-group pairs = 8742
Scalar/metric invariant pairs = 8301 (94.96%)
Scalar invariant subset pairs = 279 (3.19%)
Country-specific pairs = 162 (1.85%)
Scalar invariant Scalar invariant subset Country-specific
Financial literacy 2015
Freq
uenc
y
200
2040
60
Total item-by-group pairs = 818
Scalar/metric invariant pairs = 760 (92.91%)
Scalar invariant subset pairs = 22 (2.69%)
Country-specific pairs = 36 (4.4%)
Scalar invariant Scalar invariant subset Country-specific
• Figure 12.5 •Frequencies of international (invariant) and unique item in CPS (note that frequencies were counted using item-by-group pairs)
• Figure 12.6 •Frequencies of international (invariant) and unique item in financial literacy
(note that frequencies were counted using item-by-group pairs)
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 231
Table 12.3 Example of table for item parameter estimates provided to the countries
Domain Flag Item Slope Difficulty IRT_Step1 IRT_Step2
Maths PM00GQ01 1 1.62226
Maths PM00KQ02 1 1.11572
Maths PM033Q01S 1 -0.95604
Maths PM034Q01S 1 0.15781
Maths Unique item parameters PM155Q01 1.42972 -0.35538
Maths PM155Q02 1 -0.35727 -0.42436 0.42436
Maths PM155Q03 1.08678 0.73497 -0.20119 0.20119
Maths PM155Q04S 1 -0.27556
Maths PM192Q01S 1 0.20948
Maths Excluded from scaling PM936Q01
Generating student scale scores and reliability of the PISA scalesGiven the rotated and incomplete assessment design, it is not possible to calculate marginal reliabilities for each cognitive domain. In order to get an indication of test reliability, the explained variance (i.e. variance explained by the model) for each cognitive domain was computed based on the weighted posterior variance. The variance is computed using all 10 plausible values as follows: 1 – (expected error variance/total variance). The weighted posterior variance is an expression of the posterior measurement error and is obtained through the population modeling. The expected error variance is the weighted average of the posteriori variance. This term was estimated using the weighted average of the variance of the plausible values (the posteriori variance is the variance across the 10 plausible values). The total variance was estimated using a resampling approach (Efron, 1982). It was estimated for each country depending on the country-specific proficiency distributions for each cognitive domain.
Applying the conditioning approach described in Chapter 9 and anchoring all of the item parameters at the values obtained from the final IRT scaling, plausible values were generated for all sampled students. Table 12.4 gives the median of national reliabilities for the generated scale scores based on all 10 plausible values. National reliabilities of the main cognitive domains based on all 10 plausible values are presented in Table 12.5.
Table 12.4 Reliabilities of the PISA cognitive domains and Science subscales overall countries1
Mode Domains Median S.D. Max Min
CBA
Maths 0.85 0.03 0.90 0.75
Reading 0.87 0.02 0.90 0.80
Science 0.91 0.02 0.93 0.82
CPS 0.78 0.03 0.83 0.70
Financial literacy 0.83 0.06 0.93 0.72
Science subscales
Explain phenomena scientifically 0.89 0.03 0.91 0.80
Evaluate and design scientific inquiry 0.87 0.04 0.90 0.71
Interpret data and evidence scientifically 0.89 0.03 0.92 0.78
Content 0.89 0.02 0.91 0.81
Procedural & epistemic 0.90 0.03 0.92 0.78
Earth & science 0.88 0.03 0.90 0.77
Living 0.89 0.03 0.91 0.79
Physical 0.88 0.03 0.91 0.76
PBA
Maths 0.80 0.05 0.87 0.67
Reading 0.82 0.04 0.88 0.72
Science 0.86 0.04 0.92 0.77
1. Please note that Argentina, Malaysia, and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
12SCALING OUTCOMES
232 © OECD 2017 PISA 2015 TECHNICAL REPORT
Table 12.5[Part 1/2]National reliabilities for main cognitive domains
Mode Country/economy Maths Reading Science CPS Financial literacy
CBA Australia 0.84 0.86 0.92 0.76 0.93
CBA Austria 0.87 0.88 0.93 0.80 –
CBA Belgium 0.89 0.89 0.93 0.80 0.87
CBA Brazil 0.78 0.82 0.87 0.71 0.72
CBA Bulgaria 0.85 0.88 0.92 0.82 –
CBA Canada 0.83 0.83 0.91 0.74 0.76
CBA Chile 0.86 0.86 0.91 0.78 0.83
CBA B-S-J-G (China)1 0.90 0.90 0.93 0.83 0.88
CBA Colombia 0.82 0.87 0.89 0.76 –
CBA Costa Rica 0.78 0.83 0.85 0.70 –
CBA Croatia 0.86 0.87 0.91 0.76 –
CBA Cyprus2 0.84 0.84 0.90 0.74 –
CBA Czech Republic 0.88 0.88 0.92 0.77 –
CBA Denmark 0.85 0.86 0.91 0.78 –
CBA Dominican Republic 0.81 0.86 0.84 – –
CBA Estonia 0.85 0.86 0.91 0.79 –
CBA Finland 0.85 0.87 0.91 0.77 –
CBA France 0.88 0.89 0.93 0.77 –
CBA Germany 0.86 0.86 0.92 0.76 –
CBA Greece 0.86 0.87 0.91 0.79 –
CBA Hong Kong (China) 0.84 0.85 0.90 0.77 –
CBA Hungary 0.88 0.89 0.92 0.81 –
CBA Iceland 0.83 0.86 0.91 0.76 –
CBA Ireland 0.85 0.87 0.91 – –
CBA Israel 0.87 0.88 0.92 0.83 –
CBA Italy 0.87 0.87 0.91 0.80 0.81
CBA Japan 0.85 0.85 0.91 0.75 –
CBA Korea 0.85 0.85 0.91 0.78 –
CBA Latvia 0.85 0.86 0.90 0.75 –
CBA Lithuania 0.85 0.87 0.91 0.80 0.83
CBA Luxembourg 0.87 0.89 0.93 0.77 –
CBA Macao (China) 0.82 0.86 0.90 0.78 –
CBA Malaysia 0.86 0.87 0.90 0.79 –
CBA Mexico 0.79 0.84 0.86 0.75 –
CBA Montenegro 0.80 0.84 0.88 0.74 –
CBA Netherlands 0.89 0.89 0.93 0.79 0.88
CBA New Zealand 0.85 0.86 0.92 0.79 –
CBA Norway 0.84 0.85 0.91 0.75 –
CBA Peru 0.82 0.88 0.87 0.78 0.87
CBA Poland 0.86 0.87 0.92 – 0.83
CBA Portugal 0.87 0.86 0.92 0.78 –
CBA Qatar 0.85 0.89 0.91 – –
CBA Russia 0.78 0.80 0.88 0.75 0.73
CBA Singapore 0.87 0.88 0.93 0.79 –
CBA Slovak Republic 0.86 0.89 0.92 0.77 0.76
CBA Slovenia 0.88 0.89 0.93 0.79 –
CBA Spain 0.86 0.86 0.91 0.75 0.81
CBA Sweden 0.85 0.86 0.92 0.78 –
CBA Switzerland 0.86 0.88 0.92 – –
CBA Chinese Taipei 0.87 0.88 0.93 0.78 –
CBA Thailand 0.81 0.86 0.88 0.83 –
CBA Tunisia 0.75 0.80 0.82 0.70 –
CBA Turkey 0.82 0.85 0.89 0.74 –
CBA United Arab Emirates 0.83 0.87 0.91 0.80 –
CBA United Kingdom 0.87 0.88 0.92 0.83 –
CBA United States 0.87 0.88 0.92 0.81 0.87
CBA Uruguay 0.85 0.87 0.90 0.78 –
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 233
Table 12.5[Part 2/2]National reliabilities for main cognitive domains
Mode Country/economy Maths Reading Science CPS Financial literacy
PBA Albania 0.75 0.79 0.84 – –
PBA Algeria 0.67 0.72 0.77 – –
PBA Argentina 0.79 0.82 0.85 – –
PBA FYROM 0.79 0.79 0.84 – –
PBA Georgia 0.83 0.83 0.86 – –
PBA Indonesia 0.78 0.77 0.82 – –
PBA Jordan 0.78 0.82 0.86 – –
PBA Kazakhstan 0.73 0.71 0.78 – –
PBA Kosovo 0.80 0.81 0.82 – –
PBA Lebanon 0.82 0.85 0.86 – –
PBA Malta 0.87 0.88 0.92 – –
PBA Moldova 0.78 0.83 0.86 – –
PBA Romania 0.80 0.82 0.86 – –
PBA Trinidad and Tobago 0.86 0.84 0.88 – –
PBA Viet Nam 0.83 0.84 0.87 – –
1. B-S-J-G (China) data represent the regions of Beijing, Shanghai, Jiangsu, and Guangdong.2. Note by Turkey: The information in this document with reference to “Cyprus” relates to the southern part of the Island. There is no single authority representing both Turkish and Greek Cypriot people on the Island. Turkey recognizes the Turkish Republic of Northern Cyprus (TRNC). Until a lasting and equitable solution is found within the context of the United Nations, Turkey shall preserve its position concerning the “Cyprus issue.”Note by all the European Union Member States of the OECD and the European Union: The Republic of Cyprus is recognised by all members of the United Nations with the exception of Turkey. The information in this document relates to the area under the effective control of the Government of the Republic of Cyprus.
The table above shows that the explained variance by the combined IRT and latent regression model (population or conditioning model) is at a comparable level across countries. While the population model reaches levels of above 0.80 for reading, mathematics and science, it is important to keep in mind that this is not to be confused with a classical reliability coefficient, as it is based on more than the item responses. Comparisons among individual students are not appropriate because the apparent accuracy of the measures is obtained by statistically adjusting the estimates based on background data. This approach does provide improved behavior of subgroup estimates, even if the plausible values obtained using this methodology are not suitable for comparisons of individuals (e.g. Mislevy & Sheehan, 1987; von Davier et al., 2006).
TRANSFORMING THE PLAUSIBLE VALUES TO PISA SCALESThe plausible values were transformed using a linear transformation to form a scale that is linked to the historic PISA scale. This scale can be used to compare the overall performance of countries or subgroups within a country.
For science, reading and mathematics, country results from the 2006, 2009 and 2012 PISA cycles for OECD countries were used to compute the transformation coefficients for each content domain separately. The country means and variances used to compute the transformation coefficients included only those values from the cycle in which a given content domain was the major domain. Hence, the transformation coefficients for science are based on the 2006 reported and model-based results, reading coefficients are based on the 2009 results, and mathematics coefficients are based on the 2012 results. Only the results for countries designated as OECD countries in the respective PISA reporting cycle were used to compute the transformation coefficients. If mYij is the reported mean for country i in cycle j, mXij is the model-based mean obtained from the concurrent calibration using the software mdltm, and s2
Yij and s2Xij are the reported
and model-based score variances respectively. The same transformation was used for all plausible values (within a given domain). The transformation coefficients for a given content domain were computed as:
12.1
A =τYjτXj
12.2
B = mYj – AmXj
12SCALING OUTCOMES
234 © OECD 2017 PISA 2015 TECHNICAL REPORT
12.3
τYj = τYj2 =
1nj – 1
mYij – mYj2
nj
i = 1
+1nj
SYij2
nj
i = 1
12.4
τXj = τXj2 =
1nj – 1
mXij – mXj2
nj
i = 1
+1nj
SXij2
nj
i = 1
2006 Sciencewhere = { 2009 Reading 2012 Maths
The values and mYj and mXj are grand means of the reported and model-based country means in cycle j, respectively. The terms τ2
Yj and τ2Xj correspond to the total variance, defined as the variance of the country means, plus the mean of
the country variances respectively. The square root of these terms is taken to compute the standard deviations τYj and τXj. The 2015 plausible values (PVs) for examinee k in country i were transformed to the PISA scale via the following transformation:
12.5
PVTik = A × PVUik + B
The subscripts T and U correspond to the transformed and untransformed values respectively.
For financial literacy, country results from the 2012 PISA cycle were used to compute the transformation coefficients. The method used to compute the coefficients is the same as that used for reading, mathematics and science. The key distinction is that in reading, mathematics and science, only results for OECD countries were used to compute the coefficients, whereas, for financial literacy, all available country data were used to compute the coefficients. This decision was made because there were too few OECD countries to provide a defensible transformation of the results. The plausible values for financial literacy were transformed using the same linear transformation as for reading, mathematics and science.
A new scale for CPS was established in PISA 2015. Consistent with the introduction of content domains in previous PISA cycles, transformation coefficients for CPS were computed such that the plausible values for OECD countries have a mean of 500 and a standard deviation of 100. The 10 sets of plausible values were stacked together and the weighted mean and variance (and by extension SD) were computed. Stated differently, the full set of transformed plausible values for CPS have a weighted mean of 500 and a weighted SD of 100 (based on senate weights).
If Xkv is the vth PV {v in 1, 2, ..., 10} for examinee k, the transformation coefficients for CPS are computed as
12.6
A =100τPV
12.7
B = 500 – A Xkv = 500 – AXkv Wkv
nk =1
10v=1
10 Wkvnk =1
12.8
τPV = τPV2 =
Wkv Xkv – Xkv2n
k =110v=1
10n – 1 Wkvnk =1 /n
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 235
The grand mean of the PVs, Xkv, was computed by compiling all 10 sets of PVs into a single vector (the corresponding senate weights were compiled in a separate vector) then finding the weighted mean of these values. The weighted variance, τ2
PV, was computed using the vector of PVs as well. The square root is taken to compute the standard deviation, τPV.The plausible values for CPS were transformed using the same approach as that for science, reading, mathematics and financial literacy. The transformations for reading, mathematics, science and financial literacy used the model-based results from the concurrent calibration (IRT scaling) in order to align the results with previously established scales. The transformation for CPS is based on the PVs because this is the first time the results for this domain have been scaled.
The transformation coefficients for all content domains are presented in Table 12.6. The A coefficient adjusts the variability (standard deviation) of the resulting scale while the B coefficient adjusts the scale location (mean).
Table 12.6 PISA 2015 transformation coefficients
Domain A B
Science 168.3189 494.5360
Reading 131.5806 437.9583
Mathematics 135.9030 514.1848
Financial literacy 140.0807 490.7259
Collaborative problem solving 196.7695 462.8102
Table 12.7 shows the average transformed plausible values for each cognitive domain by country as well as the resampling-based standard errors.
Table 12.7
[Part 1/2]Average plausible values (PVs) and resampling-based standard errors (SE) by country/economy for the PISA domains of science, reading, mathematics, financial literacy, and collaborative problem solving (CPS)
Country/economy
Maths Reading Science CPS Financial literacy
Average PV SE
Average PV SE
Average PV SE
Average PV SE
Average PV SE
International average 462 0.32 461 0.34 466 0.31 486 0.36 481 0.95
Albania 413 3.45 405 4.13 427 3.28
Algeria 360 2.95 350 3.00 376 2.64
Argentina 409 3.05 425 3.22 432 2.87
Australia 494 1.61 503 1.69 510 1.54 531 1.91 504 1.91
Austria 497 2.86 485 2.84 495 2.44 509 2.56
Belgium 507 2.35 499 2.42 502 2.29 501 2.39 541 2.95
Brazil 377 2.86 407 2.75 401 2.30 412 2.30 393 3.84
B-S-J-G (China) 531 4.89 494 5.13 518 4.64 496 3.97 566 6.04
Bulgaria 441 3.95 432 5.00 446 4.35 444 3.85
Canada 516 2.31 527 2.30 528 2.08 535 2.27 533 4.62
Chile 423 2.54 459 2.58 447 2.38 457 2.69 432 3.74
Colombia 390 2.29 425 2.94 416 2.36 429 2.30
Costa Rica 400 2.47 427 2.63 420 2.07 441 2.42
Croatia 464 2.77 487 2.68 475 2.45 473 2.52
Cyprus1 437 1.72 443 1.66 433 1.38 444 1.71
Czech Republic 492 2.40 487 2.60 493 2.27 499 2.20
Denmark 511 2.17 500 2.54 502 2.38 520 2.53
Dominican Republic 328 2.69 358 3.05 332 2.58
Estonia 520 2.04 519 2.22 534 2.09 535 2.47
Finland 511 2.31 526 2.55 531 2.39 534 2.55
France 493 2.10 499 2.51 495 2.06 494 2.42
FYROM 371 1.28 352 1.41 384 1.25
Georgia 404 2.78 401 2.96 411 2.42
Germany 506 2.89 509 3.02 509 2.70 525 2.85
Greece 454 3.75 467 4.34 455 3.92 459 3.60
Hong Kong (China) 548 2.98 527 2.69 523 2.55 541 2.95
Hungary 477 2.53 470 2.66 477 2.42 472 2.35
Iceland 488 1.99 482 1.98 473 1.68 499 2.26
12SCALING OUTCOMES
236 © OECD 2017 PISA 2015 TECHNICAL REPORT
Table 12.7
[Part 2/2]Average plausible values (PVs) and resampling-based standard errors (SE) by country/economy for the PISA domains of science, reading, mathematics, financial literacy, and collaborative problem solving (CPS)
Country/economy
Maths Reading Science CPS Financial literacy
Average PV SE
Average PV SE
Average PV SE
Average PV SE
Average PV SE
Indonesia 386 3.08 397 2.87 403 2.57
Ireland 504 2.05 521 2.47 503 2.39
Israel 470 3.63 479 3.78 467 3.44 469 3.62
Italy 490 2.85 485 2.68 481 2.52 478 2.53 483 2.80
Japan 532 3.00 516 3.20 538 2.97 552 2.68
Jordan 380 2.65 408 2.93 409 2.67
Kazakhstan 460 4.28 427 3.42 456 3.67
Korea 524 3.71 517 3.50 516 3.13 538 2.53
Kosovo 362 1.63 347 1.57 378 1.70
Latvia 482 1.87 488 1.80 490 1.56 485 2.26
Lebanon 396 3.69 347 4.41 386 3.40
Lithuania 478 2.33 472 2.74 475 2.65 467 2.46 449 3.15
Luxembourg 486 1.27 481 1.44 483 1.12 491 1.50
Macao (China) 544 1.11 509 1.25 529 1.06 534 1.24
Malaysia 446 3.25 431 3.48 443 3.00 440 3.29
Malta 479 1.72 447 1.78 465 1.64
Mexico 408 2.24 423 2.58 416 2.13 433 2.46
Moldova 420 2.47 416 2.52 428 1.97
Montenegro 418 1.46 427 1.58 411 1.03 416 1.27
Netherlands 512 2.21 503 2.41 509 2.26 518 2.39 509 3.32
New Zealand 495 2.27 509 2.40 513 2.38 533 2.45
Norway 502 2.23 513 2.51 498 2.26 502 2.52
Peru 387 2.71 398 2.89 397 2.36 418 2.50 403 3.40
Poland 504 2.39 506 2.48 501 2.51 485 2.97
Portugal 492 2.49 498 2.69 501 2.43 498 2.64
Qatar 402 1.27 402 1.02 418 1.00
Romania 444 3.79 434 4.07 435 3.23
Russia 494 3.11 495 3.08 487 2.91 473 3.42 512 3.33
Singapore 564 1.47 535 1.63 556 1.20 561 1.21
Slovak Republic 475 2.66 453 2.83 461 2.59 463 2.38 445 4.53
Slovenia 510 1.26 505 1.47 513 1.32 502 1.75
Spain 486 2.15 496 2.36 493 2.07 496 2.15 469 3.19
Sweden 494 3.17 500 3.48 493 3.60 510 3.44
Switzerland 521 2.92 492 3.03 506 2.90
Chinese Taipei 542 3.03 497 2.50 532 2.69 527 2.47
Thailand 415 3.03 409 3.35 421 2.83 436 3.50
Trinidad and Tobago 417 1.41 427 1.49 425 1.41
Tunisia 367 2.95 361 3.06 386 2.10 382 1.94
Turkey 420 4.13 428 3.96 425 3.93 422 3.45
United Arab Emirates 427 2.41 434 2.87 437 2.42 435 2.43
United Kingdom 492 2.50 498 2.77 509 2.56 519 2.68
United States 470 3.17 497 3.41 496 3.18 520 3.64 487 3.80
Uruguay 418 2.50 437 2.55 435 2.20 443 2.29
Viet Nam 495 4.46 487 3.73 525 3.91
See note 2 under Table 12.5.
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 237
LINKING ERRORAn evaluation of the magnitude of linking error can be accomplished by considering differences between reported country results from previous PISA cycles and the transformed results from the rescaling. In the application to linking error estimation for the 2015 PISA trend comparisons the robust measure of standard deviation was used, the Sn statistic (Rousseeuw & Croux, 1993); see Chapter 9 for more information on the linking error approach taken in PISA 2015. The robust estimates of linking error between cycles, by domain are presented in Table 12.8.
The Sn statistic is available in SAS as well as the R package robustbase. See also https://cran.r-project.org/web/packages/robustbase/robustbase.pdf. The Sn statistic was proposed by Rousseeuw and Croux (1993) as a more efficient alternative to the scaled median absolute deviation from the median (1.4826*MAD) that is commonly used as a robust estimator of standard deviation.
Table 12.8Robust link error (based on absolute pairwise differences statistic Sn) for comparisons of performance between PISA 2015 and previous assessments
Comparison Maths Reading Science Financial literacy
PISA 2000 to 2015 6.8044
PISA 2003 to 2015 5.6080 5.3907
PISA 2006 to 2015 3.5111 6.6064 4.4821
PISA 2009 to 2015 3.7853 3.4301 4.5016
PISA 2012 to 2015 3.5462 5.2535 3.9228 5.3309
Note: Comparisons between PISA 2015 scores and previous assessments can only be made to when the subject first became a major domain. As a result, comparisons in mathematics performance between PISA 2015 and PISA 2000 are not possible, nor are comparisons in science performance between PISA 2015 and PISA 2000 or PISA 2003.
INTERNATIONAL CHARACTERISTICS OF THE ITEM POOLThis section provides an overview of the test targeting, the domain inter-correlations and the correlations among the science subscales.
Test targetingIn addition to identifying the relative discrimination and difficulty of items, IRT can be used to summarise the results for various subpopulations of students. A specific value – the response probability (RP) – can be assigned to each item on a scale according to its discrimination and difficulty, similar to students who receive a specific score along a scale according to their performance on the assessment items (OECD, 2002). Chapter 15 describes how items can be placed along a scale based on RP values and how these values can be used to describe different proficiency levels.
After the estimation of item parameters in the item calibration stage, RP values were calculated for each item, and then items were classified into proficiency levels within the cognitive domain. Likewise, after generation of the plausible values, respondents can be classified into proficiency levels for each cognitive domain. The purpose of classifying items and respondents into levels is to provide more descriptive information about group proficiencies. The different item levels provide information about the underlying characteristics of an item as it relates to the domain (such as item difficulty); the higher the difficulty, the higher the level. In PISA, an RP62 value is used for the classification of items into levels. Respondents with a proficiency located below this point have a lower probability than the chosen RP62 value, and respondents with a proficiency above this point have a higher probability (that is > 0.62) of solving an item. The RP62 values for all items are presented in Annex A together with the final item parameters obtained from the IRT scaling. The respondent classification into different levels is done by PISA scale scores transformed from the plausible values. Each level is defined by certain score boundaries for each cognitive domain. Tables 12.9 to 12.13 show the score boundaries overall countries used for each cognitive domain along with the percentage of items and respondents classified at each level of proficiency. The decision for the score boundaries for science is explained in Chapter 15; for reading and mathematics the same levels were used that were defined in previous PISA cycles.
12SCALING OUTCOMES
238 © OECD 2017 PISA 2015 TECHNICAL REPORT
Table 12.9 Item and respondent classification for each score boundary in mathematics
Level Score points on the PISA scale Number of items Percentage of items Percentage of respondents
6 Higher than 669.30 27 13.30 1.91
5 Higher than 606.99 and less than or equal to 669.30 23 11.33 6.37
4 Higher than 544.68 and less than or equal to 606.99 50 24.63 13.93
3 Higher than 482.38 and less than or equal to 544.68 41 20.20 20.16
2 Higher than 420.07 and less than or equal to 482.38 39 19.21 21.81
1 Higher than 357.77 and less than or equal to 420.07 12 5.91 18.78
Below 1 Less than 357.77 11 5.42 17.05
Table 12.10 Item and respondent classification for each score boundary in reading
Level Score points on the PISA scale Number of items Percentage of items Percentage of respondents
6 Higher than 698.32 18 7.63 0.70
5 Higher than 625.61 and less than or equal to 698.32 28 11.86 4.96
4 Higher than 552.89 and less than or equal to 625.61 50 21.19 15.45
3 Higher than 480.18 and less than or equal to 552.89 62 26.27 24.14
2 Higher than 407.47 and less than or equal to 480.18 58 24.58 24.36
1a Higher than 334.75 and less than or equal to 407.47 15 6.36 17.92
1b 262.04 to less than or equal to 334.75 5 2.12 9.12
Below 1b Less than 262.04 0 0.00 3.34
Table 12.11 Item and respondent classification for each score boundary in science
Level Score points on the PISA scale Number of items Percentage of items Percentage of respondents
6 Higher than 707.93 13 4.45 0.76
5 Higher than 633.33 and less than or equal to 707.93 29 9.93 4.79
4 Higher than 558.73 and less than or equal to 633.33 75 25.68 14.51
3 Higher than 484.14 and less than or equal to 558.73 94 32.19 23.20
2 Higher than 409.54 and less than or equal to 484.14 63 21.58 25.71
1a Higher than 334.94 and less than or equal to 409.54 15 5.14 20.88
1b 260.54 to less than or equal to 334.94 3 1.03 8.68
Below 1b Less than 260.54 0 0.00 1.48
Table 12.12 Item and respondent classification for each score boundary in financial literacy
Level Score points on the PISA scale Number of items Percentage of items Percentage of respondents
5 Higher than 624.63 22 26.51 9.36
4 Higher than 549.86 and less than or equal to 624.63 14 16.87 17.38
3 Higher than 475.10 and less than or equal to 549.86 24 28.92 24.31
2 Higher than 400.33 and less than or equal to 475.10 12 14.46 22.63
1 Higher than 325.57 and less than or equal to 400.33 6 7.23 15.73
Below 1 Less than 325.57 5 6.02 10.59
Table 12.13 Item and respondent classification for each score boundary in CPS
Level Score points on the PISA scale Number of items Percentage of items Percentage of respondents
4 Higher than 640.00 25 21.37 6.28
3 Higher than 540.00 and less than or equal to 640.00 28 23.93 23.66
2 Higher than 440.00 and less than or equal to 540.00 38 32.48 35.30
1 Higher than 340.00 and less than or equal to 440.00 20 17.09 26.78
Below 1 Less than 340.00 6 5.13 7.99
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 239
Because RP62 values and the transformed plausible values are on the same PISA scales, the distribution of respondents’ latent ability and item RP62 values can be located on the same scale. Figures 12.7 to 12.11 illustrate the distribution of the first plausible value (PV1) along with item RP62 values on the PISA scale separately for each cognitive domain for the PISA 2015 main survey data. Note that international RP62 values and international plausible values (PV1) were used for these figures.1 RP62 values for CBA items are denoted on the right side. In each domain, solid circles indicate PBA items and hollow circles indicate additional PBA items from previous PISA cycles that were not administered in PISA 2015 main survey. For the polytomous items where partial scoring was available, only the highest RP62 values are illustrated in these figures. On the left side, the distribution of plausible values is plotted. In each figure, the blue line indicates the empirical density of the plausible values across countries, and the grey line indicates the theoretical normal distribution with mean of plausible values and the variance of plausible values in each domain across countries. Specifically, N(461, 104.172) for mathematics, N(463, 106.832) for reading, N(467, 103.022) for science, N(474, 1232) for financial literacy, and N(483, 101.652) for CPS are displayed as grey lines. (Note that there are RP62 values higher than 1 000 for the CPS domain, these are outside of the region occupied by the vast majority of respondent’s proficiency estimates and therefore are not shown in Figure 12.11.)
Distribution of PV1
Below Level 1
Level 1
Level 2
Level 3
Level 4
Level 5
Level 6
EmpiricalN(461,10851)
Items RP62
0 10 20 30 40 50 60 70 80 90 105 120 135 150 165 180 195
010
020
030
040
050
060
070
080
090
0
PISA 2015 MS (CBA)PISA 2015 MS (PBA)Previous PISA cycles (PBA)
• Figure 12.7 •Item RP62 values and distribution of PV1 in maths
12SCALING OUTCOMES
240 © OECD 2017 PISA 2015 TECHNICAL REPORT
Distribution of PV1
Below Level 1b
Level 1b
Level 1a
Level 2
Level 3
Level 4
Level 5
Level 6
EmpiricalN(463,11413)
Items RP62
0 10 25 40 55 70 85 100 115 130 145 160 175 190 205 220 235
010
020
030
040
050
060
070
080
090
0
PISA 2015 MS (CBA)PISA 2015 MS (PBA)Previous PISA cycles (PBA)
Distribution of PV1
Level 1b
Level 1a
Level 2
Level 3
Level 4
Level 5
Level 6
EmpiricalN(467,10613)
Items RP62
0 15 30 45 60 75 90 105 125 145 165 185 205 225 245 265 285
010
020
030
040
050
060
070
080
090
0
PISA 2015 MS (CBA)PISA 2015 MS (PBA)Previous PISA cycles (PBA)
Below Level 1b
• Figure 12.8 •Item RP62 values and distribution of PV1 in reading
• Figure 12.9 •Item RP62 values and distribution of PV1 in science
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 241
Figures 12.12 to 12.16 show the percentage of respondents per country at each level of proficiency for each cognitive domain.
Distribution of PV1
Below Level 1
Level 1
Level 2
Level 3
Level 4
Level 5
EmpiricalN(474,15099)
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
010
020
030
040
050
060
070
080
090
0
PISA 2015 MS (CBA)PISA 2015 MS (PBA)Previous PISA cycles (PBA)
Distribution of PV1
Below Level 1
Level 1
Level 2
Level 3
Level 4
EmpiricalN(483,10333)
0 5 10 20 30 40 50 60 70 80 90 100 110
010
020
030
040
050
060
070
080
090
0
PISA 2015 MS (CBA)
• Figure 12.10 •Item RP62 values and distribution of PV1 in financial literacy
• Figure 12.11 •Item RP62 values and distribution of PV1 in collaborative problem solving
12SCALING OUTCOMES
242 © OECD 2017 PISA 2015 TECHNICAL REPORT
22 30 29 27 25 27
29 31 28
31 24
35 24
26 31 29 27
30 24
27 27 25
28 26 24 22 21
24 24
21 22
19 21 19 17 16 17 17 14 16 17 15 15 15 15 14 14 15 14 14 14 15 15 14 12 11 13 12 13 12 10 10 12 11 9
11 10 10
8 8 5 6 6
68 51
49 47
45 44 39
38 38
35 37
27 35 31
26 27
26 24
28 25 25
25 23
23 24
20 21 16 14
15 10 17 12
11 15
12 11
8 15
6 9
7 8 8 9 7 8
9 8 7 5 5
7 8
5 4
4 5
7 4
4 3
5 4
2 5 5 6
3 4
1 3 2
81417161717
21202122
2026
2023
27262526
222524252525
2326
242729
2630
222626
212423
2620
2823252423
222323
2123232626
232124242322
19212222
202021
18171617
15151412
24
56
99
98101012
913
1313
131514
161615161617
1619
1920
2222
2320
2324
222424
2522
2924
282525
242626
242526
2727
2525
2830
2727
2327
2929
2527
2924
2421
2621
2723
20
11
233
23
32
626
534
55
866
766
89
119
912
1114
1315
161716
1617
1618
181818
191819
2119
191918
1920
2121
2121
2222
2423
2323
2423
2322
2623
2927
25
11
1
2
21
111212
21
13
34
323
37
55
777
69
58
688
988
109
877
910
98
910
1210
910
1211
1114
1417
1518
1719
22
1
1
12
11
211
13
12
1223
22
232
22
33
22
33
43
22
34
357
95
105
813Singapore 564
Hong Kong (China) 548Macao (China) 544Chinese Taipei 542
Japan 532 B-S-J-G (China) 531
Korea 524Switzerland 521
Estonia 520Canada 516
Netherlands 512Denmark 511
Finland 511Slovenia 510Belgium 507
Germany 506Poland 504Ireland 504
Norway 502Austria 497
New Zealand 495Viet Nam 495
Russia 494Sweden 494
Australia 494France 493
United Kingdom 492Czech Republic 492
Portugal 492Italy 490
Iceland 488Spain 486
Luxembourg 486Latvia 482Malta 479
Lithuania 478Hungary 477
Slovak Republic 475Israel 470
United States 470Croatia 464
International average 462Kazakhstan 460
Greece 454Malaysia 446Romania 444Bulgaria 441Cyprus1 437
United Arab Emirates 427Chile 423
Turkey 420Moldova 420Uruguay 418
Montenegro 418Trinidad and Tobago 417
Thailand 415Albania 413
CABA (Argentina) 409Mexico 408
Georgia 404Qatar 402
Costa Rica 400Lebanon 396
Colombia 390Peru 387
Indonesia 386Jordan 380Brazil 377
FYROM 371Tunisia 367Kosovo 362Algeria 360
Dominican Republic 328
Below Level 1 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
2015 PISA main study – mathsAverage scores (PV) & proficiency-level percentages
• Figure 12.12 •Percentage of respondents per country/economy at each level of proficiency for maths
1. Note by Turkey: The information in this document with reference to “Cyprus” relates to the southern part of the Island. There is no single authority representing both Turkish and Greek Cypriot people on the Island. Turkey recognizes the Turkish Republic of Northern Cyprus (TRNC). Until a lasting and equitable solution is found within the context of the United Nations, Turkey shall preserve its position concerning the “Cyprus issue.”Note by all the European Union Member States of the OECD and the European Union: The Republic of Cyprus is recognised by all members of the United Nations with the exception of Turkey. The information in this document relates to the area under the effective control of the Government of the Republic of Cyprus.
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 243
Singapore 535Hong Kong (China) 527
Canada 527Finland 526Ireland 521Estonia 519
Korea 517Japan 516
Norway 513New Zealand 509
Germany 509Macao (China) 509
Poland 506Slovenia 505
Netherlands 503Australia 503Sweden 500
Denmark 500France 499
Belgium 499Portugal 498
United Kingdom 498Chinese Taipei 497
United States 497Spain 496
Russia 495B-S-J-G (China) 494
Switzerland 492Latvia 488
Czech Republic 487Croatia 487
Viet Nam 487Austria 485
Italy 485Iceland 482
Luxembourg 481Israel 479
Lithuania 472Hungary 470
Greece 467International average 461
Chile 459Slovak Republic 453
Malta 447Cyprus1 443
Uruguay 437Romania 434
United Arab Emirates 434Bulgaria 432Malaysia 431
Turkey 428Costa Rica 427
Trinidad and Tobago 427Kazakhstan 427
Montenegro 427CABA (Argentina) 425
Colombia 425Mexico 423
Moldova 416Thailand 409
Jordan 408Brazil 407
Albania 405Qatar 402
Georgia 401Peru 398
Indonesia 397Tunisia 361
Dominican Republic 358FYROM 352Algeria 350Kosovo 347
Lebanon 347
Below Level 1b Level 1aLevel 1b Level 2 Level 3 Level 4 Level 5 Level 6
2015 PISA main study – readingAverage scores (PV) & proficiency-level percentages
22 34
37 28
31 34 35
28 26 23
27 27 25
32 25
28 26 26 25
29 22
28 27 25
20 22
23 24
20 17 18 20 18 17 18 17 15 16 14 15 14 12
15 15 13 14 14 13 12 13 12 13 13 13 13 11 12 12 13 11 11 9
11 11 11 9
10 8 8 8 8 7 8
25 28
31 24
28 27
17 19
16 18
16 17
14 15
15 11 14 13 13
11 14
10 11
10 14
13 12 12
11 11 9 7 9
8 8 7 8 8
6 5 7
2 5 6
4 5 6
3 3
5 4
4 4
5 6
3 5 5 4
3 3
2 4 5 4
3 3
2 2 3 2 2
2
24 15
11 19
13 11
4 6
9 11 7
7 7
3 6
2 3 3 4 2 6 2 2
2 8 5 4 3
4 7
4 1
4 2 1
1 3 2
2 1
2
1 1
1 2
1 1
1 1 1
1 2
1 2 1 1
1
1 1
1 1 1
1
16 19
17 19 20 21
31 27
25 23
27 25
31 31
28 34
29 31
29 33
26 35
33 34
22 25
30 28 27
22 26
30 24 25 24 27
22 22
26 25
23 32
27 23
27 23
21 27
24 23 22 24 23
21 19
24 22 21 22 23 22 23
21 21 20 20 19 22 21
18 19 18 17
9 4
4 8 7 7
12 15
16 17
16 16
19 15
19 20
20 20
20 19
20 19
21 23
21 21
21 21 23
22 25
27 24
27 27
27 24 25
27 29
27 35
29 28
32 28
25 31
32 28 31 28 30
27 24
32 27 28 27
30 31
34 28
26 29 30
29 31 32
30 30 32
26
4
2 1 1
2 4 6
7 5
6 4
4 7
4 7 6
8 6
9 5 6
5 12
11 8
9 11
14 14
12 15
16 17
17 19 19
18 19
20 16
19 19
19 21
21 19 22
21 22
20 22
23 23
22 22 22 23
23 23
24 24
22 24
26 25
25 26
28 27
29 27
1
1 1
1 1
1
1 1 1 1
2 1 1
3 3
2 2
3 5
3 2
5 4 4
4 8
7 6
5 6
3 5
7 4
7 9
6 5
8 6
8 7
8 10
6 8 9 9
8 7
6 10
11 10
10 11
10 9
12 12
10 15
1
1
1 1
1 1
1
1 1
1 2
1
1 1
1 1
1 2
1 1 2 1
1 1
2 3
2 1
2 1 1
2 2
1 4
• Figure 12.13 •Percentage of respondents per country/economy at each level of proficiency for reading
1. See note 2 under Table 12.5.
12SCALING OUTCOMES
244 © OECD 2017 PISA 2015 TECHNICAL REPORT
Singapore 556Japan 538
Estonia 534Chinese Taipei 532
Finland 531Macao (China) 529
Canada 528Viet Nam 525
Hong Kong (China) 523B-S-J-G (China) 518
Korea 516New Zealand 513
Slovenia 513Australia 510
United Kingdom 509Germany 509
Netherlands 509Switzerland 506
Ireland 503Belgium 502
Denmark 502Poland 501
Portugal 501Norway 498
United States 496Austria 495France 495
Sweden 493Czech Republic 493
Spain 493Latvia 490
Russia 487Luxembourg 483
Italy 481Hungary 477
Lithuania 475Croatia 475Iceland 473
Israel 467International average 466
Malta 465Slovak Republic 461
Kazakhstan 456Greece 455
Chile 447Bulgaria 446Malaysia 443
United Arab Emirates 437Uruguay 435Romania 435Cyprus1 433
CABA (Argentina) 432Moldova 428
Albania 427Turkey 425
Trinidad and Tobago 425Thailand 421
Costa Rica 420Qatar 418
Colombia 416Mexico 416
Montenegro 411Georgia 411
Jordan 409Indonesia 403
Brazil 401Peru 397
Lebanon 386Tunisia 386
FYROM 384Kosovo 378Algeria 376
Dominican Republic 332
Below Level 1b Level 1aLevel 1b Level 2 Level 3 Level 4 Level 5 Level 6
2015 PISA main study – scienceAverage scores (PV) & proficiency-level percentages
30 43
39 34
44 32
37 32
40 30 31 32
35 33
28 36 34
28 32 30 28 28 27 28 28 26 26
23 25 22 24
20 18
21 20 19 19 19 18 17
19 15 15 14
16 15 15 16 15 14 14 13 13
14 12
14 14 13 14 13 12 13 11 12
8 6
9 7
9 9 8 8 7
40 24
24 22
20 24
19 20
14 15 16
16 12 14
18 10 12
15 12
10 12 10 13 9
11 13
7 12 9
9 4
9 11
9 9
6 5 5 7
5 6
3 3 4
4 6 6 5 4
4 3 3 3
5 3
4 4
4 3 4
3 4
3 4
2
2 1
2 3
1 2 2
16 4
4 7
2 7
3 4
1 4
4 3
1 2 4
1 1
3 1
2 2
1 2
1 1 3
3 1
1
2 4 2 2
1
1 1
1
1 1
1
1
1
1
1123242527
2228
253231
2829
3531
2535
3227
3135
3134
2935
3027
3625
3128
3828
232624
292930
2527
253130
2726
2422242525252726
2226
232223232223
22222120
252021191820
1815
36
7107
1211
1311
161515
1516
1615
1618
1919
2020
2020
2019
2423
2425
2425
2223
23272826
2729
2531
3231
2827
2628272929
3031
2731
2626
2828
2729
2629
2636
3730
3429
2731
2823
11
213
24
23
54
24
83
57
54
65
86
79
611
912
813
151415
151415
1717
171617
1918
1921
2019
2021
2020
2220
2322
2222
2222
2224
2427
2426
2826
2727
2928
1
1
2
1
11
11
13
13
12
23
655
344
44
643
56
7777
7766
86
99
999
910
911
77
108
1213
1213
19
211
1
11111
1111
11
12222
23
12
12
12
32
26
• Figure 12.14 •Percentage of respondents per country/economy at each level of proficiency for science
1. See note 2 under Table 12.5.
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 245
B-S-J-G (China) 566
Belgium 541
Canada 533
Russia 512
Netherlands 509
Australia 504
United States 487
Poland 485
Italy 483
International average 481
Spain 469
Lithuania 449
Slovak Republic 445
Chile 432
Peru 403
Brazil 393
Below Level 1 Level 1 Level 2 Level 3 Level 4 Level 5
2015 PISA main study – financial literacyLiteracy average scores (PV) & proficiency-level percentages
24
24
22
18
19
16
15
14
14
15
12
12
9
8
8
6
29
24
16
16
12
9
10
6
6
7
8
7
2
4
4
3
22
26
26
24
27
26
22
25
24
23
19
18
23
17
15
13
15
18
22
22
25
27
24
29
28
26
24
23
32
25
22
20
7
7
10
13
13
16
18
19
19
19
22
22
24
24
27
24
3
1
3
6
4
6
11
6
8
10
15
17
11
22
24
33
• Figure 12.15 •Percentage of respondents per country/economy at each level of proficiency for financial literacy
Note: The financial literacy data from Belgium come from the Flanders part of Belgium only and thus are not nationally representative; the same is the case with regard to the financial literacy data from Canada since some provinces of Canada did not participate in the financial literacy assessment.
12SCALING OUTCOMES
246 © OECD 2017 PISA 2015 TECHNICAL REPORT
Singapore 561Japan 552
Hong Kong (China) 541Korea 538
Canada 535Estonia 535Finland 534
Macao (China) 534New Zealand 533
Australia 531Chinese Taipei 527
Germany 525United States 520
Denmark 520United Kingdom 519
Netherlands 518Sweden 510Austria 509
Norway 502Slovenia 502Belgium 501Iceland 499
Czech Republic 499Portugal 498
Spain 496B-S-J-G (China) 496
France 494Luxembourg 491
International average 486Latvia 485
Italy 478Russia 473
Croatia 473Hungary 472
Israel 469Lithuania 467
Slovak Republic 463Greece 459
Chile 457Cyprus1 444Bulgaria 444Uruguay 443
Costa Rica 441Malaysia 440Thailand 436
United Arab Emirates 435Mexico 433
Colombia 429Turkey 422
Peru 418Montenegro 416
Brazil 412Tunisia 382
Below Level 1 Level 1 Level 2 Level 3 Level 4
2015 PISA main study – CPSAverage scores (PV) & proficiency-level percentages
60 43
45 43 45 42 41
38 42 39 41 38
34 36 34 32 31 30 30 29 29 29 27 25 26 25 23 22 21 22 22 23 21 21 21 20 20 19 18 16
19 17 14 16 16
13 15 13
15 11 12
9 10
25 21 18 18 15
14 12
16 12
11 9 13
15 13
8 10 10
8 12
9 7
7 8
6 8
7 7 6
4 5 5
5 6 4 4 5 5
3 4
3 5
4 3
4 4
2 3
2 3
1 2
1 2
15 28
32 31 34 34
37 32 35
40 40
34 33 36
40 38 38 39
31 37
42 40 38 41
35 36 36 38
42 40 40
38 37 39 39
36 36 36 35
39 33 34 37
31 31
36 32 35
32 35
34 31
28
17
68
79
913
101010
1416
1416
181820
2222
202023
2424
262827
282829
2829
2928
303032
3133
3032
363433
3835
3734
424044
39
1
1
21
1222
123
25
32
44
47
776
455
77679910
129
1413
101516
1114
1216
1013
1421
• Figure 12.16 •Percentage of respondents per country/economy at each level of proficiency for CPS
Note: The CPS sample from Israel does not include ultra-Orthodox students and thus is not nationally representative. 1. See note 2 under Table 12.5.
Domain inter-correlationsEstimated correlations between the PISA domains, based on the 10 plausible values and averaged across all countries and assessment modes, are presented in Table 12.14. Overall, the correlations are quite high, as expected, yet there is still some separation between each of the domains. The estimated correlations at the national level are presented in Table 12.15.
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 247
Table 12.14 Domain inter-correlations1
Domain Reading Science CPS Financial literacy
Maths
Average 0.79 0.88 0.70 0.74
Average (CBA) 0.79 0.88 0.70 0.74
Average (PBA) 0.75 0.80 – –
Range 0.57~0.87 0.70~0.91 0.55~0.76 0.60~0.81
Reading
Average
–
0.87 0.74 0.75
Average (CBA) 0.87 0.74 0.75
Average (PBA) 0.77 – –
Range 0.71~0.90 0.58~0.80 0.61~0.81
Science
Average
– –
0.77 0.77
Average (CBA) 0.77 0.77
Average (PBA) – –
Range 0.65~0.83 0.68~0.85
CPS
Average
– – –
0.64
Average (CBA) 0.64
Average (PBA) –
Range 0.50~0.71
1. Please note that Argentina, Malaysia and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
Table 12.15[Part 1/2]National-level domain inter-correlations based on 10 PVs
Country/economyMaths &reading
Maths &science
Maths & CPS
Maths &fin. lit.
Reading &science
Reading &CPS
Reading &fin. lit.
Science &CPS
Science &fin. lit.
CPS & fin. lit.
Albania 0.68 0.80 – – 0.77 – – – – –
Algeria 0.57 0.70 – – 0.71 – – – – –
Argentina 0.75 0.83 – – 0.81 – – – – –
Australia 0.79 0.88 0.68 0.79 0.87 0.75 0.8 0.76 0.85 0.7
Austria 0.80 0.89 0.71 – 0.88 0.77 – 0.78 – –
B-S-J-G (China) 0.84 0.91 0.76 0.80 0.90 0.76 0.80 0.80 0.83 0.70
Belgium 0.84 0.90 0.73 0.80 0.90 0.76 0.80 0.78 0.83 0.67
Brazil 0.75 0.84 0.65 0.62 0.86 0.73 0.65 0.75 0.68 0.54
Bulgaria 0.80 0.89 0.74 – 0.89 0.80 – 0.83 – –
Canada 0.77 0.87 0.67 0.68 0.87 0.74 0.70 0.75 0.74 0.59
Chile 0.80 0.88 0.70 0.75 0.87 0.74 0.75 0.77 0.78 0.64
Colombia 0.83 0.90 0.74 – 0.90 0.74 – 0.80 – –
Costa Rica 0.75 0.83 0.59 – 0.85 0.67 – 0.68 – –
Croatia 0.80 0.89 0.69 – 0.87 0.75 – 0.76 – –
Cyprus1 0.74 0.85 0.65 – 0.83 0.71 – 0.74 – –
Czech Republic 0.84 0.90 0.69 – 0.89 0.72 – 0.75 – –
Denmark 0.77 0.87 0.69 – 0.86 0.72 – 0.77 – –
Dominican Republic 0.78 0.83 – – 0.85 – – – – –
Estonia 0.78 0.88 0.71 – 0.87 0.74 – 0.79 – –
Finland 0.79 0.87 0.72 – 0.87 0.75 – 0.78 – –
France 0.84 0.91 0.70 – 0.90 0.75 – 0.78 – –
FYROM 0.75 0.78 – – 0.74 – – – – –
Georgia 0.79 0.79 – – 0.73 – – – – –
Germany 0.81 0.90 0.70 – 0.88 0.72 – 0.77 – –
Greece 0.79 0.88 0.73 – 0.88 0.75 – 0.79 – –
Hong Kong 0.77 0.88 0.64 – 0.86 0.73 – 0.74 – –
Hungary 0.83 0.90 0.74 – 0.90 0.78 – 0.81 – –
Iceland 0.78 0.86 0.70 – 0.84 0.74 – 0.76 – –
Indonesia 0.70 0.82 – – 0.75 – – – – –
Ireland 0.81 0.89 – – 0.88 – – – – –
Israel 0.83 0.89 0.75 – 0.89 0.78 – 0.80 – –
12SCALING OUTCOMES
248 © OECD 2017 PISA 2015 TECHNICAL REPORT
Table 12.15[Part 2/2]National-level domain inter-correlations based on 10 PVs
CountriesMaths &Reading
Maths &Science
Maths & CPS
Maths &Fin. Lit.
Reading &Science
Reading &CPS
Reading &Fin. Lit.
Science &CPS
Science &Fin. Lit.
CPS & Fin. Lit.
Italy 0.75 0.85 0.65 0.68 0.84 0.68 0.67 0.73 0.73 0.56
Japan 0.79 0.87 0.66 – 0.86 0.73 – 0.72 – –
Jordan 0.70 0.79 – – 0.78 – – – – –
Kazakhstan 0.61 0.73 – – 0.70 – – – – –
Korea 0.78 0.87 0.72 – 0.85 0.76 – 0.77 – –
Kosovo 0.74 0.81 – – 0.78 – – – – –
Latvia 0.77 0.87 0.66 – 0.87 0.73 – 0.75 – –
Lebanon 0.80 0.82 – – 0.81 – – – – –
Lithuania 0.79 0.90 0.72 0.70 0.87 0.74 0.73 0.79 0.75 0.63
Luxembourg 0.83 0.91 0.73 – 0.90 0.78 – 0.78 – –
Macao 0.75 0.84 0.65 – 0.89 0.78 – 0.78 – –
Malaysia 0.78 0.87 0.72 – 0.88 0.74 – 0.79 – –
Malta 0.83 0.87 – – 0.87 – – – – –
Mexico 0.77 0.84 0.67 – 0.86 0.73 – 0.76 – –
Moldova 0.73 0.79 – – 0.77 – – – – –
Montenegro 0.76 0.83 0.66 – 0.84 0.70 – 0.74 – –
Netherlands 0.87 0.91 0.75 0.81 0.89 0.78 0.81 0.77 0.84 0.70
New Zealand 0.79 0.89 0.70 – 0.87 0.75 – 0.78 – –
Norway 0.78 0.89 0.68 – 0.84 0.72 – 0.74 – –
Peru 0.81 0.86 0.73 0.76 0.88 0.78 0.81 0.79 0.79 0.70
Poland 0.80 0.90 – 0.74 0.86 – 0.75 – 0.77 –
Portugal 0.79 0.89 0.70 – 0.86 0.74 – 0.76 – –
Qatar 0.84 0.88 – – 0.90 – – – – –
Romania 0.79 0.78 – – 0.77 – – – – –
Russian Federation 0.66 0.82 0.55 0.60 0.81 0.68 0.61 0.70 0.68 0.50
Singapore 0.82 0.89 0.73 – 0.90 0.78 – 0.80 – –
Slovak Republic 0.83 0.88 0.69 0.66 0.87 0.74 0.66 0.74 0.68 0.58
Slovenia 0.79 0.89 0.68 – 0.87 0.73 – 0.74 – –
Spain 0.76 0.88 0.66 0.71 0.86 0.71 0.72 0.74 0.75 0.61
Sweden 0.78 0.89 0.71 – 0.85 0.78 – 0.77 – –
Switzerland 0.81 0.88 – – 0.88 – – – – –
Chinese Taipei 0.83 0.90 0.71 – 0.90 0.77 – 0.77 – –
Thailand 0.75 0.83 0.65 – 0.87 0.76 – 0.78 – –
Trinidad and Tobago 0.81 0.87 – – 0.80 – – – – –
Tunisia 0.72 0.81 0.59 – 0.83 0.58 – 0.65 – –
Turkey 0.76 0.86 0.68 – 0.85 0.71 – 0.76 – –
United Arab Emirates 0.81 0.88 0.74 – 0.89 0.80 – 0.81 – –
United Kingdom 0.77 0.87 0.68 – 0.86 0.74 – 0.76 – –
United States 0.83 0.90 0.76 0.80 0.90 0.79 0.80 0.82 0.83 0.71
Uruguay 0.79 0.88 0.71 – 0.87 0.73 – 0.77 – –
Viet Nam 0.81 0.87 – – 0.85 – – – – –
1. See note 2 under Table 12.5.
Science scale and subscalesThe estimated correlations between the PISA 2015 science subscales and the domains of reading, mathematics, science and financial literacy scales, are presented in Tables 12.16 to 12.18. The different science subscales, which belong to the three scales or subscale groups Knowledge (SKCO, SKPE), Competency (SCEP, SCED, SCID), and System (SSPH, SSLI, SSES), were considered.
Please note that because of the way in which the proficiency data were generated, you should not calculate the correlations among the knowledge, competency and systems subscales. Therefore these are presented in separate tables.
12SCALING OUTCOMES
PISA 2015 TECHNICAL REPORT © OECD 2017 249
Table 12.16 Estimated correlations among domains and science knowledge subscales1
Reading Science CPS Financial literacy SKCO SKPE
Maths 0.783 0.863 0.692 0.726 0.798 0.808
Reading 0.853 0.741 0.738 0.786 0.817
Science 0.765 0.770 – –
CPS 0.630 0.688 0.722
FinLit 0.743 0.763
SKCO 0.921
Note: Content, SKPE: Procedural & Epistemic.1. Please note that Argentina, Malaysia and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
Table 12.17 Estimated correlations among domains and science Competency subscales1
Reading Science CPS Financial literacy SCED SCEP SCID
Maths 0.783 0.863 0.692 0.726 0.778 0.797 0.802
Reading 0.853 0.741 0.738 0.790 0.786 0.805
Science 0.765 0.770 – – –
CPS 0.630 0.700 0.687 0.712
FinLit 0.733 0.743 0.756
SCED 0.894 0.903
SCEP 0.919
Note: SCED: Evaluate and Design Scientific Inquiry, SCEP: Subscale of Science Explain Phenomena Scientifically, SCID: Interpret Data and Evidence Scientifically.1. Please note that Argentina, Malaysia and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
Table 12.18 Estimated correlations among domains and science System subscales1
Reading Science CPS Financial literacy SSES SSLI SSPH
Maths 0.783 0.863 0.692 0.726 0.791 0.798 0.791
Reading 0.853 0.741 0.738 0.791 0.804 0.781
Science 0.765 0.770 --- --- ---
CPS 0.630 0.693 0.711 0.688
FinLit 0.743 0.754 0.736
SSES 0.910 0.900
SSLI 0.908
Note: SSPH: Physical, SSLI: Living, SSES: Earth & Science.1. Please note that Argentina, Malaysia and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
12SCALING OUTCOMES
250 © OECD 2017 PISA 2015 TECHNICAL REPORT
Note
1. Please note that Argentina, Malaysia and Kazakhstan were not included in this analysis due to adjudication issues (inadequate coverage of either population or construct).
References
Efron, B. (1982), “The Jackknife, the Bootstrap, and Other Resampling Plans”, Society of Industrial and Applied Mathematics CBMS-NSF Monographs, Vol. 38.
Hoaglin, D.C., F. Mosteller and J.W. Tukey, (1983), Understanding Robust and Exploratory Data Analysis, John Wiley & Sons, New York, NY.
Mislevy, R.J. and K.M. Sheehan, (1987), “Marginal estimation procedures”, in A.E. Beaton (Ed.), Implementing the new design: The NAEP 1983-84 technical report, (Report No. 15-TR-20), Educational Testing Service, Princeton, NJ.
OECD (2002), Reading for Change: Performance and Engagement across Countries: Results from PISA 2000, OECD Publishing, Paris, http://dx.doi.org/10.1787/9789264099289-en.
von Davier et al. (2006), “The statistical procedures used in National Assessment of Educational Progress: Recent developments and future directions”, in C.R. Rao and S. Sinharay (Ed.), Handbook of Statistics,Vol. 26, pp. 1039-1055, Elsevier.