Rankings of Students Based on Experts’ Assessment of ...ceur-ws.org/Vol-1844/10000289.pdf ·...

Rankings of Students Based on Experts’ Assessment of

Levels of Verification of Learning Outcomes by Test

Items

Aleksandra Mreła1 and Oleksandr Sokołov2

1 Kujawy and Pomorze University in Bydgoszcz, Technical Department, Toruńska str. 55-57,

Bydgoszcz 85-023, Poland

[email protected]

2 Nicolaus Copernicus University in Toruń, Faculty of Physics, Astronomy and Informatics,

Grudziądzka str. 5, 87-100 Toruń, Poland

[email protected]

Abstract. The paper presents methods of preparing students’ rankings based on

the results of final secondary school examination test in mathematics in Poland.

The currently used method is based on the percentage of earned points and does

not take into account levels of acquirement of learning outcomes by students. The

data used in this article contains results of students who earned the same number

of points. Presented methods of preparing rankings are based on the experts’ as-

sessment of levels of verification of learning outcomes by items and methods of

fuzzification of this assessment. According to the applied method the rankings

show some difference.

Keywords: Ranking of students, Fuzzification, Quality assurance.

1 Introduction

Development of information technologies has contributed significantly to the teaching

methods and students’ assessment. Tutorial programs, interactive tests, different mon-

itoring studies and state programs for automated evaluation of knowledge and skills

have been introduced recently.

Testing has been applied widely in distance education and during implementation of

the Bologna Process for student’s self-education. Automated testing application has

been expanded to the manufacturing, where personnel management is transformed into

a continuous process of training (of course, with the subsequent testing and assessment

of trainees). The distinct feature of such systems is that the role of a teacher in the

process of learning and assessment is much narrower, and the results have been evalu-

ated automatically which has been caused by the requirement for simultaneous estima-

tion of a large number of trainees, and by the ideology of automated learning itself –

mailto:[email protected]

self-consistent learning and independent evaluation. One of the major tasks is the com-

parability of the results of different tests, ranging of students level of knowledge, for-

mation of the final scoring for the test sets. Use of so-called "raw" scores, i.e., totals for

the successful implementation of items resulting from the test might be applied to the

very limited extent (if testing is limited to the identifying of the level of knowledge on

particular topic and cannot be integrated with other results). Effectiveness of the test

score depends not only on the quality of the test, but also on the methods of comparison

and interpretation of primary (raw) score of test group.

Therefore one can assume as important to analyze the existing methods of compari-

son and integration of scores of various tests, study the quality of the students’ group

assessment, understanding the diversity of evaluation points as a quality criterion for

estimating methods.

In 1999 the Polish Government decided to take part in the Bologna Process [1] and

because of that all Polish educational institution at the beginning of the process of de-

signing curricula define learning outcomes which students have to acquire during their

studies and their teachers have to verify their acquirement by the students.

The mathematics curriculum in Polish secondary schools was approved by the Polish

Ministry of Education [2]. This curriculum considers 5 learning outcomes which stu-

dents are taught during three-year studies. At the end of the studies students take the

written final secondary school examination.

The examination is dived into two parts, the first one is consisted of 25 items of

multiple choice for which students can earn 1 point for answering correctly and the

second part is consisted of 9 items for solving which students can earn more than 1

point. In this paper the data, we use to discuss the methods of preparing rankings, refers

only to results of the first part of this examination.

The data comprised with results of the final secondary school examination in math-

ematics in one of Polish secondary schools in 2015. This examination was written by

149 students. For solving the first part of this examination students can earn up to 25

points. This paper presents discussion on the group of 18 students who earned the num-

ber of points equal to 18, so they all achieved the same position in the ranking of stu-

dents based on the number of earned points or the average mean.

Sometimes we might encounter situations when there is a need to distinguish be-

tween these students, for example we would like to choose 5 out of these 18 students.

So this ranking does not help us choose 5 better students unless we decide to apply

more criteria.

In this paper we will discuss methods of preparing rankings [6,7] of students taking

into consideration the results of the test in mathematics and experts’ assessment of lev-

els of verification of learning outcomes by test items and we discuss the results of 18

students who answering the first 25 items of the final secondary school examination in

mathematics earned total score 18 points in one of Polish high schools.

In order to calculate levels of acquiring learning outcomes by students we will use the

theory of fuzzy sets which was introduced by L.A. Zadeh [12] in 1965. In 1975 he gen-

eralized the concept of type 1 fuzzy sets and introduced type 2 fuzzy sets [13].

2 Crisp Relations

To build the relations between learning outcomes and items we will use the description

of four leaning outcomes LO1 – LO4 written in [2]:

˗ LO1 – Student interprets mathematical texts. After solving the tasks, student inter-

prets the achieved result.

˗ LO2 – Student uses simple, well known mathematical objects.

˗ LO3 – Student chooses a mathematical object to the simple situation and estimates

the pertinence of model critically.

˗ LO4 – Student applies strategy which results clearly from the content of the task.

On the basis of principles of assessment published in [3], where for each learning

outcome the experts indicate items prepared for verifying its acquirement by students,

the crisp relation 1R between learning outcomes and items was prepared, so the value

of this relation ),(1 kjR for learning outcome ,j where ,4,...,1j and item ,k

where 25,...,1k , is equal to 1 if the experts decided that item k can verify the

acquirement of learning outcome ,j or it is equal to 0, otherwise. Note that

0 LOjLOi for each ji and each item.

Figure 1 presents the membership functions of the relation between learning out-

comes and items (bars show the value equal to 1).

Fig. 1. The membership functions of the relation between learning

outcomes LO1-LO4 and items 1-25.

The value ),(2 ikR of relation 2R between item ,k where ,25,...,1k and student

,i where 149,...,1i , is equal to 1 if the student answered this item correctly or it is

equal to 0, otherwise.

To calculate values of relation 3R between learning outcomes and students S-T-

composition is used.

Let us recall the definition of S-T composition [11]. Let ),(),,( yxyxR R

and ),(),,( zyzyP P be two relations with the membership functions R and

P . The S-T composition of these relations YXR and ZYP is a relation

ZXPR with the membership function defined as follows:

))),(),,(((),( zyyxTSzx PRYyPR .

In [9] there is shown that for educational purposes the algebraic T-norm and S-norm

are better than the most popular T-norm minimum and S-norm maximum, so we will

use S-T composition with algebraic T-norm and S-norm.

Table 1. Levels of acquirement of learning outcomes by students.

Students Learning outcomes

LO1 LO2 LO3 LO4

S1 – S9; S11 – S18 1 1 1 1

S10 1 1 1 0

In this case all these students acquired learning outcomes LO1 – LO4 on the level 1

except student S10 whose level of acquirement of leaning outcome LO4 was 0. Thus

we cannot distinguish between these students with the exception of S10 and we cannot

prepare the ranking.

3 Item Response Theory

According to the Item Response Theory (IRT) mathematical ability (understanding

mathematics and skills in solving tasks) is a latent trait which cannot be measured. This

theory describes the method of measuring this ability on the basis of results of the test

which was solved by the group of students [4, 8, 10].

According to the algorithm [10], the mathematical abilities described in learning

outcomes LO1-LO4 (all together) and then in these learning outcomes taken into ac-

count separately were calculated and put to Table 2.

Table 2. Levels of mathematical abilities according to IRT.


LO1-LO4 LO1 LO2 LO3 LO4

S1 3.15 2.64 1.01 0.78 0

S2 -1.15 2.64 0.66 0.78 1.83

S3 3.15 -0.89 1.41 2.35 0

S4 0.96 0.87 1.41 -0.75 1.83

S5 3.15 0.87 1.01 0.78 1.83

S6 0.96 2.64 1.41 -0.75 0

S7 3.15 -0.89 1.9 0.78 0

S8 0.96 0.87 0.66 2.35 1.83

S9 -1.15 2.64 1.01 -0.75 1.83

S10 0.96 2.64 1.41 0.78 -1.82

S11 3.15 0.87 1.9 -0.75 0

S12 3.15 -0.89 1.41 2.35 0

S13 0.96 -0.89 2.56 -0.75 0

S14 0.96 0.87 1.41 -0.75 1.83

S15 0.96 2.64 1.01 -0.75 1.83

S16 0.96 0.87 1.41 0.78 0

S17 3.15 2.64 0.66 0.78 1.83

S18 0.96 2.64 1.41 -0.75 0

On the basis of these values, the rankings of students were prepared and they are pre-

sented in Table 3.

Table 3. The rankings of students.

Position The basis for the ranking

LO1-LO4 LO1 LO2 LO3 LO4

1

S1, S3, S5,

S7, S11,

S12, S17

S1, S2,

S6, S9,

S10, S15,

S17, S18

S13 S3, S8,

S12

S2, S4, S5, S8,

S9, S14, S15,

S17

2

S4, S6, S8,

S10, S13,

S14, S15,

S16, S18

S4, S5,

S8, S11,

S14, S16

S7, S11

S1, S2,

S5, S7,

S10, S16,

S17

S1, S3, S6, S7,

S11, S12, S13,

S16, S18

3 S2, S9 S3, S7,

S12, S13

S3, S4, S6,

S10, S12,

S14, S16,

S18

S4, S6,

S9, S11,

S13, 14,

S15, S18

S10

4

S1, S5, S9,

S15

5

S2, S8, S17

.

The main problem with levels of mathematical abilities is too few values (too many

students achieved the same level of this ability) in order to prepare rankings and differ-

entiate students. For example, when we take into account all learning outcomes (LO1-

LO4) there are only 3 positions in the ranking, on the first position there are 7 students

(S1, S3, S5, S7, S11, S12, S17), on the second one there are 9 students (S4, S6, S8,

S10, S13, S14, S15, S16, S18) and on the third one there are 2 students (S2, S9).

The similar situation we encounter if we take as the basis of the ranking learning

outcomes LO1, LO3 and LO4 (there are only 3 positions) and only for learning outcome

LO2 there are 6 positions.

If we have more requirements according to the ranking, e.g. assume that learning

outcome LO1 is the most important, then LO2, LO3 and the least important learning

outcome is LO4, then we can distinguish students and using the lexicographical order

we can prepare the ranking of the students presented in Table 4.

Table 4. Rankings of students.

Position Student Position Student

1 S10 8 S4, S14

2 S6, S18 9 S5

3 S1 10 S8

4 S9, S15 11 S13

5 S2, S17 12 S7

6 S11 13 S3, S12

7 S16 14 –

.

.

Now we can differentiate students and prepare the ranking. Remembering that the most

important learning outcome was LO1, the students who were on the first position in the

ranking based on this learning outcome are on the first 5 positions, so we differentiated

them. It is interesting that student S10 who did not acquire learning outcome LO4 takes

the first position in the ranking. Students who took the third position in the ranking

based on learning outcome LO1 take positions 11-13.

4 First Type Fuzzification

Now, we will fuzzify the relation between learning outcomes and items by letting the

experts who define levels of verifying learning outcomes by items to use values from

the interval [0,1]. The membership functions of the relation between the given learning

outcome and items are presented in Fig. 2.

Fig. 2. The membership functions of the relation between learning

outcomes LO1-LO4 and items.

Now using the crisp relation between students and items and type 1 fuzzy relation be-

tween learning outcomes and items we can calculate, using the S-T composition, type

1 fuzzy relation between learning outcomes and students which values denote levels of

acquirement learning outcomes by students. The values of this relationship are pre-

sented in Table 5.

Table 5. Levels of acquirement of learning outcomes by students.


LO1 LO2 LO3 LO4

S1 0.79 0.68 0.67 0.66

S2 0.83 0.73 0.74 0.79

S3 0.63 0.71 0.73 0.62

S4 0.71 0.73 0.59 0.77

S5 0.72 0.74 0.68 0.79

S6 0.77 0.68 0.55 0.63

S7 0.56 0.69 0.58 0.60

S8 0.75 0.72 0.78 0.77

S9 0.80 0.72 0.61 0.78

S10 0.79 0.63 0.64 0.41

S11 0.67 0.69 0.52 0.63

S12 0.60 0.71 0.72 0.62

S13 0.55 0.66 0.52 0.57

S14 0.72 0.73 0.59 0.78

S15 0.80 0.72 0.61 0.78

S16 0.72 0.66 0.69 0.60

S17 0.81 0.74 0.70 0.79

S18 0.78 0.67 0.57 0.64

Now we can prepare rankings on the basis of levels of acquirement of each learning

outcome LO1 – LO4 (separately) by the students. However, since they should acquire

all learning outcomes, we prepare the ranking based on all of them using the lexico-

graphical order. Assume that learning outcome LO1 is the most important, then LO2,

LO3 and finally LO4. After using this information we can prepare ranking of students

in Table 6.

Table 6. Ranking of students.


1 S2 10 S14

2 S17 11 S16

3 S9, S15 12 S4

4 S1 13 S11

5 S10 14 S3

6 S18 15 S12

7 S6 16 S7

8 S8 17 S13

9 S5 18 –

At first we can notice that the students take different positions in the ranking, only two

students S9 and S15 have got the same position in the ranking. Now the best student is

S2 and the poorest student is S13.

This ranking shows that student S2 is the best one when learning outcome LO1 is

the most important one. Of course, if we choose the different order of importance of

learning outcomes, the ranking will be different. The possibility of preparing different

rankings according to specific criteria is really important for recruitment officers be-

cause they need candidates with specific abilities and skills.

Comparing the fuzzification and the IRT method we can see that the IRT takes into

consideration only the difficulties of items and the examinee’s abilities. Our method

enables to calculate the levels of learning outcomes’ acquirement taking into consider-

ation one, a few or all learning outcomes.

5 Second Type Fuzzification

Now, we fuzzify further the relation between learning outcomes and items by letting

the experts who defined levels of verifying learning outcomes define their own value

belonging to the interval [0,1]. Hence the sample membership functions of the relation

between the given learning outcome and items are presented in Fig. 3.

Fig. 3. The membership functions of the relation between learning out-

comes LO1-LO4 and items.

In order to prepare another ranking of students, we will use type 2 fuzzy relations

[11,13].

Let }6.1,...,0,,...,49.0,5.0{ A be the basic membership for all secondary

membership functions for 4,3,2,1j , 149,...,2,1i and 25,...,2,1k . Let kjm ,

and kjs , denote the average mean and standard deviation of values set by experts for

learning outcome j and item .k Let each secondary membership function of the rela-

tion between learning outcome and item be defined the Gauss function

)/)(exp(),,( ,

2

,1 kjkj smxkjx for each j , k and .Ax

Since students can earn 0 or 1 points, so the secondary membership functions of the

type 2 fuzzy relation between items and students can be the Gauss functions defined as

follows: )/)(exp(),,( ,

2

,2 kiki smxkix for each i , k and Ax , where

1,0, kim and 1.0, kis .

Now let the S-T composition between relations 1R and 2R be defined as follows:

...)),1,()1,,(1(1),,( 213 ixRjxRijxR )),25,()25,,(1( 21 ixRjxR

for each ,j i and x . After the S-T composition, the sample secondary membership

functions of the type 2 fuzzy relation between learning outcomes and students are pre-

sented in Fig. 4.

Student S1 Student S2

Fig. 4. The membership functions of the relation type 2 fuzzy relation

between learning outcomes LO1-LO4 and students.

The next step is to find the method of comparing the results of students on the basis of

the calculated secondary membership functions of the relation between learning out-

comes and students. Let the level of acquirement of learning outcome j by the student

i be equal to first coordinate ijx , of the maximal point of the secondary membership

function. Moreover, as we can see (Figure 3) some of the functions are “slim” and some

are “wide”. Hence we can assume that if the function is “slim”, so the likelihood of the

result is higher and when the function is “wide”, so the likelihood is smaller.

Thus the likelihood of the level of acquiring learning outcome j by student i , called

the range, is defined as follows:

)()(),( 12 aAaAijrange

where

5.0),,(min 131 ijaa Ax and .5.0),,(max 132 ijaa Ax

Thus for student i and learning outcome j we get the pair ( )).,(,, ijrangex ij Hence

we got the set of values of acquiring learning outcomes with their ranges. The pairs of

these values for learning outcomes LO1 and LO2 and all students are put to Table 8.

Table 8. Levels of acquirement of learning outcomes LO1 and LO2 with their ranges.

Students

Learning outcomes

LO1-

value

LO1-

range

LO2-

value

LO2-

range

S1 0.98 0.49 0.96 0.8

S2 0.98 0.49 0.97 0.78

S3 0.98 1.32 0.95 0.82

S4 0.98 1.26 0.97 0.79

S5 0.98 1.26 0.96 0.79

S6 0.98 0.49 0.97 0.81

S7 0.07 0.38 0.96 0.81

S8 0.86 0.32 0.96 0.79

S9 0.98 0.49 0.96 0.8

S10 0.98 0.49 0.95 0.81

S11 0.98 1.26 0.95 0.82

S12 0.07 0.38 0.95 0.79

S13 0.07 0.38 0.94 0.82

S14 0.98 1.26 0.96 0.79

S15 0.98 0.49 0.95 0.81

S16 0.86 0.32 0.97 0.81

S17 0.98 0.49 0.95 0.8

S18 0.98 0.49 0.97 0.81

To prepare the ranking we have to defuzzify the achieved results [11,4]. We assume

that the first value should be greater and for students who achieved the same first values,

the second one should be smaller. Thus on this basis we can prepare the ranking of

students according to each learning outcome but as in the previous sections, we will

present the ranking based on the lexicographic order assuming that learning outcome

LO1 is most important, then LO2, LO3 and LO4. Hence we achieve the following rank-

ing of students presented in Table 9.

Table 9. Ranking of students.

Position of students


1 S2 10 S5

2 S6, S18 11 S11

3 S9 12 S3

4 S1 13 S16

5 S17 14 S8

6 S15 15 S7

7 S10 16 S12

8 S4 17 S13

9 S14 – –

Comparing the rankings presented in Tables 6 and 9 we can notice that the first and last

positions are the same and position of other students are similar. However, using the

type 2 fuzzy relations we have more information because we can also describe the like-

lihood of acquirement of the learning outcomes by given students.

Even if this ranking did not differentiate students S6 and S18, it is not worse than

the previous rankings. Moreover, we have got more information about likelihood of

acquirement of learning outcomes.

6 Results

Nowadays rankings of students are prepared very often. For example, in Poland all

universities to admit students for the first year course prepare the ranking of candidates

based on the results of the final secondary school examination and the grades of specific

subjects. Universities could choose students based on the levels of acquirement of

learning outcomes, not only on the average mean. All students discussed in the paper

are on the same position after the first part of the examinations.

The paper presents different methods of preparing rankings based on the IRT algo-

rithm and three different level of fuzzification of the relation between learning out-

comes and items: crisp (two different position in the ranking), type 1 (we can prepare

the ranking using additionally the lexicographical order of levels of acquirement of

learning outcomes by students) and type 2 (the ranking based on the lexicographical

order of levels of acquirement of learning outcomes by students gives additionally the

information about likelihood of this acquirement).

The IRT method takes into account only difficulties of items and examinee’s abilities

described as learning outcomes. Moreover, when we prepared the ranking based on

examinee’s abilities of all discussed learning outcomes we have got only 3 positions.

The ranking prepared on the basis of the crisp relation between learning outcomes

and items had only 2 positions. After the first and second fuzzifications, we have got

the rankings with 17 positions and in the case of type 2 fuzzy relations we have got

more information (the likelihood of levels of learning outcomes acquirement).

The next step is to find more precise measure than range to calculate the likelihood

of acquirement learning outcomes by students, for example the area between x-axis and

Gauss functions.

References

1. European Higher Educational Area, Homepage, www.ehea.info, last accessed 2016/07/12.

2. Rozporządzenie Ministra Edukacji Narodowej z dnia 23 grudnia 2008 r. w sprawie

podstawy programowej wychowania przedszkolnego oraz kształcenia ogólnego w

poszczególnych typach szkół.

3. Zasady oceniania rozwiązań zadań, Egzamin maturalny w roku 2014/2015, Centralna

Komisja Egzaminacyjna, www.cke.edu.pl, last accessed 2016/07/12.

4. F.B. Baker, F.B.: The basics of Item Response Theory, ERIC Clearinghouse on Assessment

and Evaluation USA (2001).

5. Dobrosielski W.T., Szczepański J., Zarzycki H.: A Proposal for a Method of Defuzzification

based on the Golden Ratio – GR, CONFERENCE IWIFSGN, Cracow 2015, Novel Devel-

opments in Uncertainty Representation and Processing, pp. 75-84, Springer International

Publishing (2016).

6. Duch W., Wieczorek T., Biesiada J., Blachnik M.: Comparison of feature ranking methods

based on information entropy, Proc. of Int. Joint Conf. on Neural Networks (IJCNN), Buda-

pest 2004, IEEE Press, pp. 1415-142 0 (2004).

7. Duch W., Winiarski T., Biesiada J., Kachel, A.: Feature Ranking, Selection and Discretiza-

tion, Proc. Joint Int. Conf. on Artificial Neural Networks (ICANN) and Int. Conf. on Neural

Information Processing (ICONIP), Istanbul, pp. 251-254 (2003).

8. Hambleton R.K., Swaminathan H.: Item Response Theory, Principles and applications,

Springer Science+Business Media, LLC (1991).

http://www.ehea.info/

http://www.cke.edu.pl/

9. Mreła A., Sokołov O., Katafiasz T.: Types of fuzzy relations’ composition applied to vali-

dation of learning outcomes at mathematics during final high school examination, in: Mreła

A., Wilkoszewski P. (ed.): Nauka i technika u progu III tysiąclecia, Wydawnictwo

Kujawsko-Pomorskiej Szkoły Wyższej w Bydgoszczy, Bydgoszcz, pp.119-132 (2015).

10. Нейман Ю.М, Хлебников В.А.: Введение в теорию моделирования и параметризации

педагогических тестов, М., Прометей (2000).

11. Rutkowski R.: Metody i techniki sztucznej inteligencji, PWN, Warsaw (2009).

12. Zadeh L.A.: Fuzzy sets, Information and Control 8, pp.338-353 (1965).

13. Zadeh L.A.: The Concept of a Linguistic Variable and Its Application to Approximate Rea-

soning–1, Information Sciences, vol. 8, pp. 199–249 (1975).

Rankings of Students Based on Experts’ Assessment of ...ceur-ws.org/Vol-1844/10000289.pdf ·...

Documents

Transcript of Rankings of Students Based on Experts’ Assessment of ...ceur-ws.org/Vol-1844/10000289.pdf ·...