Lyle F. Bachman Measurement ( Chapter 2 )

Chapter 2 Measurement

◊ Introduction ◊ Definition of terms: measurement, test,

evaluation ♦Measurement ◊Quantification ◊Characteristics

◊Rules and procedures ♦Test

♦Evaluation ◊ Essential measurement qualities

◊ Properties of measurement scales ◊ Characteristics that limit measurement

◊ Steps in measurement

◊ Summary

Instructor: Professor Khoshsima

Presenter: Omidi, A

Sat 19/04/2014

Introduction

◊ In developing language tests, we must take into account considerations and follow procedures

that are characteristics of tests and measurement in the social sciences in general. Likewise our interpretation and use of the results of language tests are subject to the same general limitations

that characterize measurement in the social sciences.

◊The purpose of this chapter is to introduce the fundamental concepts of measurement, an

understanding of which is essential to the development and use of language tests .

◊Fundamental Concepts of Measurement

► The terms ‘measurement’ , ‘test’ , and ‘evaluation’ and how these are distinct from each other

► Different types of measurement scales and their properties

► The essential qualities of measures – reliability and validity

► The characteristics of measures limiting our interpretations of test results

Definition of terms: measurement, test, evaluation

◊The terms ‘measurement’ , ‘test’ , and ‘evaluation’ are often used synonymously; in deed they may, in practice, refer to the same activity. When we ask for an evaluation of an individual’s language proficiency, for example, we are frequently given a test score. This attention to the superficial similarities among these terms, however, tends to obscure the distinctive characteristics of each. So an understanding of the distinctions among the terms is vital to the proper development and use of language tests.

♦Measurement

► Measurement in the social sciences is ‘the process of quantifying the characteristics of persons

.according to explicit procedures and rules’

► This definition includes three distinguishing features :

quantification, characteristics, and explicit procedures and rules

(1 )Quantification

Quantification involves assigning numbers, and this distinguishes measures from qualitative descriptions such as verbal accounts or nonverbal, visual representations. Non-numerical categories

or rankings such as letter grades (‘A,B.C,..’), or labels (for example, ‘excellent , good, average,…’) may have the characteristics of measurement. When we actually use categories or rankings, we frequently assign numbers to them in order to analyze and interpret them.

(2) Characteristics

We can assign numbers to both physical and mental characteristics of persons. Physical attributes

such as height and weight can be observed directly. In testing, however, we are almost always interested in quantifying mental attributes and abilities, sometimes called traits or constructs, which can only be observed indirectly. These mental attributes include characteristics such as aptitude, intelligence,

motivation, field dependence/independence, attitude, native language, fluency in speaking, and

achievement in reading comprehension. Whatever attributes or abilities we measure, it is important to

understand that it is these attributes or abilities and not the persons themselves that we are measuring.

(3 )Rules and procedures

The third distinguishing characteristic of measurement is that quantification must be done according to

explicit rules and procedures. That is, the blind or haphazard assignment of numbers to characteristics

of individuals cannot be regarded as measurement.

►In order to be considered a measure, an observation of an attribute must be replicable:

for other observers ,

in other contexts ,

and with other individuals.

♦Test

Carroll’s (1968)definition of a test:

► A psychological or educational test is ‘a procedure designed to elicit certain behavior from which one

can make inferences about certain characteristics of an individual ’.

►As one type of measurement, a test necessarily quantifies characteristics of individuals according to

explicit procedures .

►What distinguishes a test from other types of measurement is that:

it is designed to obtain a specific sample of behavior .

♦Evaluation

Evaluation can be defined as the systematic gathering of information for the purpose of making

decisions. (Weiss, 1972).

► Evaluation is involved only when the results of tests are used as a basis for making a decision.

► One aspect of evaluation is the collection of reliable and relevant information.

► Evaluation does not necessarily entail testing.

►Tests in and of themselves are not evaluative.

►Tests are often used for pedagogical purposes either as a means of motivating students to study or as

a means of reviewing material taught .

►Tests may also be used for purely descriptive purposes.

Information-providing function of measurement versus decision-making function of evaluation

Essential measurement qualities

Reliability is a quality of test scores themselves .Validity is a quality of test interpretation and use.

unlike physical attributes, such as height, weight, voice pitch, and temperature, we cannot directly

observe intrinsic attributes or abilities, and we therefore must establish our measurement scales by

definition, rather than by direct comparison. The scales we define can be distinguished in terms of four properties:

1. A measure has the property of distinctiveness if different numbers are assigned to persons with

different values on the attribute.

2. It is ordered in magnitude if larger numbers indicate larger amounts of the attribute.

3. The measure has equal intervals if equal differences between ability levels are indicated by equal

differences in numbers.

4. The measure has an absolute zero point if a value of zero indicates the absence of the attribute .

Measurement specialists have defined four types of measurement scales – nominal, ordinal, interval, and ratio – according to how many of these four properties they possess.

Properties of measurement scales

Nominal/ categorical scale

A nominal scale comprises numbers that are used to ‘name’ the classes or categories of a given

attribute. That is, we can use numbers as a shorthand code for identifying different categories.

Nominal scales possess the property of distinctiveness.

► A special case of a nominal scale is a dichotomous scale, in which the attribute has only two

categories, such as ‘sex’ (male and female), or ‘status of answer’ (true and false) on some types of tests.

The attribute ‘native language’

native language background number

Chinese =1

Bengali =2French =3Arabic =4

… …

Ordinal scale

An ordinal scale comprises the numbering of different levels of an attribute that are ordered with

respect to each other.

The most common example of an ordinal scale is a ranking, in which individuals are ranked ‘first’,

‘second’, ‘third’, and so on, according to some attribute or ability.

The use of subjective ratings in language tests is another example of ordinal scales. The points or

levels on an ordinal scale can be characterized as ‘greater than’ or ‘less than’ each other .

Ordinal scales possess the property of ordering in addition to the property of distinctiveness.

The attribute ‘speaking ability’

student rank

Ali 1st

Reza 2nd

Mehdi 3rd

… …

Interval scale

An interval scale is a numbering of different levels in which the distances, or intervals, between

the levels are equal. That is, in addition to the ordering that characterizes ordinal scales, interval scales consist of equal distances or intervals between ordered levels. Thus they possess the properties of

distinctiveness, ordering, and equal intervals.

Ratio scale

The distinguishing feature of a ratio scale is an absolute zero point.

The reason we call a scale with an absolute zero point a ratio scale is that we can make comparisons in

terms of ratios with such scales.

Characteristics that limit measurement

As test developers and test users, we must know that our tests:

► are not perfect indicators of the abilities we want to measure and

► that test results must always be interpreted with caution .

The most valuable basis for keeping this clearly in mind can be found in:

► an understanding of the characteristics of measures of mental abilities and

► the limitations these characteristics place on our interpretation of test scores.

1 .limitations in specification, and

2 .limitations in observation and quantification

These limitations are of two kinds:

Limitations in specification

In any language testing situation, as with any non-test situation in which language use is involved, the

performance of an individual will be affected by a large number of factors. The most important factor

affecting test performance, with respect to language testing, is the individual’s language ability .

► In order to measure a given language ability, we must be able to specify what it is.

♦ This specification generally is at two levels:

(1) At the theoretical level, we need to specify the ability in relation to, or in contrast to, other

language abilities and other factors that may affect test performance.

(2) At the operational level, we need to specify the instances of language performance that we

are willing to interpret as indicators of the ability we wish to measure. This level of specification

defines the relationship between the ability and the test score.

Limitations in observation and quantification

In addition to the limitations related to the underspecification of factors that affect test performance,

there are characteristics of the processes of observation and quantification that limit our interpretations of test results. These derive from the fact that all measures of mental ability are

necessarily indirect, incomplete, imprecise, subjective, and relative.

► Indirectness

► Incompleteness

► Imprecision

► Relativeness

►Subjectivity

Steps in measurement

Interpreting a language test score as an indication of a given level of language ability involves being

able to infer, on the basis of an observation of that individual’s language performance, the degree to which the ability is present in the individual. The limitations discussed above restrict our ability to make

such inferences.

A major concern of language test development is to minimize the effects of these limitations .

To accomplish this, the development of language tests needs to be based on a logical sequence of

procedures linking the putative ability to the observed performance.

◊ This sequence includes three steps:

(1) Identifying and defining the construct theoretically;

(2) Defining the construct operationally,

(3) Establishing procedures for quantifying observations

◊The first step – defining constructs theoretically

The first step in the measurement of a given language ability is to distinguish the construct we wish to

measure from other similar constructs by defining it clearly, precisely, and unambiguously.

This can be accomplished by determining what specific characteristics are relevant to the given

construct .

♦ Two distinct approaches to defining language proficiency:

(1) The ‘real life’ approach:In this approach, language proficiency itself is not defined, but rather, a domain of actual, or ‘real life’ language use is identified that is considered to be characteristic of the performance of competent language users. Example: ILR oral proficiency interview, ACTFL oral proficiency interview

(2) The ‘interactional/ability’ approach:In this approach, language proficiency is defined in terms of its component abilities.Example: the functional framework of Halliday, or the communicative framework of Munby

◊The second step – defining constructs operationally

This step enables us to relate the constructs we have defined theoretically to our observations of

behavior. It involves determining how to isolate the construct and make it observable.

We must decide what specific procedures, or operations, we will follow to elicit the kind of perform-

ance that will indicate the degree to which the given construct is present in the individual.

The theoretical definition itself will suggest relevant operations.

The context in which the language testing takes place also influences the operations we would follow.

For an operational definition to provide a suitable basis for measurement, it must elicit language performance in a standard way, under uniform conditions.

◊The third step – quantifying our observations

This step involves establishing procedures for quantifying or scaling our observations of performance.

◊ The primary concern in establishing scales for measuring mental abilities:

defining the units of measurement

◊ The units of measurement of language tests are typically defined in two ways:

(1) One way is to define points or levels of language performance or language ability on a scale.

(2) Another common way of defining units of measurement is to count the number of tasks

successfully completed.

Relevance of steps to the interpretation of test results

◊The first step – defining constructs theoretically

provides the basis for evaluating the validity of the uses of test scores

◊The second step – defining constructs operationally

the observed relationship among different measures of the same theoretical construct provide the

basis for investigating concurrent relatedness

◊The third step – quantifying our observations

directly related to reliability

Summary

► Fundamental measurement terms and concepts – measurement, test, and evaluation

► Four properties of measurement scales – distinctiveness, ordering, equal intervals and

an absolute zero point

► Four types of scales or levels of measurement – nominal, ordinal, interval, and ratio

► Essential qualities of measurement scales – reliability and validity

► characteristics of measures of ability limiting our interpretation and use of test scores

► Fundamental steps in the development of tests in order to minimize the effects of the limitations

and to maximize the reliability of test scores

Lyle F. Bachman Measurement ( Chapter 2 )

Education

Transcript of Lyle F. Bachman Measurement ( Chapter 2 )