Language Testing (Leeds)1

45
Language Testing Liu Jianda

Transcript of Language Testing (Leeds)1

Page 1: Language Testing (Leeds)1

Language Testing

Liu Jianda

Page 2: Language Testing (Leeds)1

Syllabus

Understand the general considerations that must be addressed in the development of new tests or the selection of existing language tests;

Make their own judgements and decisions about either selecting an existing language test or developing a new language test;

Familiarise themselves with the fundamental issues, approaches, and methods used in measurement and evaluation;

Design, develop, evaluate and use language tests in ways that are appropriate for a given purpose, context, and group of test takers;

Understand the future development of language testing and the application of IT to computerized language testing.

It is expected that, by the end of this module, participants should be able to do the following :

Page 3: Language Testing (Leeds)1

In order to achieve these objectives, the module gives participants the opportunity to develop the following skills:

writing test items collecting test data and conducting item analysis evaluating language tests with regard to validity and

reliability

This is done by considering a wide range of issues and topics related to language testing. These include the following :

General concepts in language testing and evaluation Evaluation of a language test: reliability and validity Communicative approach to language testing Design of a language test Item writing and item analysis Interpreting test results Item response theory and its applications Computerized language testing and its future development

Syllabus

Page 4: Language Testing (Leeds)1

Class Schedule1 Basic concepts in language testing 2 Test validation: reliability and validity (1) 3 Test validation: reliability and validity (2) 4 Test construction (1) 5 Test construction (2) 6 Test construction (3) 7 Test Construction (4) 8 Test Construction (5) 9 Test Construction (6) 10 Rasch analysis (1) 11 Rasch analysis (2) 12 Language testing and modern technology

Page 5: Language Testing (Leeds)1

One 5000 – 6000 word paper on language testing

Collaborative work:You’ll be divided into group of four to

complete the development of a test paper. Each of you will be responsible for one part of the test paper. But each part should contribute equally to the whole test paper. Therefore, besides developing your part, you need to come together to discuss the whole test paper in terms of reliability and validity.

Assessment

Page 6: Language Testing (Leeds)1

Bachman, L. F. & Palmer, A. (1996). Language Testing in Practice. Oxford: Oxford University Press.

Brown, J. D. (1996). Testing in Language Programs. Upper Saddle River, NJ: Prentice Hall Regents.

Li, X. (1997). The Science and Art of Language Testing. Changsha: Hunan Educational Press.

McNamara, T. (1996). Measuring second language performance. London ; New York: Longman.

Website:http://www.clal.org.cn/personal/testing/Leeds

Course books

Page 7: Language Testing (Leeds)1

Session 1

Basic concepts in language testing

Page 8: Language Testing (Leeds)1

Spolsky (1978) classified the development of language testing into three periods, or trends: the prescientific period the psychometric/structuralist period the integrative/sociolinguistic period.

. A short history of language

testing

Page 9: Language Testing (Leeds)1

grammar-translation approaches to language teaching

translation and free composition tests

difficult to score objectively no statistical techniques applied to

validate the tests simple, but unfair to students

The prescientific period

Page 10: Language Testing (Leeds)1

The psychometric-structuralist period audio-lingual and related teaching

methods objectivity, reliability, and validity of

tests considered measure discrete structure points multiple-choice format (standardized

tests) follow scientific principles, have trained

linguists and language testers

Page 11: Language Testing (Leeds)1

The integrative-sociolinguistic period communicative competence

Chomsky’s (1965) distinction of competence and performance Competence: an ideal speaker-listener’s knowledge of the rules of the la

nguage; performance: the actual use of language in concrete situations

Hymes’s (1972) proposal of communicative competence the ability of native speakers to use their language in ways that are not onl

y linguistically accurate but also socially appropriate. Canale & Swain’s (1980) framework of communicative competence:

Grammatical competence, mastery of the language code such as morphology, lexis, syntax, semantics, phonology;

Sociolinguistic competence, mastery of appropriate language use in different sociolinguistic contexts;

Discourse competence, mastery of how to achieve coherence and cohesion in spoken and written communication

Strategic competence, mastery of communication strategies used to compensate for breakdowns in communication and to enhance the effectiveness of communication.

Page 12: Language Testing (Leeds)1

Bachman’s (1990)’s framework of communicative language ability:

Language competence: grammatical, sociolinguistic, and discourse competence (Canale & Swain): organizational competence

grammatical competence textual competence

pragmatic competence illocutionary competence sociolinguistic competence

Strategic competence: performs assessment, planning, and execution functions in determining the most effective means of achieving a communicative goal

Psychophysiological mechanisms: characterize the channel (auditory, visual) and mode (receptive, productive)

The integrative-sociolinguistic period

Page 13: Language Testing (Leeds)1

Oller’s (1979) pragmatic proficiency test: Temporally and sequentially consistent with the

real world occurrences of language forms Linking to a meaningful extralinguistic context f

amiliar to the testees Clark’s (1978) direct assessment: approxi

mating to the greatest extent the testing context to the real world

Cloze test and dictation (Yang, 2002b) Communicative testing or to test communi

catively

The integrative-sociolinguistic period

Page 14: Language Testing (Leeds)1

Performance tests (Brown, Hudson, Norris, & Bonk, 2002; Norris, 1998) Not discrete-point in nature Integrating two or more of the language skills of

listening, speaking, reading, writing, and other aspects like cohesion and coherence, suprasegmentals, paralinguistics, kinesics, pragmatics, and culture

Task-based: essays, interviews, extensive reading tasks

The integrative-sociolinguistic period

Page 15: Language Testing (Leeds)1

Performance Tests Three characteristics:

The task should: be based on needs analysis (What criteria

should be used? What content and context? How should experts be used?)

be as authentic as possible with the goal of measuring real-world activities

sometimes have collaborative elements that stimulate communicative interactions

be contextualized and complex integrate skills with content be appropriate in terms of number, timing, and

frequency of assessment be generally non-intrusive, that is, be aligned

with the daily actions in the language classroom

Page 16: Language Testing (Leeds)1

Raters should be appropriate in terms of: number of raters overall expertise familiarity and training in use of the scale

The rating scale should be based on appropriate: categories of language learning and development appropriate breadth of information regarding learner

performance abilities standards that are both authentic and clear to

studentsTo enhance the reliability and validity of decisions as well as accountability, performance assessments should be combined with other methods for gathering information (e.g. self-assessments, portfolios, conferences, classroom behaviors, and so forth)

Performance Tests

Page 17: Language Testing (Leeds)1

Development graph (Li, 1997: 5)

Page 18: Language Testing (Leeds)1

2. Theoretical issues

Language testing is concerned with both content and methodology.

Page 19: Language Testing (Leeds)1
Page 20: Language Testing (Leeds)1

Development since 1990

Communicative language testing (Weir, 1990)

Reliability and validity Social functions of language

testing

Page 21: Language Testing (Leeds)1

Ethical language testing Washback (impact) (Qi, 2002; Wall, 1997)

impact: effects of tests on individuals, policies or practices within the classroom, the school, the educational system or society as a whole

washback: effects of tests on language teaching and learning Ways of investigating washback:

analyses of test results teachers’ and students’ accounts of what takes place in the classroom (questionnair

es and interviews) classroom observation

Ethics of test use use with care (Spolsky, 1981: 20) codes of practice

Professionalization of the field training of professionals development of standards of practice and mechanism for their implementatio

n and enforcement Critical language testing

put language testing in the society

Page 22: Language Testing (Leeds)1

Factors affecting performance of examinees

Communicative language ability

TEST SCORE

Test method facets

Personal attributes

Random factors

Page 23: Language Testing (Leeds)1

Development since 1990 Testing interlanguage pragmatic knowledge

currently on research level focus on method validation web-based test by Roever

Computerized language testing Item banking Computer-assisted language testing Computerized adaptive language testing

Test items adapted for individuals Test ends when examinee’s ability is determined Test time very shorter

Web-based testing Phonepass testing

Page 24: Language Testing (Leeds)1

Development since 1990

Language testing and second language acquisition (Bachman & Cohen, 1998)

Help to define construct of language ability

Use findings of language testing to prove hypotheses in SLA

Provide SLA researchers with testing and standards of testing

Page 25: Language Testing (Leeds)1

Development of research methodology

Factor analysis The main applications of factor analytic

techniques are: (1) to reduce the number of variables and (2) to detect structure in the relationships

between variables, that is to classify variables.

Therefore, factor analysis is applied as a data reduction or structure detection method

Page 26: Language Testing (Leeds)1

Generalizability theory (Bachman, 1997; Bachman, Lynch, & Mason, 1995)

Estimating the relative effects of different factors on test scores (facets)

The most generalizable indicator of an individual’s language ability is the universe score, however, in real world, we can only obtain scores from a limited sample of measures, so we need to estimate the dependability of a given observed score as an estimate of the universe score.

Page 27: Language Testing (Leeds)1

Two stages are involved in applying G-theory to test development

G-study The purpose is to estimate the effects of the various

facets in the measurement procedure (usually conducted in pretesting).

e.g. persons (differences in individuals’ speaking ability), raters (differences in severity among raters), tasks (differences in difficulty of tasks);

two-way interactions: task x rater different raters are rating the different tasks

differently person x task some tasks are differentially diffucult for di

fferent groups of test takers (source of bias) person x rater some raters score the performance of diff

erent groups of test takers differently (indication of rater bias)

Page 28: Language Testing (Leeds)1

Two stages are involved in applying G-theory to test development

D-study The purpose is to design an optimal measure for

the interpretations or decisions that are to be made on the basis of the test scores (estimation of dependability).

Generalizability coefficient (G coefficient) provides an estimate of the proportion of an individual’s observed score that can be attributed to his or her universe score, taking into consideration the effects of the different conditions of measurement specified in the universe of generalization. But it is appropriate for norm-referenced tests.

For criterion-referenced tests, use phi coefficient. (GENOVA)

Page 29: Language Testing (Leeds)1

Item response theory (Rasch model)

It enables us to estimate the statistical properties of items and the abilities of test takers so that these are not dependent upon a particular group of test takers or a particular form of a test. It is widely used in large-scale standardized test.

Page 30: Language Testing (Leeds)1

Structural equation model (Antony John Kunnan, 1998)

A combination of multiple regression, path analysis and factor analysis

Attempts to explain a correlation or a covariance data matrix derived from a set of observed variables; latent variables are responsible for the covariance among the measured variables.

Page 31: Language Testing (Leeds)1

Basic procedures in SEM (Example from Purpura, 1998)

Examine the relationships between strategy use and second language test performance.

Design two questionnaires for cognitive strategies and metacognitive strategies (40 items)

Ask respondents to answer the questionnaires Respondents take a foreign language test Cluster the 40 items to measure several variables Compute the reliability of the variables Conduct factor analysis to identify factors Conduct SEM analysis (AMOS, EQS, LISREL)

Page 32: Language Testing (Leeds)1
Page 33: Language Testing (Leeds)1

Qualitative method

Verbal report (think-aloud, introspective)

Observation Questionnaires and interviews Discourse analysis

Page 34: Language Testing (Leeds)1

3. Classification of language tests

According to families Norm-referenced tests Criterion-referenced tests

Page 35: Language Testing (Leeds)1

Norm-referenced tests

Measure global language abilities (e.g. listening, reading speaking, writing)

Score on a test is interpreted relative to the scores of all other students who took the test

Normal distribution

Page 36: Language Testing (Leeds)1

Normal Distribution

http://stat-www.berkeley.edu/~stark/Java/NormHiLite.htm

Page 37: Language Testing (Leeds)1

Norm-referenced tests

Students know the format of the test but do not know what specific content or skill will be tested

A few relatively long subtests with a variety of question contents

Page 38: Language Testing (Leeds)1

Criterion-referenced tests Measure well-defined and fairly specific

objectives Interpretation of scores is considered absolute

without referring to other students’ scores Distribution of scores need not to be normal Students know in advance what types of

questions, tasks, and content to expect for the test

A series of short, well-defined subtests with similar question contents

Page 39: Language Testing (Leeds)1

According to decision purposes

Proficiency tests Placement tests Achievement tests Diagnostic tests

Page 40: Language Testing (Leeds)1

Proficiency tests Test students’ general levels of

language proficiency The test must provide scores that form

a wide distribution so that interpretations of the differences among students will be as fair as possible

Can dramatically affect students’ lives, so slipshod decision making in this area would be particularly unprofessional

Page 41: Language Testing (Leeds)1

Placement tests

Group students of similar ability levels (homogeneous ability levels)

Help decide what each student’s appropriate level will be within a specific program

Right tests for right purposes

Page 42: Language Testing (Leeds)1

Achievement tests About the amount of learning that students

have done The decision may involve who will a advanced

to the next level of study or which students should graduate

Must be designed with a specific reference to a particular course

Criterion-referenced, conducted at the end of the program

Used to make decisions about student’s levels of learning, meanwhile can be used to affect curriculum changes and to test those changes continually against the program realities

Page 43: Language Testing (Leeds)1

Diagnostic tests Aimed at fostering achievement by promoting strengths

and eliminating the weaknesses of individual students Require more detailed information about the very

specific areas in which students have strengths and weaknesses

Criterion-referenced, conducted at the beginning or in the middle of a language course

Can be diagnostic at the beginning or in the middle but achievement test at the end

Perhaps the most effective use of a diagnostic test is to report the performance level on each objective (in a percentage) to each student so that he or she can decide how and where to invest time and energy most profitably

Page 44: Language Testing (Leeds)1

Formative assessment vs. summative assessment

Formative: a judgment of an ongoing program used to provide information for program review, identification of the effectiveness of the instructional process, and the assessment of the teaching process

Summative: a terminal evaluation employed in the general assessment of the degree to which the larger outcomes have been obtained over a substantial part of or all of a course. It is used in determining whether or not the learner has achieved the ultimate objectives for instruction which were set up in advance of the instruction.

Page 45: Language Testing (Leeds)1

Public examinations vs. classroom tests

Purpose: proficiency vs. achievement (placement, diagnostic)

Format: standardized vs. open (objective vs. subjective)

Scale: large-scale vs. small-scale (self-assessment)

Scores: normality, backwash