Language Testing - Universität Bielefeld · Language Testing Types of Tests. Why do we want to...

Language Testing

Types of Tests

Why do we want to test?

AssessEvaluateTest

Assessment of

peopleprocessesproductsenvironments

Types of tests (test techniques):What comes to mind first?

Multiple choiceTrue-falseFill-in-the-blankShort answerEssay questionsDictationCloze and C-TestSummary

What other types of tests can you think of?

Listening comprehension testComposition testOral interview test Reading comprehension testTranslationGuided composition, summary

Tests are one form of assessment, but other forms include:

Observations of language performancePortfolio assessmentPeer assessmentSelf assessment

Qualitative Assessment

Traditional- Oral interview- Essay/composition- Translation

Portfolio assessment:- Personal context- Learner empowerment- Life-long learning- Individual record of achievement, progress- criticism (creative and repetitive)

Flexible assessor:- self-assessmentpeer-assessment- other-assessment

Testing and assessment fundamentally involve:

Collecting learner dataAnalyzing learner dataUsing the learner results of the data analysis to make interpretations about learners’ language abilities

Thus, types of tests are really types of learner data

Most common types of learner data

Observations of spontaneous language behavior, e.g. in the classroomElicited experimental (or highly structured and controlled) dataElicited clinical (or unstructured) dataElicited metalingual judgmentsSelf report data, e.g. learner diaries

What’s the best kind of test? =What’s the best kind of data?

All types of data are useful, but none tell the whole storyThe more types of data you use, the more confident you can be in your interpretations

Test types can also be distinguished according to their purposes

ResearchDiagnosisPlacement, including entrance testsAchievementProficiencyAptitude

Classic goal-based test categories

Proficiency test:- relative to a given standard, often a qualification test- frequently very large scale

Classic goal-based test categories

The test itself should be tested in terms ofreliability: does it achieve consistent results on various occasionsvalidity: does it test the level and abilities which it purports to testobjectivity: do different testers come to the same conclusionsface validity is desirable (but not necessary)

Achievement test- relative to teaching goals- sometimes very large scale, sometimes very small scale- also tested in terms of reliability and validity

Diagnostic test:- aims to give feedback on performance- based on error detection and error analysis, e.g. grammar vs. vocabulary; or: the verb phrase vs. the noun phrase

Aptitude Tests

Aptitude tests are structured, systematic ways of evaluating how people perform on tasks or react to different situationsAttempt to predict possible success, e.g. the America SAT (Scholastic Aptitude Test) which is used as an entrance level qualification for college/university studies

Aptitude Tests

They are characterized by standardized methods of administration and scoring with the results quantified and compared with how others have done on the same tests

Classic form-based test categories

Written:- objective test (results automatically gradable)

- multiple choice test- cloze (gap-filling) test

- guided test- translation- guided essay

- free test- open answeressay

Oral:- interview- recording-based

- transcription- repetition- response (Q+A, gap-filling, ...)

Multiple choice

Parameters: questions vs. incomplete statementsParameter values:- a subset of possible parameter values (answers) is provided- one is correct; the others are "distractors“

What causes night and day?A. The earth spins on its axis.B. The earth moves around the sun.C. Clouds block out the sun's light.D. The earth moves into and out of the sun's shadow.E. The sun goes around the earth.

(Source: P. M. Sadler, "Psychometric Models of Student Conceptions in Science," Journal of Research in Science Teaching (1998. V. 35, N. 3, pp. 265-296).)

The correct answer is „A“; the other answers are so-called distractors.

If the distractors are (in some sense) equally plausible and randomly ordered, then- the probability of obtaining a correct answer by guessing is 1/n, where n is the number of values provided (i.e. the correct answer plusdistractors)- thus: if there are 4 equally plausible answers, the probability of guessing correctly is ¼ = 0.25.

Multiple choice: example

● Example: The phonemic transcription of sourdough in Southern Educated British English is

1. /sordu/2. /sado/3. /sad/4. /sardf/

The probability of guessing the correct transcription of sourdough is 1/4 = 0.25

Multiple choice: pros and cons....

Pro:– reasonably easy for the teacher to construct– easy for the teacher to correct– easy to evaluate statistically– easy for the student to understand the procedure

....Multiple choice: pros and cons

Con:– tests written knowledge only (unless acoustic media are used)– tests receptive knowledge only– tests metalinguistic knowledge rather than

performance– if the distractors are not carefully selected,

the probability of random guessing may be high

- tends to test knowledge rather than understanding

Cloze Tests....

A gap-completion taskWrittenMany different kinds of gap possible:– letters– parts of words (morphs)– words– word sequences

....Cloze Tests

Gap selection procedure may be– random or systematic

- e.g. gaps every 4-10 words- hard words maybe filtered out- or maybe specific word types are filtered

– manual or computerized

Modified cloze (UBI Entrance Test)

Ten years ago representatives from 178 nations met in Rio to plan how to protect the world's resources. Pledges we____ given t__ safeguard ecosy______, reduce glo____-warming ga______, and pro________humanity thr______ sustainabledevel________. Last ye____ world lea_______, scientists a___ activists m___ again. The agenda: to check whether Rio had changed the world.

What is in a test? Some criteria

Technical:- comparison of two performancesPersonal:- interaction between examiner and examineeInstitutional:- qualification: license to participate in certain activities

Psychological:- performance under highly constrained conditionsLinguistic:- Language understanding and language production- Metalinguistic activity

Finally, test types can be distinguished by scoring approach

Norm-referencedCriterion-referenced

Norm-referenced versusCriterion-referenced

Norm-referenced tests are scored according to how well each person does in relation to the mean (or average) score on the test, e.g. national standard reading tests in the U.S.Criterion-referenced tests are scored according to predetermined criteria, so that a person’s score is not affected by how well everyone else does on the test.

Norm-referenced versusCriterion-referenced

Proficiency tests tend to be norm-referenced.Most other types of tests tend to be criterion-referenced.

Testers do not assume that a test score is a person’s true score

There is always a margin of error in a test, so the true score (the person’s true ability) may be somewhat higher or lower than the test score.The amount that the true score may differ from the test score is calculated in relation to the test’s overall reliability (standard deviation).

Mastery of Reliability

Expert testers and testing companies have mastered the art of test reliability.However, the more important question of test validity remains a challenge for everyone, both novice and expert.

Analogies

How would you test whether a person knows the capital cities of all 50 states?How would you test a person’s ability to play tennis?How would you test whether a person can tie his/her shoe?How would you test whether a person can build a cabinet?

Language Testing - Universität Bielefeld · Language Testing Types of Tests. Why do we want to...

Documents

Transcript of Language Testing - Universität Bielefeld · Language Testing Types of Tests. Why do we want to...