Thom Kiddle & Eaquals members, Assessing oral Proficiency
Transcript of Thom Kiddle & Eaquals members, Assessing oral Proficiency
Speaking test formats and task typesAnthea Wilson, Head of Test Production, Trinity College LondonBelinda Steinhuber, Head of Language Education Department, CEBS, Austria
EAQUALS members’ meeting, Florence 2016
©Eaquals 06/08/20141
Agenda
I. Speaking test formats and task typesII. Construction and validation of criteria
III. Standardisation and monitoring
practices
©Eaquals 06/08/2014 2
1. Speaking test formats and task types
Beyond the examiner-led interview:• What formats can we use to assess
speaking?• What demands do different task types
place on candidates?• What are the implications for reliable
assessment?
©Eaquals 06/08/2014 3
Why use other formats?
• focus on communicative competence• make use of more authentic tasks and
situations• include a greater variety of
communicative functions• widen the scope of task types• action-oriented approach
©Eaquals 06/08/2014 4
Activity
Watch the video and complete the table for the three task types:
• Trinity ISE II Collaborative Task (B2)• CEBS Plurilingual task (Engl. B2/French B1)• Group discussion task (B1)
©Eaquals 06/08/2014 5
ISE II Collaborative taskFor the next part, I’ll tell you something. Then, you have to ask me questions to find out more information and make comments. You need to keep the conversation going. After four minutes, I’ll end the conversation. Are you ready?
My nephew’s school has just announced that all the students might have to learn three foreign languages. I’m not sure this is a good idea.
©Eaquals 06/08/2014 6
PARTICIPANTS
1 Examiner for the second foreign language (e.g. French)
1 Candidate
Interaction
Interaction
1 Examiner for English
Plurilingual Task English and French
TIME FRAME
Preparation minimum 30 min.
Exam12-15 min.
Interaction8-10 min.
Individual Long Turn
4-5 min.
Plurilingual Task English and French
Plurilingual Task English and French
©Eaquals 06/08/2014 9
Rubric in German Input
mostly in German
Situation
Task Long Turn
Task Interaction
Plurilingual Task English and French
TOPIC: Health and Nutrition
Situation
Your school is particularly involved in various activities encouraging a healthy lifestyle. Your class has organized a meeting with students
and teachers from other countries who are also interested in implementing projects in this field.
©Eaquals 06/08/2014 10
Plurilingual Task English and French
InteractionFollowing the presentation you carry on a conversation with the visiting teachers in which you discuss the possibility of working together on interscholastic projects. • Present examples of activities or projects at your school
which promote a healthy and active lifestyle (input 2).• Inquire about similar activities at the schools of your foreign visitors.
• Discuss the possibilities of a joint project.
©Eaquals 06/08/2014 11
Development of marking criteria Tim Goodier, Head of Academic Development, Eurocentres
www.eaquals.org
• Introduction to Eurocentres ‘RADIO’ task oriented assessment
• Interpreting CEFR Table 3 and other relevant sources to form profile categories and maximise pragmatic validity
• Practical considerations for scaled criteria and issues informing update for EAP
• A sample from Eurocentres standardisation materials & criteria for spoken assessment
www.eaquals.org
‘RADIO’ Task orientation
www.eaquals.org
How RADIO fits
Teacher-centred Focus on forms Present, Practice Produce (PPP)
Fluency-centred Planned focus on form ‘Free practice’ Role plays Communicative Drills Grammar Games
Natural
= Task-oriented approach approach
for fluency & assessment
Meaning-centred Focus on task Incidental focus on form Case studies Decision tasks Consensus tasks Simulations
A continuum, not categories
with fixed boundaries
R.A.D.I.O. = R: Range A: Accuracy D: Delivery I: Interaction O: Organisation & interaction
R.A.D.I.O. – group task rationale R.A.D.I.O. group tasks follow three distinct stages:Phase 1: Collaboration. Students work in small groups (2-4) to organise the task, reach a consensus/conclusion and prepare their report. (planning)
Phase 2: Exchange. Groups are remixed in order to report their findings / conclusions (report)
Phase 3: Discussion. Groups discuss either (a) the best solution or (b) discussion questions related to the task topic (discussion)
• Impression (holistic/global)
• Analysis (R,A,D & I)
• Considered judgement
Distilling a workable profiling scheme (R,A,D,I + O)
www.eaquals.org
TABLE 3 OF THE CEFR Phonology scale
Range Accuracy Fluency Interaction Coherence Pron.
Range Accuracy Delivery Interaction Overall
R+A+D: Overall Spoken Production
R+I: Overall Spoken Interaction
Certificate Profile (SP & SI)
Assessor descriptors at 10 levels (including CEFR plus levels)
www.eaquals.org
B1+
B1(CEFRtable 3)
A2+
Key considerations for ongoing update and revision
www.eaquals.org
1. Draw on validated sources, and colour code ‘master’ for future reference2. Use bulleted clusters rather than boxed paragraphs
e.g.
Blue = CEFR, purple = IELTS public descriptors, black = original RADIO, green = EAQUALS, bold = paraphrased from the source.
(Accuracy) (Delivery) Maintains a high degree of grammatical
accuracy. Error-free sentences are frequent.
Some inappropriate word choice and
occasional minor slips but few significant errors.
Uses paraphrase effectively.
Speaks confidently and spontaneously in clear, smoothly-flowing speech.
Descriptions and arguments are easy to follow.
Can vary intonation and place
sentence stress appropriately. Speech is clear and intelligible throughout.
Adaptation to include presentation task types for EAP
www.eaquals.org
Range Accuracy Delivery Interaction Organisation
R+A+D: Overall Spoken Production
R+I or R+O: Overall Spoken Interaction
Certificate Profile (SP & SI)
The ‘Interaction’ and ‘Organisation’ columns both contain the SAME descriptors for argumentation (B1+ to C2).
Structuring planned speaking to achieve a communicative objective with an audience
RADIO Grades • Based on CEFR table 2
distinguishing between spoken interaction and spoken production
• In R.A.D.I.O.:• Spoken Interaction = an average of range and interaction• Spoken Production = an average of range, accuracy and delivery
• Half grades possible, but only full grades on certificate profile
R.A.D.I.O. – Grading a spoken sample
We will now listen to a speaking sample.
Then, look at the mid-high leveldescriptors (5-9) and think about what score you might give each of them.
Rainer (left), Marco (centre) and Andreas (right) will talk about whether sport is bad for relationships and marriage
First think about who you think is lower/higher in level.
RainerA relaxed communicator.
Can initiate discourse and take his turn when appropriate
Can link his utterances into a coherent contribution.
He has a sufficient range of language to express viewpoints without much searching for words, even though many of his utterances have a strong influence from German in both formulation and pronunciation.
He cannot be said to show a relatively high degree of grammatical or lexical control.
Communicates with reasonable accuracy in familiar contexts; generally good control though with noticeable mother tongue influence. Errors occur, but it is clear what he is trying to express.
Eaquals International Conference, 21 – 23 April 2016
Marco
Good interaction skills, and able to produce stretches of language with a fairly even tempo – although can be hesitant. Generally coherent speaker with some impressive turns of phrase for the narrowness of his linguistic base. Weak on accuracy with many past tense and word order mistakes, tends not to elaborate his contribution. Appeared to improve in the course of the activity.
Eaquals International Conference, 21 – 23 April 2016
Andreas
Clearly meets all the B2 criteria on Range, Accuracy, Fluency, Interaction and Coherence. A very controlled, conscious performance showing considerable language awareness for this level. He always gets his point across effectively, though the performance is very self-conscious and a little laboured at times.
Meets the level of accuracy described for B2+ but does not consistently maintain the high degree of accuracy seen at C1, and the hesitancy he showed launching himself into both description and discussion indicates he does not meet the C1 criterion in the area of Delivery.
Alternatives to theory-driven oral assessment criteria gridsThom KiddleDirector, NILE (Norwich Institute for Language Education)
Eaquals Members Meeting, Florence, November 2016
©Eaquals 06/08/201425
Alternatives to theory-driven marking criteria
©Eaquals 06/08/2014 26
“[Theory-driven] approaches generate impoverished descriptions of communication, while performance data-driven approaches have the potential to provide richer descriptions that offer sounder inferences from score meaning to performance in specified domains.”
Fulcher et al (2011)
Potential problems with theory-driven assessment criteria
©Eaquals 06/08/2014 27
• “Reification of ordered scale descriptors” (Fulcher et al, 2011)• Standardisation with abstract concepts• May not relate to specific task demands• Encourages the ‘halo effect’
Halo effect
©Eaquals 06/08/2014 28
Try this experiment from Nobel prize winner, Daniel Kahnemann:
On the next page, you will see descriptions of two people. Read the descriptions and decide which person you view more favourably…
Halo effect
©Eaquals 06/08/2014 29
Alan is: intelligent – industrious – impulsive – critical – stubborn – envious
Ben is: envious – stubborn – critical – impulsive – industrious – intelligent
Implications
©Eaquals 06/08/2014 30
What implications might this have for traditional criteria grid models?
Fulcher et al (2011) propose Performance Decision Trees to incorporate specific reference to data obtained from successful performance on a task (and as a way to include ‘indigenous’ criteria.
©Eaquals 06/08/2014 31
©Eaquals 06/08/2014 32
You bought the product and had the problems shown in the video. Record a voicemail message for the manager of the shop, stating:- What you bought- What the problems were- What you would like them to do about itYou should speak for at least one minute.
Lexical resource (theory-driven)
©Eaquals 06/08/2014 33
Manages to talk about familiar and unfamiliar topics but uses vocabulary �with limited flexibility attempts to use paraphrase but with mixed success�
Has enough language to get by with sufficient vocabulary to express �him / herself with some hesitation and circumlocution on topics such as family, hobbies and interests, work, travel, and current events.
Lexical resource (data-driven)
©Eaquals 06/08/2014 34
Is able to describe the sequence of events using time/sequence markers. �Has sufficient resource to describe two specific problems, either with individual accurate lexis or ‘placeholder names ’ (‘thing’, ‘stuff’, ‘kind of’).Has specific lexis to refer to future action and desired outcome / response.
Can sequence events using, for example, � earlier today / this morning / when I got home / after washing.Can identify concrete nouns and problems using, for example, jeans / washing machine / shrunk / ripped / a hole.Can make demands using, for example, money back, refund, replacement, return.
Challenges with data-driven approach
©Eaquals 06/08/2014 35
• Need for different descriptors for different tasks?• Need for piloting with ‘known masters’ to obtain data?• Need for detailed task familiarity among raters?• Need to establish parallels between task demands?• Need to relate to external frameworks?
www.eaquals.org
TestDaF: The development of standardisation and monitoring practise for ratersClaudia Pop,TestDaF-Institut, g.a.s.t. e.V. Germany
1. Why standardise?2. The TestDaF
Test of German as a Foreign Language3. Rater trainings4. Conclusion
www.eaquals.org
Content
1. Why standardise?
www.eaquals.org
1. Why standardise?
www.eaquals.org
2. The Test of German as a Foreign Language (TestDaF)
Designed for international students applying for entry to an institution of higher education in Germany
Measures German language proficiency at an intermediate to high level (B2.1 to C1.2)
Developed, scored and evaluated at the TestDaF Institute in Germany
Can be taken in the applicant’s home country Administered worldwide since 2001
High stakes setting
www.eaquals.org
2. The TestDaF37,881 participants in 2015 – a plus of 18.8 percent from 2014 to 2015, more than 257,000 participants since 2001
1.190 3.582
7.4988.982
11.052
13.55415.389
16.88218.059 18.528
21.374
24.261
27.166
31.898
37.881
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
www.eaquals.org
Our Test Centres worldwide
Europe / Russian Federation /Turkey
356 230 460
59 45 57
Asia
5 2 4
Australia /New Zealand /Ozeania
Germany
218 149 169 TestDaF-test centres TestAS-test centres onDaF/onSET-test centres
19 14 21
Africa
41 24 66
America
2. The TestDaF
www.eaquals.org
Dev
elop
men
t
Adm
inis
trat
ion
Scor
ing
Statistical analysis
Customer service / transparency of information
2. The TestDaF
www.eaquals.org
Dev
elop
men
t
Adm
inis
trat
ion
Scor
ing
Statistical analysis
Customer service / transparency of information
Standardized format Training and guidelines for
item writers Extensive trialling
procedures for each test version
2. The TestDaF : Development
www.eaquals.org
Testokay?
Piloting
Ready to go
No
Yes
Revision Trialling
Item and task development
2. The TestDaF
www.eaquals.org
Dev
elop
men
t
Adm
inis
trat
ion
Scor
ing
Statistical analysis
Customer service / transparency of information Administration in licenced test
centres Training and monitoring for test
administrators Detailed security instructions and
procedures Inspections
2. The TestDaF
www.eaquals.org
Dev
elop
men
t
Adm
inis
trat
ion
Scor
ing
Statistical analysis
Customer service / transparency of information
Training of raters Monitoring
Calibration materials Regular evaluation of
rater behaviour
2. The TestDaF
1.190 3.582
7.4988.982
11.052
13.55415.389
16.88218.059 18.528
21.374
24.261
27.166
31.898
37.881
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
2 test dates Calibration- /
training session before each test date
www.eaquals.org
2. The TestDaF
1.190 3.582
7.4988.982
11.052
13.55415.389
16.88218.059 18.528
21.374
24.261
27.166
31.898
37.881
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
6 test dates (4+2) Separation:
calibration materials ≠ Rater trainings
www.eaquals.org
2. The TestDaF
1.190 3.582
7.4988.982
11.052
13.55415.389
16.88218.059 18.528
21.374
24.261
27.166
31.898
37.881
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
From now on: 9 test dates per year (6+3)
www.eaquals.org
2. The TestDaF
1.190 3.582
7.4988.982
11.052
13.55415.389
16.88218.059 18.528
21.374
24.261
27.166
31.898
37.881
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Modifications in the standardisation
process for raters
www.eaquals.org
3. The TestDaF: Rater Trainings
Trained raters 10/2016
2010 2012 2014 20160
50
100
150
200
250
300
350
www.eaquals.org
3. The TestDaF: Rater Trainings
2010 2012 2014 20160
1
2
3
4
5
6
7
8
9
10
initial trainings re-trainings
www.eaquals.org
3. The TestDaF: Rater TrainingsInitial trainings, goals:
Explaining construct, format Introducing the TestDaF-criteria and the rating
procedure Operationalizing the process and criteria: rating of
performances and group discussion Raising awareness of rater effects Explaining of the statistical procedures of quality
ensurance (MFR Analysis)
www.eaquals.org
3. The TestDaF: Rater TrainingsInitial trainings, modifications:
Since 2008: e-learning unit to be completed before the actual 2-day training session
Since 2009: Presentation slot on practical and logistical procedures
Since 2013: successful individual rating as a condition to be contracted
www.eaquals.org
3. The TestDaF: Rater TrainingsRe-trainings, goals:
Recollecting the goal (construct) Individual rating of performances and group
discussion Discussing external effects Giving updates about TestDaF-Institut Further training about chosen topics Giving the opportunity to meet “the others” – “rating is a
lonely job”
www.eaquals.org
3. The TestDaF: Rater Trainings
Re-trainings, modifications:
Since 2013: Re-trainings are led by specially trained senior
raters Re-trainings are taking place across Germany Preparation weekend in January of each year
www.eaquals.org
3. The TestDaF: Rater TrainingsFollow up-problem: Raters feel they are losing contact with the TestDaF-staff
Since 2016: Online-consultation hours (Vitero team room) In each assessment phase Separately for Writing and Speaking
www.eaquals.org
Summing up
www.eaquals.org
Calibration session
Summing up
www.eaquals.org
Calibration material
Rater trainings
Summing up
www.eaquals.org
Rater trainings
Initial rater trainings Re-trainings
Online-consultation
hours
Calibration material
Conclusion
www.eaquals.org
Conclusion
www.eaquals.org
Calibration material
Initial rater trainings
Consultation hours
Re-trainings
Online training
www.eaquals.org
Conclusion
Standardisation – a practical example in a lowish-stakes context.Emma HeydermanDirector of EducationLacunza - IH
www.eaquals.org
Our journey• about us• the now• and the future
www.eaquals.org
www.eaquals.org
www.eaquals.org
English & French:• 5• 11• 5,500 (70:30)• 3 hrs / wk• 110• 30
www.eaquals.org
• Que comiencen bien con el inglés, familiarizándose con el idioma en un ambiente ameno, adquiriendo los hábitos de estudio que utilizarán en el futuro.
• Si se sigue la trayectoria Lacunza, al terminar los estudios de secundaria el nivel de vuestro hij@ será de dominio del idioma C1.
www.eaquals.org
Continuous assessment of:
ATTITUDE | ATTENDANCE| PUNCTUALITY
Speaking, Listening, Structure, Vocabulary, Writing.
• A-B Performance above expected level• C ‘On track’• D-E Needs improvement
www.eaquals.org
Speaking• Students are placed in level in September• Their speaking performance is assessed:
• informally through activities in class• formally through at least three assessed speaking
tasks per year• Teachers use our own Speaking & Writing
Assessment Handbook
www.eaquals.org
www.eaquals.org
www.eaquals.org
www.eaquals.org
www.eaquals.org
www.eaquals.org
www.eaquals.org
And next?• complete the training course• but consider the implications for
• teaching and learning(How do these clips inform our reflections on our teaching
and our students’ learning and/or performance?)• evaluation and assessment
(How do these clips inform the decisions we make about evaluation and assessment?)
www.eaquals.org
Thank-you!
www.eaquals.org
www.eaquals.org