Testing in the time of crisis

download Testing in the time of crisis

If you can't read please download the document

description

BILC PROFESSIONAL SEMINAR Stockholm , October 14 - 17, 2013 INNOVATIVE TEST DESIGNS AND FORMATS. Testing in the time of crisis. Lt.Col . F. Gratton. course of action. Summary :. Past situation (up to Sept 2013) New Course Of Actions adopted A do- it - all software Proposals. - PowerPoint PPT Presentation

Transcript of Testing in the time of crisis

  • Testing in the time of crisisBILC PROFESSIONAL SEMINARStockholm , October 14 - 17, 2013INNOVATIVE TEST DESIGNS AND FORMATS Lt.Col. F. Gratton

  • Summary:Past situation (up to Sept 2013)New Course Of Actions adoptedA do-it-all softwareProposalscourse of action

  • 1:Multilevel test (level 1 to 4)2:Multiple choice questions (60 for L & R)3: No penalties for wrong answers 4:Duration: R 105 / L 905:Separate Sections (& levels)6: # of correct answers multiplied for a coefficient 7: Potential use of F factorRECEPTIVE SKILLS (Listening & Reading)Stanag Proficiency Test 1.0

  • How STANAG levels were awarded(Stanag Proficiency Test 1.0)# of correct answers fixed coefficient (1,66) Multiplied byRECEPTIVE SKILLS

  • 1:Functional language assessed in a global manner 2: Structured interview3:Tailored to the candidate 4: Checks & probes5: 1 to 2 role plays SPEAKING (holistically assessed )PRODUCTIVE SKILLSProficiency Test 1.0

  • Three tasks (one for each level)PRODUCTIVE SKILLSJFLT 1.0WRITING (holistically assessed )

  • Summary:Past situationNew COAs adoptedA do-it-all softwareProposals

  • Specifications Cut-off score Joint DatabaseNew COAs adopted

  • PURPOSEADMINISTRATION PROCEDURESVALIDATION PROCEDURESTEST FORMATLEVELS OF LINGUISTIC KNOWLEDGETEST CONTENT

  • 1:Multilevel test (level 1 to 4)2:Multiple choice questions (60 for L & R)3: No penalties for wrong answers 4:Duration: R 105 / L 905:Separate Sections (& levels)6: # of correct answers multiplied by a coefficient 7: Potential use of F factorRECEPTIVE SKILLS (Listening & Reading)Stanag Proficiency Test 1.0

  • Stanag Proficiency Test 2.0New Key-factors Each section is a mini-test (L & R)Plus levelsPercentages

  • Specifications Cut-off score Joint DatabaseNew COAs adopted

  • JFLT 2.0: RDS (Listening & Reading)

    Section 1 Stanag level 1 (questions from 1 to 15)

    No.Correct answ.

    Level awarded

    123456

    789

    101112131415

    0

    0+

    1

    Section 2 Stanag level 2 (questions from 16 to 30)

    No.Correct answ.

    Level awarded

    123456

    789

    101112131415

    1

    1+

    2

    Section 3 Stanag level 3 (questions from 31 to 45)

    No.Correct answ.

    Level awarded

    123456

    789

    101112131415

    2

    2+

    3

    Section 4 Stanag level 4 (questions from 46 to 60)

    No.Correct answ.

    Level awarded

    123456

    789

    101112131415

    3

    3+

    4

  • EXAMPLELevel 1 (15 questions)Level 2 (15 questions)Level 3 (15 questions)Max score: 45All candidates answer correctly to 30 questionsOld test Withthey would all get thesame score

  • CANDIDATOCORRECT ANSWERSLEVEL 1 (15 Questions)LEVEL 2 (15 Questions)LEVEL 3 (15 Questions)FINAL LEVELBIANCHI301010103ROSSI30111092+VERDI30131252GIALLI3015871+ARANCIONI3015691

  • Specifications Cut-off score Joint DatabaseNew COAs adopted

  • NEW LISTENING & READING ITEMSJDB (JOINT DATA BASE)JOINT EFFORTHOW NOT TO MAKE JOINT EFFORTS

  • Test-writers involvedAccustomed to military environmentLanguage Testing Seminar QualifiedNorming Sessions

  • THE JDB FLOW

  • SEPOCTNOVDECJANFEBMARAPRMAJJUNJUL20122013AUGTIMINGS

  • Summary:Past situationCOA (specs, JDB, cut-off score)A do-it-all softwareProposals

  • WHATS THE DIFFERENCE ?

  • PC-assessment-related terminology

    TermDefinitionStakes

    AssessmentAny systematic method of obtaining evidence (through questions) for a purpose.

    Quiz measures for the purpose of providing feedback to the student.Low

    Survey to determine needs required to fulfill a defined purpose.Low

    Test measures knowledge for the purpose of informing the student on their current levelMedium

    Exam measures knowledge for the purpose of documenting the current level of knowledgeHigh

  • the software is used for:Needs analysis:(surveys) Placement testAny training activityAssessment: First level survey Post-coursePre-certification Certification

  • Softwares system Schema Create questionsAnd organize them in tests using a windows based PC

  • Assessment via BrowserAssessmentDefinitionsQuestions allows to choose:Time limitsFeedback to test-takerStyles (Template)JumpsQuestion shufflingInstructions to test-takers

    Workstation

    Question Manager

    Workstation

    Assessment Manager

  • Assessments also created with authoring manager by selecting Qs previously createdAny question can be chosen from the databaseAssessmentDefinitionsQuestionssu PC Windowsvia Browser

    Workstation

    Question Manager

    Workstation

    Assessment Manager

    Workstation

    Question Manager

    Workstation

    Assessment Manager

  • Softwares system Schema Create questionsAnd organize them in tests using a windows based PCSet security parameters, schedule assessment and link to other (Learning Management Systemsassessment published using any browser, secure browsers, or a PC/MAC Result reports, CIA, graphs, gimmicks, you name it ..

  • Types of Questions: Multiple Choice

  • Likert Scale (for questionnaires)

  • Essay QuestionCandidate can write free text in the space providedTesters will evaluate later

  • Summary:Past situationCOA (specs, JDB, cut-off score)A do-it-all softwareProposals

  • Wide projectSharing experience & capabilitiesOptimization of resourcesNo alternative to B.A.T.Testing in the time of crisis

  • A Bilateral-based CDBA multilateral-based CDBCOMBINED DATABASEor or

  • FLOWTIME SCHEDULESPECS123

  • Thank [email protected]@sclingue.esercito.difesa.it

    *Punto 3 come un costrutto singolo (ogni livello certifica un costrutto separato)

    Assenza delle prove produttive perche non cambiano un granch*Troppi numeri e sottolinea una grossa deficienza nel sistema precedente e che invalidava il test per tutti gli anni che lo uavamo. La valutazione COMPENSATORY (sistema di coefficiente) su una scala di proficiency deve essere NON-COMPENSATORY*Punto 3 come un costrutto singolo (ogni livello certifica un costrutto separato)

    Assenza delle prove produttive perche non cambiano un granch*Punto 3 come un costrutto singolo (ogni livello certifica un costrutto separato)

    Assenza delle prove produttive perche non cambiano un granch*Punto 3 come un costrutto singolo (ogni livello certifica un costrutto separato)

    Assenza delle prove produttive perche non cambiano un granch

    Quando citi REDS o RDS (non sapevo nemmeno io che lavevate cambiato) devi esser pronto a giustificare o spiegare perch questa omissione dellEmerging e come avete stabilito il RDS (ricorda che il REDS era frutto di angoff duranti numerosi sessioni e stats)

    *1: Attributes the level based on the Stanag scale2: Clearly establishes the competence (NO casuality)3: Each level evaluated separately 4: an internal scale within each level 5: Time limit for Reading test: 120

    Punto 3 come un costrutto singolo (ogni livello certifica un costrutto separato*You will notice that, compared to the REDS system there is no emerging***The overarching idea was to crate a common database that we would call JDB or Joint DatabaseNow, the main issue when undergoing a mutual effort is that everybody be on the same Wawe lenght, otherwise the risk is that we . Push towards the same direction

    So, we were lucky enough to count on people who have a common background,

    Having attended at least the LTS (language Testing Seminar)Having an Academic **AssessmentAny systematic method of obtaining evidence (through questions) for a purpose.

    Quiz measures for the purpose of providing feedback to the student.Low

    *Needs analysis:(surveys) Che cosa si vuole imparare?Placement test Aiuta a scegliere il corso pi adatto alle esigenzeAttivit Formative Fornisce feedback per aiutare lapprendimentoPre-corsi o Test dingresso Quali sono le attuali conoscenze?Assessment: Sondaggio di 1 Livello (smile sheets)E stato divertente imparare?Post-corsoQuanto stato assimilato?Pre-certificazione/PraticaPreparazione per passare test di certificazioneCertificazione

    **Use highlighting on this to show the flow chart.

    Start with Assessment Creation write the question and gather them into tests.Next is Administration who takes the tests when, how they are scheduled and how they are registered.Third is Question Delivery the participants taking the test and how we get the answers back from the student.Reporting is fourth To participants, instructors and managers. Secure reporting so managers can see what they are doing.This is a complete cycle because you may need to change the questions or sessions after you look at the statistics. Stats stored in database so can be viewed at any time

    *Hopefully should be about half past the hour now.

    Highlight things on this page. Questions are in topics that are independent of sessions.

    Time limits show a countdown timer, then 5-10 minutes before test is over, gets message saying time is almost up, then when time is up, literally grab the test out of their hands, it is closed automatically. Templates control look and feel of session. Typically people will spend a lot of time on template design initially, then just finesse it as needed. Perception offers template consulting service if you want.Jumps, adaptive testing, if they got 1 and 2 right, jump to 5, if not, then jump to 3. Or surveys can customize path thru survey based on responses.Golden questions. Set up single question, then randomize rest so that golden question will not show up.End procedures are things in score bands, where to go based on score, branch to another test, a web site, send email to supervisor, what feedback, etc. *Use highlighting on this to show the flow chart.

    Start with Assessment Creation write the question and gather them into tests.Next is Administration who takes the tests when, how they are scheduled and how they are registered.Third is Question Delivery the participants taking the test and how we get the answers back from the student.Reporting is fourth To participants, instructors and managers. Secure reporting so managers can see what they are doing.This is a complete cycle because you may need to change the questions or sessions after you look at the statistics. Stats stored in database so can be viewed at any time

    *E consentita la selezione di una sola scelta tra tante.*Several paragraphs, up to 30,000 characters. Instructor sets up rubrics for scoring in advance, x points for spelling, content, etc. Then manually grades it and includes it in the score.**