Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT...
-
Upload
ashley-peters -
Category
Documents
-
view
217 -
download
0
Transcript of Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT...
![Page 1: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/1.jpg)
Automated Identification of Preposition Errors
Joel TetreaultEducational Testing Service
ECOLTOctober 29, 2010
![Page 2: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/2.jpg)
Outline
• Computational Linguistics (CL) and Natural Language Processing (NLP)
• NLP at ETS (automated scoring)• Automated Preposition Error Detection
![Page 3: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/3.jpg)
Linguistics
D’oh!
![Page 4: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/4.jpg)
Computational Linguistics
D’oh!
Want computers to understand language
![Page 5: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/5.jpg)
Computational Linguistics
D’oh!
![Page 6: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/6.jpg)
Computational Linguistics vs. NLP
• Computational Linguistics (CL):– Computers understanding language– Modeling how people communicate
• Natural Language Processing (NLP):– Applications on the computer side– Natural: refers to languages spoken by people
(English, Swahili) vs. artificial languages (C++)– Take CL theories and implement them into tools
• CL and NLP often conflated
![Page 7: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/7.jpg)
Computational Linguistics Space
• Computer Science: learning algorithms• Linguistics: formal grammars • Psychology: human processing modeling
CL
![Page 8: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/8.jpg)
Computational Linguistics Space
CL
ArtificialIntelligence
Intelligent Machines
• Perfect speech recognition• Perfect language understanding• Perfect speech synthesis• Perfect discourse modeling• Intention Recognition• World Knowledge• (Vision)
![Page 9: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/9.jpg)
Real World Applications of NLP
• Spelling and Grammar correction/detection– MSWord, e-rater
• Machine Translation– Google and Bing Translate
• Opinion Mining– Extract sentiment of demographic from blogs and
social media
• Speech Recognition and Synthesis• Automatic Document Summarization
![Page 10: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/10.jpg)
NLP at ETS: Motivation
• Millions of GRE and TOEFL tests taken each year
• Tests move to more natural assessment– Fewer multiple choice questions– Tests have essay component
• Problem:– Thousands of raters required – Costly and timely
![Page 11: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/11.jpg)
NLP at ETS
• Use NLP techniques to automatically score essays (e-rater)
• Other scoring tools which use NLP:– Criterion: online writing feedback– SpeechRater: automatic speaking assessment– C-Rater: content scoring of short answers– Plagiarism Detection
![Page 12: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/12.jpg)
E-rater (Automated Essay Scoring)
• First deployed in 1999 for GMAT Writing Assessment
• Operational for the GRE and TOEFL as well as a collection of smaller assessments
• System Performance (5 point essay scale):– E-rater/Human agreement: 75% exact, 98% exact
(+1 adjacent)– Comparable to two humans
![Page 13: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/13.jpg)
E-rater (Automated Essay Scoring)
• Massive collection of 50+ weighted features organized into 5+ high level features
• Each feature is represented by a module:– Simple: collection of manual rules and/or regular
expressions– More complex: NLP (Natural Language Processing)
statistical system is behind the feature
• Combined using linear regression
![Page 14: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/14.jpg)
E-rater Features
![Page 15: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/15.jpg)
E-rater Features
![Page 16: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/16.jpg)
How to Game the System
• Word Salad Detector
• Unusually Short / Off-Topic Essays
“Quick The the over brown dogs fox. Jumped. Lazy”
“Skfhdorla;sf[e’skas as,fr’r;/.,fkrasa”
“I don’t know how to explain this question because I took a nap. Sorry.” “I THINK EVERYONE SHOULD BE ABLE TO WEAR WHATEVER THE HELL THEY WANT TO WEAR.”
![Page 17: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/17.jpg)
NLP for English Language Learners
• Increasing need for tools for instruction in English as a Second Language (ESL)– 300 million ESL learners in China alone– 10% of US students learn English as a second
language– Teachers now burdened with teaching classes with
wildly varying levels of English fluency– Assessments for EFL Teacher Proficiency
![Page 18: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/18.jpg)
NLP for English Language Learners
• Other Interest:– Microsoft Research (ESL Assistant)– Publishing/Assessment Companies (Cambridge, Oxford,
Pearson)– Universities
![Page 19: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/19.jpg)
Objective
• Research Goal: develop NLP tools to automatically provide feedback to ESL learners about grammatical errors
• Preposition Error Detection– Selection Error (“They arrived to the town.”)– Extraneous Use (“They came to outside.”)– Omitted (“He is fond this book.”)
![Page 20: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/20.jpg)
Motivation
• Preposition usage is one of the most difficult aspects of English for non-native speakers – [Dalgish ’85] – 18% of sentences from ESL essays
contain a preposition error– Our data: 8-10% of all prepositions in TOEFL essays
are used incorrectly
![Page 21: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/21.jpg)
Why are prepositions hard to master?
• Prepositions are problematic because they can perform so many complex roles– Preposition choice in an adjunct is constrained by
its object (“on Friday”, “at noon”)– Prepositions are used to mark the arguments of a
predicate (“fond of beer.”)– Phrasal Verbs (“give in to their demands.”)
• “give in” “acquiesce, surrender”
![Page 22: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/22.jpg)
Why are prepositions hard to master?
• Multiple prepositions can appear in the same context:
“When the plant is horizontal, the force of the gravity causes the sap to move __ the underside of the stem.”“When the plant is horizontal, the force of the gravity causes the sap to move __ the underside of the stem.”
![Page 23: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/23.jpg)
Preposition Error Detection
• In NLP: computer system learns from lots and lots of data
• Training Phase: Create a “model” of the problem area– Face detection– Credit Card Usage– Translating from Chinese to English
• Testing Phase: Use model to classify new cases
![Page 24: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/24.jpg)
Baseball Feature Example
• Predict the outcome of the baseball game• Look at all the games where both teams
played each other:• For each game (event), use features:
– Win/loss records before game– Home field advantage– Players’ prior performance
• Train learning algorithm
![Page 25: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/25.jpg)
Baseball Feature Example
Event Winner Location Prior Isotopes Win Streak
Prior Capital City Win Streak
Game 1 Isotopes Springfield 0 3
Game 2 Capital City Springfield 4 0
Game 3 Capital City Capital City 2 0
Game 4 Isotopes Springfield 2 1
![Page 26: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/26.jpg)
Building a Model of Preposition Usage
• Prepositions are influenced by:– Words in the local context, and how they interact
with each other (lexical)– Syntactic structure of context– Semantic interpretation
• Get computer to understand correct usage:– Encode these influences as “features”– Train computer algorithm on millions of examples
of correct usage with the associated features
![Page 27: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/27.jpg)
Deriving the Features
• Derived using NLP tools• Tokenizing
– “He is fond of beer . ”
• Part-of-Speech Tagging– “ He_PRP is_BE fond_VB of_PREP beer_NN ._.”
• Chunking / Parsing– “ {NP He_PRP } {VP is_BE fond_VB } of_PREP {NP
beer_NN } ._.”
![Page 28: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/28.jpg)
Feature Overview
• System uses a minimum of 25 features– Lexical, syntactic, semantic sources– Head words before and after preposition– Words in the local context (+/- 2 words)– Part of Speech (POS) of words above– Combination Features– Parse Features
![Page 29: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/29.jpg)
Preposition Feature Example
Event Prep Prior Verb Prior Noun Following Word
POS of Following Word
Prep 1 of fond <none> beer NN
Prep 2 at arrive <none> the Det
Prep 3 with <none> car the Det
1. He is fond of beer.2. The train will arrive at the Springfield Station.3. The car with the broken wheel is in the shop.
![Page 30: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/30.jpg)
Flagging Errors
• Train learning algorithm on millions of events develop model (classifier)
• Testing (flagging errors)– Derive features– Replace writer’s preposition with all other
prepositions, classifier outputs score for each preposition
– Compare top scoring preposition to score of writer’s preposition
![Page 31: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/31.jpg)
Thresholds
0
10
20
30
40
50
60
70
80
90
100
of in at by with
“He is fond with beer”
FLAG AS ERROR
![Page 32: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/32.jpg)
Thresholds
0
10
20
30
40
50
60
of in around by with
“My sister usually gets home by 3:00”
FLAG AS OK
![Page 33: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/33.jpg)
Performance
• Evaluation corpus of 5600 TOEFL essays (8200 prepositions)– Each preposition manually annotated
• Recall = 0.19 ; Precision = 0.84– 1/5 of errors are flagged– 84% of flagged errors are indeed errors
• Precision > recall to reduce false positives• State of the Art performance
![Page 34: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/34.jpg)
Conclusions
• Presented an overview of:– NLP– NLP at ETS– One feature (Prepositions) in e-rater
• Future Directions– Use of large scale corpora (WWW)– L1-specific models– Train on error-annotated data
![Page 35: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/35.jpg)
Plugs
• ETS/NLP Publications: – http://ets.org/research/erater.html
• 5th Workshop on Innovative Use of NLP for Educational Applications (NAACL-10)– http://www.cs.rochester.edu/u/tetreaul/naacl-bea5.html
![Page 36: Automated Identification of Preposition Errors Joel Tetreault Educational Testing Service ECOLT October 29, 2010.](https://reader030.fdocuments.net/reader030/viewer/2022032803/56649e205503460f94b0b648/html5/thumbnails/36.jpg)
Plugs
• “Automated Grammatical Error Detection for Language Learners” – Leacock et al., 2010– Synthesis Series