Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational...

27
Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College and the Graduate Center

description

Introduction (1) Schools may have more after school sports. (2) I went to the dentist after school today. (3) My father like play basketball with me. Missing Hyphens :

Transcript of Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational...

Page 1: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Detecting Missing Hyphens in Learner Text

Aoife Cahill, SusanneWolff, Nitin MadnaniEducational Testing Service

ACL 2013

Martin ChodorowHunter College and the Graduate Center

Page 2: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Outline

Introduction Baselines System Description Evaluation Conclusions

Page 3: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Introduction

(1) Schools may have more after school sports.

(2) I went to the dentist after school today.

(3) My father like play basketball with me.

Missing Hyphens:

Page 4: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Outline

Introduction Baselines System Description Evaluation Conclusions

Page 5: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Baselines

(1) Collins Dictionary

(2) More than 1,000 times in Wikipedia

(3) Probability of the hyphenated form as estimated from Wikipedia is greater than 0.66

Page 6: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Outline

Introduction Baselines System Description Evaluation Conclusions

Page 7: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Learner text: Schools may have more after school sports.

Page 8: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Model:Logistic regression model

Probability:Only predict a missing hyphen error when the probability of the prediction is >0.99

Page 9: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

SJM-trained: - San Jose Mercury News corpus

- For training, hyphenated words are automatically split (i.e. well-known becomes well known)

- The training data contains 1% of the positive examples and 3% of the negative examples

Page 10: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Negative examples selected:Only contexts that occur more than 20 times are selected during training.

Page 11: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Wiki-revision-trained: - Wikipedia articles

Page 12: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Page 13: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Page 14: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

System Description

Combined: - Combine both data sources

Page 15: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Outline

Introduction Baselines System Description Evaluation Conclusions

Page 16: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Artificial Data: - Brown corpus

- taking 24,243 sentences

- 2,072 hyphenated words

Page 17: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Page 18: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Page 19: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Learner Text: - CLC-FCE

- The corpus contains 1,244 exam scripts - Totally 173 instances of missing hyphen errors

Evaluation 1

Page 20: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Page 21: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Page 22: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

There are 131 true positives for the learner data reveal that 87 of these are cases of a single type, the word “make-up”.

Page 23: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Evaluation 2Learner Text: - A data set of 1,000 student GRE and TOEFL essays

- Drawn from 295 prompts - Ranged in length from 1 to 50 sentences - Average of 378 words per essay

Page 24: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Learner Text (Cont.): - Manually inspect a random sample of 100 instances where each system detected a missing hyphen

- Two native-English speakers judge

- Using the Chicago Manual of Style as a guide - High agreement

Page 25: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Evaluation

Page 26: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Outline

Introduction Baselines System Description Evaluation Conclusions

Page 27: Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Conclusions

1 ) Automatically detecting missing hyphen errors in learner text

2 ) The classifiers generally performed better than the baseline systems

3 ) Taking context into account when detecting the errors is important.