Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of...

15
Using semantic Using semantic associations associations for the detection of for the detection of real-word spelling errors real-word spelling errors Jennifer Pedler Jennifer Pedler School of Computer Science & Information School of Computer Science & Information Systems Systems Birkbeck, University of London. Birkbeck, University of London.

Transcript of Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of...

Page 1: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Using semantic Using semantic associationsassociations

for the detection of for the detection of real-word spelling errorsreal-word spelling errors

Jennifer Pedler Jennifer Pedler

School of Computer Science & Information School of Computer Science & Information SystemsSystems

Birkbeck, University of London.Birkbeck, University of London.

Page 2: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Real-word spelling errorsReal-word spelling errors

There is a There is a boredbored ((boardboard) for messages.) for messages.

She wrote the appointment in her She wrote the appointment in her dairydairy ( (diarydiary).).

One word mistakenly produced for another

Page 3: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Noun Pairs

Word 1 % Total Word 2 % Total

college 98% collage 2%

dinner 96% diner 4%

road 93% rod 7%

manner 76% manor 24%

diary 70% dairy 30%

reactor 69% rector 31%

ear 67% era 33%

lintel 50% lentil 50%

Verb Pairs

Word 1 % Total Word 2 % Total

unite 99% untie 1%

ensure 95% ensue 5%

inflict 95% inflect 5%

expand 94% expend 6%

relieve 91% relive 9%

confirm 87% conform 13%

mediate 86% meditate 14%

carve 84% crave 16%

depreciate 66% deprecate 34%

inhibit 64% inhabit 36%

Confusable Pairs

Page 4: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Semantic ‘flavour’Semantic ‘flavour’

Carve

stone

wood

knife

.

.

oak

walnut

marble

granite

chisel

Crave

man

food

success

.

.

people

chocolate

attention

Page 5: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

material

stone wood

marble granite oak walnut

WordNet RelationshipsHyponymy/hypernymy

ISA relation

Page 6: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

stone, rockstone, rock (countable, as in “he threw a stone at me”)(countable, as in “he threw a stone at me”)

stone, rockstone, rock (uncountable, as in “stone is abundant in New (uncountable, as in “stone is abundant in New England”)England”)

stone stone (building material)(building material)

gem, gemstone, stonegem, gemstone, stone stone, pit, endocarpstone, pit, endocarp (e.g. cherry stone)(e.g. cherry stone)

stonestone (unit used to measure ... weight)(unit used to measure ... weight)

stone stone (lack of feeling...)(lack of feeling...)

WordNet senses for WordNet senses for stonestone

Page 7: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

material

stone

sandstone granite limestonemarble

wood

oak walnut beech ash

stone

pericarp

covering

Stone, rock

stone, pit, endocarp

Branches for two senses for Branches for two senses for stonestone

Page 8: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

material0 (91) P 0.05

stone 19 (39) P 0.02

granite4 P 0.002

marble12 P 0.007

wood10 (52) P 0.03

substance0 (342) P 0.2

entity 0 (1028) P 0.6

Section of final Section of final carvecarve hypernym tree hypernym tree

Page 9: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

entitycarve 0.96 crave 0.04

substancecarve 0.87 crave 0.13

materialcarve 0.99 crave 0.01

foodcarve 0.41 crave 0.59

foodstuffcarve 0.22 crave 0.78

stonecarve 1.0 crave 0.0

Merged treeMerged tree

Page 10: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Scoring at run-timeScoring at run-time

Seventeenth century dolls carved from wood fetch very high prices...

Final Scores:

carve 0.776

crave 0.223

Score

Co-occurrence Distance carve crave

doll 1 0.899 0.101

century 2 0.432 (0.338) 0.568 (0.512)

wood 2 0.990 (0.891) 0.010 (0.009)

price 6 0.768 (0.384) 0.232 (0.116)

Page 11: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

SpellcheckingSpellchecking

Test data: Flob Corpus

I million words

1310 confusables

Flob Original:

Seventeenth century dolls carved from wood fetch very high prices...

Flob Reversed:

Seventeenth century dolls craved from wood fetch very high prices...

Behaviour

Accept as correct

Flag as error

Correct spelling Error

False alarm

Ignored error

Optimum performance:

Minimise false alarms, maximise number of actual errors flagged

Page 12: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

There is a famous early 15th century chest in the chapel bearing a carved scene of two jousters in action.

Scores: carve 0.8326, crave 0.1674 prefer carve

This method in particular has interested dairy farmers because, being alkaline, it helps the animal's rumen to function efficiently.

Scores: dairy 0. 0.6797, diary 0. 0.3203 prefer dairy

Selecting the confusable with the highest score - Examples

Flob Original (confusable correctly accepted):

Black hands caught him, united the rope...Scores: unite 0.2428, untie 0.7572 prefer untie

The ensuring fight is real Douglas Fairbanks Jnr stuff. Scores: ensure 0.3233, ensue 0.6767 prefer ensue

Flob Reversed (error correctly flagged):

Page 13: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

Corpus Flob Original Flob Reversed

Method

Accept

False alarm

Flag

Ignore

Hyper Tree 91% 9% 88% 12%

Choose more frequent

89% 11% 89% 11%

Selecting the confusable with the highest score - Results

Page 14: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

... reforms now in the final making should signify a new era in local government in which results count for more than ructions.

Scores: ear 0.5136, era 0.4864 prefer ear

Setting a confidence threshold

Corpus Flob Original Flob Reversed

Threshold

Accept

False

alarm

Flag

Ignore

> 0.5 91% 9% 88% 12%

>0.7 95% 5% 80% 18%

>0.8 97% 3% 71% 29%

>0.9 99% 1% 51% 49%

>0.99 99.9% 0.1% 5% 95%

Page 15: Using semantic associations for the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University.

A fierce fight ensured.. Scores: ensure 0.0646, ensue 0.9354 prefer ensue

You've bought the manner House and you've got a Ferrari. Scores: manor 0.9951, manner 0.0049 prefer manor

Final Examples

Less frequent word preferred with high level of confidence

But the firm... it's gone bankrupt and we're all out on our ears. Scores: ear 0.0093 era 0.9907 prefer era

The one remaining false alarm