Developing a Specialized Corpus for ATA Translator ...ruttan/capstone/lectures/translation.pdf ·...
Transcript of Developing a Specialized Corpus for ATA Translator ...ruttan/capstone/lectures/translation.pdf ·...
Developing a Specialized Corpus for ATA Translator Certification Examinations
Geoffrey S. Koby Kent State University,Ohio, USA
Slide 2 of 41
American Translators AssociationTranslator Certification Exam
• Examinations currently handwritten(transitioning to computerized)
• Data starting with 2006 testing year
• Design of certification exam determines corpus design
Slide 3 of 41
Exam Design• 23 language pairs• Language > English or English >
language• Three source texts - about 250 words • Candidates translate two passages:• Mandatory general passage (labeled
“A”)• Either a semi-technical passage (“B”)
or a commercial/legal passage (“C”)
Slide 4 of 41
Annual Statistics• In 2006, 535 candidates• Spanish>English/English>Spanis
h about 55%• 1070 passages (x two graders)• 2140 marked passages• Estimated 535,000 words for
2006• 7,585 examinations 2001-2011• Corpus grows 690,000 words
per year
Slide 5 of 41
American Translators AssociationTranslator Certification Exam
• Graders mark passages• Error codes indicate
category/severity • Target text shows scope, type
and severity of various translation errors
Slide 6 of 41
Sample Marked Target Text Error scope
marked explicitly
Error type and severity marked
explicitly
Slide 7 of 41
Unique features of the corpus
• Multidirectional parallel corpus• All exam texts selected according
to same criteria• Common language is English• One-to-many alignment between
one source and multiple target texts
• Range of quality• Explicit error markings
(scope, severity, category)• Designing a database tool:
Slide 8 of 41
Op
enin
g S
creen
Slide 9 of 41
En
ter
New
Sou
rce
Slide 10 of 41
Revie
w S
ou
rce T
exts
Slide 11 of 41
Revie
w S
ou
rce T
ext
Slide 12 of 41
New
Exam
Reco
rd
Slide 13 of 41
Targ
et
Text
Qu
eue
Slide 14 of 41
En
ter
Targ
et
Text
Target Passage Input
Submit Target
Slide 15 of 41
Targ
et
Text
Qu
eue
Slide 16 of 41
Revie
w T
arg
et
Pass
ag
eTarget Passage Input
Submit Target
Slide 17 of 41
Gra
din
g Q
ueu
e
Slide 18 of 41
Grading Screen
Slide 19 of 41
Entering an Error
Slide 20 of 41
Enter Error Category
Slide 21 of 41
Enter Error Severity
Slide 22 of 41
ATA
Flo
wch
art
Slide 23 of 41
Normal/Disjunct Error
Slide 24 of 41
Vie
win
g G
rad
ed
Exam
s
Slide 25 of 41
List
of
Gra
ded E
xam
s
Slide 26 of 41
Viewing Graded Exam
Slide 27 of 41
Viewing Error List
Slide 28 of 41
Analy
sis
Op
tion
s
Already coded queriesUser-guided queriesStatistical analysesKeyword in Context
Slide 29 of 41
Analysis Options - Preliminary
Pass-fail analysis Average score analysis
Error analysis
Slide 30 of 41
Sele
ct S
ou
rce L
an
gu
age
Slide 31 of 41
Sele
ct T
arg
et
Lan
gu
age(s
)
Slide 32 of 41
Pass
/Fail
An
aly
sis
Slide 33 of 41
Avera
ge S
core
An
aly
sis
Slide 34 of 41
Err
or
An
aly
sis
by P
art
Slide 35 of 41
Err
ors
by Q
uart
er
Slide 36 of 41
Keyw
ord
in C
on
text
Not yet implemented
Slide 37 of 41
Handel, Clearing und Settlement sehr zu begrüßen und ökonomisch sinnvoll, so dass mit
trade, clearing and settlement is a very welcome trend and makes economic sense to both GID024:1A
trading, clearing, and settlement is very much to be greeted and economically makes sense
GID024:4MT-GID156:4MT
trading, clearing and settlement is quite welcome and makes economic sense to large inves
trade, clearing and settlement welcoming and economically sensible, which means that furt
GID023:MT4-GID165:4MT
trading, clearing, and settlement are very welcome and make economic sense so that ad
GID023:2MT/MU/G-GID024:1G GID024:1P
trade, clearing and settlement is very welcome and economically sensible, such that further GID023:2MT-GID200:2MT-GID165R:2MT GID023:1U-GID200:1U-GID165R:U1
Trading, Clearing, and Settlement is very appreciated and economically sensible, so that aGID156:1C-GID200:1C GID156:2MT-GID200:1U+2MT GID200:1L
trade, clearing and settlement is very desirable and makes economic sense for both for the
GID 023:MT8 Candidate lost his place, destroyed sense of text. – GID165: Mixed up reading of source text 4 MU
commerce , Clearing and Settlement is highly appreciated and economically useful , so thGID165:2MT|GID165:1C|GID165:1C|GID156:2MT-GID165:2MT GID156:2MT-GID165:2MT
KW
IC w
\Mark
ed
Err
ors
Slide 38 of 41
Summary and Outlook
• Multidirectional parallel corpus of translation exams
• One-to-many alignment between one source and multiple target texts
• Future research possibilities
Slide 39 of 41
Future Research Possibilities 1
• Sorting/analysis of translation errors as marked by the graders.
– error type, error severity, language pair, etc.
• Patterns of errors, types of errors, etc.
• Analyze terms by Part of Speech• Search, view, analyze by various
tags• Pass/fail rates & score ranges by
language pair and for the corpus as a whole
Slide 40 of 41
Future Research Possibilities 2
• Clustering of score ranges above and below the pass/fail boundary
• Inter-grader/intra-grader consistency and variability in error discovery/assessment
• Congruity in discovery of translation errors
– (i.e., the ratio of errors marked by both graders to those marked by only one of the two)
• Investigation of congruity in marking of severity and category of translation errors
Slide 41 of 41
Thank You for your Attention! • Dr. Geoff Koby – [email protected]
• For a complete description of the ATA Examination Program, see:
Koby, Geoffrey S. and Gertrud G. Champe. “Welcome to the Real World. Professional-Level Translator Certification.” Translation & Interpreting 5:1 (2013): 156-173