Beyond Post-Editing: The Work of the eBay MTLS
-
Upload
jose-luis-bonilla-sanchez -
Category
Internet
-
view
215 -
download
1
Transcript of Beyond Post-Editing: The Work of the eBay MTLS
Beyond Post-EditingHow the eBay MT Language Specialists Reinvent the Linguist’s Role
November 2016
Jose Luis Bonilla Sánchez, eBay MTLS Supervisor
This presentation is about…
MeMachine Translation- Different views- A brief history- MT at eBayThe MTLS- Their place in L10n- Tasks- ProfileThe Future
Who am I?
My Journey
EBAY 1
EFI
ISP AMSTERDAM
GRANADA UNIVERSITY
TLT MADRID
Senior LSLead Translator
BA
Translation
Interpreting
Translator & PjM
LQA
Engineer
EBAY 2
APPLE
MONSTER
MTLS Supervisor
Knowledge
Engineer
LQA
Engineer
SILICON VALLEY LINE
SPAIN LINE
NETHERLANDS LINE
The Views on MT
The Nightmare Scenario
How I see it
A Little History
The “MTree”
Rule-Based MT
Statistical
MT
Phrase-Based
Word-Based
Neural MT
Rule-Based MT
The RBMT Workflow
We Write The Rules
Source
Text
Translation
Lexicographic
Analysis
Syntactic
Analysis
Morphological
Analysis
Target
Text
The Limits
- Too laborious
- Too unique
- Hard limits
Statistical Machine Translation: Cracking the Code
How to Crack the Code
Data
Translation
(search for best
possible
translation)
Text(input)
Text(output)
Language
Model
Translation
Model
Training
Forget linguistics – let’s look for statistical
patterns in bilingual texts.
How?
It’s All about the Patterns
car
car
English text
Auto
Auto
German text
Mein Auto ist rot.
My car is red.
decode
car
Wagen
src -> trg | prob
car -> Auto | 0.9
car -> Wagen | 0.1
The Translation Model finds similarities (patterns) between source and target
languages.
…But you still need a “proofreader”
The Language Model makes it sound “natural”.
My car is red.
English text
My car drives fast
You drive my car
I drive my car
N-gram count
my 4
car 4
is 1
… …
my car 4
… …
drive my car 2
… …
Statistical MT Limits
- OOV (out of vocabulary) words: often “out of domain”
- Idioms:
- Word order problems
Neural Machine Translation
What is Neural Machine Translation?
A particular application of Neural Networks
Neural Networks
MT
Self-Driving Cars
Etc.Script
Recognition
Price Prediction
Some Definitions
AI: A branch of computer science dealing with the
simulation of intelligent behavior in computers.
Machine Learning:
A type of AI that provides computers with the ability
to learn without being explicitly programmed.
Neural Networks: A ML data approach consisting of a
large number of simple, high-interconnected processing
elements (artificial neurons) in an architecture inspired by
the structure of the cerebral cortex of the brain.
How does it work?
Source words are
converted to numbers
and added up (encoded)
to produce a final score
for the whole sentence,
which is then decoded
to the target
2 Parts:
Encoder and Decoder
A Closer Look
1
1
0.5
0.9
1.3
INPUT
LAYER
INTERMEDIATE
(HIDDEN)
LAYER
OUTPUT
LAYER
weightsactivation
function
0.79
0.4
0
weights
0.8
0.2
0.3
0.9
0.5
1
0.73
0.8
0.69
Neural MT has great potential
Vector values keep track of long connections (as opposed to SMT’s n-grams)
Will it be a game changer for translators? We’ll get back to this.
MT AT EBAY
25
Who we are
erspective.
“The world’s
marketplace,
where the world
goes to shop,
sell, and give.”
$2.2BRevenue in Q2 2016
$20.1BGMV in Q2 2016
165MGlobal Active Buyers
56%International
revenue
Q3 2016 data
$9.4BMobile Volume
337MApp downloads
eBay by the Numbers
TRUE GLOBAL COMMERCE
of eBay’s business
is international56%
of commercial
sellers engage in
exporting
95%
27
Localized languages13Countries with an
eBay site +30
Why eBay needs MT
Tim
e t
o M
arke
t
Word Volume
Leg
alLegal
Marketing
Help /
User
Documentation
SW
UI
Member
Communication
(e-mail,Forums)
eBay
Seller Listings
1k 10k 100k 1M
No rush
Asap
(MT-
ready)
The Time-to-Market Issue
Use Cases for MT at eBay
MT at eBay. Linguist’s Perspective. 29
• Search Queries
(eBay MT, automatic)
• Item Titles
(eBay MT, automatic)
• Item Descriptions
(on demand)
• Product Descriptions
(eBay MT, coming up)
• Product Reviews
(eBay MT, coming up)
30
Challenges for MT at eBay
1. Variety of context:
~12K categories on ebay.com
30MT at eBay. Linguist’s Perspective.
31
334
Challenges for MT at eBay 2. User-generated content:
31
• Spelling errors/typos/mixed languagesansung samsug samsumg samung amsung samnsung smsung samsuns …
• SyntaxChattanooga Intelect Xt Vectra 2 Channel Emg Stim Chiropractic Physical Therapy
• Improper, broken Englishull buy em rii nah thru paypal you will buy them right now through PayPal
• Ambiguous brand namesGreen Apple iPhone 6 = Manzana verde iPhone 6?
MT at eBay. Linguist’s Perspective.
ENTER THE MTLS
Date of team creation as part of eBay’s MT initiative
The MTLS by the Numbers
2013
69
Linguists based in the US and Germany
Languages supported: US English, UK English, French, German, Italian, Russian, Brazilian-Portuguese, and Latin American / European Spanish
We are a Hybrid Team
MT Science
TeamL10n MTLS
WHAT DO WE DO?
MTLS ≠ Not Your Regular Linguist
36
Raw MT output
Vendor postedits
MTLS reviewData fed into
the engine
Training data:
Testing data:
Source textVendor
translatesMTLS review
Data used for reference
Vendor Review: Workflow
- We need to process very large volumes.
Vendor Review: Scale
4.5M words in 2016(estimate)
Massive Volumes x Limited Resources = Inventiveness
Our guiding principle: Adding Value
Automation (with OS tools)
Integrating QA Upstream
High-value QA: Intelligent sampling
Error pattern detection
Targeted terminology
Scalability (modular guidelines, trainings)
Examples: Patterns
We use Regular Expressions to locate errors: Plurals
cantos?, cases?, bab(y|ies) Replacing accents
câmera, camera > c.mera Gender agreements nov(o|a)
Synonyms celular – 1332 queries - (cell|phone|mobile) cell – 635, phone – 655, mobile – 474 does not contain any – 56 (only 4%)
Units of measurement contains a digit +”in” and the translation is not there – 5 in <> 5 pol
Detecting acronyms [A-Z]{2,4}
Examples: High-value Terms
Specialized acronyms (NWT, BNWT, NOB…)
Ambiguous brand names
Polysemous words
We add value by improving the most strategic asset:
Linguistic QA
Mistranslated queries = bad search results = less sales
We perform Linguistic QA on MT systems.
queries
We check top unique queries
подарок 8 марта
Russian Shopper’sQuery
Literal MT translation
Corrected MT translation
March 8 gift122 matches
Mother’s Day Gift105,000 matches
Example
Ranking: Comparing the qualityof 2 or more MT systems
Human Judgement: Ranking and Rating MT Systems
Rating: Assigning a qualityscore to the output of a MT system
Sometimes combined.
Just like with post-editing, theactual evaluation work is sent to Vendors.
Human Judgement: Ranking and Rating MT Systems
We add value by QA’ing ourvendors’ results (intra-annotator, inter-annotatoragreement).
Example – tagging an eBay listing title:
Reviewed by MTLS to ensure quality
- Used to identify:
- Brands
- Main item in the listing
- Important aspects of the item (color, material, texture, etc.)
NER: Providing QA for Semantic Annotation
Pottery \& China 380990996167 eBay Google Herend Hungary Handpainted Porcelain QUEEN VICTORIA Leaf Dish Flowers Butterfly
b g m as as su t su/ su
Named Entity Recognition (NER) is the process of tagging words as semantic entities that will be used to improve MT performance.
NER: Providing QA for Semantic Annotation
We add value by providing
targeted vendor QA in
2 stages:
1) Sample vendor’s work at
regular intervals
2) Target tokens (words) likely to cause problems. E.g. we filter tokens by:
- Tagged with multiple labels (e.g. 7 times with “a”, 4 with “g”, etc)
- Tagged only sometimes
- That are polysemous
Innovation
A real-life problem
1) eBay listings have about 20,000 frequent acronyms (NOB, NWT, etc.).
2) The MT engine used to create training data doesn’t know most of them, so it just inserts them in the target text “as is”.NWT White Lace Dress Size XXNWT Vestido Encaje Blanco Talla XX
3) This means our vendor post-editors have to spend a long time researching.
4) Researching the equivalent acronym for each language would take too long.
What do you do?
THE MTLS PROFILE
What makes a good MTLS?
The Human Side of MT
Translator skills
- Linguistic knowledge: command of source and target language grammar and style
- Cultural knowledge: at ease in two worlds (US and target language)
Post-editor skills
- Adaptability to different translation quality requirements
- Speed: To process MT’s vast amounts of output
MTLS-specific skills
- Analytical mind: can detect patterns
- Excellent communicator: Interfaces with MT Science Team and with vendors –“translates” between both
- Versatile: our MTLS have to perform many kinds of tasks (ranking, rating, semantic annotation)
- Process improver: - Analyzes QA process to improve it
- Constantly learns to find new applications to his work
A Particular Set of Skills
THE FUTURE OF LINGUISTS IN MT: NEURAL MT AND BEYOND
Neural MT is Different…
Statistical MT is a White Box technology Neural MT is Black Box
Translation Model
Language Model
Alignment
Others
…But our Role Stays the Same
The Machine Learning Flow will always have Linguist-shaped gaps
…Put Another Way
1900-1980
Translator
PCsWord Processors
TMs, TDs
Internet
MT
Data Science
1980-1990 1990-2007 2007-2015 2015-…
MTLS?
Language Data
Specialist?
Linguistic Trainer?
- Core Linguistic Work:Review & regular LQA
- MT work:Human JudgementLarge-scale QA
- Data Science:Continue and expand semantic annotation services beyond Named Entities (name-value pairs, polysemous words…)
- Innovation: Identify quality gaps and provide data sets fix them (e.g. profanities, idioms, etc.)
…Future Tasks for Linguists in MT
56MT at eBay. Linguist’s Perspective.
Join the conversation: eBay MT Language Specialist Series (https://www.linkedin.com/groups/7011515): >40 articles on MT from a translator’s perspective
…Want to Know More?
57MT at eBay. Linguist’s Perspective.
Visit us at the eBay Tech Blog (http://www.ebaytechblog.com/category/machine-translation/ )
…Want to Know More?
….Or Just Write Me a Letter
Q&A