The tipping point
-
Upload
andrzej-zydron-mbcs -
Category
Internet
-
view
199 -
download
2
Transcript of The tipping point
![Page 1: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/1.jpg)
The Tipping PointAndrzej Zydroń CTO XTM Intl
Localization World 2014 Vancouver
![Page 2: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/2.jpg)
The Tipping Point
OCR analogy:
• 1978 Kurzweil Computer Products launches OCR
• Initial quality varied average up to 90%- Still quicker and cheaper to retype and proof
• Gradual improvements including extensive use of dictionaries- 1990 quality up to 97%
• 1990’s- Better algorithms, faster processors, cheaper RAM, extensive use
of dictionaries, dynamic training, multiple script support
• 2000 – quality up to 99%
![Page 3: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/3.jpg)
The Tipping Point
![Page 4: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/4.jpg)
Language
![Page 5: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/5.jpg)
Global Demand
12% pa growth
![Page 6: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/6.jpg)
Average Price Paradox
![Page 7: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/7.jpg)
Average Price Paradox
• Automation• More competition• More resources• Better technology• Machine translation
![Page 8: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/8.jpg)
The Translation Puzzle
![Page 9: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/9.jpg)
The Translation Puzzle
![Page 10: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/10.jpg)
The Translation Puzzle
Project Manager requirements:
– Real-time projects• Creation• Tracking• Communication
– Translation assets – TM, Terminology
– Financial management
![Page 11: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/11.jpg)
The Translation Puzzle
Client / Requestor requirements:
– Project creation
– Cost confirmation
– Project tracking
– Quality review
– Translation pick up
![Page 12: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/12.jpg)
The Translation Puzzle
Linguist requirements:
– Work effectively as a team
– Access to the most up to date assets
– Ensure translation quality
– WYSIWYG preview of target files
– Meet deadlines
![Page 13: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/13.jpg)
Putting the Pieces Together
Swift collaboration of all the project contributors with real-time data
sharing and tracking.
![Page 14: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/14.jpg)
Machine TranslationIn a nutshell:– 1950’s IBM/Washington University/Georgetown University
• Transfer systems• ALPAC Report – 1966
– More expensive, slower, less accurate– Ambiguity/Complexity of language– Context
– 1970’s/1980’s• Systran (USAF, Xerox, Caterpillar, European Commission), Canadian
Meteo– Statistical Machine Translation (SMT) 2000’s
• EU funded research: Moses• Statistical/Example based translation (Och, Ney, Koehn, Marcu)
– Big Data: 1million+ aligned sentences
![Page 15: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/15.jpg)
SMT
A great success:
– Google Translate
– Microsoft Translator
– Asia Online
– Safaba
– Tauyou
– DoMY
– Etc.
![Page 16: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/16.jpg)
SMT
Cannot overemphasise the contribution:– European Union– Academic institutions:
• Edinburg University• Carnegie Mellon• Princeton University• John Hopkins University• University of Pennsylvania• CNGL
– Dublin City University– Trinity College– University of Limerick
![Page 17: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/17.jpg)
SMTIn a nutshell:– Based on: Information Theory
• Bayesian theory:
• Translation model– Probability that the source string is the translation of the
target string– Given enough data we can calculate the probability that word ‘A’ is
translation for word ‘X’
![Page 18: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/18.jpg)
SMTLimitations:– You need an awful lot of data– Probabilities are at best a ‘guess’– Word order issues,
• English and German• English Japanese
– Morphology difficulties• Impoverished to rich, e.g. English to Polish
– Terminology– Workflow– Real time retraining
![Page 19: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/19.jpg)
SMTLimitations:
– Currently these are an impediment to further SMT adoption
![Page 20: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/20.jpg)
FALCON:
– EU FP7 funded project
– Federated Active Linguistic data CuratiON
– Members• Dublin City University• Trinity College Dublin• Easyling• Interverbum• XTM International
– Currently half way into 2 year project
![Page 21: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/21.jpg)
– Tight integration • Easyling• TermWeb• XTM
– L3Data• Linked Language and Localisation Data • SPARQL linking and curation of language resources
– Advances in SMT• Adding Babelnet – Lexical Big Data• Dynamic retraining• Optimal segment translation sequence• Forcing terminology (forced decoding)• Workflow integration• L3Data curation and sharing
Lays a golden egg
![Page 22: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/22.jpg)
Babelnet: http://www.babelnet.org
• Lexical Big Data• Sapienza Università di Roma
– Roberto Navilgi– ERC funded project
• Princeton WordNet• Wikipedia• Wiktionary• DBPedia• Google• 9.5 million entries• Equivalents in 50 languages
![Page 23: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/23.jpg)
![Page 24: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/24.jpg)
Moses + Babelnet:
Moses: SMT Big DataBabelnet: Lexical Big DataBabelnet + Moses =
much improved SMTBabelnet + Segment Alignment =
much improved alignment
![Page 25: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/25.jpg)
Dynamic retraining:
– New feature– Moses learns on the fly as translation/post editing
happened– Immediate benefits from translator output
![Page 26: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/26.jpg)
Optimal translation sequence:Prioritize translation for dynamic retraining
![Page 27: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/27.jpg)
Forced decoding:
– Terminology system integration– Prompt the Moses decoder to use a specific term– Immediate benefits for translator
das ist ein kleines <term translation="dwelling”>Haus</term>
![Page 28: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/28.jpg)
Workflow integration:
– Making SMT part of an integrated TMS workflow• Terminology: forced decoding• Babelnet input• Translation Memory• Browser based Translator Workbench• Dynamic retraining• Optimal sequence• Always up to date SMT engines
![Page 29: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/29.jpg)
Workflow integration:
![Page 30: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/30.jpg)
L3Data curation and sharing:
Publish
Correct & refine
Lex-concept lifecycle
Correct & refine
Discover & use
Discover & use
Correct & refine
Bitext lifecycle
Discover data
(Re)train-MT
Revise and annotate
Publish
Content lifecycle
Publish
I18n & source QA
Trans QA
Post-edit
Automated translation
Consume Create
![Page 31: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/31.jpg)
Limits of current technology
– We are making significant progress
• Big Data generated dictionaries
– 9.5 million+ entries
• Phrase based alignment/translation
• Syntax based translation
• Hierarchical phrase based translation
– Marker/Function words
![Page 32: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/32.jpg)
Limits of current technology
– There are limits with current technology
• Syntax
• Morphology
• Grammar
• Statistical anomalies
• Data dilution
• Idioms
• Out of Vocabulary words
• Morphology
– Computers can never ‘understand’ the text
– Next generation systems need a completely approach
![Page 33: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/33.jpg)
John Searle’s Chinese Room
![Page 34: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/34.jpg)
Defining Intelligence
Human vs Computer• Human 200 OPS
• Computer 82,300,000,000 OPS
vs
![Page 35: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/35.jpg)
How the brain works
30 billion cells, 100 trillion synapses
![Page 36: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/36.jpg)
How the brain works
![Page 37: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/37.jpg)
How the brain works
• Trajectory• Velocity• Angle• Wind speed • Direction
![Page 38: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/38.jpg)
How the brain works
![Page 39: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/39.jpg)
How the brain works
![Page 40: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/40.jpg)
How the brain works
![Page 41: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/41.jpg)
Human Intelligence
Jeff Hawkins: On Intelligence 2004 ISBN 0-8050-7456-2• Understanding cannot be measured by external behavior• Understanding is an internal metric of how the brain remembers
things to make predictions• AI programs do not simulate brains and are not intelligent• All intelligence is concentrated in the neocortex and the synapses
that connect different parts of the brain• Intelligence is primarily based on hierarchical pattern matching
starting with an ‘invariant form’: house, animal, dog• All animals exploit patterns in nature
![Page 42: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/42.jpg)
Simulating Human Intelligence
Beyond TuringBiological intelligenceNeocortical architectureNumentaCortical theorySparse distributed architecturePattern matchingHierarchical Temporal Memory
![Page 43: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/43.jpg)
Simulating Human Intelligence
Hierarchical Temporal Sequence Memory:
Regions• Learn sequences of common spacial patterns• Pass stable representations up hierarchy• Unfold sequences going down hierarchy
Hierarchy• Reduces memory and training time• Provides means of generalization
![Page 44: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/44.jpg)
Question and Answer session
Better Translation Technology
![Page 45: The tipping point](https://reader035.fdocuments.net/reader035/viewer/2022062904/5873e2cb1a28abd72e8b6563/html5/thumbnails/45.jpg)
Contact Details
XTM International
www.xtm-intl.com
Register for future Webinar sessions
www.xtm-intl.com/demos
Contact
+44 (0) 1753 480 479