Post on 18-Dec-2015
L10N Standards
Warszawa 2014
http://maturebabespics.com/http://maturebabespics.com/
Why Standards?
Why have Standards?
L10N Standards
What are we going to cover:
1. Why L10N standards are important2. The role XML has to play3. Key L10N standards data standards4. How to leverage L10N standards5. Creating a totally data driven automated L10N process6. Interoperability
Why have Standards?
Current State of Art
L10N Typical Workflow
What you need is a better crane!???
Localization without Standards
Customer
source text
source text extract extracted text tm
process
prepared text
translatetranslated text
target texttarget text
merge target text
QA
True Cost of Translation
Standards = Uniform Data
ISO Standard
Standards = Efficiency
Standards = Lower Costs
Standards = Safe to Implement
Standards = Greater Interoperability
Standards: Unforeseen Benefits
Standards: Unforeseen Benefits
Standards: Misuse
imap://azydron%40xml-intl%40xml-intl%2Ecom@xml-intl.com:143/fetch%3EUID%3E.INBOX%3E87222?part=1.2&filename=image003.jpg
Standards: Abuse
Standards: Sabotage
• Sabotaged Standards:• Proprietary extensions• Bad implementations
The importance of XML
Everything is now XML• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization
The power of XML
Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.
And then back into the original format
Benefits of XML for L10N
• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization
The significance of XML
• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootstrap its own localization
Benefits of XML for L10N
Why use XML for Localization?• Most localizable documents are now in XML• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine
Core L10 Standards
• W3C ITS Document Rules
• ETSI LIS SRX
• ETSI LIS xml:tm
• ETSI LIS TMX
• ETSI LIS TBX
• ETSI LIS GMX
• OASIS XLIFF
• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)
• Linport Interoperability: TIPP XLIFF:doc
ITS
• Internationalization and Localization Tag Set– http://www.w3.org/International/its
• Internationalization Tag Set – Document Rules for a given XML vocabulary:– Inline elements (within text)– Sub flows– Non-translatable– Translatable attributes
• Guidelines for localizing XML documents• Internationalization and Localization Markup Requirements• Version 1.0, 2008• Version 2.0, 2013
• http://www.etsi.org/deliver/etsi_gs/lis/001_099/002/01.04.02_60/gs_lis002v010402p.pdf
• Translation Memory Exchange• Current version 1.4b, 2.0 undergoing review• Allows for the interchange of translation
memories between different vendor systems– No translation vendor lock-in– Free exchange of translation assets
TMX
• First LISA OSCAR Standard– Version 1.1 1998 – Version 1.2 1999– Version 1.3 2001– Version 1.4b 2002
• Moved to ETSI/LIS 2012– Version 2.0 2014?
• Two level of implementation:– Level 1 (Plain Text Only) – Level 2 (Content Markup)
TMX History
http://www.gala-global.org/oscarStandards/srx/srx20.html
• Segmentation Rules Exchange
• Current version 2.0 2008
• How sentences are segmented
• Allows for the exchange of segmentation rules using regular expressions
• Complements TMX standard
• Quoted XLIFF, TMX and xml:tm
SRX
• Unicode Regular expression syntax defined• Meta characters – Unicode regular expressions: "\
X", "\s", "\S" etc. • Operators – "*", "|", "?", "+" etc.• Defines:
– Language rules: segmentation rules– Map rules: how to apply the segmentation rules
SRXKey Concepts
GMX
http://docbox.etsi.org/ISG/Open/ISGLIS/GMX-V/GMX-V/GMX-V-2.0.html
• Global Information Management Metrics eXchange
• GMX/V Approved LISA OSCAR Standard February 2007
• Tripartite– GMX-V : Volume, published for public comment
– GMX-C : Complexity, initial specification
– GMX-Q : Quality
• Standard for defining a L10N job
• Allows for quantifying job complexity
• GMX/V 2.0 Approved ETSI LIS
– added support for CJK word counts
– overall character count including white space characters
• GIM Metrics eXchange – Volume• Objectives:
– Unambiguous and verifiable definition of word and character counts
– A method of exchanging counts within an XML framework
• Two types of count:– Verifiable, based on electronic documents– Non-verifiable
• Canonical form: XLIFF based• Word boundaries: Unicode TR29• Unicode character encoding• Minimum conformance
– Total Character Count– Total Word Count
GMX-V
XLIFF
http://www.oasis-open.org/committees/xliff• XLIFF – XML Localization Interchange File Format• Current status
– XLIFF 1.1 Committee Specification (31 Oct 2003)– XLIFF 1.2 Approved as an OASIS Standard 2008
• Segmentation support• (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1.
Representation Guide• Java / Windows / .Net Representation Guide
– XLIFF 2.0 currently out for public comment (not backwards compatible)
XLIFF
• Single format for exchanging L10N from disperate sources
• Loss-less• Tool-neutral• Formalized as an XML vocabulary • Can embed skeleton file
XLIFF
xml:tm
http://www.xtm-intl.com/manuals/xml-tm/xml-tm2.0.html
• XML based Text Memory– Radical rethink of how to handle Translation Memory– Donated by XML INTL to LISA OSCAR– OSCAR Standard Feb 2007– Adopted by ETSI LIS, version 2.0 ready for adoption
• Takes the DITA reuse principle down to sentence level– Author Memory– Translation Memory
xml:tm - Namespace
• Namespace is a major feature of XML• Allows the mapping of different ontological entities
onto the same representation• Allows different ways to look at the same data• Namespaces can be made transparent
xml:tm
• XML based text memory• Revolutionary approach to translating XML
documents• First significant advance in translation memory
technology• Uses XML namespace to transparently embed
contextual information• The one ring that binds them all
xml:tm namespace
Example of the use of tm namespace in an XML document:
<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
xml:tm namespace
docdoc
titletitle
sectionsection sectionsection
parapara
tmtm
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
Source document tm namespace
viewtete texttexttututexttext
tete sentencesentence sentencesentencetutu tutu
parapara texttext
parapara texttext
parapara texttext
parapara texttext
parapara texttext
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
texttext
Source document view
xml:tm Text Memory
• Author memoryMaintain memory of source textAuthoring statisticsAuthoring tool input
• Translation memoryAutomatic alignmentMaintain perfect link of source and target textReduce translation costs
xml:tm DOM differencing
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Original Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
modified
new
Updated Source Document
DOMDifferencin
g
xml:tm translated documentin Polish
docdoc
titletitle
sectionsection sectionsection
parapara
tmtm
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
Translated document tm namespace
viewtete tekstteksttututeksttekst
tete zdaniezdanie zdaniezdanietutu tutu
parapara teksttekst
parapara teksttekst
parapara teksttekst
parapara teksttekst
parapara teksttekst
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
teksttekst
Translated document view
Putting It All Together
• Open Architecture for XML Authoring and Localization (OAXAL)
– http://wiki.oasis-open.org/oaxal/FrontPage
OAXAL 2.0
OAXAL 2.0
OAXAL Benefits
• SOA (Service Oriented Architecture) Open Architecture
• Open Standards - Open APIs
• Easy Exchange
• Modular design
• Interoperability
• Very high level of automation
Interoperability Now!/Linport
Interoperability Now!http://www.interoperability-now.org/• Born out of frustration and necessity• Early 2012• Members
• Bioloom Group• Kilgray• Medtronic• Ontram• Spartan Software• XTM-INTL
• The goal:• True 100% roundtrip interoperability between TMS/CAT tools
• Now part of Linport
Interoperability Now!/Linport
Linporthttp://www.linport.org/• Language INteroperability Portfolio• Created in 2012 by the merging of two initiatives:
• Multilingual Electronic Dossier• The Container Project
• Sponsored:• the European Union DG Translation• JAIMCATT (http://jiamcatt.org/) -
• Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology
OAXAL in Action
Translating English Soccer Articles into
Arabic 24x7
Translating English Soccer Articles into
Arabic 24x7
Browser-Based Workbench
OAXAL In Action
• Contact details:• Andrzej Zydroń• azydron@xtm-intl.com• http://www.xtm-intl.com