Coping with Babel How to Localize XML. Designing for Localization Document design can seriously...

Coping with Babel

How to Localize XML

Designing for Localization

• Document design can seriously impact the costs of translation and localization.

• Remember that you are designing for all languages, not just English.

• There are clear do’s and don’ts.

• Overriding principle is good XML practice.

• Always consider the target language implications.

Entity references

Do not use entity references for word substitution:

<para>Use a &tool; to release the catch.</para>

• Cause problems for inflected languages

• Cause problems for parsing/translation tools

• Use boiler plate text instead

Translatable attributes

Avoid using translatable attributes:<para>Use a <tool id="a1098" name="claw hammer"> to release the CPU retention catch.</para>

• Cause problems for inflected languages

• Cause extra burden for translators

• More to go wrong

CDATA sections

Avoid using CDATA sections that may contain translatable text:

• Lose syntactical control

• How are translation tools to cope?

Processing instructions

Avoid Processing Instructions in translatable text:

<para>Use a <?tool name="claw hammer"?> to release the CPU retention catch.</para>

• Syntactically week

• Confuse translation memory operations

Infinite Naming Schemes

Avoid the use of infinite naming schemes:<resources xml:lang="en">

<err001>Cannot open file $1.</err001>

<hint001>Hint: does file $1 exist.</hint001>

<err002>Incorrect value.</err002>

<hint002>Hint: Must be between $1 and 2.</hint002>

<err003>Connection timeout.</err999>

</resources>

• No clear element definitions

Typographical elements

Avoid the use of "typographical" elements:<para>Do not use type elements.</para>

• Bad XML practice.

• Causes problems for translators.

• Target language text may be in the opposite order.

Do not break sentences

Never break a linguistically complete text unit over more than one non-inline element:

<para>

<line>This text should not be</line>

<line>broken this way – the translated text may well be in a different order.</line>

</para>

XML Translation Standards

• LISA - Localization Industry Standards Association: http://www.lisa.org

• OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org

• W3C - World Wide Web Consortium: http://www.w3c.org

• OLIF Consortium: http://www.olif.net

LISA Standards

• TMX - Translation Memory Exchange format: http://www.lisa.org/tmx

• TBX - Termbase Exchange format: http://www.lisa.org/tbx

• SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx

• GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx

OASIS L10n Standards

• XLIFF - XML Localization Interchange File Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

• TransWS - Translation Web Services: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws

W3C and OLIF

• W3C to start on Localization Directives standard.

• OLIF - Open Lexicon Interchange Format: http://www.olif.net

xml:tm

XML Text Memory

A radical new approach to translating XML documents

• Machine Translation

• Translation Memory

• Hybrid Linguistic Inferencing Engines

• Terminology

Computational Linguistic Methodologies

Translation memory

• Advent in early 1980’s

• Intermediate format

• Alignment

• Storage

• Leveraged memory

• Fuzzy matching – statistical

• Advantages: cost reduction, consistency

• Drawbacks: proofreading, managing memories

• No significant advances in technology

XML namespace

• Major new feature of XML compared to SGML• Allows the mapping of different ontological

entities onto the same representation

• Allows different ways to look at the same data• Namespaces can be made transparent

xml:tm namespace

• Text Memory namespace• Can be mapped onto any XML document• Vertical view of document in terms of ‘text segments’• Can be totally transparent

xml:tm

xml:tm namespacexml:tm

Example of the use of namespace in an XML document:

<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

xml:tm namespace

section section

para text

te sentence sentencetu tu

tm namespace view

original document

view te texttutext

para text

xml:tm namespace

original document view

tm namespace view

xml:tm namespace

Namespace is very simple. It is easy to use.

original document view

tm namespace view

<para>

</para>

<para>

</para>

<tm:te id=“e1”>

<tm:tu id=“u1.1”> Namespace is very simple. </tm:tu>

<tm:tu id=“u1.2”> It is easy to use. </tm:tu>

</tm:te>

xml:tm Text Memory

• Author memoryMaintain memory of source text

Authoring statistics

Authoring tool input

• Translation memoryAutomatic alignment

Maintain perfect link of source and target text

Reduce translation costs

xml:tm

Updated Source Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”new

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

xml:tm DOM differencing

origid=”5”modified

xml:tm Author Memory

• Namespace aware differencing

• Identify changes from the previous version• Unique text unit identifiers are maintained• Modification history• Text units can be loaded into a database• Authoring environment integration

xml:tm

xml:tm Translation Memory

• The tm namespace can be used to create XLIFF files

• Automatic alignment of source and target languages• Allows for more focused translation matching

– Perfect matching

– Leveraged matching from document - identical text

– Leveraged matching from database

– Modified text unit matching

– Linguistically enhanced fuzzy matching

– Non translatable text unit identification

xml:tm

xml:tm translation

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

XLIFF Document

trans-unit id=”1”

section section

para tekst

te zdanie zdanietu tu

translated tm namespace

translated document

view te teksttutekst

para tekst

xml:tm translated document

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

xml:tm perfect alignment

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

xml:tm perfect matching

xml:tm contextual memory

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

xml:tm leveraged DB memory

xml:tm in-document leveraged matching

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new:same id=”3”

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires proofing

leveraged match

xml:tm in-document fuzzy matching

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

New:same

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires proofing

fuzzy match

leveraged match

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

new:same

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

xml:tm db leveraged matching

requires proofing DB leveraged match

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

non trans

tu id=”8”new:same

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

requires proofing DB leveraged match

tu id=”2” requires no translation non translatable

xml:tm non translatable text

Traditional Translation Scenarioxml:tm

source text

Publishing Translation

source text extract

Extracted text

tm process

Prepared text

TranslateTranslated

target text

xml:tm

xml source

Publishing

Translator

extractExtracted

texttm

process

Prepared text

Translate

xml target text merge

perfect matching

leveraged matching

Automatic Process

web interfaceQA

Automatic Process

xml:tm Translation Scenario

xml:tm matching• Perfect Matching driven by Author Memory• Leveraged Matching:

100% same textIn document Leveraged MatchingDatabase Leveraged Matching

• Fuzzy MatchingModified MatchingLinguistically aware Fuzzy Matching

• Non translatable element identificationAlphanumericNumericMeasurements

xml:tm

xml:tm benefits

• Enterprise level scalability

• Totally integrated within the XML framework

• Source text is automatically extracted and matched• Word counts are controlled by the customer• Text can be presented for translation via the web• Online composition• The most up to date translation is held by the customer• Data is merged automatically at end of translation cycle• All memory operations are totally automated • Can be used transparently for relay translations• Much cheaper to implement and run• More accurate – better matching

xml:tm

xml:tm summary

• Can be used to build consistent authoring systems• Can be used to produce automatic authoring statistics• Translation Memory generation and alignment is totally

automatic

• Memory is held within the documents themselves• Extraction and merging for translation are automatic• The system provides much more efficient matching mechanisms• Structure of the XML document is protected during translation

xml:tm

• Fully specified XML based standard• http://www.xml-intl.com/docs/specification/

xml-tm.html• Maintained by xml-intl.com• http://www.xml-intl.com/dtd/tm.dtd• http://www.xml-intl.com/dtd/tm.xsd• Detailed article on www.xml.com• Offered for consideration as a Lisa standard

xml:tm

Any questions?

xml:tm

Coping with Babel How to Localize XML. Designing for Localization Document design can seriously...

Documents

Transcript of Coping with Babel How to Localize XML. Designing for Localization Document design can seriously...

Ensuring content security with STORM Localization Platform … · Amagi’s STORM platform enables broadcasters to localize their content and ad breaks for better monetization, and

Can Automated Program Repair Refine Fault Localization? · bugs within Top-1, while state-of-the-art spectrum and mutation based fault localization techniques at most localize 117

Robust visual localization in changing lighting conditionsstatic.tongtianta.site › paper_pdf › c70f1e34-cee5-11e... · [1]. Astrobee will localize anywhere on the station through

BINAURAL SPEECH SOURCE LOCALIZATION USING TEMPLATE ...€¦ · localize sounds with just two ears using two major cues i.e, interaural time difference (ITD) and interaural level difference

Localize content Devops

Re-Think Retail : Localize

Object Detectionfidler/slides/2015/CSC420/lecture17.pdfObject Detection The goal of object detection is to localize objects in an image and tell their class Localization: place a tight

Mapping and Localization with RFID Tags · accurately localize moving objects based on this technology. Further experiments demonstrate that RFID tags greatly reduce the time required

Caraoke:AnE-TollTransponderNetworkforSmartCities...to minimize the average wait time for the green light. It can also leverage RF-based localization to localize cars using their transponders’

Localize LeMag

Simultaneous Power-Based Localization of Transmitters for … · 2018. 12. 7. · of se−ings and •nd that we are able to localize multiple sources transmi−ing simultaneously

How do bacteria localize proteins to the cell pole? · 2013. 12. 23. · protein localization. How proteins localize at the cell poles: themes and variations Diffusion and capture

Simultaneous Localization, Calibration, and Tracking in an ...users.isr.ist.utl.pt/~jpg/proj/urbisnet/refs/tr06.pdf · mobile) is sufficient to localize the nodes; this is an attrac-

Learning to Localize Using a LiDAR Intensity Mapurtasun/publications/barsan_etal_corl18.pdf · localization system.For routing the self-driving vehicle from point A to point B, precision

The NeurologicExam And Lesion LocalizationThe NeurologicExam And Lesion Localization. What is the goal of the neuro exam? • Neuroanatomically localize the lesion ... • T—trauma,

Autonomous Mobile Robots, Chapter 5 © R. Siegwart, I. Nourbakhsh Localization and Map Building Noise and aliasing; odometric position estimation To localize.

La Cour de Babel School of Babel / Auf dem Schulhof von Babel · 2017. 2. 20. · La Cour de Babel School of Babel / Auf dem Schulhof von Babel Cultural and religious diversity in

GUIDE TO LOCALIZATION MANAGEMENT - …translationjournal.net/images/e-Books/PDF_Files/Guide to... · Localization-readiness Testing 38 ... to localize software, ... ROI Metrics In

Polarity Montages Localization - mc.vanderbilt.edu 1... · Learning Objectives to predict the appearance of potentials of negative or positive polarity in different montages to localize

Learning to Localize Sound Source in Visual Scenesopenaccess.thecvf.com/content_cvpr_2018/papers/... · The learning task for sound source localization from lis-tening is challenging,