Coping with Babel How to Localize XML. Designing for Localization Document design can seriously...

43
Coping with Babel How to Localize XML

Transcript of Coping with Babel How to Localize XML. Designing for Localization Document design can seriously...

Page 1: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Coping with Babel

How to Localize XML

Page 2: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Designing for Localization

• Document design can seriously impact the costs of translation and localization.

• Remember that you are designing for all languages, not just English.

• There are clear do’s and don’ts.

• Overriding principle is good XML practice.

• Always consider the target language implications.

Page 3: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Entity references

Do not use entity references for word substitution:

<para>Use a &tool; to release the catch.</para>

• Cause problems for inflected languages

• Cause problems for parsing/translation tools

• Use boiler plate text instead

Page 4: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Translatable attributes

Avoid using translatable attributes:<para>Use a <tool id="a1098" name="claw hammer"> to release the CPU retention catch.</para>

• Cause problems for inflected languages

• Cause extra burden for translators

• More to go wrong

Page 5: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

CDATA sections

Avoid using CDATA sections that may contain translatable text:

<tmpl><![CDATA[<p>Please refer to the <em>index page</em> page for further information</p>]]></tmpl>

• Lose syntactical control

• How are translation tools to cope?

Page 6: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Processing instructions

Avoid Processing Instructions in translatable text:

<para>Use a <?tool name="claw hammer"?> to release the CPU retention catch.</para>

• Syntactically week

• Confuse translation memory operations

Page 7: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Infinite Naming Schemes

Avoid the use of infinite naming schemes:<resources xml:lang="en">

<err001>Cannot open file $1.</err001>

<hint001>Hint: does file $1 exist.</hint001>

<err002>Incorrect value.</err002>

<hint002>Hint: Must be between $1 and 2.</hint002>

<err003>Connection timeout.</err999>

</resources>

• No clear element definitions

Page 8: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Typographical elements

Avoid the use of "typographical" elements:<para><b>Do not use</b> <br/> type elements.</para>

• Bad XML practice.

• Causes problems for translators.

• Target language text may be in the opposite order.

Page 9: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Do not break sentences

Never break a linguistically complete text unit over more than one non-inline element:

<para>

<line>This text should not be</line>

<line>broken this way – the translated text may well be in a different order.</line>

</para>

Page 10: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

XML Translation Standards

• LISA - Localization Industry Standards Association: http://www.lisa.org

• OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org

• W3C - World Wide Web Consortium: http://www.w3c.org

• OLIF Consortium: http://www.olif.net

Page 11: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

LISA Standards

• TMX - Translation Memory Exchange format: http://www.lisa.org/tmx

• TBX - Termbase Exchange format: http://www.lisa.org/tbx

• SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx

• GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx

Page 12: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

OASIS L10n Standards

• XLIFF - XML Localization Interchange File Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

• TransWS - Translation Web Services: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws

Page 13: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

W3C and OLIF

• W3C to start on Localization Directives standard.

• OLIF - Open Lexicon Interchange Format: http://www.olif.net

Page 14: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm

XML Text Memory

A radical new approach to translating XML documents

Page 15: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

• Machine Translation

• Translation Memory

• Hybrid Linguistic Inferencing Engines

• Terminology

Computational Linguistic Methodologies

Page 16: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Translation memory

• Advent in early 1980’s

• Intermediate format

• Alignment

• Storage

• Leveraged memory

• Fuzzy matching – statistical

• Advantages: cost reduction, consistency

• Drawbacks: proofreading, managing memories

• No significant advances in technology

Page 17: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

XML namespace

• Major new feature of XML compared to SGML• Allows the mapping of different ontological

entities onto the same representation

• Allows different ways to look at the same data• Namespaces can be made transparent

Page 18: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm namespace

• Text Memory namespace• Can be mapped onto any XML document• Vertical view of document in terms of ‘text segments’• Can be totally transparent

xml:tm

Page 19: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm namespacexml:tm

Example of the use of namespace in an XML document:

<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

Page 20: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm namespace

doc

title

section section

para text

tm

te sentence sentencetu tu

te sentence sentencetu tu

te sentence sentencetu tu

tm namespace view

original document

view te texttutext

te sentence sentencetu tu

para text

para text

para text

para text

para text

te sentence sentencetu tu

te sentence sentencetu tu

Page 21: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm namespace

text

te sentence sentencetu tu

original document view

tm namespace view

Page 22: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm namespace

Namespace is very simple. It is easy to use.

te sentence sentencetu tu

original document view

tm namespace view

<para>

</para>

<para>

</para>

<tm:te id=“e1”>

<tm:tu id=“u1.1”> Namespace is very simple. </tm:tu>

<tm:tu id=“u1.2”> It is easy to use. </tm:tu>

</tm:te>

text

Page 23: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm Text Memory

• Author memoryMaintain memory of source text

Authoring statistics

Authoring tool input

• Translation memoryAutomatic alignment

Maintain perfect link of source and target text

Reduce translation costs

xml:tm

Page 24: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Updated Source Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”new

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

xml:tm DOM differencing

origid=”5”modified

Page 25: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm Author Memory

• Namespace aware differencing

• Identify changes from the previous version• Unique text unit identifiers are maintained• Modification history• Text units can be loaded into a database• Authoring environment integration

xml:tm

Page 26: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm Translation Memory

• The tm namespace can be used to create XLIFF files

• Automatic alignment of source and target languages• Allows for more focused translation matching

– Perfect matching

– Leveraged matching from document - identical text

– Leveraged matching from database

– Modified text unit matching

– Linguistically enhanced fuzzy matching

– Non translatable text unit identification

xml:tm

Page 27: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm translation

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

XLIFF Document

trans-unit id=”1”

trans-unit id=”2”

trans-unit id=”3”

trans-unit id=”4”

trans-unit id=”5”

trans-unit id=”6”

Page 28: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

doc

title

section section

para tekst

tm

te zdanie zdanietu tu

te zdanie zdanietu tu

te zdanie zdanietu tu

translated tm namespace

view

translated document

view te teksttutekst

te zdanie zdanietu tu

para tekst

para tekst

para tekst

para tekst

para tekst

te zdanie zdanietu tu

te zdanie zdanietu tu

xml:tm translated document

Page 29: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

xml:tm perfect alignment

Page 30: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires translation

xml:tm perfect matching

Page 31: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm contextual memory

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

Page 32: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

DB

xml:tm leveraged DB memory

Page 33: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm in-document leveraged matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new:same id=”3”

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

leveraged match

Page 34: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm in-document fuzzy matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

New:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

fuzzy match

leveraged match

Page 35: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

new:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

xml:tm db leveraged matching

DB

requires proofing DB leveraged match

Page 36: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

non trans

tu id=”8”new:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

DB

requires proofing DB leveraged match

tu id=”2” requires no translation non translatable

xml:tm non translatable text

Page 37: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Traditional Translation Scenarioxml:tm

source text

Publishing Translation

source text extract

Extracted text

tm process

Prepared text

TranslateTranslated

text

target text

target text

merge

target text

QA

Page 38: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm

xml source

text

Publishing

Translator

extractExtracted

texttm

process

Prepared text

Translate

xml target text merge

Web

perfect matching

leveraged matching

Automatic Process

web interfaceQA

Automatic Process

xml:tm Translation Scenario

Page 39: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm matching• Perfect Matching driven by Author Memory• Leveraged Matching:

100% same textIn document Leveraged MatchingDatabase Leveraged Matching

• Fuzzy MatchingModified MatchingLinguistically aware Fuzzy Matching

• Non translatable element identificationAlphanumericNumericMeasurements

xml:tm

Page 40: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm benefits

• Enterprise level scalability

• Totally integrated within the XML framework

• Source text is automatically extracted and matched• Word counts are controlled by the customer• Text can be presented for translation via the web• Online composition• The most up to date translation is held by the customer• Data is merged automatically at end of translation cycle• All memory operations are totally automated • Can be used transparently for relay translations• Much cheaper to implement and run• More accurate – better matching

xml:tm

Page 41: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm summary

• Can be used to build consistent authoring systems• Can be used to produce automatic authoring statistics• Translation Memory generation and alignment is totally

automatic

• Memory is held within the documents themselves• Extraction and merging for translation are automatic• The system provides much more efficient matching mechanisms• Structure of the XML document is protected during translation

xml:tm

Page 42: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

xml:tm

• Fully specified XML based standard• http://www.xml-intl.com/docs/specification/

xml-tm.html• Maintained by xml-intl.com• http://www.xml-intl.com/dtd/tm.dtd• http://www.xml-intl.com/dtd/tm.xsd• Detailed article on www.xml.com• Offered for consideration as a Lisa standard

xml:tm

Page 43: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Any questions?

xml:tm