The XML Localisation Interchange File Format
description
Transcript of The XML Localisation Interchange File Format
What’s New in XLIFF1.2?
Tony JewtushenkoDirector Research & Development
Product Innovator Ltd. Co-Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Agenda
Overview of XLIFF Definition, goals, benefits, architecture and basic XLIFF concepts
What’s new in XLIFF 1.2New and changed features of XLIFF 1.2 normative specification
Non-Normative Representation GuidesA brief introduction of the representation guides provided with XLIFF 1.2
XLIFF Overview
A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format.
What is XLIFF?
A specification
for the lossless interchange of localizable data and its related information,
which is tool-neutral,
has been formalized as an XML vocabulary,
and features an extensibility mechanism.
Why XLIFF was created…
Localisation Is Difficult Insufficient interoperability between tools Lack of support for overall localisation
workflow Necessity of localisation tools developers to
deal with many formats Large number of proprietary intermediate
formats
XLIFF Timeline
2001
9/00 9/06
2002 2003 2004 2005 2006
Sep 2000DataDefinition
Kickoff
Mar 2001Draft 1.0 Spec
and DTDpublished
Jun 2001WhitepaperPublished
Dec 2001OASIS XLIFFTC Proposal
Submitted
Apr 2002XLIFF 1.0
Committee SpecApproved
May 2003XLIFF 1.1
Committee SpecApproved
Aug 03 - Sep 03XLIFF 1.1 Public
Peer Review
Nov 03Revised XLIFF 1.1Committee Spec
Approved
Dec 03 - May 06XLIFF 1.2 Segmentation
Representation Guides for (X)HTML,Java, PO/POT
May 2006XLIFF 1.2 Committee Spec
Representation GuidesApproved
14 Jul, 2006 - 12 Sep, 2006XLIFF 1.2
Public Peer Review
Contributors to XLIFF - Past and Present Alchemy Software Bowne Global Solutions Convey Software Ektron, Inc ENLASO Corp (RWS) Globalsight Heartsome HP Idiom Technologies, Inc Lionbridge LRC Lotus/IBM
Microsoft Moravia IT Novell Oracle Red Hat PASS Engineering SAP SDL International Sun Microsystems Tektronix TRADOS XML Intl
OASIS XLIFF TC Members as of 1 Sept 06
TC Officers: Chairs: Tony Jewtushenko, Product Innovator Ltd; Bryan Schnabel, Tektronix Secretary: Peter Reynolds, Idiom Technologies, Inc.
Current Members of TC: • Mat Lovatt, Oracle• Doug Domeny, Ektron• Rodolfo Raya, Heartsome• Eiju Akahane, IBM• Steven Harris, Idiom Technologies, Inc.• Fredrik Corneliusson, Lionbridge• Joachim Schurig, Lionbridge• Milan Karasek, Moravia IT• Florian Sachse, Pass Engineering• Christian Lieske, SAP• Magnus Martikainen, SDL International• David Pooley, SDL International• Kevin Bargary, University of Limerick Localisation Research Centre• Reinhard Schaler, University of Limerick Localisation Research Centre• Andrzej Zydron, XML- Intl
OASIS: Standards Body Home of XLIFF
OASIS: Organization for the Advancement of Structured Information Standards
World’s largest independent, non-profit organization dedicated to the standardisation of XML applications and Web Services
More than 150 member companies plus individuals Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas clearinghouse of XML application schemas
Technical work on XML interoperability includes XML conformance and XML Registries/Repositories
General XML technical resource
XLIFF Benefits:
Cost,Time
Automation
OpenStandards
Interoperability
Flexiblility
ScalabilityReduce cost,
turnaround time
Reduce cost, turnaround time
Reduces Effort in Deploying Integrated
Best of Breed Solutions
Reduces Effort in Deploying Integrated
Best of Breed Solutions
Reduces Vendor Lock-In, Re-Use
Reduces Vendor Lock-In, Re-Use
Reduces Defects introduced by
Manual Processing and Handling
Reduces Defects introduced by
Manual Processing and Handling
Leverages services, technologies,
vendors
Leverages services, technologies,
vendors
Easy to scale and future proof
Easy to scale and future proof
High Level XLIFF Architecture
An XLIFF document is a container for all data needed for a localisation project:
1. Localizable objects (e.g. text strings, graphics) in source and target languages.
2. Supplementary information (e.g. glossaries, or material to recreate the original format).
3. Administrative information (e.g. workflow data).
4. Custom data (e.g. initialization information for tools).
The XLIFF Document
An XLIFF document is designed to store the extracted data related to localisation.
Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF.
Each XLIFF document can include several <file> elements.
An entire localisation project could stored in a single XLIFF document.
Bilingual Model
Each <file> element is designed to store one source language and one target language
The rationale is that the translation of different target language is done by different people most of the time
However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese.
Localisable Objects
Besides localisable text, XLIFF can also contain other localisable object types such as binary graphics
Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text)
Relationship between object can be captured (e.g. a hierarchical menu or text related to a web graphic)
Supplementary Info
XLIFF provides “hooks” for storing supplementary information in reference element Glossaries Translation memories Segmentation Rules (via SRX file)
The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document
Administrative Info
XLIFF provides mechanisms for capturing administrative information:
For relating source material to XLIFF documents.
For storing workflow data. For providing pre-translation entries. For keeping track of changes.
Administrative Info – Pre-Translation
A set of proposed translations can be included for each <trans-unit> element, using the <alt-trans> element.
<trans-unit id='1'> <source xml:lang='en'>The text</source> <alt-trans quality-match='high' origin='MTsystem'> <target xml:lang='fr'>Le texte</target> </alt-trans></trans-unit>
Customising XLIFF
Customise XLIFF by extending (adding) user defined:
Elements Attributes Attribute Values
Extending Elements
Extension points in the following elements: <alt-trans>, <bin-unit>,<group>, <header>,<tool>,
<trans-unit>, and new in 1.2: <xliff> and <seg-source>. content of each custom element can be any valid
XML content: empty content, PCDATA, mixed content, and so forth
Custom elements defined in private namespace schema
Example of Extending Elements<xliff version='1.2'xmlns='urn:oasis:names:tc:xliff:document:1.2'xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'> <file original='passus-1.doc' source-language='enm‘
datatype='plaintext'> <group> <sup:SourceInfo> <sup:Book>Piers Plowman, Passus 1</sup:Book> <sup:Author>William Langland</sup:Author> </sup:SourceInfo> <sup:WorkInfo Task='transcription' Context='Middle-English:1360'/> <trans-unit id='1'> <source xml:lang='enm'>What this mountaigne bymeneth</source> <target xml:lang='en'>What this mountain means</target> <sup:Reference Type='strophe'>1-a</sup:Reference> </trans-unit> </group> </file></xliff>
Non-XLIFF elements in BOLD
Non-XLIFF elements Defined in XSD:
<xsd:schema targetNamespace="XLFSup-v1"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:sup="http://www.ChaucerState.ac.pg/Frm/XLFSup-v1"elementFormDefault="qualified" attributeFormDefault="unqualified"><xsd:element name="SourceInfo"><xsd:complexType><xsd:sequence maxOccurs="unbounded"><xsd:element name="Book" type="xsd:string"/><xsd:element name="Author" type="xsd:string"/></xsd:sequence></xsd:complexType></xsd:element><xsd:element name="WorkInfo"><xsd:complexType><xsd:attribute name="Task" type="xsd:string"/><xsd:attribute name="Context" type="xsd:string"/></xsd:complexType></xsd:element><xsd:element name="Reference"><xsd:complexType><xsd:simpleContent><xsd:extension base="xsd:string">Struct_InLine<xsd:attribute name="Type" type="xsd:string"/></xsd:extension></xsd:simpleContent></xsd:complexType></xsd:element></xsd:schema>
Extending Attributes Attributes of a namespace different than XLIFF can
be included in these XLIFF elements: <alt-trans>, <bin-source>, <bintarget>,<bin-unit>, <bpt>,
<bx/>, <ept>, <ex/>, <file>, <g>, <group>, <it>, <mrk>,<ph>, <source>, <target>, <tool>, <trans-unit>, <x/>, and new in 1.2 :<xliff>, <seg-source>.
No specific location where to insert the non-XLIFF attributes
No limit to the number of non-XLIFF attributes that can be used in an XLIFF document
Extending AttributesAttributes from HTML extend <group> and <trans-unit>
<xliff version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2' xmlns:htm='http://www.w3.org/1999/xhtml'><file original='table.htm' source-language='en' datatype='html'>
<group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'>
<group restype='row'><trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 1</source> </trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'><source>Text of row 1 column 2</source>
</trans-unit></group>
<group restype='row'><trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 1</source></trans-unit><trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 2</source></trans-unit>
</group></group>
</file></xliff>
Extending Attribute Values
Attributes where the list of values can be extended are the following: context-type, count-type, ctype, datatype, mtype, priority, purpose, restype, size-unit, state, state-qualifier, unit; new in 1.2: alttranstype, reformat
User-defined values must start with a “x-” prefix There is no specified mechanism to validate
individual user-defined values, beyond starting with “x-”
Example of Extending Attribute Values
The following excerpt shows how the user-defined value “x-for-engineer” can be utilized in a document:
...<group>
<context-group name='EngineersData'><context context-type='x-for-
engineers'>Data...</context></context-group>
</group> ...
Embedding XLIFF
Can embed an entire or part of an XLIFF doc in other XML doc
Valid where XML defined by XML Schema (XSD) includes an <any> element in the definition of the element where the XLIFF data can be inserted
What’s new in XLIFF 1.2
New and changed features of XLIFF 1.2 normative specification
New, Deprecated or Changed 1.1 to 1.2 Validation via Transitional and Strict models Segmentation Support added Add mid as an optional attribute for the <alt-trans> element Changed name attribute for <context-group> from required to
optional, and modified description Added extension point at <xliff> Tracking/Accepting Suggested Translations added:
Add a alttranstype attribute for the alt-trans element. Deprecate the use of multiple target elements in a single alt-trans. Deprecate the restype attribute for the target element. Introduce the phase-name attribute for alt-trans element. Introduce a convention: more recent alt-trans elements should
appear before older ones.
Validation in 1.2
Validation via two “Flavours” of XSD (Schema): Transitional: Deprecated (obsolete) elements
and attributes are permitted. Use to validate reading older version documents (XLIFF 1.1). xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-transitional.xsd‘
Strict: Deprecated items are not permitted. Use to validate when creating XLIFF 1.2 documents.xsi:schemaLocation='urn:oasis:names:tc:xliff:document
:1.2 xliffcore-1.2-strict.xsd'
XLIFF 1.2 Segmentation: seg-source
How corresponding segments are referenced between <seg-source> and <target>
<trans-unit id= "1"><source>First sentence.Second sentence.</source><seg-source><mrk mtype="seg" mid="1">First sentence.</mrk><mrk mtype="seg" mid="2">Second sentence.</mrk></seg-source><target><mrk mtype="seg" mid="1">Translated first sentence.</mrk><mrk mtype="seg" mid="2">Translated second sentence.</mrk></target></trans-unit>
XLIFF 1.2 Segmentation: seg-source
Alt-trans may also be segmented:<trans-unit id="3">
<source>First sentence. Second sentence.</source><alt-trans match-quality="100%"><source>The second sentence.</source>
<seg-source>
<mrk mtype="seg" mid="1">First sentence.</mrk>
<mrk mtype="seg" mid="2">Second sentence.</mrk>
</seg-source>
<target>
<mrk mtype="seg" mid="1">Translated first sentence.</mrk>
<mrk mtype="seg" mid="2">Translated second sentence.</mrk>
</target>
</alt-trans>
</trans-unit>
XLIFF 1.2 Segmentation: merged-trans
Aggregating translations across multiple trans-units:<group merged-trans="yes"> <trans-unit id="t1"> <source>The German acronym v.</source> <target equiv-trans="no">Niemiecki skrót v. OT oznacza górną pozycję silnika.</target> </trans-unit> <trans-unit id="t2"> <source>OT signifies the top dead center position for an engine.</source> <target equiv-trans="no"/> </trans-unit></group>
XLIFF 1.2 Segmentation: equiv-trans
To denote when translation is not direct equivalent to source: <trans-unit id="t1">
<source>Constrained text for limited</source>
<target equiv-trans="no">Tekst angielski dla</target>
</trans-unit>
<trans-unit id="t2">
<source>display for English</source>
<target equiv-trans="no">ograniczonego pola</target>
</trans-unit>
XLIFF 1.2 Add a type attribute for the <alt-trans> element
The type attribute is to be optional, and is to have the following values and meanings:
Value Meaning
proposal (default) The <alt-trans> represents a translation proposal from a translation memory or other resource.
previous-version The <alt-trans> represents a previous version of the <target> element
rejected The <alt-trans> represents a rejected version of the <target> element.
reference The <alt-trans> represents a translation to be used for reference purposes only, for example from a related product or a different language
accepted The <alt-trans> represents a proposed translation that was used for the translation of the trans-unit, possibly modified.
XLIFF 1.2 Additional revision to alt-trans Introduce the phase-name attribute for <alt-trans>
makes it possible to find out who made the change, when, and which process the change was introduced in
Deprecate the restype attribute for the <target> element no longer needed, as the <target> is always of the same restype
as the <trans-unit> or <alt-trans> it appears in Introduce the phase-name attribute for <alt-trans>
makes it possible to find out who made the change, when, and which process the change was introduced in
convention: more recent <alt-trans> elements should appear before older ones determine the order of changes if multiple previous versions
have been introduced
Non-Normative Representation Guides
A brief walk-through of the Representation Guides provided with XLIFF 1.2
Purpose of the Guides
Synonymous with “profile” specifications Non-normative
Not requirement for “legal” XLIFF 1.2 Guidance for consistently representing native
formats as XLIFF across implementations Kickstart new implementations Better interoperability between tools
Guide Contents
Recommended Extraction Techniques and Considerations
Recommended mappings from native structures to XLIFF
Strategies for implementing Translation Memory support (using inline tags)
Detailed examples and supplementary sample files
Extract-Localize-Merge Minimalist Approach
Process:1. Identify localisable content (resources) and non-localisable content (code)2. Populate XLIFF document’s trans-unit and bin-unit with localisable content 3. Create “Skeleton File” with localisable content stripped out and replaced with tokens that map to
XLIFF trans-unit or bin-unit ID’s4. Translate XLIFF document5. Merge translated data in XLIFF with Skeleton to generate the localised translated material
Skeleton file is optional and not recommended in certain circumstances (e.g., HTML or if tool interoperability required)
In <SKL> embed the entire Skeleton file within the XLIFF file or specify the file’s location XLIFF doesn’t define the Skeleton file or token format
Convert/Transform Paradigm (maximalist approach)
Process:1. Convert original material by mapping entire original document to XLIFF (using
representation guides)2. Structural information (code) stored in XLIFF container as non-translatable trans-
units / bin-units3. Translate XLIFF content4. Generate the native translated material directly from the XLIFF content
Best suited for textual resource formats (RCDATA, Java, PO/POT) and mark-up languages like (X)HTML and XML
Difficult and impractical for binary resource formats (e.g., EXE’s and DLL’s)
OriginalMaterial
Filter
XLIFF
TranslatedMaterial
Minimalist Example –Source Content & Skeleton
A very simple HTML file: <html>
<head><h1 class='title'>Almost the Smallest HTML File</title>
</head> <body>
<p>Just some stuff here to fill up space</p> </body>
</html>
<html><head>
<title>%%%1%%%</title></head> <body>
<p>%%%2%%%</p> </body>
</html>
Original Content
…<header> <skl> <external-file href='sample.skl'/> </skl></header><body>
<trans-unit id='%%%1%%%'> <source xml:lang='en'>Almost the Smallest HTML File</source>
</trans-unit> <trans-unit id='%%%2%%% “restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up
space</source> </trans-unit></body>
…
XLIFF
Skeleton
Filter
Full Transformation:
<html><head>
<h1 class='title'>Almost the Smallest HTML File</title></head> <body>
<p>Just some stuff here to fill up space</p> </body>
</html>
…<body> <group restype='x-html-html'>
<group restype='x-html-head'> <trans-unit id='1' restype='x-html-p-title' html:class='title'> <source xml:lang='en'>Almost the Smallest HTML File</source></trans-unit>
</group> <group restype='x-html-body'>
<trans-unit id='2' restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up space</source> </trans-unit>
</group> </group></body>
…
Maximalist Example – Transform content to XLIFF
Original Content
XLIFF
Guides provided with XLIFF 1.2
(X)HTML Many flavours of HTML, guide focuses on HTML
4.01, XHTML 1.0 Java Resource Bundles
Support for java.util.ResourceBundle abstract class’ two subclasses: PropertyResourceBundle and ListResourceBundle
Gettext PO/POT files Linux resource format
To Get the Most from the Guides Review the document in full before commencing design or development of an
XLIFF solution Considerations for recommended source document structure and content Identify exceptions (e.g., dynamically generated HTML via server-side processing)
Consider the Guide’s recommended Extraction approach when designing overall architecture: HTML recommends “maximalist”, but provides examples for “minimalist” as well. Both PO/POT and Java make no specific recommendation, but examples are
“maximalist” Order of Extraction recommendations: typically in the order of the data in the source
document Refer to Mappings Reference in each guide when designing and building filters
Recommendations are comprehensive with many examples Non-standard structures and conventions are dealt with (especially for (X)HTML)
Use the Sample files Valuable reference for learning Provides validation during development effort Verify compliance by feeding sample files into filter – either native source or XLIFF
More Representation Guides
Late draft of Windows 32 / .NET Not approved, but is posted on the XLIFF website Requires more expert input
More to follow upon request
More Information
The XLIFF TC Web Site: http://www.xliff.org Presenter:
XLIFF TC Co-Chair: Tony Jewtushenko (Product Innovator Ltd)([email protected])
Product Innovator Ltd
provides product management and software process improvement training and mentoring services to technology companies seeking to maximize their productivity and revenue potential
Contact: [email protected]+353 1 8875183 / +353.87.2479057