Why the Baltics are a prime region for driving innovation in language technology, Rihards Kalnins,...

69
Welcome to the Baltics: A Prime Region for Driving Innovation in Language Technology Rihards Kalniņš tilde.com/mt

Transcript of Why the Baltics are a prime region for driving innovation in language technology, Rihards Kalnins,...

Welcome to the Baltics: A PrimeRegion for Driving Innovation inLanguage Technology

Rihards Kalniņštilde.com/mt

tilde.com/mt

‘Multilingualism...is an important element in Europe’s competitiveness. One of the objectives of the EU’s language policy is therefore that every European citizen should master two other languages in addition to their mother tongue.’

EU Language Policy

tilde.com/mt

‘Mother tongue + two’ objective Barcelona Summit, 2002

tilde.com/mt

LuxembourgNetherlandsEstonia (52%)Latvia (54%)Lithuania (52%)SloveniaMaltaDenmark

tilde.com/mt

Estonia (1.2M)Latvia (2M)Lithuania (2.9M)

Baltic languages =morphologically rich languageswith large vocabulary, lots of ambiguity, and complex word agreement

Translation quality scores

Tilde is a leading European language technology company, specializing in custom machine translation and cloud terminology tools

tilde.com/mt

• 140 employees at offices in 3 countries• Leading research + innovation team• Clients: LSPs, EU governments, major multinationals

tilde.com/mt

Tilde MT is a full-service platformfor custom MT – robust, mature, and highly scalable.

tilde.com/mt

Convert your raw data into fully customized MT engines for high-demand applications.

tilde.com/mt

Cleaning and alignment of data in multiple file formats, preparation for engine training

Data processing

tilde.com/mt

Gathering and extraction of parallel corpora from the web, using automatic extraction methods

Automatic corpora collection

tilde.com/mt

Training and tuning of MT systems with parallel data (TMs), monolingual data, terminology sets (glossaries)

MT training and tuning

tilde.com/mt

Custom development of linguistic components for boosting translation quality, particularly for complex languages

Linguistic components

tilde.com/mt

Correct processing of formatting tags, such as HTML code and various placeholders. Accurately deals with tags in CAT tools.

Tag processing

tilde.com/mt

Automatic conversions

tilde.com/mt

Automatically localize units (punctuation, number formatting, etc.) from source- to target language style.

Automatic conversions

tilde.com/mt

• Distance (60 mph ➝96,56 km/h)• Currency (USD 1,200.43 ➝ EUR 1 064,31)• Time (4 PM ➝ 16:00)• Spacing (60% ➝ 60 %)• Temperature (86°F ➝30°C)

Evaluate MT quality for each sentence individually, providing deeper insight into the performance of MT engines.

Interactive quality scoring

tilde.com/mt

Set a unique QE threshold for filteringMT results, saving youtime in the post-editingprocess

Quality Estimation

tilde.com/mt

Dynamically tailor MT engines with terms in the right inflectional forms –particularly important for morphologically rich languages

Runtime terminology integration

tilde.com/mt

Dynamic Learning

tilde.com/mt

User post-edits are instantly returned to the engine, helping translators to improve their MT engines through use

• Custom interfaces• Corporate intranet• Customer services• Internal messaging• Mobile and desktop apps

Full integration via API

tilde.com/mt

Cloud or on-premise hosting

tilde.com/mt

MT for Latvian e-Govtilde.com/mt

36% of the Latvian population speaksRussian at home.Challenge: Enable residents to use e-services in their native language.tilde.com/mt

The Latvian language hasless than 2 million native speakers.Challenge: Empower access to information for all residents andvisitors.

tilde.com/mt

Solution: Comprehensive, large-scale machine translation service for the public sector, fully integrated into government platforms

tilde.com/mt

• Novel methods of corpora collection• Balanced various types of texts and language models to get the optimal results• Over 100 million sentences of training data (in multiple languages)• Addition of public sector specific terminology

Building a MT service for the public sector

tilde.com/mt

HUGO.LVLATVIATRANSLATES WITH

• MT service for the Latvian public sector• Securely translates texts, documents, websites• Adapted for the Latvian language and public sectortexts• Integrated into e-services and government websites• Languages

• Latvian-English• English-Latvian• Latvian-Russian

Developed bytilde.com/mt

Hugo.lv enablesmultilingual communication in thepublic sector and empowers accessto information and e-services.

tilde.com/mt

• 1 680 Presidency events• 25 300 participants• 800+ journalists from 40 countries• 197 EU policy meetings

2015 Presidency of the Council of the EU

tilde.com/mt

• Analyzed specific EU Presidency terminology, added these terms to the system• “Taught” MT to use the right terms in the right inflectional forms – a challenge for SMT and morphologically rich languages

Adapting Hugo.lv for EU Presidency texts

tilde.com/mt

EU Presidency Translator• Desktop application providing translation of texts, documents, websites• Mobile app for journalists, delegates, and Presidency visitors• Translation kiosk at the official headquarters of the EU Presidency

tilde.com/mt

hugo.lv/translate2015

Mobile MT kiosk for the EU Presidency headquarters at the Latvian National Library –translation as a utility

tilde.com/mt

“There will be no digital single market without multilingualism, as few people feel comfortable operating in their second or third language. ”

Robert MadelinChief Innovation Officer, European Commission

Four elements are necessary for the take-up of an inclusive Digital Single Market:1. Focus on digital skills2. Special attention to gender and age balance 3. Local involvement and local outreach actions4. Increased considerations for language diversity

Robert MadelinChief Innovation Officer, European Commission

eamt2016.tilde.com

Thank [email protected]/mt