Machine Translation at the EPO Removing language barriers...

21
19/09/2013 Machine Translation at the EPO Removing language barriers from patent documentation Paul Schwander European Patent Office The 5th Workshop on Patent Translation, MT Summit 2013

Transcript of Machine Translation at the EPO Removing language barriers...

Page 1: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

19/09/2013

Machine Translation at the EPO

Removing language barriers from patent documentation

Paul Schwander

European Patent Office

The 5th Workshop on Patent Translation, MT Summit 2013

Page 2: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Roadmap

The context: why is MT strategic?

Machine Translation @ the EPO:

state of play and future plans

Page 3: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Why Machine Translation?

- Reducing the language

barrier in the European

context: Unitary Patent

system supported by MT.

- Access to global patent

information for prior art

searches.

Page 4: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Global patent filings rising continuously, especially Chinese applications

IP5= Europe, USA, Japan, China and Korea

Page 5: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Accessing Asian languages patent for EPO

examiners

Page 6: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO
Page 7: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Addressing the Chinese language wall

MT Full-text acquired, ca. 5 milion documents.

An on demand manual translation service offered to examiner

5 million patents: manual translation -> 1 day a patent -> 22 years of work for a team of 1000 translators.

Search in

MTed text

Relevant patents

Order a manual

translation

Detect

Understand

Page 8: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO
Page 9: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate launched on 29 February 2012

System integrated in Espacenet,

the EPO Publication Server and EPOQUE

Page 10: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate: how does it work?

Result of a collaboration between the EPO and Google

Patent data represent a huge source of corpora.

Patent documents and their translation/corresponding documents are prepared and stored in a corpora repository.

Translation system is trained using this corpora.

Translation quality assessed before launch: test fit for purpose level.

Page 11: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Translation

Memory

Translation

(Google)

Patent Translate: Architecture

Patent Translate Service

EPO

API (REST)

Corpora

Repository

Espacenet Bulk GPI Examiner tools Publication Server

EPO

«Functionblock»

Business Services::ApplicationBE

EPO«Functionblock»

SysManag.::Monitoring

«Functionblock»

SysManag.::Installation

«file»

ConfigurationFile

«Component»

Superinstaller

uses/provides

«file»

Monitoring::Checks

«document»

InstallationGuideline

Installation-GUI

Actors::ITAdministartor

«Component»

LiveUpdate

Update Plugin

«Component»

EPO::EPO update server

«file»

ExternalSoftware

«file»

OLF Software build

«file»

Installation Packages

provides

provides

Update-GUI

Actors::PatentApplicant

soap via http

«file»

OLF & NO Plugins

CLI

«Component»

ApplicationBE::FileManagerBE

provides

install

National

Patent

Offices

Page 12: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Quality level ranking

Page 13: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Current achievements

Patent Translate now covers translations between English and

21 other languages: Bulgarian, Chinese, Czech, Danish, Dutch,

Finnish, French, German, Greek, Hungarian, Icelandic, Italian,

Japanese, Norwegian, Polish, Portuguese, Romanian, Slovak,

Slovenian, Spanish and Swedish

305 million different machine translations of complete patent

documents can be accessed 'on the fly', using the current

language pairs offered = 1500 years of work for 1000 translators

if done manually

Page 14: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate usage

Page 15: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Plans

Project to be completed end of 2014:

32 languages European and Asian.

2013-2014

Turkish, Estonian, Croatian, Latvian,

Lithuanian, Albanian, Macedonian, Serbian,

Russian and Korean

Page 16: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate : illustrative example (Description)

Page 17: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

MT Example

Page 18: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

MT Example

Page 19: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Topics around Patent MT

CLIR or not?

OCR combined with MT for

non-digitised collections

Fit for purpose quality

assessment

Perception of the quality

Page 20: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Conclusion

MT is more than ever a must in the context of

the global patent documentation:

– The size of patent collections to search is

increasing and systematic manual translation

is not an option.

– MT has proven to be fit for purpose

– Quality will continue improving

Page 21: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Thank You

www.epo.org

[email protected]