Crestec - TAUS Tokyo Forum 2015

25
Research on developing a system to translate Japanese laws and regulations CRESTEC Inc. Yasuhiro SEKINE ( [email protected]) April 9, 2015 @ TAUS Executive Forum 2015

Transcript of Crestec - TAUS Tokyo Forum 2015

Research on developing a system to translate

Japanese laws and regulations

CRESTEC Inc.

Yasuhiro SEKINE([email protected])

April 9, 2015 @ TAUS Executive Forum 2015

Outline

• Background

• Overview of the system

• Challenges for the future

BackgroundJapanese Law Translation (http://www.japaneselawtranslation.go.jp/?re=02)

o Launched in April, 2009 by Ministry of Justice, Japano 489 translated laws (As of March 31, 2015)o More than 100,000 accesses everyday

BackgroundLaw Data Providing System (http://law.e-gov.go.jp/cgi-bin/idxsearch.cgi)

o Launched in April, 2001 by Ministry of Internal Affairs and Communications

o Provides about 8000 texts of Japanese laws and regulations (Japanese text only)

BackgroundProblems

o Only 489 translations of laws are available

More than 8000 laws and regulations now effective in Japan

o Most of the translations do not include the latest amendment

About 100 laws are amended every year in Japan

Translating every law and keeping them updated is costly in terms of money and human resources.

To solve this problem is one of the motivations to develop a system to provide translation of every law at its latest version using technologies.

BackgroundJapanese Law Machine Translation (http://itrd.crestec.co.jp/jlmt/default_en.aspx)

o Test version was released in July, 2014.o 8206 translated laws (As of April 9, 2015)

Outline

• Background

• Overview of the system

• Challenges for the future

Overview of the system• Purpose of the system

o Provide translation of every law in Japan→ Collect every Japanese law from the Law Data Providing System and translate them all automatically

o Provide translations of the latest amendment version and keep them updated

→ Collect the latest amendment version of laws and retranslate them constantly

• Functionso Search by keywords

o Search by category

Resources• Source texts

o HTML files downloaded from the Law Data Providing System

• Bilingual corpuso Made from translated laws downloaded from the Japanese Law

Translationo Used for translation memory and machine translation

• Dictionarieso Governmental organizationso Positions of governmental authoritieso Titles of lawso Place names …etc.o Compiled by hands

Translation methodsTranslation methods

(1) Automatic conversionExpression which has definitive translations

Expression with numbers …etc.

(2) 100% match from translation memory100% match from translation memory

(3) Automatic post-edit of fuzzy matchSpecific type of fuzzy match from translation memory is post-edited automatically

(4) Statistical machine translationMicrosoft Translator Hub

Translation order (1) →(2) →(3) →(4)

Translation method (1)Translation method (1) Automatic conversion

• The expressions which have definitive translations are translated usingdictionarieso Law title (民法 → Civil Code)

o Position (内閣広報官 → Cabinet Public Relations Secretary)

o Organization (林野庁 → Forestry Agency)

o Place (北海道 → Hokkaido) …etc.

• The expressions with numbers are translated by conversion programo Law number (昭和五十二年政令第二十号 → Cabinet Order No. 20 of 1977)

o Reference number (第五条第三項第二号 → Article 5, paragraph (3), item(ii))

o Date (昭和五十二年六月八日 → June 8, 1977)

o Price (千五百二十円 → 1,520 yen)

o Age (十八歳 → 18 years of age)

o Weight (二十ミリグラム → 20 milligram)

o Length (三十キロメートル → 30 kilometers) ...etc.

Translation method (2)Translation method (2) 100% match from translation memory

o A 100% match in the translation memory is used

o Translation memory is made from Japanese and English XML files downloaded from the Japanese Law Translation website

o Translation memory consists of 273,046 units taken from 489 laws (As of March 31, 2015)

Translation method (3)Translation method (3) Automatic post-edit

If there is a fuzzy match and the parts which need corrections can be converted like translation method (1), translation of the fuzzy match is automatically post-edited.

次の各号のいずれかに該当する者は、三十万円以下の罰金に処する。

Translation method (3)Translation method (3) Automatic post-edit

If there is a fuzzy match and the parts which need corrections can be converted like translation method (1), translation of the fuzzy match is automatically post-edited.

次の各号のいずれかに該当する者は、三十万円以下の罰金に処する。

↓ the sentence is abstracted by variables

[次の各号のいずれかに該当する者は、<price>以下の罰金に処する。]

Translation method (3)Translation process (3) Automatic post edit

If there is a fuzzy match and the parts which need correction can be converted like translation process (1), equivalent translation of the fuzzy match is automatically translated

次の各号のいずれかに該当する者は、三十万円以下の罰金に処する。

[次の各号のいずれかに該当する者は、<price>以下の罰金に処する。]

↓ find correspondent abstracted sentences from the translation memory

MT: [次の各号のいずれかに該当する者は、<price>以下の罰金に処する。]

次の各号のいずれかに該当する者は、五百万円以下の罰金に処する。

A person who falls under any of the following items shall be punished by a fine of not more than five million yen.

Translation method (3)Translation process (3) Automatic post edit

If there is a fuzzy match and the parts which need correction can be converted like translation process (1), equivalent translation of the fuzzy match is automatically translated

次の各号のいずれかに該当する者は、三十万円以下の罰金に処する。

A person who falls under any of the following items shall be punished by a fine of not more than five million yen.

MT: 次の各号のいずれかに該当する者は、五百万円以下の罰金に処する。

A person who falls under any of the following items shall be punished by a fine of not more than five million yen.

Translation method (3)Translation process (3) Automatic post edit

If there is a fuzzy match and the parts which need correction can be converted like translation process (1), equivalent translation of the fuzzy match is automatically translated

次の各号のいずれかに該当する者は、三十万円以下の罰金に処する。

A person who falls under any of the following items shall be punished by a fine of not more than three hundred thousand yen.

MT: 次の各号のいずれかに該当する者は、五百万円以下の罰金に処する。

A person who falls under any of the following items shall be punished by a fine of not more than five million yen.

The text translated by this method is highlighted in blue on mouse over

Translation method (4)Translation method (4) Statistical machine translation

Microsoft Translator Hub

o Translation memory used in the translation method (2) and (3) is usedfor training data.

o Dictionaries used in the translation method (1) are used

(It does not seem to be working though…)

o BLEU Score: 36.59 (17.92 higher than Microsoft’s general domainsystem)

The text translated by this method is highlighted in yellow on mouse over

StatisticsProportion of the translation method by segment

No. of segments

1) Auto conversion

923,270 (20%)

2) 100% match 1,737,505 (40%)

3) Auto post-edit 121,511 (3%)

4) SMT 1,640,607 (37%)

5) Unable to translate

239

Total 4,423,132

Auto conversion100% matchAuto post-editSMTUnable to translate

StatisticsProportion of the translation method by character

No. of characters

1) Auto conversion

6,387,268 (5%)

2) 100% match 21,453,627 (17%)

3) Auto post-edit 3,428,573 (3%)

4) SMT 95,468,688 (75%)

5) Unable to translate

563,782

Total 127,301,938

1) Auto conversion2) 100% match3) Auto post-edit4) SMT5) Unable to translate

Quality of translationQuality of the translation

A score which roughly indicates quality of the translation is calculated by the proportion of the translation methods, and the search results are ordered by that score. The score is displayed as a symbol “★” in the search result.

Higher score

Lower score

(1) Automatic conversion

(2) 100% match

(3) Automatic post-edit

(4) Statistical machine translation

Outline

• Background

• Overview of the system

• Challenges for the future

Challenges for the future• Quality evaluation of translations

o How much are they understandable?

• Increasing recyclability of translation memoryo Add more variableso Add potential highly recyclable translations to the translation memory

• Improving quality of statistical machine translationo Use dictionarieso Try other settingso Try other MT engines

• Applying this system to other documentso Municipal lawso School ruleso Internal rules for company

Acknowledgements

Japan Legal Information Institute,

Graduate School of Law, Nagoya University

Prof. Tomoko Masuda, Prof. Yoshiharu Matsuura,

Prof. Katsuhiko Toyama, Prof. Tokuyasu Kakuta,

Prof. Yasuhiro Ogawa, Prof. Makoto Nakamura,

and other professors and researchers

Thank you!

Japanese Law Machine Translation

http://itrd.crestec.co.jp/jlmt/

Yasuhiro SEKINE, CRESTEC Inc.

[email protected]