Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail:...

31
Language Service Management Language Service Management with the Language Grid with the Language Grid NICT Language Grid Project NICT Language Grid Project Yohei Murakami Yohei Murakami E-mail: E-mail: [email protected] Web: Web: http://langrid.nict.go.jp/ http://langrid.nict.go.jp/

Transcript of Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail:...

Language Service Language Service

Management with the Management with the

Language GridLanguage Grid

NICT Language Grid ProjectNICT Language Grid Project

Yohei MurakamiYohei Murakami

E-mail: E-mail: [email protected]

Web: Web:

http://langrid.nict.go.jp/http://langrid.nict.go.jp/

NICT Language Grid Project 2

Background Existing frameworks to combine language resources (data and

tools) are constructed for NLP professionals End users have difficulties while trying to combine the existing

language resources and use them in real field Less knowledge of language resources Complex contracts and intellectual property rights

The Language Grid is a trial of service-oriented collective intelligence to share language resources worldwide. Users can combine existing language resources (machine translations,

morphological analyzers, dictionaries etc.) to create customized language services.

Users can create their own language resources and utilize them to further customize the language services.

Public Language Grid 120 groups from 18 countries share more than 60 language services.

NICT Language Grid Project 3

The Language Grid

more more

Disaster Management Education Medical Care

Sharing Multilingual Information

Universal Playground Translation Services at Hospital Receptions

Providing Language Support for Multicultural Societies

Sharing language resources such as dictionaries and machine translators around the world

German Research Center

for Artificial Intelligence Stuttgart University

National Research Council, ItalyChinese Academy of

Sciences

National Institute of

InformaticsNICT

NTT Research LabsAsian Disaster Reduction Center

Kookmin University

Princeton University

NECTECUniv. of Indonesia

Google Inc.

NICT Language Grid Project 4

Service-Oriented Approach

Enactment of a wide variety of policies and licenses Policies and licenses depend

on providers Different from content-based

CI framework which relieson common license(e.g. Wikipedia)

Request

Resources+

License AResources

+License B

Resources+

License C

Access controller

ServiceInterface

+ Policy

Coordinationengine

+

Resource Providers

Users

Protecting intellectual property rights of the resources Access control based on

polices Combining services freely

Any combination of services are available

NICT Language Grid Project 5

Service Layers of the Language Grid

Customized Multilingual Environment

Composite Language Services

Atomic Language Services

Cloud Services

Multilingual communication is supported using various language services.

Multiple atomic language services are composed using workflows.

Language resources are made usable as Web services with standardized interfaces.

Allow users to connect to Language Grid servers on the Internet.

P2P Grid Infrastructure

Language Services(back translations, specialized translations, ….)

Intercultural Collaboration Tools

Language Resources(machine translations, morphological

analyzers, dictionaries, parallel texts…)

P2P Service Grid

Composite Service(back translations, domain-specific translations, ….)

Application System

Atomic Service(machine translations, morphological

analyzers, dictionaries, parallel texts…)

NICT Language Grid Project 6

P2P Service Grid

Language Grid Core Node

Sharing InformationInvoking Services①

Language service management, search & composition, and access control Language Grid Service NodeProvides language resources as Web services.

②③

⑤JapaneseMorphological

Analyzer

Multi-language Glossary on Natural

Disasters(Ja, En, Ko, Zh, Es, Fr)

KoreanMorphological

AnalyzerJa to Ko

Translator

Life Science

Dictionary(ja, en)

Ja to EnTranslator

En to FrTranslator

NICT Language Grid Project 7

Atomic Service

Wrap language resources as Web services equipped with a standard interface

Language service ontology is required for wrapping language resources to standardize interfaces of machine translations or dictionaries.

Machine Translation

Morphological Analyzer

Dictionary

ParallelTexts

Web Service

Language Resource

Machine Translation

Morphological Analyzer

Dictionary

ParallelTexts

Wra

pp

er

NICT Language Grid Project 8

Standard Interfaces

Translation Service Input: translate(sourceLang, targetLang, source) Output: String

Morphological Analysis Service Input: analyze(language, text) Output: Mopheme[], Morpheme={word, lemma, partOfSpeech}

Bilingual Dictionary Input: search(headLang, targetLang, headWord, matchingMethod) Output: Translation[], Translation={headWord, targetWords[]}

Parallel Text Input: search(sourceLang, targetLang, source, matchingMethod) Output: ParallelText[], ParallelText={source, target}

Pictogram, Paraphrase, …

Wrapper libraries to ease implementation of wrappers will be provided as open sources.(http://langrid.nict.go.jp/langrid-developers-wiki/)

NICT Language Grid Project 9

Composite Service

To create a new language service, describe an abstract workflow

Register the abstract workflow into Language Grid Core Node Assign an concrete atomic service to each task in the abstract

workflow in invoking the service Put the binding information into the SOAP header

Translationja->en

Translationen->de

<An Abstract Workflow for two-hop Translation>

JServer Web Transer

……Translation Services

Change Service

NICT Language Grid Project 10

10

Workflow can be Complex!

Translationja->en

Translationen->ja

+

Multilingual Backtranslation

(ja->zh->ja, ja->de->ja, ja->en->ja)

3 translation results,3 back translation results

Pangaea’sCommuni

tyDictionary

(by NPO Pangaea)

JServer(by Kodensha)

MeCab

(by NTT CS)

Web Transer(by Cross Language)

AtomicServices

Translationja->zh

Translationzh->ja

Translationja->de

Translationde->ja

Technical Term Extraction

Technical TermMultilingual Dictionary

remaining terms?

Yes

No No

Japanese Morphological Analysis

remaining terms?

+Intermediate Code

Insertion

Term Replacement

IntermediateCode Table

Translationja->en

Translationen->de

Translationja->de

Japanese-German Domain Specific Translation

Japanese-German Translation

Yes

NICT Language Grid Project 11

Service ManagerService Manager

Language Service Management Architecture

Language Grid Core Node

Language Grid Service Node

ApplicationSystem

Service InvokerService Invoker

Composite Service Engine(ActiveBPEL, Java, JavaScript, etc)

Composite Service Engine(ActiveBPEL, Java, JavaScript, etc)

LanguageResources

WSDL

AtomicService Engine

AtomicService Engine

1. SOAP

2. SOAP3. SOAP4. SOAP

VirtualEndpoint

Language ServiceUsers

Language ResourceProviders

get

create

AccessConstraint

Monitor

AccessLog

EndpointURL

Service Registration

WSDL

AccessLogging

AccessController

LoadBalancer

Policy

LanguageServiceWrapper

NICT Language Grid Project 12

Service Manager

Web-based tool to manage Language Grid users, language resources, and language services on the Language Grid.(http://langrid.org/operation/service_manager)

(http://langrid.org/operation/service_manager/)

NICT Language Grid Project 13

Monitoring & Control of Language Services

Monitor access date, IP address, and data transfer size of each request Set access right for each user Control accesses per day/month/ year, and data transfer size

To Monitor and Control the Language Services

NICT Language Grid Project 14

Case Studies (1)

NICT Hard to provide free EDR (Concept/Bilingual Dictionary) services

because NICT sells it.⇒ Set 1000/month and 15KB/access for bilingual dictionary⇒ Set 2000/month and 35KB/access for concept dictionary

(These polices are configured to take almost one year for downloading whole data!!)

⇒ Allow only members to access EDR services without any restrictions

Kodensha Co., Ltd. Hard to provide free J-Server service to users who are Kodensha’s

business target⇒ Prohibit them to access the free J-Server service⇒ Allow only members to access the latest and high-quality J-Server

service operated by Kodensha

NICT Language Grid Project 15

Case Studies (2)

Kyoto University Have a responsibility to prevent illegal usage because Kyoto U.

provides services based on resources it purchased from companies

⇒ Monitor whether the services are abused or not⇒ Detect excessive access from a specific IP address

GSK Promote language resource distribution on behalf of language

resource providers⇒ Deploy the language resources on GSK’s server and allow

users who purchase the resources to access them(This hosting model can reduce language resource providers’ burden for selling and operating them)

NICT Language Grid Project 16

From Having to Using: Service-oriented approach can relax complex issues of intellectual property rights of language resources.

Cloud Services: Service-oriented approach allows resource providers to scale up the usage of language resources.

Service Federation: Service-oriented approach allows language services to be easily combined with other services, i.e., e-learning services, ambient intelligent services, etc.

Service-Oriented Approach: Pros and Cons

Maintenance Cost: Language services should be maintained and provided continuously by secure providers.

Market Pull: Language services should be designed based on market demand that is hard to be controlled by academic communities.

Pros

Cons

NICT Language Grid Project 17

SummarySummary

Propose service-oriented collective intelligence platform to manage language services Enable language resource providers to provide their

services while holding their ownership of their resources

Develop language service management architecture Monitoring of language services Access control of language services Service Manager is a Web-based GUI

Collect experience of operating the first operation of service-oriented platform Several language service policies Pros and Cons of service-oriented approach

NICT Language Grid Project 18

NICT Language Grid Project 19

Difficulties often arise while trying to share and combine the existing language resources and use them in real field Complex contracts and intellectual property rights Non-standardized application interfaces

Improve the accessibility and usability of those language resources and encourage users to create new language services that suit their needs by combining several language resources Standardize interfaces of language resources by wrappers Publish language resources not as source program but as

Web services Combine language resources by Web service workflows Manage those service profile Control access to those resources

Role of the Language GridRole of the Language Grid

NICT Language Grid Project 20

Language Grid Core Node and Service Node

Language Grid Core Node

Language Grid Service Node

IC Tools

Service Invoker

Composite Service Engine(ActiveBPEL, UIMA, HoG, etc)

LanguageResources

WSDL

Atomic ServiceEngine1. SOAP

2. SOAP

3. SOAP

4. SOAP

5. HTTP, FunctionCall, etc.

VirtualEndpoint

Language ServiceUsers

Language ResourceProviders

get

create

Language Service Management

AccessConstraint

Monitor

AccessLog

EndpointURL

Service Registration

WSDL

NICT Language Grid Project 21

Participants / Language Services

0

20

40

60

80

100

120

Universities

Research Organizaitons

Research Projects

NPOs and NGOs

Public Organizations

Companies

Others

• Participants ( 17 countries, 118 groups)– University / Research Institute

• Kyoto Univ. (Japan), Shanghai Jiaotong Univ. (China), Univ. of Stuttgart (Germany), IT Univ. of Copenhagen (Denmark), Princeton Univ. (U.S), DFKI (Germany), CNR (Italy), Chinese Academy of Sciences (China), NECTEC (Thailand), and more.

– NPO/NGO/Public Sector• NGOs for disaster reduction, Public Junior-high schools, City Boards of Education,

and more.

– Corporate (CSR activities / language resource providers)• NTT, Toshiba, Oki, Google, Kodensha, Translution, and more.• Language Services (more than

60)– Machine Translator

• J-Server, Web-Transer, Toshiba, Parsit, Google Translate, and more.

– Dictionary, Parallel Text• EDR , Wordnet, Life Science

Dictionary, Multi-language Glossary on Natural Disasters, and more.

– Morphological Analyzer– Dependency Parser– Composite Services

NICT Language Grid Project 22

DictionaryDictionary ServiceService

Parallel Text ServiceParallel Text Service

Human Translation ServiceHuman Translation Service

Machine Translation ServiceMachine Translation Service

Wrapping

Wrapping

Wrapping

Wrapping

DictionaryDictionary Service

Parallel TextParallel Text Service

MTMachine Translation Service

Human Translator Human Translation Service

Search translated word

Search similar translated text

Translate by machine

Translate with high quality

Hinanbasho (Disaster shelter)

disaster shelter

Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)

Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)

Disaster shelter is school close from a house.

Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)

Your disaster shelter is the school closest to your house.

Atomic ServiceAtomic Service

The disaster shelter is [].

NICT Language Grid Project 23

NICT Language Grid Project 24

Fourth Layer: Intercultural Collaboration ToolsLanguage Grid Toolbox (developed by NICT)

24

Text Translation Multilingual BBS

Language Resource Creation

Toolbox was released as OSS.http://langrid-tools.nict.go.jp/toolbox/

・ Estimate the translation accuracy using backtranslation

・ Submit messages in users’ mother languages・ Improve the translation result by post editing manually

・ Create multilingual dictionaries specific to users’ communities

Input Translation Result

Back Translation Result

XOOPS (Open Source Software)XOOPS (Open Source Software)

Multilingual Dictionary

Multilingual Dictionary

Multilingual Corpus

Multilingual Corpus

Dictionary DataDictionary Data Parallel Texts

Parallel Texts

Text

Tra

nsla

tion

Text

Tra

nsla

tion

Multilin

gual

BB

SM

ultilin

gual

BB

S

Language Services on the Language Grid

Language Services on the Language Grid

NICT Language Grid Project 25

Language Resource

Provider

Computation Resource Provider

Language GridOperator

Language ServiceUser

control their

resources

control their

resources

Operation of the Language GridOperation of the Language Grid

Diverse stakeholders Language Service User Language Resource Provider Computation Resource

Provider Language Grid Operator

Language Grid for non-profit use has been operated by Kyoto Univ. since December 2007.

The Letter of Agreement on the Language Grid is available.(http://langrid.org/operation/)

118 organizations (from 17 countries) signed the agreement

NICT Language Grid Project 26

Language Grid AssociationLanguage Grid Association(http://langrid.org/associaiton/)(http://langrid.org/associaiton/)

NICT Language Grid Project 27

Intercultural Collaboration ToolsIntercultural Collaboration ToolsLanguage Grid Playground (developed by Kyoto Univ.)Language Grid Playground (developed by Kyoto Univ.)

http://langrid.org/playground

NICT Language Grid Project 28

M3

(developed by Wakayama Univ.)

For foreign patient

For medical staff

http://www.langrid.org/association/m3support/indexe.html

NICT Language Grid Project 29

Pangaea is an NPO which aims at supporting communication between children in various countries

Pangaea Community Site allows the participants and the staffs to Communicate in their own language using translation service

Japanese, Korean, English, German Revise the result of machine translation for other people

Pangaea Community SitePangaea Community Site(NPO Pangaea)(NPO Pangaea)

NICT Language Grid Project 30

Pangaea is also a provider of language resources Pictograms designed for

communication between children in different countries

Pangaea Community dictionary which contains 500 terms for Pangaea’s activities

Pangaea as a Language Resource Pangaea as a Language Resource ProviderProvider

Both resources are provided as Web services and combined with other services on the Language Grid

e.g. Pangaean (participants of activities), Koetsuna (ice break-ing activity for children)

Pangaea Activities (Community Site)

LanguageGrid

Pictograms and community dictionary

Combining Korean, Japanese, English, and German Morphological Analyzers, community dictionary, and 2 Machine

Translators

NICT Language Grid Project 31

Multilingual Communication SystemMultilingual Communication System(( Kyoto University, Ritsumeikan UniversityKyoto University, Ritsumeikan University ))

Backtranslation

Translation

Autocomplete

Fujimi Junior High SchoolEvery students 584

Filipino 4Chinese 6Korean 2

Peruvian 1

Chinese userJapanese user