Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail:...
-
Upload
emil-dylan-walker -
Category
Documents
-
view
246 -
download
1
Transcript of Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail:...
Language Service Language Service
Management with the Management with the
Language GridLanguage Grid
NICT Language Grid ProjectNICT Language Grid Project
Yohei MurakamiYohei Murakami
E-mail: E-mail: [email protected]
Web: Web:
http://langrid.nict.go.jp/http://langrid.nict.go.jp/
NICT Language Grid Project 2
Background Existing frameworks to combine language resources (data and
tools) are constructed for NLP professionals End users have difficulties while trying to combine the existing
language resources and use them in real field Less knowledge of language resources Complex contracts and intellectual property rights
The Language Grid is a trial of service-oriented collective intelligence to share language resources worldwide. Users can combine existing language resources (machine translations,
morphological analyzers, dictionaries etc.) to create customized language services.
Users can create their own language resources and utilize them to further customize the language services.
Public Language Grid 120 groups from 18 countries share more than 60 language services.
NICT Language Grid Project 3
The Language Grid
more more
Disaster Management Education Medical Care
Sharing Multilingual Information
Universal Playground Translation Services at Hospital Receptions
Providing Language Support for Multicultural Societies
Sharing language resources such as dictionaries and machine translators around the world
German Research Center
for Artificial Intelligence Stuttgart University
National Research Council, ItalyChinese Academy of
Sciences
National Institute of
InformaticsNICT
NTT Research LabsAsian Disaster Reduction Center
Kookmin University
Princeton University
NECTECUniv. of Indonesia
Google Inc.
NICT Language Grid Project 4
Service-Oriented Approach
Enactment of a wide variety of policies and licenses Policies and licenses depend
on providers Different from content-based
CI framework which relieson common license(e.g. Wikipedia)
Request
Resources+
License AResources
+License B
Resources+
License C
Access controller
ServiceInterface
+ Policy
Coordinationengine
+
Resource Providers
Users
Protecting intellectual property rights of the resources Access control based on
polices Combining services freely
Any combination of services are available
NICT Language Grid Project 5
Service Layers of the Language Grid
Customized Multilingual Environment
Composite Language Services
Atomic Language Services
Cloud Services
Multilingual communication is supported using various language services.
Multiple atomic language services are composed using workflows.
Language resources are made usable as Web services with standardized interfaces.
Allow users to connect to Language Grid servers on the Internet.
P2P Grid Infrastructure
Language Services(back translations, specialized translations, ….)
Intercultural Collaboration Tools
Language Resources(machine translations, morphological
analyzers, dictionaries, parallel texts…)
P2P Service Grid
Composite Service(back translations, domain-specific translations, ….)
Application System
Atomic Service(machine translations, morphological
analyzers, dictionaries, parallel texts…)
NICT Language Grid Project 6
P2P Service Grid
Language Grid Core Node
Sharing InformationInvoking Services①
Language service management, search & composition, and access control Language Grid Service NodeProvides language resources as Web services.
②③
④
⑤JapaneseMorphological
Analyzer
Multi-language Glossary on Natural
Disasters(Ja, En, Ko, Zh, Es, Fr)
KoreanMorphological
AnalyzerJa to Ko
Translator
Life Science
Dictionary(ja, en)
Ja to EnTranslator
En to FrTranslator
⑥
NICT Language Grid Project 7
Atomic Service
Wrap language resources as Web services equipped with a standard interface
Language service ontology is required for wrapping language resources to standardize interfaces of machine translations or dictionaries.
Machine Translation
Morphological Analyzer
Dictionary
ParallelTexts
Web Service
Language Resource
Machine Translation
Morphological Analyzer
Dictionary
ParallelTexts
Wra
pp
er
NICT Language Grid Project 8
Standard Interfaces
Translation Service Input: translate(sourceLang, targetLang, source) Output: String
Morphological Analysis Service Input: analyze(language, text) Output: Mopheme[], Morpheme={word, lemma, partOfSpeech}
Bilingual Dictionary Input: search(headLang, targetLang, headWord, matchingMethod) Output: Translation[], Translation={headWord, targetWords[]}
Parallel Text Input: search(sourceLang, targetLang, source, matchingMethod) Output: ParallelText[], ParallelText={source, target}
Pictogram, Paraphrase, …
Wrapper libraries to ease implementation of wrappers will be provided as open sources.(http://langrid.nict.go.jp/langrid-developers-wiki/)
NICT Language Grid Project 9
Composite Service
To create a new language service, describe an abstract workflow
Register the abstract workflow into Language Grid Core Node Assign an concrete atomic service to each task in the abstract
workflow in invoking the service Put the binding information into the SOAP header
Translationja->en
Translationen->de
<An Abstract Workflow for two-hop Translation>
JServer Web Transer
……Translation Services
Change Service
NICT Language Grid Project 10
10
Workflow can be Complex!
Translationja->en
Translationen->ja
+
Multilingual Backtranslation
(ja->zh->ja, ja->de->ja, ja->en->ja)
3 translation results,3 back translation results
Pangaea’sCommuni
tyDictionary
(by NPO Pangaea)
JServer(by Kodensha)
MeCab
(by NTT CS)
Web Transer(by Cross Language)
AtomicServices
Translationja->zh
Translationzh->ja
Translationja->de
Translationde->ja
Technical Term Extraction
Technical TermMultilingual Dictionary
remaining terms?
Yes
No No
Japanese Morphological Analysis
remaining terms?
+Intermediate Code
Insertion
Term Replacement
IntermediateCode Table
Translationja->en
Translationen->de
Translationja->de
Japanese-German Domain Specific Translation
Japanese-German Translation
Yes
NICT Language Grid Project 11
Service ManagerService Manager
Language Service Management Architecture
Language Grid Core Node
Language Grid Service Node
ApplicationSystem
Service InvokerService Invoker
Composite Service Engine(ActiveBPEL, Java, JavaScript, etc)
Composite Service Engine(ActiveBPEL, Java, JavaScript, etc)
LanguageResources
WSDL
AtomicService Engine
AtomicService Engine
1. SOAP
2. SOAP3. SOAP4. SOAP
VirtualEndpoint
Language ServiceUsers
Language ResourceProviders
get
create
AccessConstraint
Monitor
AccessLog
EndpointURL
Service Registration
WSDL
AccessLogging
AccessController
LoadBalancer
Policy
LanguageServiceWrapper
NICT Language Grid Project 12
Service Manager
Web-based tool to manage Language Grid users, language resources, and language services on the Language Grid.(http://langrid.org/operation/service_manager)
(http://langrid.org/operation/service_manager/)
NICT Language Grid Project 13
Monitoring & Control of Language Services
Monitor access date, IP address, and data transfer size of each request Set access right for each user Control accesses per day/month/ year, and data transfer size
To Monitor and Control the Language Services
NICT Language Grid Project 14
Case Studies (1)
NICT Hard to provide free EDR (Concept/Bilingual Dictionary) services
because NICT sells it.⇒ Set 1000/month and 15KB/access for bilingual dictionary⇒ Set 2000/month and 35KB/access for concept dictionary
(These polices are configured to take almost one year for downloading whole data!!)
⇒ Allow only members to access EDR services without any restrictions
Kodensha Co., Ltd. Hard to provide free J-Server service to users who are Kodensha’s
business target⇒ Prohibit them to access the free J-Server service⇒ Allow only members to access the latest and high-quality J-Server
service operated by Kodensha
NICT Language Grid Project 15
Case Studies (2)
Kyoto University Have a responsibility to prevent illegal usage because Kyoto U.
provides services based on resources it purchased from companies
⇒ Monitor whether the services are abused or not⇒ Detect excessive access from a specific IP address
GSK Promote language resource distribution on behalf of language
resource providers⇒ Deploy the language resources on GSK’s server and allow
users who purchase the resources to access them(This hosting model can reduce language resource providers’ burden for selling and operating them)
NICT Language Grid Project 16
From Having to Using: Service-oriented approach can relax complex issues of intellectual property rights of language resources.
Cloud Services: Service-oriented approach allows resource providers to scale up the usage of language resources.
Service Federation: Service-oriented approach allows language services to be easily combined with other services, i.e., e-learning services, ambient intelligent services, etc.
Service-Oriented Approach: Pros and Cons
Maintenance Cost: Language services should be maintained and provided continuously by secure providers.
Market Pull: Language services should be designed based on market demand that is hard to be controlled by academic communities.
Pros
Cons
NICT Language Grid Project 17
SummarySummary
Propose service-oriented collective intelligence platform to manage language services Enable language resource providers to provide their
services while holding their ownership of their resources
Develop language service management architecture Monitoring of language services Access control of language services Service Manager is a Web-based GUI
Collect experience of operating the first operation of service-oriented platform Several language service policies Pros and Cons of service-oriented approach
NICT Language Grid Project 19
Difficulties often arise while trying to share and combine the existing language resources and use them in real field Complex contracts and intellectual property rights Non-standardized application interfaces
Improve the accessibility and usability of those language resources and encourage users to create new language services that suit their needs by combining several language resources Standardize interfaces of language resources by wrappers Publish language resources not as source program but as
Web services Combine language resources by Web service workflows Manage those service profile Control access to those resources
Role of the Language GridRole of the Language Grid
NICT Language Grid Project 20
Language Grid Core Node and Service Node
Language Grid Core Node
Language Grid Service Node
IC Tools
Service Invoker
Composite Service Engine(ActiveBPEL, UIMA, HoG, etc)
LanguageResources
WSDL
Atomic ServiceEngine1. SOAP
2. SOAP
3. SOAP
4. SOAP
5. HTTP, FunctionCall, etc.
VirtualEndpoint
Language ServiceUsers
Language ResourceProviders
get
create
Language Service Management
AccessConstraint
Monitor
AccessLog
EndpointURL
Service Registration
WSDL
NICT Language Grid Project 21
Participants / Language Services
0
20
40
60
80
100
120
Universities
Research Organizaitons
Research Projects
NPOs and NGOs
Public Organizations
Companies
Others
• Participants ( 17 countries, 118 groups)– University / Research Institute
• Kyoto Univ. (Japan), Shanghai Jiaotong Univ. (China), Univ. of Stuttgart (Germany), IT Univ. of Copenhagen (Denmark), Princeton Univ. (U.S), DFKI (Germany), CNR (Italy), Chinese Academy of Sciences (China), NECTEC (Thailand), and more.
– NPO/NGO/Public Sector• NGOs for disaster reduction, Public Junior-high schools, City Boards of Education,
and more.
– Corporate (CSR activities / language resource providers)• NTT, Toshiba, Oki, Google, Kodensha, Translution, and more.• Language Services (more than
60)– Machine Translator
• J-Server, Web-Transer, Toshiba, Parsit, Google Translate, and more.
– Dictionary, Parallel Text• EDR , Wordnet, Life Science
Dictionary, Multi-language Glossary on Natural Disasters, and more.
– Morphological Analyzer– Dependency Parser– Composite Services
NICT Language Grid Project 22
DictionaryDictionary ServiceService
Parallel Text ServiceParallel Text Service
Human Translation ServiceHuman Translation Service
Machine Translation ServiceMachine Translation Service
Wrapping
Wrapping
Wrapping
Wrapping
DictionaryDictionary Service
Parallel TextParallel Text Service
MTMachine Translation Service
Human Translator Human Translation Service
Search translated word
Search similar translated text
Translate by machine
Translate with high quality
Hinanbasho (Disaster shelter)
disaster shelter
Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)
Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)
Disaster shelter is school close from a house.
Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house)
Your disaster shelter is the school closest to your house.
Atomic ServiceAtomic Service
The disaster shelter is [].
NICT Language Grid Project 24
Fourth Layer: Intercultural Collaboration ToolsLanguage Grid Toolbox (developed by NICT)
24
Text Translation Multilingual BBS
Language Resource Creation
Toolbox was released as OSS.http://langrid-tools.nict.go.jp/toolbox/
・ Estimate the translation accuracy using backtranslation
・ Submit messages in users’ mother languages・ Improve the translation result by post editing manually
・ Create multilingual dictionaries specific to users’ communities
Input Translation Result
Back Translation Result
XOOPS (Open Source Software)XOOPS (Open Source Software)
Multilingual Dictionary
Multilingual Dictionary
Multilingual Corpus
Multilingual Corpus
Dictionary DataDictionary Data Parallel Texts
Parallel Texts
Text
Tra
nsla
tion
Text
Tra
nsla
tion
Multilin
gual
BB
SM
ultilin
gual
BB
S
Language Services on the Language Grid
Language Services on the Language Grid
NICT Language Grid Project 25
Language Resource
Provider
Computation Resource Provider
Language GridOperator
Language ServiceUser
control their
resources
control their
resources
Operation of the Language GridOperation of the Language Grid
Diverse stakeholders Language Service User Language Resource Provider Computation Resource
Provider Language Grid Operator
Language Grid for non-profit use has been operated by Kyoto Univ. since December 2007.
The Letter of Agreement on the Language Grid is available.(http://langrid.org/operation/)
118 organizations (from 17 countries) signed the agreement
NICT Language Grid Project 26
Language Grid AssociationLanguage Grid Association(http://langrid.org/associaiton/)(http://langrid.org/associaiton/)
NICT Language Grid Project 27
Intercultural Collaboration ToolsIntercultural Collaboration ToolsLanguage Grid Playground (developed by Kyoto Univ.)Language Grid Playground (developed by Kyoto Univ.)
http://langrid.org/playground
NICT Language Grid Project 28
M3
(developed by Wakayama Univ.)
For foreign patient
For medical staff
http://www.langrid.org/association/m3support/indexe.html
NICT Language Grid Project 29
Pangaea is an NPO which aims at supporting communication between children in various countries
Pangaea Community Site allows the participants and the staffs to Communicate in their own language using translation service
Japanese, Korean, English, German Revise the result of machine translation for other people
Pangaea Community SitePangaea Community Site(NPO Pangaea)(NPO Pangaea)
NICT Language Grid Project 30
Pangaea is also a provider of language resources Pictograms designed for
communication between children in different countries
Pangaea Community dictionary which contains 500 terms for Pangaea’s activities
Pangaea as a Language Resource Pangaea as a Language Resource ProviderProvider
Both resources are provided as Web services and combined with other services on the Language Grid
e.g. Pangaean (participants of activities), Koetsuna (ice break-ing activity for children)
Pangaea Activities (Community Site)
LanguageGrid
Pictograms and community dictionary
Combining Korean, Japanese, English, and German Morphological Analyzers, community dictionary, and 2 Machine
Translators
NICT Language Grid Project 31
Multilingual Communication SystemMultilingual Communication System(( Kyoto University, Ritsumeikan UniversityKyoto University, Ritsumeikan University ))
Backtranslation
Translation
Autocomplete
Fujimi Junior High SchoolEvery students 584
Filipino 4Chinese 6Korean 2
Peruvian 1
Chinese userJapanese user