The European Resources Landscape
description
Transcript of The European Resources Landscape
LKR2004, Tokyo March 8+9 2004
The European Resources Landscape
Steven KrauwerELSNET / Utrecht University
The Netherlands
LKR2004, Tokyo March 8+9 2004
Overview
• About ELSNET• Main characteristics of the European scene• Impact of EU funding policies• Bottom-up resources infrastructure actions• Concluding remarks
LKR2004, Tokyo March 8+9 2004
What is ELSNET• European Network in Human Language
Technologies (ca 145 academic and industrial member organisations)
• Funded by the European Commission• Created in 1991 as one network out of (eventually)
ca 25, covering all subfields of ICT• Objectives
– bringing together the language and speech communities– bringing together academia and industry– facilitating R&D in language and speech technology
• Info: [email protected] http://www.elsnet.org
LKR2004, Tokyo March 8+9 2004
What we do• Spreading knowledge, e.g.:
– Training (e.g annual summer schools, curriculum development)
– Information dissemination (newsletter, website, etc)– Knowledge transfer (directories, workshops)
• Creating common foundations:– language resources– common standards and evaluation methods
• Roadmapping:– Establishing a broadly supported common vision of
where the language and speech field is going
LKR2004, Tokyo March 8+9 2004
Main characteristics of the European Landscape
• Multilinguality: coping with many languages and crossing language boundaries
• Fragmentation of all R&D efforts over national funding schemes and policies
• Unbalanced efforts over languages, even though all languages are equally hard
LKR2004, Tokyo March 8+9 2004
Languages in Europe
• European Union has – 15 member states, with 11 official languages (plus quite
a few ‘unofficial languages’)– 10 new member states with (at least) 10 new official
languages joining May 1st 2004– 3 applicant countries in the waiting room with at least 3
extra languages• Europe has
– 17 other countries, with quite a few additional languages (think of Russia!)
LKR2004, Tokyo March 8+9 2004
Languages in the world
The Ethnologue (http://www.ethnologue.org):• Europe: 230 languages• The Americas: 1013 languages• The Pacific: 1311 languages• Africa: 2058 languages• Asia: 2197 languages
LKR2004, Tokyo March 8+9 2004
Languages in Japan
• Just one language: Japanese ….• But even in Japan multilinguality is a factor, e.g:
– Export market requires localized products (e.g. user interfaces)
– Users require documentation in their own language– Business to business communication crosses language
boundaries– Immigrants
LKR2004, Tokyo March 8+9 2004
Resources in Europe
• Language resources collection started in most countries as a cultural or political activity
• Most activities in larger countries with bigger funding programmes
• Adoption or creation of resources for industrial application started much later
• Most of them addressing commercially interesting languages
• Result: very uneven coverage
LKR2004, Tokyo March 8+9 2004
Impact of the EU
• During 70s and 80s EU becomes a major funder of technology programmes
• For smaller languages EU becomes main funding source
• Political requirement of multinational consortia and balanced participation over member states gave strong boost to resources development for smaller languages
LKR2004, Tokyo March 8+9 2004
Recent EU policies• EU focus shifting to activities with a more direct
commercial impact• EU focus shifting from spreading excellence to
boosting excellence: only invest in sectors where Europe can maintain or strengthen world leadership (over e.g. US and Japan)
• EU moves from many small projects (up to 5 million euro) to few big projects (up to 50 million)
• Language and speech technology have disappeared from the agenda, and Interfaces and Knowledge Systems have taken their place
LKR2004, Tokyo March 8+9 2004
Result of new policies
• Strong emphasis on the commercially interesting languages
• Language and speech will only appear as embedded technologies
• Creation of language resources in EU projects only if needed for the main objectives of the project, i.e. never as a goal per se
• Fragmentation of language and speech technology activities over many projects
LKR2004, Tokyo March 8+9 2004
Impact on infrastructures
• Creation and distribution of resources, standards, and evaluation are infrastructural in nature (as opposed to research and development)
• They require continuity and active industrial involvement
• Very hard to accomplish in EU funding context because of short duration of projects and requirement that industries contribute 50% of their costs themselves
• Resources actions now mostly at national level
LKR2004, Tokyo March 8+9 2004
Overall picture …
• … not very good: very little to expect from EU as far as improvement of the language resources situation is concerned for the duration of the present Framework Programme (2003-2007)
• But there are some signs that the situation will improve in the next Framework Programme,
• And there are still a number of bottom up activities (emerging from the community, with or without EU support)
LKR2004, Tokyo March 8+9 2004
Ongoing resources infrastructure actions
• ELSNET: still running (since 1991, hopefully secured until summer 2005; funded by the EU as a series of independent 2-3 year projects), still supporting resources and evaluation, now focusing on the roadmap for language and speech technology and for language and speech resources
• ELRA/ELDA: Resources Association and Agency; European counterpart (although not twin sister) of LDC
LKR2004, Tokyo March 8+9 2004
Ongoing actions,continued
• ENABLER: – Network aiming at coordination of national
resources activities; EU funding has ended, but it remains active.
– Surveys and other useful material on website (www.enabler-network.org)
– Involved in resources roadmap and landscape (see later)
– Asian and US participation
LKR2004, Tokyo March 8+9 2004
Cocosda
• International committee for the coordination and standardisation of speech databases and assessment techniques
• International, not just European – also active Asian involvement
• Not funded, but alive
LKR2004, Tokyo March 8+9 2004
ICCWLRE
• International coordination committee for written language resources and evaluation.
• Written language counterpart of Cocosda• Goal is to join forces with Cocosda• To be launched at LREC 2004 in Lisbon• International, active Asian participation
LKR2004, Tokyo March 8+9 2004
LREC
• Biannual international conference on resources and evaluation
• Initiated in 1998, very successful, and truly international
• Only conference on this topic and only conference bringing together language and speech communities
LKR2004, Tokyo March 8+9 2004
Ongoing actions,continued
• The Language Resources Roadmap:– Joint activity of ELSNET/ENABLER/ELRA– Aimed at creating a broadly supported common
vision of where the field is going, and what the implications are for language resources
– Workshops (www.elsnet.org/roadmap.html)– Graphical representation at elsnet.dfki.de
LKR2004, Tokyo March 8+9 2004
Ongoing actions,continued
• The Resources Landscape:– Joint project by ELSNET/ENABLER– Aimed at creation and continued maintenance
of a full landscape of the world of language resources (actors, actions, projects, events, resources, etc)
– Still under construction– See www.enabler-network.org
LKR2004, Tokyo March 8+9 2004
EAGLES/ISLE/Wordnet
• EAGLES (and its successor ISLE) were EU funded projects aimed at standards in language and speech processing
• Projects have ended, but there are still some ongoing activities, such as MILE (the Multilingual ISLE Lexical entry)
• WordNet has had a number of European spin-offs, such as EuroWordNet, BalkaNet and local instantiations for other languages
LKR2004, Tokyo March 8+9 2004
Ongoing actions: BLARK
• Define (in a language-independent way) the minimal set of language resources that is necessary to do any precompetitive R&D and education at all for a language (the Basic Language Resource Kit or BLARK)
• Determine for each language which components are already available (survey)
• Make for each language a priority plan to complete the BLARK (and to get funding)
LKR2004, Tokyo March 8+9 2004
New initiatives• Proposal to create BLARKnet: rejected by EU
because language and speech are no core objectives• In France the successful launch of the new national
programme TechnoLangue, explicitly addressing resources and evaluation
• In Europe the initiative towards LangNet, a network aimed at coordination of national language and speech technology programmes (including resources and evaluation)
• Some of the new EU projects will address resources problems, but project info has not been released yet
LKR2004, Tokyo March 8+9 2004
Concluding remarks• We have seen some problems that are inherent to
the situation in Europe and that will not go away: linguistic fragmentation and uneven balance in distribution of R&D efforts over languages
• We have seen self-imposed problems (EU funding schemes and policies); they may go away if and when the funders change their minds
• But we have also seen that there is still place for a variety of resources related initiatives in Europe, many of which could benefit from collaboration with e.g. Japan