LKR2004, Tokyo March 8+9 2004 [email protected] The European Resources Landscape Steven...

25
LKR2004, Tokyo M arch 8+9 2004 [email protected] 1 The European Resources Landscape Steven Krauwer ELSNET / Utrecht University The Netherlands

Transcript of LKR2004, Tokyo March 8+9 2004 [email protected] The European Resources Landscape Steven...

LKR2004, Tokyo March 8+9 2004

[email protected] 1

The European Resources Landscape

Steven Krauwer

ELSNET / Utrecht University

The Netherlands

LKR2004, Tokyo March 8+9 2004

[email protected] 2

Overview

• About ELSNET

• Main characteristics of the European scene

• Impact of EU funding policies

• Bottom-up resources infrastructure actions

• Concluding remarks

LKR2004, Tokyo March 8+9 2004

[email protected] 3

What is ELSNET

• European Network in Human Language Technologies (ca 145 academic and industrial member organisations)

• Funded by the European Commission• Created in 1991 as one network out of (eventually)

ca 25, covering all subfields of ICT• Objectives

– bringing together the language and speech communities– bringing together academia and industry– facilitating R&D in language and speech technology

• Info: [email protected] http://www.elsnet.org

LKR2004, Tokyo March 8+9 2004

[email protected] 4

What we do

• Spreading knowledge, e.g.:– Training (e.g annual summer schools, curriculum

development)– Information dissemination (newsletter, website, etc)– Knowledge transfer (directories, workshops)

• Creating common foundations:– language resources– common standards and evaluation methods

• Roadmapping:– Establishing a broadly supported common vision of

where the language and speech field is going

LKR2004, Tokyo March 8+9 2004

[email protected] 5

Main characteristics of the European Landscape

• Multilinguality: coping with many languages and crossing language boundaries

• Fragmentation of all R&D efforts over national funding schemes and policies

• Unbalanced efforts over languages, even though all languages are equally hard

LKR2004, Tokyo March 8+9 2004

[email protected] 6

Languages in Europe

• European Union has – 15 member states, with 11 official languages (plus quite

a few ‘unofficial languages’)– 10 new member states with (at least) 10 new official

languages joining May 1st 2004– 3 applicant countries in the waiting room with at least 3

extra languages

• Europe has– 17 other countries, with quite a few additional

languages (think of Russia!)

LKR2004, Tokyo March 8+9 2004

[email protected] 7

Languages in the world

The Ethnologue (http://www.ethnologue.org):

• Europe: 230 languages

• The Americas: 1013 languages

• The Pacific: 1311 languages

• Africa: 2058 languages

• Asia: 2197 languages

LKR2004, Tokyo March 8+9 2004

[email protected] 8

Languages in Japan

• Just one language: Japanese ….• But even in Japan multilinguality is a factor, e.g:

– Export market requires localized products (e.g. user interfaces)

– Users require documentation in their own language

– Business to business communication crosses language boundaries

– Immigrants

LKR2004, Tokyo March 8+9 2004

[email protected] 9

Resources in Europe

• Language resources collection started in most countries as a cultural or political activity

• Most activities in larger countries with bigger funding programmes

• Adoption or creation of resources for industrial application started much later

• Most of them addressing commercially interesting languages

• Result: very uneven coverage

LKR2004, Tokyo March 8+9 2004

[email protected] 10

Impact of the EU

• During 70s and 80s EU becomes a major funder of technology programmes

• For smaller languages EU becomes main funding source

• Political requirement of multinational consortia and balanced participation over member states gave strong boost to resources development for smaller languages

LKR2004, Tokyo March 8+9 2004

[email protected] 11

Recent EU policies

• EU focus shifting to activities with a more direct commercial impact

• EU focus shifting from spreading excellence to boosting excellence: only invest in sectors where Europe can maintain or strengthen world leadership (over e.g. US and Japan)

• EU moves from many small projects (up to 5 million euro) to few big projects (up to 50 million)

• Language and speech technology have disappeared from the agenda, and Interfaces and Knowledge Systems have taken their place

LKR2004, Tokyo March 8+9 2004

[email protected] 12

Result of new policies

• Strong emphasis on the commercially interesting languages

• Language and speech will only appear as embedded technologies

• Creation of language resources in EU projects only if needed for the main objectives of the project, i.e. never as a goal per se

• Fragmentation of language and speech technology activities over many projects

LKR2004, Tokyo March 8+9 2004

[email protected] 13

Impact on infrastructures

• Creation and distribution of resources, standards, and evaluation are infrastructural in nature (as opposed to research and development)

• They require continuity and active industrial involvement

• Very hard to accomplish in EU funding context because of short duration of projects and requirement that industries contribute 50% of their costs themselves

• Resources actions now mostly at national level

LKR2004, Tokyo March 8+9 2004

[email protected] 14

Overall picture …

• … not very good: very little to expect from EU as far as improvement of the language resources situation is concerned for the duration of the present Framework Programme (2003-2007)

• But there are some signs that the situation will improve in the next Framework Programme,

• And there are still a number of bottom up activities (emerging from the community, with or without EU support)

LKR2004, Tokyo March 8+9 2004

[email protected] 15

Ongoing resources infrastructure actions

• ELSNET: still running (since 1991, hopefully secured until summer 2005; funded by the EU as a series of independent 2-3 year projects), still supporting resources and evaluation, now focusing on the roadmap for language and speech technology and for language and speech resources

• ELRA/ELDA: Resources Association and Agency; European counterpart (although not twin sister) of LDC

LKR2004, Tokyo March 8+9 2004

[email protected] 16

Ongoing actions,continued

• ENABLER: – Network aiming at coordination of national

resources activities; EU funding has ended, but it remains active.

– Surveys and other useful material on website (www.enabler-network.org)

– Involved in resources roadmap and landscape (see later)

– Asian and US participation

LKR2004, Tokyo March 8+9 2004

[email protected] 17

Cocosda

• International committee for the coordination and standardisation of speech databases and assessment techniques

• International, not just European – also active Asian involvement

• Not funded, but alive

LKR2004, Tokyo March 8+9 2004

[email protected] 18

ICCWLRE

• International coordination committee for written language resources and evaluation.

• Written language counterpart of Cocosda

• Goal is to join forces with Cocosda

• To be launched at LREC 2004 in Lisbon

• International, active Asian participation

LKR2004, Tokyo March 8+9 2004

[email protected] 19

LREC

• Biannual international conference on resources and evaluation

• Initiated in 1998, very successful, and truly international

• Only conference on this topic and only conference bringing together language and speech communities

LKR2004, Tokyo March 8+9 2004

[email protected] 20

Ongoing actions,continued

• The Language Resources Roadmap:– Joint activity of ELSNET/ENABLER/ELRA– Aimed at creating a broadly supported common

vision of where the field is going, and what the implications are for language resources

– Workshops (www.elsnet.org/roadmap.html)– Graphical representation at elsnet.dfki.de

LKR2004, Tokyo March 8+9 2004

[email protected] 21

Ongoing actions,continued

• The Resources Landscape:– Joint project by ELSNET/ENABLER– Aimed at creation and continued maintenance

of a full landscape of the world of language resources (actors, actions, projects, events, resources, etc)

– Still under construction– See www.enabler-network.org

LKR2004, Tokyo March 8+9 2004

[email protected] 22

EAGLES/ISLE/Wordnet

• EAGLES (and its successor ISLE) were EU funded projects aimed at standards in language and speech processing

• Projects have ended, but there are still some ongoing activities, such as MILE (the Multilingual ISLE Lexical entry)

• WordNet has had a number of European spin-offs, such as EuroWordNet, BalkaNet and local instantiations for other languages

LKR2004, Tokyo March 8+9 2004

[email protected] 23

Ongoing actions: BLARK

• Define (in a language-independent way) the minimal set of language resources that is necessary to do any precompetitive R&D and education at all for a language (the Basic Language Resource Kit or BLARK)

• Determine for each language which components are already available (survey)

• Make for each language a priority plan to complete the BLARK (and to get funding)

LKR2004, Tokyo March 8+9 2004

[email protected] 24

New initiatives

• Proposal to create BLARKnet: rejected by EU because language and speech are no core objectives

• In France the successful launch of the new national programme TechnoLangue, explicitly addressing resources and evaluation

• In Europe the initiative towards LangNet, a network aimed at coordination of national language and speech technology programmes (including resources and evaluation)

• Some of the new EU projects will address resources problems, but project info has not been released yet

LKR2004, Tokyo March 8+9 2004

[email protected] 25

Concluding remarks

• We have seen some problems that are inherent to the situation in Europe and that will not go away: linguistic fragmentation and uneven balance in distribution of R&D efforts over languages

• We have seen self-imposed problems (EU funding schemes and policies); they may go away if and when the funders change their minds

• But we have also seen that there is still place for a variety of resources related initiatives in Europe, many of which could benefit from collaboration with e.g. Japan