The CESAR Project: Challenges and Achievements
description
Transcript of The CESAR Project: Challenges and Achievements
![Page 1: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/1.jpg)
Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119.
Co-funded by the ICT PSP Programme of the European Commission through the contract CESAR, grant agreement no.: 271022.
The CESAR Project: Challenges and Achievements
Tamás Váradicoordinator
Research Institute for Linguistics, Hungarian Academy of Sciences Budapest, Hungary
CESAR META-NET RoadshowBudapest, 18th January, 2013
![Page 2: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/2.jpg)
Outline
The CESAR consortium
Project objectives
CESAR in META-SHARE
Survey of results
Gaps and Challenges
Conclusions
http://www.cesar-project.net
2
![Page 3: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/3.jpg)
META-NET & CESAR
http://www.cesar-project.net
3
![Page 4: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/4.jpg)
Geo-linguistic position
CESAR stands for CEntral and Southeast EuropeAn Resources
operates as integral part of META-NET
geo-linguistic spread Central and Southeast Europe three inner seas: Baltic, Adriatic, Black Sea
CESAR covers languages Polish EU, 38M (40-48M) Slovak EU, 5.4M (7M) Hungarian EU, 10M (16M) Croatian EU in 2013, 4.4M (5.5M) Serbian candidate soon, 7.3M (9M) Bulgarian EU, 7.5M (9M)
all languages Slavic, except Hungarian
4http://www.cesar-project.net
![Page 5: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/5.jpg)
Who is CESAR?
Participant no.
Participant organisation name Participant short name
Country
1 (CO) Nyelvtudományi Intézet, Magyar Tudományos Akadémia HASRIL Hungary
2 Budapesti Műszaki és Gazdaságtudományi Egyetem BME-TMIT Hungary
3 Sveučilište u Zagrebu, Filozofski Fakultet – University of Zagreb, Faculty of Humanities and Social Sciences
FFZG Croatia
4 Instytut Podstaw Informatyki Polskej Akademii Nauk IPIPAN Poland
5 Uniwersytet Lodzki Ulodz Poland
6 Faculty of Mathematics, University of Belgrade UBG Serbia
7 Institut Mihajlo Pupin IPUP Serbia
8 The Institute for Bulgarian Language Prof. Lyubomir Andreychin IBL Bulgaria
9 Jazykovedny Ústav Ludovíta Stúra Slovenskej Akadémie Vied LSIL Slovakia
http://www.cesar-project.net
5
![Page 6: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/6.jpg)
The Faces behind CESAR
http://www.cesar-project.net
6
![Page 7: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/7.jpg)
Project objectives
provide a description of the national landscape in terms of language use, language-savvy products and services, language technologies
and resourcesc
ontribute to a pan-European digital language resources exchange(META-SHARE) enhance, extend, document, standardize, cross-link, cross-align resources
and toolsm
obilise national and regional stakeholders, public bodies and fundingr
einvigorate cooperation between key technology partners in the regionc
ollaborate with other partner projectsb
ridge the technological gap between this region and the other parts of Europe by 7
http://www.cesar-project.net
![Page 8: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/8.jpg)
Timeline
Project runs between 1st February 2011 and 31st January 2013
Three major deliverables of resources and tools
BATCH 1: M10, 30th November 2011
BATCH2: M18, 31st July 2012
BATCH3: M24 31st January 2013
http://www.cesar-project.net
8
![Page 9: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/9.jpg)
Where to find CESAR
www.meta-net.eu
http://www.cesar-project.net
9
![Page 10: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/10.jpg)
www.cesar-project.net
http://www.cesar-project.net
10
![Page 11: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/11.jpg)
CESAR in META-SHARE
http://www.cesar-project.net
11
![Page 12: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/12.jpg)
www.meta-share.org
http://www.cesar-project.net
12
![Page 13: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/13.jpg)
www.cesar-project.net/metashare
http://www.cesar-project.net
13
![Page 14: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/14.jpg)
http://www.cesar-project.net
14
![Page 15: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/15.jpg)
http://www.cesar-project.net
15
![Page 16: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/16.jpg)
http://www.cesar-project.net
16
![Page 17: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/17.jpg)
http://www.cesar-project.net
17
![Page 18: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/18.jpg)
Results – M24
http://www.cesar-project.net
18
![Page 19: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/19.jpg)
CESAR First Batch of Resources
http://www.cesar-project.net
19
Statistics of resources:
![Page 20: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/20.jpg)
CESAR Second Batch of Resources
http://www.cesar-project.net
20
Statistics of resources:
![Page 21: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/21.jpg)
CESAR Third Batch of Resources
http://www.cesar-project.net
21
Statistics of resources available for 3rd batch:
![Page 22: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/22.jpg)
Total resources
http://www.cesar-project.net
22
![Page 23: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/23.jpg)
‘In other words – 1st and 2nd batch’
Quick statistics of already submitted LRs:
monolingual corpus (token) = 1 702 565 806
paralel corpus (token) = 41 810 000
record/entry/lexicon = 1 640 579
divided between 32 corpora 12 lexical resources 20 tools/services
http://www.cesar-project.net
23
![Page 24: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/24.jpg)
Distribution of META-SHARELicence types
http://www.cesar-project.net
24
![Page 25: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/25.jpg)
Hungarian resources in the 1st batch
http://www.meta-net.eu 25
![Page 26: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/26.jpg)
Hungarian resources in the 2nd batch
http://www.meta-net.eu 26
![Page 27: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/27.jpg)
Hungarian resources in the 3rd batch
http://www.meta-net.eu 27
![Page 28: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/28.jpg)
NooJ
A linguistic development environment combining fast and robust finite state technology and computational power with ease of use and
Many CESAR partners had already developed a lot of valuable resources
Objective: produce open-source and multi-platform version
Institut Mihajlo Pupin in close collaboration with Max Silberztein, developer of NooJ
First phase: a version in the MONO system
Currently, open source JAVA version in development
http://www.meta-net.eu 28
![Page 29: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/29.jpg)
NooJ – Mono version
http://www.meta-net.eu 29
![Page 30: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/30.jpg)
NooJ – JAVA version
http://www.meta-net.eu 30
![Page 31: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/31.jpg)
Gaps and Challenges*
http://www.cesar-project.net
31
* Presented at LTC’11, 25-27 November, 2011, Poznan
![Page 32: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/32.jpg)
Where does CESAR stand?
http://www.meta-net.eu 32
![Page 33: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/33.jpg)
Results for language resources
below 1.000 in average; below 2.000 in average; equals 0.000 in cells 33
![Page 34: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/34.jpg)
Results for language resources
http://www.meta-net.eu 34
![Page 35: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/35.jpg)
Results for language tools
35below 1.000 in average; below 2.000 in average; equals 0.000 in cells
![Page 36: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/36.jpg)
Results for language tools
http://www.meta-net.eu 36
![Page 37: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/37.jpg)
Conclusions
META-NET excellent opportunity to promote LT in Europe to mobilize all stakeholders around a Strategic Research Agenda to create invaluable stock of resources and tools
CESAR project actively contributing to these aims
CESAR META-SHARE node
Language Whitepaper series is a unique instrument to gain a horizontal perspective of the state of the art in various languages
Hungarian resources and tools are valuable components
There is major work ahead to bridge the technological gap
37http://www.cesar-project.net
![Page 38: The CESAR Project: Challenges and Achievements](https://reader035.fdocuments.net/reader035/viewer/2022062423/56814887550346895db59b23/html5/thumbnails/38.jpg)
Thank you for your attention.
http://www.cesar-project.net
http://www.meta-net.eu
http://www.facebook.com/META.Alliance 38
http://www.cesar-project.net