European Language Resources Association (ELRA)

ELRA & ELDAELRA & ELDATC-STAR General Meeting Lux. 2007-05 KC 1

HLT EvaluationsKhalid CHOUKRI

ELRA/ELDA55 Rue Brillat-Savarin, F-75013 Paris, FranceTel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33

Email: choukri@elda.org

http://www.elda.org/ or http://www.elra.info/

Presentation Outline• European language Resources Association

• Evaluation to drive research progress

• Human Language Technologies Evaluation(s)» What, why, for whom, how ….

• (Some figures from TC-STAR)

• Examples of Evaluation campaigns

• Demo …(available afterwards)

European Language Resource Association An Improved infrastructure for Data sharing & HLT evaluation

(1) An Association of users of Language Resources

(2) Infrastructure for the evaluation of Human Language Technologies

providing resources, tools, methodologies, logistics,

The Association

• Membership Drive:ELRA is Open to European & Non-European InstitutionsResources are available to Members & Non-Members

Pay per Resource

Substantial discounts on LR prices (over 70%), Substantial discounts on LREC registration fees Legal and contractual assistance with respect to LR mattersAccess to Validation and production manuals (Quality assessment)Figures and facts about the Market (results of ELRA surveys)Newsletter and other publications……………. New: Fidelity program … earn miles and get more benefits

• Some of the benefits of becoming a member:

ELRA: An efficient infrastructure to serve the HLT CommunityStrategies for the next Decade … New ELRA status:

2005- Extension of ELRA’s official mission to promote

LRs and evaluation for the Human Language

Technology (HLT):

The mission of the Association is to promote language resources

(henceforth LRs) and evaluation for the Human Language

Technology (HLT) sector in all their forms and all their uses;

Courtesy G.Thurmair Malta workshop.

Long term / high riskLarge return of investment Evolutionary

UsabilityAcceptability

BasicResearch

TechnologyDevelopment

ApplicationDevelopment

BottleneckIdentification

Research resultsin quantitative

evaluation

Technologiesnecessitated

for applications

Technologieswhich have been

validatedfor applications.

Meeting points with technology development

QuantitativeEvaluation

UsageEvaluation

What to evaluate … Levels of Evaluation

-Basic Research Evaluation (validate research direction)

-Technology Evaluation (assessment of solution for well defined problem)

-Usage Evaluation (end-users in the field)

-Impact Evaluation (socio-economic consequences)

-Programme Evaluation (funding agencies)

Our concern

Why Evaluate?

• Validate research hypotheses

• Assess progress

• Choose between research alternatives

• Identify promising technologies (market)

• Benchmarking … state of the art

• Share knowledge … dedicated workshops

• Feedback … Funding agencies

• Share Costs ???

Progress & Evaluation (Courtesy Charles Wayn)

Technology performance & Applications

• Bad technology may be used to design useful applications

• What about good technology ? ….

» Software industry

HLT Evaluations …. For whom

• MT developers want to improve

the “quality” of MT output

• MT users (humans or software

e.g. CLIR ) want to improve

productivity using the most

suitable MT system (e.g.

multilinguality)

• ….

-Basic Research Evaluation (validate research direction)

-Technology Evaluation (assessment of solution for well defined problem)

-Usage Evaluation (end-users in the field)

-Impact Evaluation (socio-economic consequences)

-Programme Evaluation (funding agencies)

For whom … essential for technology development

Share of Information and knowledge between participants: (how to get the best results, access to data, scoring tools)

Information obtained by industrialists: state of the art, technology choice, market strategy, new products.

Information obtained by funding agencies: technology performance, progress/investment, priorities

Some types of evaluations

Comparative evaluation

the same or similar control tasks and related data with metrics that are agreed upon

Competitive vs Cooperative

Black box evaluation … Glass box

Objective evaluation … Subjective (Human-based)

Corpus based (test suites)

Quantitative measures … Qualitative

Comparative Evaluation of Technology

Used successfully in the USA by DARPA and NIST (since

Similar efforts in Europe on a smaller scale, mainly

projects (EU funded or national programs)

Select a common "task"

Attract enough Participants

Organize the campaign (protocol/metrics/data)

Follow-up workshop, interpret results and share info

Requirements for an evaluation campaign

• Referencial Language Resources (Data) (truth)• Metric(s): Automatic, Human judgments … scoring software• scale/range of performance to compare with (Baseline)

• Logistics’ Management

• reliability assessment: independent body

• Participants: technology providers

HLT Evaluation Portal… Pointers to projects

• Overview • HLT Evaluations • Activities by technology • Activities by geographical region • Players • Evaluation resources • Evaluation Services

http://www.hlt-evaluation.org/

Let us list some well known campaigns

Speech & Audio/soundASR: TC-STAR, CHIL, ESTER

TTS: TC-STAR, EVASY

Speaker identification (CHIL)

Speech 2 Speech Translation

Speech Understanding (Media)

Acoustic Person tracking

Speech activity detection, …..

………

Examples of Evaluation Campaigns – Capitalization

Multimodal --- Video – Vision technologies

Face Detection

Visual Person Tracking

Visual Speaker Identification

Head Pose Estimation

Hand Tracking

Some of the technologies being evaluated within CHIL …http://chil.server.de/

A) Vision technologies – A.1) Face Detection

– A.2) Visual Person Tracking

– A.3) Visual Speaker Identification

– A.4) Head Pose Estimation

– A.5) Hand Tracking

B) Sound and Speech technologies – B.1) Close-Talking Automatic Speech Recognition

– B.2) Far-Field Automatic Speech Recognition

– B.3) Acoustic Person Tracking

– B.4) Acoustic Speaker Identification

– B.5) Speech Activity Detection

– B.6) Acoustic Scene Analysis

C) Contents Processing technologies – C.1) Automatic Summarisation … Question Answering

more at the CHIL/CLEAR workshops

Written NLP & Content

IR, CLIR , QA, (Amaryllis, EQUER, CLEF)

Text analysers (Grace, EASY)

MT (CESTA, TC-STAR)

Corpus alignement & processing (Arcade, Arcade-2,

Romanseval/Senseval, …)

Term & Terminology extraction

Summarisation

Evaluation Projects …. The French sceneSome projects in NL, Italy, ...

Technolangue/Evalda: the Evalda platform consists of 8 evaluation

campaigns with a focus on the spoken and written language

technologies for the French language:

– ARCADE II: evaluation of bilingual corpora alignment systems.

– CESART: evaluation of terminology extraction systems.

– CESTA: evaluation of machine translation systems (Ar, Eng => Fr).

– EASY: evaluation of parsers.

– ESTER: evaluation of broadcast news automatic transcribing systems.

– EQUER: evaluation of question answering systems.

– EVASY: evaluation of speech synthesis systems.

– MEDIA: evaluation of in and out-of context dialog systems.

Some details from relevant projects

• CLEF

• TC-STAR

CLEF (Cross-Language Evaluation Forum)

Promoting research and development in Cross-Language Information Retrieval (CLIR)

(i) providing an infrastructure for the testing and evaluation of information retrieval systems - European languages - monolingual and cross-language contexts

(ii) creating test packages of reusable data which can be employed by system developers for benchmarking purposes.

Example of Evaluation Initiatives

24,5 28

58,5 61

Bulgarian

English

Spanish

Finnish

French

Italian

Portuguese

Best2004

Combination2004

Best2005

Combination2005

QA-CLEF: State of the art & Improvement

Back to Evaluation Tasks within TC-STAR (http://www.tc-star.org/)

2 categories of transcribing and translating tasks

– European Parliament Plenary Sessions: (EPPS): English (En) and Spanish (Es),

– Broadcast News (Voice of America VoA): Mandarin Chinese (Zh) and English (En)

TC-STAR: Speech to speech translation

•Packages with Speech recognition, speech translation, and speech synthesis

•Development and Test data, metrics & results.

TC-STAR evaluations……. 3 Consecutives annual evaluations

1.SLT in the following directions

i. Chinese-to-English (Broadcast News)

ii. Spanish-to-English (European Parliament plenary speeches)

iii. English-to-Spanish (European Parliament plenary speeches)

2.ASR in the following languages

i. English (European Parliament plenary speeches)

ii. Spanish (European Parliament plenary speeches)

iii. Mandarin Chinese (Broadcast News)

3.TTS in Chinese, English, and Spanish under the following conditions:

i. Complete system

ii. Voice conversion intralingual and crosslingual, expressive speech:

iii. Component evaluation

Improvement of SLT Performances (EnEs)

BLEU - 2006 system BLEU - 2007 system

IBM (FTE)

IRST (FTE)

RWTH (FTE)

UPC (FTE)

IBM (Verb)

IRST (Verb)

RWTH (Verb)

UPC (Verb)

IBM (ASR)

IRST (ASR)

UPC (ASR)

Input = Text ,

Verbatim

Speech recognition

Improvement of SLT Performances (EsEn)

BLEU - 2005 system BLEU - 2006 system BLEU - 2007 system

IRST (FTE)

RWTH (FTE)

UPC (FTE)

IRST (Verb)

LIMSI (Verb)

RWTH (Verb)

UPC (Verb)

IRST (ASR)

UPC (ASR)

Input = Text ,

Verbatim

Speech recognition

Improvement of ASR Performances (EnEs)

2004 2005 2006 2007

English (EPPS)

Spanish (EPPS)

Spanish (CORTES)

Chinese (BN)

Human Evaluation Translations … EnEs adequacy (1-5)

FTE Verbatim ASR

Combinations Commercial

End-to-End

• The end-to-end evaluation is carried out for 1 translation direction: English-to-Spanish

• Evaluation of ASR (Rover) + SLT (Rover) +TTS (UPC) system

• Same segments as for SLT human evaluation• Evaluation tasks:

– Adequacy: comprehension test

– Fluency: judgement test with several questions related to fluency and also usability of the system

Fluency questionnaire

• [Understanding] Do you think that you have understood the message?1: Not at all , ...........5: Yes, absolutely

• [Fluent Speech] Is the speech in good Spanish?1: No, it is very bad ...... 5: Yes, it is perfect

• [Effort] Rate the listening effort1: Very high ............ 5: Low, as natural speech

• [Overall Quality] Rate the overall quality of this audio sample1: Very badm unusable ...... 5: It is very useful

End to End results(subjective test: 1…5 )

ITP TCSTAR

Understanding Fluent Speech Effort Overall Quality

TC-STAR Tasks

• More results from the 2007 Campaign

http://www.tc-star.org/

Evaluation packages available

Some concluding remarks on Technology evaluation

• It saves developers time and money

• It help assess progress accurately

• It produces reusable evaluation packages

• It helps to identify areas where more R&D is

needed

European Language Resources Association (ELRA)

Documents

Transcript of European Language Resources Association (ELRA)

European Competitive Telecommunications Association

ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue.

European Hematology Association

European Financial Management Association · 2016. 11. 7. · European Financial Management Association

Directive of the European Parliament and of the Council on Consumer Rights Liz Pope (liz.pope@prai.ie) Property Registration Authority (Ireland) ELRA Conference.

EUROPEAN CYTOGENETICISTS ASSOCIATION (E.C.A.) · EUROPEAN CYTOGENETICISTS ASSOCIATION (E.C.A.) European Advanced Postgraduate Course in Classical and Molecular Cytogenetics Director:

European Composites Industry Association

European Economic Association - ENS

Consultation on the future of European Insolvency Law Liz Pope Property Registration Authority (Ireland) ELRA General Assembly Brussels 1 st June 2012.

AccentDB: A Database of Non-Native English Accents to Assist … · 2020-05-22 · c European Language Resources Association (ELRA), licensed under CC-BY-NC 5351 AccentDB: A Database

Introducing ENA the European NORM Association European ...

D’EDUCACIÓ SUPERIOR (ESG) - European Association for ... in Catalan_by AQU Catalunya.pdf · Union, European University Association, European Association of Institutions in Higher

EUROPEAN ASSOCIATION OF VETERINARY DIAGNOSTIC IMAGING EUROPEAN

ELRA/ELDA EU Enlargement and Integration Workshop Arona, September 2005 Victoria Arranz 1 European Language Resources Association ELRA/ELDA: The Importance.

1 EEA/ELRA Load Curtailment Jarreau Cycle 4 2010.

The European Bitumen Association Eurobitume perspectives ... · The European Bitumen Association Eurobitume perspectives on bitumen manufacture and ... The European Bitumen Association

EUROPEAN RENAL ASSOCIATION - EUROPEAN DIALYSIS … · european renal association - european dialysis and transplant association page: 1 report and financial statements year ended

EUROPEAN SPORT QIGONG ASSOCIATION€¦ · EUROPEAN SPORT QIGONG ASSOCIATION ... - Competition Standars ... European branchs have free access to any category.

STARCH EUROPE - projectblue.blob.core.windows.net · STARCH EUROPE European Snacks Association European Potato Processors/ Association Europatat copa* -cogeca european farmers european

European Regional Science Association