EU Vocabularies - Facilitating the linking of legal...

25
EU Vocabularies - Facilitating the linking of legal data John Dann Le Gouvernement du Grand-Duché de Luxembourg Ministère d'État, Service central de législation Anikó Gerencsér Publications Office of the European Union Standardisation Unit, Metadata Sector Law via the Internet Conference, Florence 11-12 October 2018

Transcript of EU Vocabularies - Facilitating the linking of legal...

Page 1: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

EU Vocabularies - Facilitating

the linking of legal data

John Dann Le Gouvernement du Grand-Duché de Luxembourg

Ministère d'État, Service central de législation

Anikó Gerencsér

Publications Office of the European Union Standardisation Unit, Metadata Sector

Law via the Internet Conference, Florence

11-12 October 2018

Page 2: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Publications Office of the European Union

Publishing EU law and publications in the 24 official EU languages

Multilingual legal data

Controlled vocabularies:

• EuroVoc multilingual thesaurus

• 75 Authority tables

– Language codes

– Corporate bodies (names of EU institutions)

– Countries

– Legal proceeding

– Treaty

• EU Vocabularies: merge of EuroVoc and Metadata Registry websites

https://publications.europa.eu/en/web/eu-vocabularies

Page 3: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Multilingual, multidisciplinary

thesaurus

Legal domain

All official EU languages

Releases: 2/year

Hierarchical structure:

21 domains

127 microthesauri

7180 concepts

Page 4: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Controlled vocabularies

Official Journal of Luxembourg

Page 5: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Opportunities

Modernisation of publishing the OJ

• Promote Linked Open Data

Facilitate the accessibility and participation • Access to public information - PSI Directive

Further digitalisation of public services • ... • Provide controlled vocabularies

‒ Re-use of existing Vocabularies

7

Page 6: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

8

Legal Data

Legiwrite

Web-based XML editor

Electronic OJ with legal

value

legilux.lu

User Tools

ELI Open Data

From Draft to

Publication

Project “Casemates”

• Reduce costs • Faster publications • Efficiency

Treaties

Metadata Ontology RDF XML HTML etc

Directives

Vocabulary

EUR-Lex

Page 7: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Visualisation of the vocabulary

9

http://data.legilux.public.lu/vocabulaires/fr/

Re-use from OPUE

Page 8: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Visualisation of the vocabulary

10

Wish to align with eurovoc …

Page 9: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Objective of alignment ?

Toward a thematic cross-national search on legislation • industry can make a search on “pesticide” across

national legislations • Enrich N-Lex with thematic cross-national search

‒ http://eur-lex.europa.eu/n-lex

Allow to search national legislation using Eurovoc – or EU legislation using national vocabulary • Better research and access to European and national

Legislation

Potentially enrich national concepts with Eurovoc translations, notes, synonyms, etc.

11

Page 10: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Objective of alignment ?

Align EuroVoc thesaurus with thematic vocabularies used by Member States to annotate their national legislation

Build semantic interoperability based on thematic classification: • between national legislations and

EU legislation • but also from Member State

legislation to Member State legislation

Concept

EuroVoc “résidu de

pesticide”@fr “pesticide

residue”@en

Concept

Luxembourg

Theme

“pesticide”

@fr

Concept

Member State

Theme “pesticide

residue”@en

alignment 1

alignment 2

alignment 3 inferred

from alignment 1 and

alignment 2

Page 11: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Methodology

2 methods proposed for building alignments

Method 1 : Lexical alignment between concepts

• tools used: SILK for generating links between datasets and VocBench for

uploading and managing the alignments

Method 2 : Analyse transposition of EU directives in

Luxembourg legislation

• Compare Eurovoc concepts used to index EU directive with

Legilux concepts used to index the corresponding transposition

• Combine this transposition analysis with a lexical analysis

Page 12: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

METHOD 1

Alignment between EuroVoc and LegiLux

Two methods to align

14

Page 13: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Method 1: lexical alignment

SILK – linking exercise: Legal subject theme / EuroVoc

Purpose: establish mappings between the labels of two datasets

Find pairs of exact or close matches between the lexical forms

of the concepts in two vocabularies

http://silkframework.org/

Page 14: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Statistics - EuroVoc

LegiLux EuroVoc Exact matches

Treaty subject theme EuroVoc 31%

International actor EuroVoc 23%

Legal subject theme EuroVoc

22%

Treaty type EuroVoc 16%

Page 15: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Management of vocabularies: VocBench

Open source, collaborative tool for managing multilingual

controlled vocabularies

Semantic technologies

Funded by ISA2 programme:

https://ec.europa.eu/isa2/solutions/vocbench3_en

Support of import/export in various formats (OWL, SKOS, RDF,

Excel)

Allows to upload and manage alignments

Model: INRIA's Alignment API (same as OnaGUI)

Validate results

Re-export results

Transform results into OWL/SKOS mapping triples and load into

the project`s dataset

http://vocbench.uniroma2.it/

https://joinup.ec.europa.eu/solution/vocbench3

Page 16: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Method 2 Alignment based on EU Directive Transposition Analysis

Page 17: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Reuse existing classification work

EU texts are classified with EuroVoc

Luxembourg texts transposing EU text are classified with lux. vocabulary

« most » of the time »close » concepts are used on both sides

reuse this proximity of concepts to build an alignment based on usage

19

Page 18: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Approach: use transpositions

Directive 2000/48/CE de la Commission

du 25 juillet 2000 subject:

produit pharmaceutique protection du consommateur produit phytosanitaire résidu de pesticide autorisation de vente

Règlement grand-ducal du 8 avril 2000 subject:

denrée alimentaire et produit usuel pesticide

EuroVoc

résidu de

pesticide

Lux theme

pesticide

transposes

Page 19: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Reuse existing classification work

Strenghs of the method • Only concepts really used to classify the law are aligned

• The alignment follows usages of legal professionals

• The alignement takes in account the difference of granularity between EuroVoc and national vocabulary

Limits of the method

• The alignment is focus on the vocabulary used for legal domains covered by EU. The alignement cannot be used for civil domain or any legal domain not covered by EU legislation

21

Page 20: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Focus alignment on concepts used to classify EU legislation and national transposition

EuroVoc LegiLux thematic

347 concepts are used to index legislations that transpose a directive 876 concepts are used to index legislations that do not transpose a directive Total : 1223 of the total 1593 concepts are used to index legislation in Luxembourg (370 concepts are not used)

1956 concepts out of a total of 7159 Eurovoc concepts are used to index Directives 3882 concepts out of a total of 7159 Eurovoc concepts are used to index directives or regulations.

Transposition analysis

27% 22%

7559 concepts 1593 concepts

Page 21: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Transposition analysis example

We look at the number of times a national concept used to index a transposition « co-occur » with a EuroVoc concept used to index the transposed directive

On this example, we see more meaningful alignments with a good score, and less meaningful with a low score.

Lots of potential «relatedMatch»

National

concept Eurovoc concept Number of co-

occurrences Total nb of

transpositions Score (=Number of co-

occurrences / Total number

of transpositions indexed)

bruit protection contre le bruit 7 7 1

bruit pollution acoustique 5 7 0.714285714285714285714286

bruit bruit 4 7 0.571428571428571428571429

bruit programme d'action 3 7 0.428571428571428571428571

bruit méthode d'évaluation 3 7 0.428571428571428571428571

bruit diffusion de l'information 3 7 0.428571428571428571428571

bruit accès à l'information 2 7 0.285714285714285714285714

bruit niveau sonore 2 7 0.285714285714285714285714

bruit rapprochement des

législations 2 7 0.285714285714285714285714

bruit appareil

électrodomestique 1 7 0.142857142857142857142857

bruit machine électrique 1 7 0.142857142857142857142857

bruit matériel de levage 1 7 0.142857142857142857142857

bruit norme européenne 1 7 0.142857142857142857142857

bruit harmonisation des

normes 1 7 0.142857142857142857142857

bruit matériel de construction 1 7 0.142857142857142857142857

bruit aéroport 1 7 0.142857142857142857142857

bruit norme environnementale 1 7 0.142857142857142857142857

bruit soins de santé 1 7 0.142857142857142857142857

Page 22: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Transposition analysis : tool

Use of OnaGUI

1) to enrich the statistical alignment with a linguistic alignments 2) do a human refinement and validation of calculated alignments Validate alignements expressed in INRIA EDOAL format (same as VocBench)

Ontology Alignment Graphical User Interface https://github.com/lmazuel/onagui

Page 23: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Validation in OnaGUI

Exact match • Pesticide / pesticide

Specific match • Polluant / polluant

atmosphérique

Generic match • Polluant / substance

dangeureuse

Related match • Polluant / contrôle de la

pollution

« no match » • Personnel / …

Page 24: EU Vocabularies - Facilitating the linking of legal datalvi2018.ittig.cnr.it/slide/III.A.Francesconi_AM/01DANN-GERENCSER.EU... · •The alignment follows usages of legal professionals

Conclusion

An alignement based on transpositions, consolidated with lexical

proximity

189 exact matches, 586 other links (related, narrow, broad or close

relationshops)

76% of concepts considered are aligned, 54% with exact match

Allows to produce a rich alignment, not only exact matchs, but a lot of

related / broad / narrow matches

Alignment made possible with :

Vocabularies available as structured data (SKOS)

SPARQL access to legislation indexation

Collaboration

Potential future steps:

align also with other controlled vocabularies

Test alignements on document retrieval use-case