Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent...
Transcript of Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent...
![Page 1: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/1.jpg)
Setting up a CLARIN centre
Dieter Van Uytvanck
CLARIN workshop for newcomers
22 January 2020
Utrecht
![Page 2: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/2.jpg)
Overview
• The CLARIN technical infrastructure from a bird’s eye• Infrastructure pillars:
- Repositories- Metadata & VLO- Tools & LR Switchboard- Federated Login- Federated Content Search
• CLARIN centres:- Types of centres- B-centre assessment
• Requirements• Core Trust Seal• Procedure
2
![Page 3: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/3.jpg)
The CLARIN technical infrastructure from a bird’s
eye view
3
![Page 4: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/4.jpg)
Architecture: Repositories
4
Repository at a CLARIN centre
Language Data Metadata Language
Tools
describes
single text or recording
!corpus
!lexicon
!wordnet
!grammar
!…
web application !
web service !
web service pipeline
!stand-alone application
!…
![Page 5: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/5.jpg)
Architecture: Harvesting
5
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
Harvested Metadata
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
copy
![Page 6: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/6.jpg)
Architecture: Processing
6
![Page 7: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/7.jpg)
Architecture: Federated Content Search
7
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(Federated) Content Search
(1) enter query
(4) show aggregated results
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(2) perform local search
(3) retrieve results
![Page 8: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/8.jpg)
Summing up
• Basic principles: - compatibility on protocol and format level- acknowledging the unique situation of each centre (history,
organization, technology, etc.)- focus on the strengths that a centre can contribute
• No obligation to use specific software stacks- Of course it can save resources to reuse existing solutions
• Striking the right balance between do-it-yourself and reuse is one of the most important steps in the process of becoming a CLARIN centre
8
![Page 9: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/9.jpg)
Infrastructure pillars
9
![Page 10: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/10.jpg)
Persistent Identifiers (PIDs)
10
![Page 11: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/11.jpg)
Persistent Identifiers (PIDs)
11
![Page 12: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/12.jpg)
Persistent Identifiers (PIDs)
12
![Page 13: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/13.jpg)
Component Metadata (CMDI)
13
![Page 14: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/14.jpg)
14
![Page 15: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/15.jpg)
15
Interview Profile
ActorComponent
• First name: text • Last name: text • Birth Date: date • Role: interviewer |
interviewee
Sound recording Component
• Format: wave | mp3 • Length: number
General information Component
• Title: text • Creation Date: date
Concept Registry
definition of: • Title • Creation Date • Format • Length • First name • Last name • Birth date • Role • wave • mp3 • interviewer • interviewee • …
![Page 16: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/16.jpg)
CMDI: Basic ideas behind it
• Allowing for the flexibility needed: many different subcommunities have their own wishes to provide detailed metadata descriptions
• Stimulating re-use: most providers should be able to re-use an existing profile
• Provide a standard way to - Refer to
• digital objects (in the fixed header)• landing pages• search pages• search services
- Express hierarchies in metadata files
16
![Page 17: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/17.jpg)
CMDI: What we currently have in place
• 179 profiles, 1252 components at https://clarin.eu/componentregistry
• Over 1900 concepts registered at https://clarin.eu/ccr• Over 22 CLARIN centres providing native CMDI metadata• Important conversion workflows in place:
- Europeana Data Model (121,000 records)- OLAC & Dublin Core- MODS- TEI headers- under investigation & in preparation:
• DDI (social sciences)• DataCite metadata
• Catalogue of harvested metadata: https://vlo.clarin.eu
17
![Page 18: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/18.jpg)
CMDI: Must reads & myths
• CMDI best practice guide: https://www.clarin.eu/content/cmdi-best-practices-guide
• Metadata in CLARIN – the FAQ: https://www.clarin.eu/faq-page/267
• Myth: CMDI must be used as backend format in my repository.- Incorrect! The only requirement is to deliver it via OAI-PMH.
• Myth: CMDI records must be created manually with an editor- Incorrect!
• A simple and automatic conversion from your existing metadata format or database can be sufficient.
• For large amounts of metadata, manual editors are inconvenient.
18
![Page 19: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/19.jpg)
Repositories
• https://www.clarin.eu/content/repositories• Off-the-shelf:
- LINDAT-Dspace- Dataverse (for a C-centre, not fully B-centre ready)
• But it is open source…
• Half-products- META-share repository + CLARIN-specific adjustments- Islandora (Fedora Commons + Drupal)
• As used at HZSK and FLAT (TLA, Meertens-HuC)
• From scratch- Based on Fedora Commons- Based on your existing home-built repository
19
![Page 20: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/20.jpg)
Repositories: learn more
• https://www.clarin.eu/event/2017/clarin-plus-workshop-facilitating-creation-national-consortia-repositories
• https://www.clarin.eu/event/2016/clarin-workshop-dspace-digital-repository
• Lindat-DSpace tutorial: - https://www.youtube.com/playlist?list=PLlKmS5dTMgw3lJ4Tff
nBJOhprLd_-20jd
20
![Page 21: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/21.jpg)
Tools & LR Switchboard
21
Language Data
Language Tool
Language Data
Language Tool
Language Data
Language Tool
Language Resource Switchboard
Language Data
Language Tool
![Page 22: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/22.jpg)
Tools & LR Switchboard
22
![Page 23: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/23.jpg)
Tools & LR Switchboard
23
Language Tool
Language Resource Switchboard
Language Data
CLARIN
gateway applications:
Virtual Language Observatory
(Virtual Collection Registry, …)
EUDAT
B2DROP
[cloud storage]
Parthenos
D4Science
[Virtual Research Environment]
…
![Page 24: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/24.jpg)
Tools & LR Switchboard: learn more
• Connecting a web application:- https://switchboard.clarin.eu/help
• Use cases- https://office.clarin.eu/v/CE-2018-1196-
language_resource_switchboard_use_cases.pdf
• Demonstration case:- https://www.clarin.eu/showcase/eosc-portal-demonstration
24
![Page 25: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/25.jpg)
Federated Login
25
![Page 26: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/26.jpg)
Federated Login
• Reading material:- Background (good starting point)
• History and functioning of the Service Provider Federation: https://office.clarin.eu/v/CE-2017-1014-CLARINPLUS-D2_7.pdf
- Overview (e.g. to get the paperwork running):• https://www.clarin.eu/spf
- Technical instructions on creating and integration of a Service Provider:• https://www.clarin.eu/content/creating-and-testing-shibboleth-
sp
26
![Page 27: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/27.jpg)
Federated Content Search
27
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(Federated) Content Search
(1) enter query
(4) show aggregated results
Language Data Metadata Language
ToolsLanguage
Data Metadata Language Tools
(2) perform local search
(3) retrieve results
![Page 28: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/28.jpg)
Federated Content Search
28
![Page 29: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/29.jpg)
Federated Content Search
• Specifications and background information:- https://www.clarin.eu/content/federated-content-search-
clarin-fcs- https://trac.clarin.eu/wiki/FCS
• Endpoint libraries on the way:- Korp- Kontext- NoSketchEngine
• Possible extensions:- Some first notes on FCS for treebank searches:
• https://www.clarin.eu/blog/blog-post-jan-niestadt-mini-workshop-korp-strix-and-blacklab-gothenburg
- Some first ideas about extending FCS to lexical searches
29
![Page 30: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/30.jpg)
CLARIN centres
30
![Page 31: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/31.jpg)
Centres: an introduction
• B-centres (Service Providing Centres aka Certified Centres)
• C-centres (Metadata Providing Centres, their metadata are integrated with CLARIN but they need not to offer any further services)
• K-centres (Knowledge Centres, part of the CLARIN Knowledge Sharing Infrastructure)
• E-centres (External Centres offering central services without being part of any national consortium)
31
![Page 32: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/32.jpg)
B-centres
• Assessment procedure description:- https://www.clarin.eu/node/3767
32
![Page 33: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/33.jpg)
Centre registry
• Accessing information- https://centres.clarin.eu/
• Adding information:- https://www.clarin.eu/content/clarin-centres > Register a new
centre- Adding an entry requires a CLARIN account
• https://user.clarin.eu
33
![Page 34: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/34.jpg)
Checklist
34
![Page 35: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/35.jpg)
Misc links, developer resources
• clarin.eu/dev- Trac, especially Infrastructure Overview
• Slack: request access via [email protected]• Mailing lists: all-centers• Newsflash• Matomo• https://www.clarin.eu/applications
35
![Page 36: Setting up a CLARIN centre · 22-01-2020 · Persistent Identifiers (PIDs) 11. Persistent Identifiers (PIDs) 12. Component Metadata (CMDI) 13. 14. 15 Interview Profile Actor Component](https://reader033.fdocuments.net/reader033/viewer/2022042920/5f64e8aab56b89039f257dcd/html5/thumbnails/36.jpg)
Thank you for your attention!
More information:• www.clarin.eu
Feel free to contact • our support addresses at
https://www.clarin.eu/content/support• me via [email protected]
36