Digital Author Identification UKSG 17 – 18 april 2007 Daniel van Spanje.

35
Digital Author Identification UKSG 17 – 18 april 2007 Daniel van Spanje

Transcript of Digital Author Identification UKSG 17 – 18 april 2007 Daniel van Spanje.

Digital Author IdentificationUKSG 17 – 18 april 2007

Daniel van Spanje

2

DAI in DARE

• DARE: Digital Academic REpositories

– Universities + KNAW + NWO + KB– Infrastructure for linking the IR– Stimulate production of digital scientific

output– 2003 – 2006

• 2007 – 2010: SURFshare

3

Main issues in DAI

• Unique identifying number for researchers / authors

• National scale

• Benefits:– Improve searching for electronic publications– Integrate searching for electronic and non-

electronic publications– Link Library (Catalogue) and research

environment (Metis)

4

Two projects

• Pilot in 2005 – 2006– one university: Groningen

• Roll-out 2006 - 2007– 13 academic research organizations

• Project leader: Anneloes Degenaar

• DAI website at University of Groningen:– http://dai.weblog.ub.rug.nl/– http://dai-uitrol.ub.rug.nl/

5

Organizations involved in DAI• 13 universities + CWI + KNAW

• SURF

• UCI

• OCLC PICA

6

Systems involved

• Institutional repository / DAREnet

• Metis

• Dutch Union Catalogue (NCC/PiCarta)

7

Institutional Repositories /

DAREnet

8

Institutional Repositories /

DAREnet

9

METIS

10

METIS

11

METIS

12

National Union Catalogue

13

Shared Cataloguing

System (GGC)

14

Shared Cataloguing

System (GGC)

15

Names and other issues• Authors with the same name• Use of one or more initials• Changing names • Spelling variants• Diacritics• Pseudonymes • Name in religion• Nicknames• Collective names• Different structure of names in other languages and

cultures• …..

• Discussions on standardization and unification started in the Netherlands in the Orion project (2003-2004)

16

Proposed solution

• Need established

• “External”Requirements:– use existing mechanisms– local management – national function

• Solution: use “collocation” mechanism of libraries and Metis as source

17

Cataloguing and Metis

Cataloguing Metis

GGC RepositoryNTA

18

Use authority records (NTA) in Metis

Metis

GGC Repository

NTA

Cataloguing

CWI

19

How did we link

• Mechanisms– Initial load per organization

– Online input buttons (webtemplates)

– XML output

– Synchronization mechanisms

• Requirements– No overwrite of library data!

– Deduplication (Matching/merging)

20

Datamodel developed

• Datamodel copied from bibliographic model: three levels

• Metis name-information added to library data; no overwrite

• Affiliations and other fields added

21

Structure of bibliographic data

Bibliographic metadata YoP / LoP / / Title / Author

Imprint / LCSH / DDC

Groningen bibdat:Subject headings

Amsterdam bibdatSubject headings

Copy level: •Location•holding•shelfnumber

Copy level: •Location•holding•shelfnumber

Copy level: •Location•holding•shelfnumber

Copy level: •Location•holding•shelfnumber

Linked Authorityrecord

genera

llo

cal

copy

22

Structure of authority data

Thesaurusrecord Name of authorVariant names

Groningen data(Metis name)

Amsterdam data(Metis name)

Affiliation•Begin•End

Affiliation •Begin•End

Affiliation•Begin•End

Affiliation •Begin•End

Linked Authorityrecord

Libra

ry

reco

rdM

etis

Affi

liatio

n

23

Example authority record + added

fields

Library data

Metis Researcher Name

Affiliation data

24

Example authority record + added

fields

25

Datamodel: fieldsAuthority file• Nationality• Language• Name (best known)• Name (most complete)• Maiden name• Name variants• Date of birth • Date of death• Profession / subject• Link to pseudonyms• notes• Entry date• Update date

• Note: proper name field includes subfields for first name, middle name, last name, prefix, suffix

Added fields

• Local researcher number• Metis name (preferred)• Metis name• Sex

• Code organisation• Name organisation• Start date employment• Enddate employment• Code function• Description of function• Code of employment• Notes• Entry date• Update date

26

Initial loadMetis makes list of names

Manual dedup of list

Dedup in Metis

Make Metis export

Format conversion

Load B-records (? Duplicates?)

Export DAI’s to Metis

Manual dedup by library staff

Load DAI in Metis

Load new names(not found)

Merge names with names found

Match names with auth file

27

Initial load

• Data enrichment in Metis• Export from Metis• Conversion to cataloguing system• Matching• Merging: merge / new / B-record

• Results depend on quality metadata– 95 % automatic / 5% manual– 70% automatic/ 30 % manual.– 50 % automatic / 50 % manual

28

Online process• DAI-button in Metis to create DAI-number

• Export DAI-button in NTA/Cataloguingsystem to Metis

• DAI-button in IR to create DAI-number

• Separate DAI-http-request for online input

• Online input via current cataloguing tool

• + Offline synchronization mechanisms between Metis and NTA

29

DAI-button in Metis

30

URL link instead of button

• http://www.pica.nl/dai/dai_redirect.php?action=maak_dai&user=<usernumber>&metis_export_url=http://oras.service.rug.nl:1111/metisdad&p_onderzoekernummer=00033&p_naam_medewerker=Rotteveel&p_voorletter=R&p_voorvoegsel=&p_titulatuur=&p_voorkeur=J&p_geslacht=M&p_geboortedatum=01-07-1974&p_code_functie=20&p_functie=Universitair%20hoofddocent&p_code_organisatie=22020200&p_organisatie_a=Medical%20Microbiology&p_begin_aanstelling=01-01-2005&p_einde_aanstelling=01-01-2006

31

Input form for Metis fields

32

Results of the DAI project

• Now:– 50% of the researchers have a DAI– Procedure for initial load in place– Start with online procedure – P rivacy statement

• Autumn 2007– Online procedure in place– Procedure for synchronization in place– 100% of the researchers will have a DAI in 2007 (ca.

40.000)

33

Things to do

• Finalize the roll-out, develop services (passport …) and implement a usergroup

• Add DAI in metadatastandards (DCX, MODS)

• International standardisation: ISPI

• Involve authors for controll and updating

34

Concluding remark

35

• Thanks