IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9...

17
IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities

Transcript of IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9...

Page 1: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

IWIR-CRIS '06

Data retrieval in PURE

Data retrieval in the 4-year old PURE CRIS project at 9 universities

Page 2: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

2

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Agenda

■ Overview■ Retrieval

Validated manual data gathering Dynamic integration to local back-end systems Aggregation, enrichment and import of historic data Experiments with automated imports of historic data

■ Exposure Two web services OAI Z39.50 Reports Portal framework

■ Archiving■ Near future

Page 3: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

3

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Overview

■ Brief overview■ … in order to discuss ingestion, integration,

conversion and import in a specific context

Page 4: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

4

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Overview

■ Brief overview ■ History

Development begun in 2002■ Users

9 universities (DK+SE), several hospitals + other research institutions

■ Platform and architecture J2EE enterprise application Release management: All users have instances of same release

version, same code-base■ Business model

Commercial software licenses, powerful user group, shared budgets

■ Modular Basic module, Reporting module, Student thesis module, External

publications module, Bibliometrics module, Press module.

Page 5: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

5

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Overview

Page 6: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

6

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Retrieval

■ Manual data gathering ■ User roles/right + workflow:

= de-centralized data gathering = validated data gathering = continuous data gathering

■ GUI example■ Management focus is necessary

Reports and statistics, KPI-management, etc. ■ Adding value to researchers is necessary

Instantly in Google indexes, instantly updated personal websites, instantly updated CV, increased citations (source in paper), etc.

Page 7: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

7

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Retrieval

■ Dynamic integration■ Dynamic integration to local back-end systems:

Personnel systems, payroll systems (for data retrieval) LDAPs, Active Directories (for data retrieval + authentication) Single sign-on systems (for authentication) … to automatically create object types such as “person” or

“organization”

■ … and yes, PURE hosts data, too We need complete objects according to the meta-data model

■ Plug-in architecture in PURE: Pro = individually adapted integration Con = individually programmed plug-in necessary Future = GUI, standardized plug-ins

Page 8: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

8

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Retrieval

■ Import■ Historic data■ Many sources

More or less useful data More or less consequent use of formats :-)

■ The PXA format PURE XML Archive format - .zip based Meta-data, relations between entities, binary files

■ Aggregation > enrichment > conversion > import

The process is external to PURE

Page 9: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

9

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Retrieval

■ Experiments■ Experiments with automated imports of historic

data from specific, identified sources ■ [source format] > PXA conversion > import >

enrichment/validation■ Very poor data quality demands the concept of

“draft objects” in PURE

Page 10: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

10

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ Web services■ RPC/encoded + document/literal■ Rich libraries of methods

■ Including format-specific methods: APA, MLA, HARVARD, VANCOUVER and CBE

■ Free and near-instant adding of methods

■ WS code example (if time)

Page 11: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

11

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ OAI support■ OAI-PMH data provider■ OAI-PMH formats

■ DC■ DDF-MXD (Danish national format)■ SVEP (Swedish national format)

… more to come

■ Also used to harvest other PURE-repositories for “external publications”

Page 12: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

12

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ Z39.50■ Enabling of searches in PURE from library

systems ■ SRW/SRU

Page 13: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

13

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ Reports ■ PURE reporting module

■ GUI example

Page 14: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

14

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ Reference manager■ Export of data to local Reference Manager

installation■ Using RM-formatted export file ■ Promotes registering to the repository

rather than in RM■ GUI example

Page 15: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

15

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Exposure

■ Portal framework■ PUREportal – free PURE-specific framework for

custom development of research exhibition portals

■ Online example

■ Typical cost scenario € 20,000■ Typical delivery time 1 month ■ Little need for requirements specification ■ Automatic PURE-API maintenance

Page 16: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

16

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Archiving

■ Data archiving – 2 levels ■ SQL environment

■ Meta-data and relations■ Binary files just stored in server file system

■ FEDORA via connector (not PURE-specific, Open Source)

■ Facilitates: Higher quality archival of binary files Long term preservation in general Adoption of PURE in institutions’ general FEDORA strategies

Page 17: IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.

17

atiraNiels Jernes Vej

10DK-9220 Aalborg

+45 9635 6100www.atira.dk

Near future

■ The near future regarding data retrieval ■ More automated imports using increasingly advanced

converters■ Automated data delivery (push and harvest) to:

Industry specific search services (e.g. PubMed, Nordicom) Documentary data collections (such as clinicaltrials.org), and

national collections (such as DDF (DK), ForskDok (NO), etc. ■ Temporary import objects

When imported data are not in sufficient quality to create valid objects

when data cannot be properly related to other objects upon import