Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB...

27
Die ZBW ist Mitglied der Leibniz- Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information Centre for Economics (ZBW)

Transcript of Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB...

Page 1: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

Statistical Research Data on the Semantic Web

SWIB 2012Cologne, Germany

Daniel BahlsLeibniz Information Centre for Economics (ZBW)

Page 2: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Outline

1. Introduction

2. Research data in economics and scientific practices

3. Thoughts on data representation

4. Repeatability of research results

5. Outlook

6. Data access and retrieval

7. Proxies and empirical models

Seite 2

Page 3: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

MaWiFo Project

Management of Economic Research Data

Seite 3

Page 4: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Seite 4

„What researchers want“

Source: Feijen (2011)

• Tools and services must be in tune with researchers’ workflows, which are often discipline-specific

• They must be easy to use

• “Cafeteria model”: researchers can pick and choosefrom a set of tools and services

• Benefits must be clearly visible – not in three years’time, but now

Page 5: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Research Dataas Bibliographic Artefacts

• Re-use

Data Sharing gives more opportunities for research

• Citation

Data acquisition and assignement of Persistent Identifiers

• Transparency

Reproducibility:

Fundamental criteria for good scientific practice

Seite 5

Page 6: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Research data in economics and scientific practices

Target Group: Researchers in Economics

Community Building for Knowledge Exchange:

Economists – Data Librarians – Computer Scientists

Interviews on

Data Management Sharing

Sources Publishing

Processing

Seite 6

Page 7: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

How does Research Data look like in Economics?

Seite 7

Page 8: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Interviews with Researchers in Economics

Seite 8

Sources

Data Agencies

Statistical Offices

Trusted Institutesand Researchers

Data Management

Own Surveys & Studies

Local File System

Backup Server

DVD, External HD, ...

Processing

Sharing

PublishingSPSS

Stata Matlab

...

ProgrammingLanguages

High PerformanceComputing

Execution Times:seconds, minutes, hours

Within Teams

Trusted Colleagues

On Request (?)

practiced sometimesZip Files

not includedin review process

8

Page 9: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Particular Findings

Research is driven by the availability of data

(to some extent)

Some research is based on external data,

Some research is based on self-conducted studies

Combining and Merging of data sets

Seite 9

in average, 66% ofthe data comes from

external sources(estimated)

Page 10: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Particular Findings

Data Usage Rights – e.g. Thomson-Reuters Datastream

Data Protection

on-site access, virtual access

sample data to understand structure

analysis scripts

aggregation

protection maintained?

Seite 10

Copy to third party?

Page 11: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

?

Thoughts on Data Representation

data review curationtransparency re-userepeatability

Seite 11

Often, the legal situation does not allow for publishing the entire data set as was used

Page 12: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Interim Conclusion

A model based on copying is insufficient

We suggest fine-grained referencing

single data items must be referenceable (merging, curation)

highly distributable (distributed data sources)

extensible (heterogeneous long tail data, curation)

LOD-based approach

Seite 12

Page 13: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

DataSet

type

UserDataSet

Data Items

type

Data Itemsfrom own survey

includesData

external dataset

13

Page 14: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

SourceData Cube vocabularyStatsWales: Life Expectancy, Dataset 003311

used for our example

RDF-Representation for Statistical Data

14

Page 15: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

DataSet Dimension

label

dataPropertyItem DimValue

example:

time

X

2005-7

83.7rdf:

value

A

labelregion CardiffB

labelgenderFemaleC

15

Page 16: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Using the semantic model, referencing of data at a very detailed level is possible - without need for the data itself to be public

labeltime

X

2005-7

83.7rdf:value

A

labelregionCardiffB

labelgenderFemaleC

you can omit single information itemssuch as the value itself,

yet the data is still referenceable

protected

RDF-Representation for Statistical Data

Challenge:Stable URIs required

for every single data item

16

Page 17: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

SCOVO

17

Page 18: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

RDF Data Cube Vocabulary (QB)

18source:http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html

Page 19: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Repeatability of research results

Seite 19

aggregationand data cleaning

?

missing values

seasonal adjustment

purchasing power adjustment

plausibility tests

basket analyses

...

McCullough, B. D. Got Replicability? The _Journal of Money, Credit and Banking_ Archive Econ Journal Watch, 2007, 4, 326-337

Interesting read

Page 20: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Repeatability of research results

Seite 20

scripts (“do-files”)

working copies of data

change parameters, so that

effect can be shown clearly

no overall build process

Page 21: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

A build script for empirical analyses

Maven-like, ANT-like

Seite 21

Page 22: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

DataSet

type

UserDataSet

Data Items

type

Data Itemsfrom own survey

includesData

external dataset

buildScript

No gaps

Trust

Incentive

22

Page 23: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Communication & Architecture

Seite 23

Client

Digital Library

Archive DArchive CArchive B

Archive A

DOI

Reference Model

Authenticate & Request Data

Page 24: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Open Challenges (practical)

Researchers in economics would love to re-use data from others.

Researchers in economics hesitate to share their data.

Competitive advantage:

“We put too much effort into data production,

so we want to be the ones to publish on it.”

“The code discloses too much of our know-how.”

Incentives needed:

Data citation

Trust in research results (no gaps from data sources to results)

Page 25: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Open Challenges (technical)

Precise referencing:

A unique URI for every data item / table cell ?

How about curation and data versioning ?

Maven-like build scripts:

How to specify entire system environments and software modules?

Vocabulary extensions:

Specific data needs specific description,

where do the necessary rdf:Properties come from?

Page 26: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Summing up

• Reference model for exact reconstruction of research data sets

• Build scripts and dependency management for repeatability

• Transparency of data sources and processes

• “executable paper”, learning from others, data reviews,....

• rerun analysis – with curated values – with latest data

Seite 26

Page 27: Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Thank you