IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre...

21
DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre [email protected]

Transcript of IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre...

Page 1: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

DOI and e-Science

Dr Anne E Trefethen

Oxford e-Research Centre

[email protected]

Page 2: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

A Definition of e-Science

‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’

John Taylor

Director General of Research Councils

Office of Science and Technology, 2001

Page 3: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

myGrid: Directly Supporting the e-Scientist

PartnersManchester, EBI, Southampton,Nottingham, Newcastle, SheffieldAstraZenecaGlaxoSmithKlineMerck KGaAEpistemics LtdGeneticXchangeNetwork Inference

http://mygrid.man.ac.uk

IBMSUN Microsystems

myGrid slidescourtesy of Carole Goble

Page 4: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

myGrid Project

• Imminent ‘deluge’ of genomics data

• Highly heterogeneous• Highly complex and

inter-related• Convergence of data

and literature archives

(courtesy of Carole Goble, Manchester)

Page 5: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

An in silico experiment = a web of interconnected information and components

Provenance record of workflow runs

Provenance of the workflow template. Related workflows.

People

Ontologies describing workflows

Services used

Notes

Data in and out

LiteratureLiterature

(courtesy of Carole Goble, Manchester)

Page 6: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

The eBank Project

• Building links between e-research data, from the CombeChem project, with scholarly communication and other on-line sources

• Investigating the role of aggregator services in linking data-sets from Grid enabled projects to open data archives contained in digital repositories through to peer-reviewed articles as resources in portals

• JISC-funded project led by UKOLN in partnership with the Universities of Southampton and Manchester

(eBank slides courtesy of Liz Lyon and Jeremy Frey)

Page 7: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Comb-e-Chem Project

X-Raye-Lab

Analysis

Properties

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

(eBank slides courtesy of Liz Lyon and Jeremy Frey)

Page 8: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Goals of e-Bank Project

• Provide self archive of results plus the raw and analysed data

• Links from traditionally published work provides the provenance to the work

• Disseminate for “Public Review” – raw data provided so that users can check themselves

• Avoid the “publication bottleneck” but still provide the quality check

(eBank slides courtesy of Liz Lyon and Jeremy Frey)

Page 9: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Crystallographic e-Prints Direct Access to Raw Data from scientific papers

Raw data sets can be very large and these are stored at National Datastore using SRB server

(eBank slides courtesy of Liz Lyon and Jeremy Frey)

Page 10: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

e-Bank: Some Comments

• Data as well as traditional bibliographic information is made available

• Can construct high level search on data– aggregate data from many e-print systems

• Build new data services• Will extend to provision of real spectra - rather

than very reduced summaries - for chemistry publications

(eBank slides courtesy of Liz Lyon and Jeremy Frey)

Page 11: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

Grid

E-Scientists

Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning

5

Institutional Archive

LocalWebPublisher

Holdings

Digital Library

E-Scientists Graduate Students

Undergraduate Students

Virtual Learning Environment

E-Experimentation

E-Scientists

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints & Metadata

Certified Experimental

Results & Analyses

Data, Metadata & Ontologies

(eBank slides courtesy of Liz Lyon)

Page 12: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Data Publishing

Databases, notably in biology, are replacing (paper) publications as a medium of communication– Built and maintained with a great deal of human effort– Often do not contain source experimental data, sometimes just

annotation/metadata– Borrow extensively from, and refer to, other databases– Researchers are now judged by databases as well as (paper) publications – Upwards of 1000 (public databases) in genetics

• Integration of literature and data analysis of increasing importance - linking bio-database to literature, using publishers to check, complete or complement contents of such databases

Page 13: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

This is where DOI comes in…

• DOI use for publications, papers, talks etc• Now also for Data• Enables reference, citation of the data, data sets• Enables ready tracking of use of data (Possibly for

RAE!)

• Need to think carefully about granularity of citation and DOI

Page 14: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

DOI Registration Agency for Scientific Data

• May 2005 – New DOI Registration Agency appointed

• TIB – The Technische Informationsbibliothek – German’s National Library of Science and Technology.

Page 15: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

DOI in e-Bank

• Are creating DOI for all data from a crystal structure determination or DOIs for all separate files from each stag of the structure determination

• DOI points to e-bank web page ie the data set surrounded by the metadata about the peopled and the material etc.

Citation: Coles, S.J., Hursthouse, M.B., Frey, J.G. and Rousay, E. (2004), Southampton, UK, University of Southampton, Crystal Structure Report Archive. (doi:10.1594/ecrystals.chem.soton.ac.uk/145)

Page 16: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Page 17: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Page 18: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Notes

• eBank/eCrystals uses a full schema to describe the metadata, the datasets, the relationships between the dataset and the separate files – this took many iterations between chemists and the digital library groups to refine its current structure.

• German climate groups also using DOI and data

Page 19: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Some issues…

• Issue of DOI via TIB (in Germany)• Cost of DOI• Where does it point? Laboratory store?

Institutional repository? National repository?• Global resolving vs community level?

Page 20: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Conclusions

• Publication of data and “paper” becoming integrated in the digital scholarly research cycle

• DOI has a role to play• Still a lot to learn regarding optimal use

• Have implicitly touched on Open Access but as policies begin to apply to data as well as publication research outputs, then the above will be even more so.

Page 21: IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre Anne.Trefethen@oerc.ox.ac.uk.

IATUL Porto, May 21, 2006

Acknowledgements

With special thanks to Jeremy Frey, Andrew Milsted, Liz Lyon, Carole Goble