Biodiversity Informatics at the Natural History Museum Ed Baker Terrestrial Invertebrates,...

Post on 12-Jan-2016

217 views 0 download

Tags:

Transcript of Biodiversity Informatics at the Natural History Museum Ed Baker Terrestrial Invertebrates,...

Biodiversity Informatics at the Natural History Museum

Ed BakerTerrestrial Invertebrates, Department of Life Sciences& NHM Informatics Initiative

http://dx.doi.org/10.6084/m9.figshare.722897

Science as a Slow Cooker• Only the surface visible

• Lid kept on for extended periods of time

• Uses cheap cuts of raggy meat

• Ingredient lose their nutritional value

• Children at risk due to high temperatures

http://ispiders.blogspot.co.uk/2011/11/realtime-web.html

We like data• 70 million+ specimens collected over 400 years

• 350,000+ books

• ??? Unpublished datasets in archive, notebooks, computers

• ??? In the minds of staff

How do we provide access?• Digitisation of specimens and associated data

• Scanning and transcribing books, journals, archives

• Providing tools for managing the data life cycle

• Changing the way we publish: data publication

Flowing Data

Publication

Collection Curation Use

Flowing Data

Collection Curation

Somebody retires Somebody dies Project is cancelled

Sits in desk drawer or on a hard drive until….

Flowing Data

Collection Curation Use

Data Publication

Re-use

Publication

Re-use Re-use Re-use

Flowing Data: from collection to reuse

Collection Curation Use

Data Publication

Re-use

Publication

Re-use Re-use Re-use

Collection

Citizen Science

Automated identification and monitoring

Traditional taxonomic sources

Flowing Data: from collection to reuse

Curation Use

Data Publication

Re-use

Publication

Re-use Re-use Re-use

Curation

Websites for communities to publish and curate:• Taxonomy / nomenclature• Bibliographies• Specimen information• Character matricies

Flowing Data: from collection to reuse

Use

Data Publication

Re-use

Publication

Re-use Re-use Re-use

Use: Oboe

Use: Oboe

Flowing Data: from collection to reuse

Data Publication

Re-use

Publication

Re-use Re-use Re-use

Publication (Data)

• Datasets

• Single species descriptions

• Checklists

• Software

Flowing Data: from collection to reuse

Re-use

Publication

Re-use Re-use Re-use

Publication (Research)

• Traditional research

• Systematic zoology

• Phylogeny

• Biogeography

Flowing Data: from collection to reuse

Re-use Re-use Re-use Re-use

The Problem of Scale

Data is being generated by tens of thousands of researchers, in thousands of institutions

• Hard to find what you need

• Hard to know if what you need actually exists

• Impossible to go through researcher by researcher

NHM Data Portal

• Aggregator for NHM science data

• Visualisation tools for datasets

• Allows export of NHM data for re-use

The Informatics Landscape

>18K specimen records(local small scale coverage)

>276M specimen records(worldwide coverage)

The Informatics Landscape

A webpage for every species

Aggregate specimen and observation data globally

Wikimedian in Residence

• Make NHM content available under open licenses for use on Wikimedia projects (and elsewhere)

• Reach of Wikipedia: BBC, Encyclopedia of Life

• Wikisource: Transcription and translation crowd-sourcing

Flowing Data: from collection to reuse

?

"Everybody makes mistakes. And if you don't expose your raw data, nobody will find your

mistakes." Jean-Claude Bradley

http://bit.ly/146ugIv