20140623 swets agosti_final

Post on 10-May-2015

448 views 0 download

Tags:

description

Open Access and the Future of (Biodiversity-) Research "SWETS Be Open" Event Bern, Switzerland, June 23, 2013; Donat Agosti, Plazi

Transcript of 20140623 swets agosti_final

SWETS – Be Open!

Donat Agosti

Plazi, Bern

23.6.2014, Universität Bern

Open Access and the Future of (Biodiversity-) Research

Future of Biodiversity Research

Data Mining

Background

Rio Earth Summit 1992

Background

Biodiversity Crisis

Background

Indicators

Indicators as powerful widely understood tool

Reed Elsevier, Annual Reports and Financial Statements 2013

http://www.reedelsevier.com/investorcentre/reports%202007/Documents/2013/reed_elsevier_ar_2013.pdf

39% profit

Biodiversity research and conservation planning0

Multi-Taxon Specimen Data for Setting Conservation Priorities

Source: Kremen C, et al. 2008. Science 320: 222-226.

Consensus conservation priority areas and actual and proposed protected areas

2003: Madagascar announces it will triple protected land to 10% coverage

Politics

IPBESIntergovernmental Platform on Biodiversiy & Ecosystem Services

EU-Political and Science Decision to support IPBES

EU-BONEuropean Biodiversity Observation Network

EU-FP7 funded

A EU decision in support of environmental policy making

EU-BON

• To build a European Biodiversity Observation Network

• Measure and predict change over space and time

• Combine Remote Sensing data and on the groundobservation data in predictive modeling

• Tools to inform decision makers (EU-politicians)

The basic science question

Hardisty, Nature 502, 171 (2013)

BUT: predictive ecology has substantial data needs

Harfoot, BIH2013, Rome, 2013

What is the future of the biological world?

Imagine if we could:

…Predict community level dynamics of ecosystems atscales from local to global, based on the ecology andbiology of all individual organisms

Modeling life on earth

Can we do it?

A realistic goal?

Communication

EU-BON a child of GEOSS

Global Earth Observation System of Systems

Open Access to remote sensing data from all over theworld

The impact of remote sensing data on understanding biodiversity

With sophisticated technologies we can identify different trees in the Amazon…

http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4

Access to data

…we could create a link to the related data in our biodiversity literature

http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4

Names as information tags in life sciences

Names

Characteristics

Publications

GenesCollections

Specimens

Distribution

Treatments as bits of information

Treatment: sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions. (Catapano, 2010)

Formica obsoleta, Linnaeus 1758: 580

Treatments as part of publications

DNA

Specimens Observations

Institution

Pharmacology/epidemiology

Publication

Treatment

Treatment

Treatment

Table

Appendix

Biology/ecologyReference to other biota

Publication

Treatment

Publication

Text(e.g. PDF)

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name> Bohn & Verhaagh

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving

to a sharp apical tooth, the apex parallel to the anterior clypeal margin.

(Holotype with material in mandibles, so mandibles and anterior clypeus

$ described below from paratypes.) Median clypeus

....

</treatment>

Enhanced and linked text(XML: Taxonx / Taxpub JATS)

Plazi: Semantic enhanced treatments

Automatic extraction and visualization of treatment content

CountriesMadagascar

Anochetus grandidieri Forel

Datamining of treatments

Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.

allenii

melanoceras

ruddiae

chiapensis

collinsii

cookii

cornigera

globulifera

hindsii

janzenii

mayana

sphaerocephala

boopis

flavicornis

hesperius

ita

janzenikuenckeli

mixtecus

nigrocinctus

nigropilosus

opaciceps

particeps

peperi

reconditus

satanicus

simulansspinicola

subtilissimus

veneficus

ferrugineus

gentlei

gracilis

Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments

Acacia-ant species: Pseudomyrmex gracili

Treatment: original description

Treatment: redescription

Associated ant-acacia: Acacia gentlei

Ants Plants

Photocredits: Alex Wild

Treatment

Treatments linked through citations

The Plazi approach

From treatmentto treatment repository

The Plazi approachAgosti, D., W. Egloff. 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53. doi:10.1186/1756-0500-2-53

The Plazi approach

Plazi workflow

Plazi SRS

find scan «OCR» markup store

Analyzing a large corpus of publications: Plazi repository

14,590 specimens

8900 plottable specimens from

1138 unique locations

Analyzing a journal: Journal of Hymenoptera Research

5170 specimens

4062 plottable specimens from

1138 unique locations

The biodiversity community

Plants

3,400 Herbaria worldwide

10,000 Associate curators and specialists

350,000,000 specimens in collections

180,000,000 specimens digitized

2,000,000,000 specimens including animals

The biodiversity community

200,000,000+ printed pages1,900,000 species described20,000,000+ species treatments 17,000 new species per year

The taxonomy publishing world

12,000 Taxonomic Papers on 42,000 SpidersSince 1757

Publications widely scattered

Source: Jeremy Miller

Why is the system broken?

WHY does it NOT work?

Access to data limited

…we cannot create a link to the related biodiversity data

http://video.ted.com/talk/podcast/2013G/None/GregAsner_2013G-480p.mp4

Communication

200,000,000+ printed pages1,900,000 species described20,000,000+ species treatments 17,000 new species per year

BUT: The data are hidden

Incomplete digitization Publications are not semantically enhancedCollections are incompleteData is not linkedMost data are not open

Why is the system broken?

Access to a corpus

NOT

single PDF, data point

Why is the system broken?

Access to content

NOT

representations

Why is the system broken?

Legal issues

Technical issues

Social issues

Legal issues: Copyright

Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire

body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)

Legal issues: licences

Legal licences for 1000+ journalscannot be tracked by scientists

Technical issues: Digtial Object Identifiers

DOIMissing

CrossRef an exclusive club

Technical issues: Journal publishing workflow

Journal publishing workflows:

From structured data to unstructured text

Technical issues: Content extraction

Conversion of legacy literature prohibitively expensive

Mark up costs for markup including materials citations

0

5

10

15

20

25

30

35

40

0 100 200 300 400 500 600 700

Pag

es

Minutes

Source: Spider Pilot, Jeremy Miller

Plazi SRS

find scan «OCR» markup store

Average:6 min / page complete

OCR: 0.80 EUR /page vendor

Social issues: data sharing

The misunderstood attribution

Why is the system broken?

WHY NOT

make it work?

European Open Biodiversity Knowledge Management System

European Open Biodiversity Knowledge

Management System

European Union FP7 funded project

European Open Biodiversity Knowledge Management System

Prepare the ground for the creation of a system for intelligent management of biodiversity knowledge which will improve the present system of taxonomic literature.

Legal issues: Copyright: The Blue List

The Blue List

elements of taxonomic information that are not subject to copyright

Patterson, D. J., Egloff, W., Agosti, D., Eades, D., Franz, N., Hagedorn, G., Rees, J. A. and Remsen, D. P. 2014. Scientific names of organisms: attribution, rights, and licensing BMC Research Notes 7:79 doi:10.1186/1756-0500-7-79.

Legal issues: Copyright: Legal exceptions for research

Legal exceptions for research

Egloff W, Patterson D, Agosti D, Hagedorn G 2014. Open exchange of scientific knowledge and European copyright: The case of biodiversity information. ZooKeys 414, 109-135. DOI: 10.3897/zookeys.414.7717

Legal issues: Copyright: Open Access

Open Access

Legal issues: Copyright: Creative Commons Licence

Technical issues: DOI

Persistent identifiers for data objects and physical objects

Linking data using agreed vocabularies

http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs

Technical issues: DOI

Biodiversity Literature Repository @ Zenodo

public repository for legacy literature using Data Cite DOI

CrossRef to cite (Zenodo) Data Cite DOI?!

Technical issues: semantic enhanced publishing

Semantic enhanced publishing

Taxpub JATS

Use DOI as widely as possible

Technical issues: machine access

(well documented) API

Technical issues: semantic publishing

Advanced publishing and dissemination

Form based

Semantnic enhanced TaxPub JATS based publishing

Social issues: Bouchout Declaration

http://bouchoutdeclaration.org/ launched June 12, 2014

10 PrinciplesFree and open use of digital resourcesUse of persistent identifiers and linking of dataPolicy developmentsDeveloping sustainable business models

Social issues: Bouchout Declaration

Technical issues: business plan

Conclusions

If we want to conserve theworld’s biodiversity, weneed one stop open shopping for biodiversityresearch results.

Conclusions

We scientists are getting ouracts together.

Conclusions

Will the publishers too?

Thank you!

Donat Agosti

agosti@plazi.orghttp://plazi.org