Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250...

34
Donat Agosti, Plazi Opportunities & Challenges of Citizen Science ETH Zentrum, Zurich 23.1.2015 Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Transcript of Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250...

Page 1: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Donat Agosti, Plazi

Opportunities & Challenges of Citizen Science ETH Zentrum, Zurich

23.1.2015

Plazi or the challenge to free biodiversity data caught in hundreds

of millions of pages of over 250 years of scientific publications

Page 2: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

4 years after the Rio Earth Summit

Page 3: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Image of ant book : not an easy social taskHarvard libraryOne book

1996: The Museum of Comparative

Zoology at Harvard University as the only

place on Earth with a complete collection of

ant taxonomic publications.

The ant community got together to create a

standard protocol to collect ants

1996

Page 4: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

14 years after the Rio Earth Summit

Page 5: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Image of online catalogue and libraryAccess to everybody

2006: Antbase.org allowed as a first of its kind online open access to

all the literature with up to 10,000 visitors per month.

2006

Page 6: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

23 years after the Rio Earth Summit

Page 7: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

45,000,000 pages scanned by theBiodiversity Heritage Library; many more, but private digital repositories

Millions of specimens digitized

Better than before

2015

Page 8: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

23 years after the Rio Earth Summit

2015

Page 9: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

These are only scans of a fraction of theestimated 500,000,000 pages with poorOCR.

We still have no complete «phone book» of the species of the world, nor access tothe data provided when they have beendescribed or re-used.

2015

Page 10: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

because23 years after the Rio Earth Summit

Page 11: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Taxonomic publications

PDFs are stupid –only we canunderstand them, lots of them arecopyrighted

despite…

Page 12: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Text

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name>

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin

to a sharp apical tooth, the apex parallel to the anterior

(Holotype with material in mandibles, so mandibles and

$ described below from paratypes.) Median clypeus

....

</treatment>

Semanticallyenhanced text

… alternatives: From human to machine readable text

RDF

Page 13: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

What does this mean?

The Linking Open Data cloud diagram

Linked Open Data Cloud

Page 14: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

zt03131p034lsid:zoobank.org:pub:8EDE33EB-3C43-4DFA-A1F4-5CC86DED76C8 http://treatment.plazi.org/id/BDA70EC9-F8AB-AED6-C2B7-628596A1714AJeremy Miller http://orcid.org/0000-0001-8918-9775Pardosa zyuzininew species

Developing persistent and openly accessible digital taxonomic literature

Pardosa zyuzini Kronestedt & Marusik, 2011

Torbjörn Kronestedt & Yuri M. Marusik, 2011, Studies on species of Holarctic Pardosa groups (Araneae, Lycosidae). VII. The Pardosa tesquorum group, Zootaxa 3131, pp. 1-34: 25-28. DOI: 10.5281/zenodo.10109

Pardosa zyuzini sp. nov.

Figs 7-8, 22-23, 28, 31, 94-106, 116

Pardosa paratesquorum (misidentification, in part): Schenkel 1963: 360, fig. 208b (♀, not ♂) Plazi.

Pardosa paratesquorum (misidentification): Logunov & Marusik 1995: 115 Plazi; Marusik et al. 1996: 35-36 Plazi; Logunov et al. 1998: 139 Plazi; Marusik & Logunov 1999: 247 Plazi.

Pardosa cf. paratesquorum . Marusik et al. 2000: 84 Plazi; Marusik & Buchar 2003: 157 Plazi; Logunov & Marusik 2004: 63 Plazi.

Pardosa sp. 2: Marusik & Logunov 2009: 151 Plazi.

Type material.Holotype♂ and GoogleMaps allotype ♀ from MONGOLIA, Oevoerkhangai Aimag, Zuunbayan-Ulaan Somon, ZamtynDavaa GoogleMaps (46°43'N102°51'E),2000 m,14-18 June 1997(Y.M. Marusik) in ZMMU. - GoogleMapsParatypes.MONGOLIA. OevoerkhangaiAimag GoogleMaps . same data as holotype (CAS, GoogleMaps ISEA, GoogleMapsIZAS, GoogleMaps NHRS, GoogleMaps ZMMU), GoogleMaps 110♂ 44 ♀. GoogleMaps Bayankhongor Aimag. Gurvanbulag Somon, Lake Khokh-Nuur GoogleMaps (47 ° 32'N98 ° 32'E),2600 m,7-10 June 1997(Y.M.,IBPN), 10♂. GoogleMapsAssonge, Tola (Tuul) River ,1909(du Chazaud,MNHN), 1♀ . Arkhangai Aimag. Uu-bulan, Saikhany saravi ,24 June 1976(TsugEnkhtuyaa,IBPN), 1♂ 1♀. - RUSSIA.Altai: 8 km S of Chagan-Uzun Village GoogleMaps (50°04’N,88°24’ E),1800m, grassy bank of Chuya River,13 June 2009(A.A. Fomichev,ISEA), 2♀ GoogleMaps ; 2 km SE of Kosh-Agach ,27 June 1996(A. & R. Dudko,ISEA), 1♂ ; 70-75 km W of Kosh-Agach,40-45 km W of Bel'tir , Taltura (Chagan-Uzun) River canyon ,2300-2500 m, mountain stony steppe,26-28 June 1999(V.V. Glupov,ISEA), 2♂ 1♀ ; Kosh-Agach VillageGoogleMaps (50°01'N,88°38'E),1800m, saline swamps,13 July 2009(A.A. Fomichev,ISEA), 1♂2♀ GoogleMaps . Tuva: MongunTaiga Distr.,12 km downstream from Mugur-Aksy by Kargy River ,1800 m, river bank,14 June 1989(D.L.,ISEA:SZM 001.1505), 2♂ 1♀ ; SE part of Kyzyl , steppe,22-24 July 1996(Y.M.,IBPN), 3♂ 2♀ ; Ovyur Distr, pass between Sagly and Onachy rivers ,2200 m, ca20-25 km W of Sagly Village , wet habitats,13 June 1989(D.L.,ISEA:SZM 001.1506), 2♂ ; Ulug-Khem Dist.,6-7 km E of Choduraa, Chulaanych site , near creek,10 May 1990(D.L.,ISEA:SZM 001.1514), 14♂ ; Tere-Khol' Lake, Sharlaa stand and around GoogleMaps (50 ° 1.47'N95° 3.45'E),1050 m,6- 14 July 1996(Y.M.,ISEA), 19♂ 6♀ GoogleMaps ; 30-35 km W of Erzin, Shara-NurLake GoogleMaps(50°12'N,94°32'E),900 m,8 June 1995(Y.M.,ISEA:SZM 001.1512), 5♂ 7♀ GoogleMaps ; Erzin Distr.,20 km NW of Erzin Village, Dus-Khol' Lake, Tes-Khem River ,800 m,31 May 1989(D.L.,ISEA:SZM 001.1515 & 001.1517), 39♂ 14♀ ; ~20 km WNW of Erzin, Dus-Khol' Lake

Treatment

about services projects communications Legal advocacy files

KingdomPhylumClassOrderFamilyGenus

Animalia [224130, 24756]Arthropoda [1869565, 15066]Arachnida [2959, 6934]Araneae [2785, 5430]Lycosidae [90, 850]Pardosa [35, 557]

2020

2000

1980

1960

Pardosa zyuziniKronestedt & Marusik 2011

Pardosa sp. 2Marusik & Logunov 2009

Pardosa cr. paratesquorumLogunov & Marusik 2004Marusik & Buchar 2003

Marusik et al. 2000 Pardosa paratesquorumMarusik & Logunov 1999

Logunov et al. 1998Marusik et al. 1996

Logunov & Marusik 1995

Expand all

Taxonomy [treatments, specimens]

Specimens

Pardosa paratesquorumSchenkel 1963

Verbatim Taxon Name

Taxonomic Status

Treatment

RDA of cited treatments

Count of treatments and specimens for and Taxon Kingdom, Taxon Phylum, Taxon Class, Taxon Order, Taxon Family, Taxon Genus for this species (e.g., Pardosa zyuzini)

Taxon Name Authority (1) Authority (2) Year

mods:identifier

Map of georeferenced points

publication IDpublication LSIDpersistent identifiertreatment provided byscientific namestatus

Page 15: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

FIGURES 29–31. Epigynes in dorsal view. 29, Pardosa eskovisp. nov. (from Yakutia: Suntar). 30, P. mulaiki Gertsch (from Saskatchewan: Hanley). 31, P. zyuzini sp. nov. (from type locality). cd, copulatory duct; sp, spermatheca. Scale line (applies

FIGURES 94–101. Pardosa zyuzini sp. nov. 94, left bulbus, ventral view. 95–96, left terminal part of bulbus in ventral (95) and retrolateral (96) view. 97, left male palp (patella, tibia and cymbium), dorsal view. 98, embolus of left palp,

FIGURES 102–106. Pardosa zyuzini sp. nov., male (from type locality). 99, terminal part of left bulbus in ventral (102), retrolateral (103) and ventro-frontal (104) view. 105, left tegulum with tegular apophysis in ventral view. 106, tarsus and

FIGURE 116. Distribution of Pardosa eskovi sp. nov. (■), P. mulaiki (), P. paratesquorum (), P. tesquorumoides () and

DownloadsDarwin Core materials citationsDarwin Core ArchivePlain XMLTaxonX

Shared with

Treatment reference graph

Pardosa zyuziniKronestedt & Marusik 2011

Schenkel 1963

2020

2000

1980

1960

Marusik & Logunov 2009

Logunov & Marusik 1995

Logunov & Marusik 2004

Logunov et al. 1998

Marusik & Buchar 2003

Marusik & Logunov 1999

Marusik et al. 1996

Marusik et al. 2000

view. 1, Pardosa eskovi sp. nov.♀ from Yakutia: Suntar. 2–3, P. mulaiki Gertsch ♂ (2) ♀ (3), both from Saskatchewan: Rosetown. 4, P. paratesquorum Schenkel ♂(paratype of P. daqingshanicaTang, Urita &

ventral view. 19, Pardosa eskovisp. nov. (from Yakutia: Suntar). 20, P. mulaiki Gertsch (from Saskatchewan). 21, P. tesquorumoides Song & Yu (from Sichuan: Hongyuan Co.). 22–23, P. zyuzini sp. nov. (from

ventral view. 25, Pardosa eskovisp. nov. (Yakutia: Suntar). 26, P. mulaiki Gertsch (from Saskatchewan: Hanley). 27, P. tesquorumoides Song & Yu (from Sichuan: Hongyuan Co.). 28, P. zyuzini sp. nov. (from type

Page 16: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Journal of Hymenoptera Research

5170 specimens

4062 plottable specimens from

1138 unique locations

Page 17: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Treatment

Plazi Search and Retrieval Server: Access to data

DwC-A

You

You

You

human

machine

Page 18: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Treatment

Linking of treatments to external resources

Page 19: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.

allenii

melanoceras

ruddiae

chiapensis

collinsii

cookii

cornigera

globulifera

hindsii

janzenii

mayana

sphaerocephala

boopis

flavicornis

hesperius

ita

janzenikuenckeli

mixtecus

nigrocinctus

nigropilosus

opaciceps

particeps

peperi

reconditus

satanicus

simulansspinicola

subtilissimus

veneficus

ferrugineus

gentlei

gracilis

Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments

Acacia-ant species: Pseudomyrmex gracili

Treatment: redescription

Associated ant-acacia: Acacia gentlei

Ants Plants

Photocredits: Alex Wild

Treatment

Treatments linked through citations

Treatment

Page 20: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Solution: Biodiversity Literature Repository @ Zenodo/CERN

Page 21: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

• Swiss based international NGO andSME

• Founded in 2008• Mission to foster (Open Access) Linked

Open (scientific) Data• EU and volunteer support

Page 22: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Build as start a TreatmentBank that

includes direct access to 1 million

citable treatments, related metadata

and digital copies of the source

publications

Page 23: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Goal: Giant Global Species Graph

LegalSocialTechnicalOntologiesInfrastructure

Page 24: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

The paradox

Page 25: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Citizen Scientists (red, probably gray) create discoveries:

Current descriptions of new species in Europe (10 years: 5881 species.)

http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0036881

Page 26: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Citizen scientists aka amateurs are majorcontributors to this knowledge.

They have at best a difficult access to itsdigital content. Even more so in the South, where most of the biodiversity is.

Page 27: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

In fact: citizen scientists have to discoverthe world’s biodiversity a second time - bymaking it digitally accessible this time.

The double work of citizen scientists

Page 28: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

The challenge

Page 29: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Scientists and funding can only do so much.

digitizing collaboration, Open Access, sharing

Page 30: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

How can we mobilize citizen scientists tocomplement the scientists’ effort to buildan Open Biodiversity Knowledge Commons by creating semanticallyenhanced, linked data out of 500,000,000 million printed pages, 10s of millions of digitized specimens, gene sequences, etc?

Page 31: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

I want to be able to find out what speciesthis ant is and whether I can still see thisant in 20 years here in the mediterraneancity of Zurich.

I want to know how many species live on Earth.

Page 32: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Thank you!

Donat [email protected]

Page 33: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Links

LinksFurther reading: http://plazi.org/?q=plazi_publicationsCatapano, 2011 (http://www.ncbi.nlm.nih.gov/books/NBK47081/)Bouchout Declaration (http://bouchoutdeclaration.org)Blue List (http://plazi.org/?q=blue_list)Biodiversity Literature Repository (https://zenodo.org/collection/user-biosyslitZenodo (https://zenodo.org/about)Refindit (http://refindit.org)Refbank (http://refbank.org)Pro-iBiosphere (http://pro-ibiosphere.eu/)Introduction to persistent identifiers (http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs)

Twitter@plazi_treat; @bouchoutdec, @myrmoteras

Page 34: Plazi or the challenge to free biodiversity data caught in hundreds of millions of pages of over 250 years of scientific publications

Additional resources on the impact of amateurs (By Frank Krell)

In Austria about fifteen years ago, four fifth of all entomologists were amateurs (Malicky 1978). In a recent compilation of the entomologists of Rhenania (Germany) of the last 250 years Evers (1992: 100) found out that only 23 % of the living entomologists are professionals, 77 % are amateurs (n = 78), 13 % of the deceased entomologists are professionals and 87 % amateurs (n = 111), altogether 17 % are professionals and 83 % amateurs (n = 189). At least in Central and Western Europe and Japan, the situation the same. Bello et al. (1992) imparted that 13 % of Spanish taxonomists were amateurs (56 % are professionals, 16 % have a temporary status, relying on grants, 13 % are "Other").I don’t know if there is anything newer. The references cited areBello, E. & Becerra, J.M. & G.-Valdecasas, A. 1992. Counting on taxonomy. Nature 357: 531.Evers, A.M.J. 1992. Entomologie und Entomologen des Rheinlandes, insbesondere des Niederrheins. Entomologische Blätter für Biologie und Systematik der Käfer 88: 93-102.Malicky, H. 1978. Am Beispiel der Entomologie. Amateurwissenschaftler und Amateurforschung. Österreichische Hochschulzeitung 30(4): 19-20.