Advancing the International Plant Names Index (IPNI)

34
Advancing the International Plant Names Index (IPNI) Nicky Nicolson, Alan Paton, Jim Croft, James Macklin, Paul Morris, Greg Whitbread, Kanchi Gandhi

Transcript of Advancing the International Plant Names Index (IPNI)

Page 1: Advancing the International Plant Names Index (IPNI)

Advancing the International Plant Names Index (IPNI)

Nicky Nicolson, Alan Paton, Jim Croft, James Macklin, Paul Morris, Greg Whitbread, Kanchi Gandhi

Page 2: Advancing the International Plant Names Index (IPNI)

Advancing IPNI

• Current - where IPNI is now• Issues • Future - where we’d like to go and how to get

there

Page 3: Advancing the International Plant Names Index (IPNI)

What data?

• What data types:– ICBN governed nomenclatural acts– Standardised author list– Publications

• Which groups:– Vascular plants

• Which ranks:– Family and below

Page 4: Advancing the International Plant Names Index (IPNI)
Page 5: Advancing the International Plant Names Index (IPNI)
Page 6: Advancing the International Plant Names Index (IPNI)
Page 7: Advancing the International Plant Names Index (IPNI)
Page 8: Advancing the International Plant Names Index (IPNI)
Page 9: Advancing the International Plant Names Index (IPNI)
Page 10: Advancing the International Plant Names Index (IPNI)
Page 11: Advancing the International Plant Names Index (IPNI)
Page 12: Advancing the International Plant Names Index (IPNI)
Page 13: Advancing the International Plant Names Index (IPNI)

How is data entered?

• Data entry:– From literature scanning, journals received by library at

Kew, Harvard, Canberra (2 years - 95%)– User reports of missing nomenclatural acts, usually

accompanied by a link to digitised literature page (BHL)• How many?

– About 7400 names entered in average year– About 6100 nomenclatural acts published / year– … of these about 2800 are tax. novs.

Page 14: Advancing the International Plant Names Index (IPNI)

How is data managed?• Full audit history on core objects – names /

authors / publications.• Average 300,000 edits on name records / year• Standardisation effort ongoing :

– Epithet– Author citation – Publication title– Collation– Year

Page 15: Advancing the International Plant Names Index (IPNI)

Standardisation – author and titleAuthor and Title standardization

30%

40%

50%

60%

70%

80%

90%

standardized author citations standardized publication title

Page 16: Advancing the International Plant Names Index (IPNI)

Standardisation – epithet updates

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

2006

-01

2006

-03

2006

-05

2006

-07

2006

-09

2006

-11

2007

-01

2007

-03

2007

-05

2007

-07

2007

-09

2007

-11

2008

-01

2008

-03

2008

-05

2008

-07

2008

-09

2008

-11

2009

-01

2009

-03

2009

-05

2009

-07

2009

-09

2009

-11

2010

-01

2010

-03

2010

-05

2010

-07

2010

-09

2010

-11

2011

-01

2011

-03

2011

-05

Page 17: Advancing the International Plant Names Index (IPNI)

Standardisation of epithets

• Why important – Main search criterion– Improving epithets enables other improvements

in dataset e.g.:• basionym linkage• de-duplication

– Errors propagate

Page 18: Advancing the International Plant Names Index (IPNI)

Rhus keamcyi was an OCR error for Rhus kearneyi but the incorrect value persists in datasets derived from IPNI

Page 19: Advancing the International Plant Names Index (IPNI)

Statistics

• Dataset can be used for trends analysis:– Publication rates– Combination rates– Author collaborations

• Audit history used to determine changes in data-set over time

http://www.ipni.org/stats.html

Page 20: Advancing the International Plant Names Index (IPNI)

http://www.ipni.org/stats.html

Page 21: Advancing the International Plant Names Index (IPNI)

As well as the data…

• IPNI editors respond to user queries about the data, dealing with c. 50 cases / month

• Includes an expert service re interpretation of ICBN

• Can provide worked examples illustrating particular articles of the code

Page 22: Advancing the International Plant Names Index (IPNI)

Why should anyone care?

• c55,000 searches / dayBUT• dataset is not being used to full advantage• inputs not being handled efficiently:

– limited to partnership– missing out on community input

• expertise is hidden

Page 23: Advancing the International Plant Names Index (IPNI)

Future

• Increase efficiency of input– provision of core data– annotating and linking existing data– solving nomenclatural problems

• Increase output– usage of IPNI data– benefit from on-going curation effort– benefit from nomenclatural expertise

Page 24: Advancing the International Plant Names Index (IPNI)

Data in - contributor services

• Pre-publication data entry• Batch submission of datasets• Annotation• Addition of links within dataset• Facilitate interpretation of nomenclatural

issues• Accreditation – credit for helping improve the

data

Page 25: Advancing the International Plant Names Index (IPNI)

Pre-publication data entry• Workflow currently being trialled

– Author or publisher submits data to IPNI once article has been accepted for publication

– Generated record suppressed until publication effective under the code

– But this not yet automated!

Page 26: Advancing the International Plant Names Index (IPNI)

Electronic Publication Example - Phytokeys

A nomenclator of Pacific oceanic island Phyllanthus (Phyllanthaceae), including Glochidion

Warren L. Wagner, David H. Lorence

• 5. Phyllanthus atalotrichus (A.C. Sm.) W.L. Wagner & Lorence, comb. nov.

urn:lsid:ipni.org:names:77112693-1

PhytoKeys 4: 67–94 (2011)doi: 10.3897/phytokeys.4.1581www.phytokeys.com

Page 27: Advancing the International Plant Names Index (IPNI)

Pre-publication issues• Name squatting – mitigated by only entering

names which are in papers accepted for publication

• Curation of record throughout publication process

• Electronic and effective publication – before this the record will not be visible

• IPNI editors provide visible expert service re validity of name

Page 28: Advancing the International Plant Names Index (IPNI)

Where IPNI data are placed

Any name occurrence: e.g. specimens, reports, literature citation

concepts

Standard form of name

Page 29: Advancing the International Plant Names Index (IPNI)

Data out - links

• To concept layer:– embed IPNI identifiers– storage of factual concepts / links to concept layer

• To name occurrence layer:– seed lexical reconciliation projects (e.g. GNI)

• To allied information:– literature– types

Page 30: Advancing the International Plant Names Index (IPNI)

Links to concept layerEmbed IPNI identifiers in externally held names lists• IPNI holds curated name data, labelled with persistent

identifiers.

• Need a tool to seed IPNI identifiers into datasets (in prototype)

• Can devolve curation of name elements in other systems to IPNI

Benefit from on-going curation:• 300,000 edits per year

Report on changes in name list since date

Page 31: Advancing the International Plant Names Index (IPNI)

Links to the Concept LayerExample The Plant List

Page 32: Advancing the International Plant Names Index (IPNI)

Link to name occurrence layer

• IPNI’s version history can be used to seed lexical reconciliation projects (GNI), e.g.:– Plectranthus macrophylius -> Plectranthus macrophyllus

• These editorialised translations of higher value than programmatically derived operations of the same edit distance, e.g:– Plectranthus microphyllus -> Plectranthus macrophyllus

• Standardisation tools and techniques opened up for use in allied projects

Page 33: Advancing the International Plant Names Index (IPNI)

Conclusion

• Faciliate electronic publication - pilot registration

• Foster larger community to support the data and automate workflows

• Stronger links between:– the people who produce names– the places where they are published– the downstream users

• Technical redevelopment

Page 34: Advancing the International Plant Names Index (IPNI)