Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

18
Taxonomic 'data' exchange as expression and synthesis of phylogenetic claims Jonathan A. Rees National Evolutionary Synthesis Center IevoBio, 25 June 2014

description

'Slides' from a 5-minute presentation at iEvoBio 2014.

Transcript of Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Page 1: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Taxonomic 'data' exchangeas

expression and synthesisof

phylogenetic claims

Jonathan A. ReesNational Evolutionary Synthesis Center

IevoBio, 25 June 2014

Page 2: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Synergy

CoL IRMNG NCBI GBIF EOL Union4 Treebase OpenTree...

Finding inconsistencies = goodbut hard

Collecting information is useful

Page 3: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Data' – BAH!

'data' 'information' 'representation' 'format' 'nomenclature' - how bland. Distracting.

Claims, not data. Consequential.

Page 4: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Terminology

Taxon: a set determined by a membership rule. ['taxon concept']

Character basedDescent basedConspecifcity based

Taxonomy: a collection of taxa that form a hierarchy.

Some taxonomies are phylogenetic (all clades).

Page 5: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Taxonomies are collections of claims

X

A

B

C

X includes A, B, and CA, B, C are mutually disjointX, A, B, and C are clades - if phylogenetic.

Page 6: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

The important claims are about biology

X includes Y

X1, X2, X3, … are mutually disjoint

X is a clade

X is a species

Page 7: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

We have to designate taxa somehow, when we express a claim

Many taxon names are polysemous

To be clear, always say 'in the sense of' some static document (article or database snapshot)X = Mammalia sensu http://dx.doi.org/10.1126/science.1211028

If used multiple ways in some document, give further qualifcation

Claims about taxa

Page 8: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Reasoning with claims

X includes Y and Y includes Z → X includes Z

X includes Y → X and Y are not disjoint

X and Y are clades → one includes the other, or they are disjoint

Page 9: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Two ways to be wrong

Wrong about designation

Wrong about science

Page 10: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Alignment' = estimating coreference

Alignment claims:

X = Y (X and Y are the same taxon)Mammalia sensu http://dx.doi.org/10.1126/science.1211028} = Mammalia sensu NCBI.20140515

Heuristics based on properties and relations (including names...)

Manual 'curation' if necessary

Page 11: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Incertae sedis

Confusing.

X is incertae sedis in A means(1) A includes X(2) it's not known which of A's non-incertae-sedis 'children' X belongs to, if any

(2) is not a claim about biology.

Logical content = (1).

Page 12: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Data exchange'

Taxonomies - NP

Page 13: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Exchanging 'corrections'

'Rozella belongs in Fungi.'

'Rhodophyceae is the same as Rhodophyta.'

'SILVA's Morganella isn't the same as Index Fungorum's Morganella.''Anolis isn't a clade unless it is Norops is merged into it.'

Page 14: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Interpreting advice

“Rozella is in Fungi.”

Rozella sensu SILVA115 and Fungi sensu SILVA115 belong to a clade disjoint from the other SILVA115 children of Nucletmycea.

How about let's apply the label 'Fungi' to such a clade and not to Fungi sensu SILVA115.

Page 15: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Notation not so important, but for example -

includes(X, Y)

disjoint(A, B, C, …)

clade(X)

node(X, A, B, C, …) - abbreviationspecies(X)

same(X, Y) notSame(X, Y)

sensu('Name', source)

+ nomenclatural claims

Page 16: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

On and on

Synthesis

Identifer stability

Alignment details

Compare 'macrotaxonomy' and 'microtaxonomy'

Defense of scrufy

Compare Rod's github proposal

Philosophy of language

Page 17: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Separate science from nomenclature.

Use logic to do science.

Always use names with sensu.Use heuristics to prevent paralysis.

Don't 'represent data' – express claims!

https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Expressing-phylogenetic-claims

Bottom line

Page 18: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Ack

Nico Franz, David Thau, Rod Page

Open Tree: Karen Cranston, Stephen Smith, Mark Holder, and legions of others

Gerald Jay Sussman

Jonathan A. Rees 2014

Copyright waived CC0 1.0