Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

55
An update on taxonomic concept reasoning Pleas e @taxonbyte s Nico Franz 1 & Bertram Ludäscher 2 1 School of Life Sciences, Arizona State University 2 iSchool, University of Illinois at Urbana-Champaign TDWG 2016 Biodiversity Information Standards December 06, 2016 Instituto Tecnológico de Costa Rica (#TDWG16) @ http ://www.slideshare.net/taxonbytes /franz-ludaescher-tdwg-2016-an-update-on-taxonomic-concept-reasoning

Transcript of Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Page 1: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

An update ontaxonomic concept reasoning

Please

@taxonbytes

Nico Franz1 & Bertram Ludäscher2

1 School of Life Sciences, Arizona State University2 iSchool, University of Illinois at Urbana-Champaign

TDWG 2016 – Biodiversity Information Standards

December 06, 2016 – Instituto Tecnológico de Costa Rica (#TDWG16)

@ http://www.slideshare.net/taxonbytes/franz-ludaescher-tdwg-2016-an-update-on-taxonomic-concept-reasoning

Page 2: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

The big picture:

Why taxonomic concept reasoning?

Page 3: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

The pluralistic domain of human taxonomy making

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 4: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 5: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.

• They consist of sets of labels, data, and theories about the natural world.

• Over time, these theories change – converge or conflict (often in parallel).

Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387

"100 yearsof primate

taxonomies"

Page 6: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –

which 'took' millions of years to realize – tend to not change much.

Domain of human taxonomy making("mimic")

Page 7: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

Natural domain ("model")

A model to separate the human-made versus natural domains

Domain of human taxonomy making("mimic")

Page 8: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.

• At any time, our labels and theories (concepts) aim to stand for taxa; yet the alignment may be approximate.

Reliable?

Reliable?

Reliable?

A model to separate the human-made versus natural domains

Natural domain ("model")

Domain of human taxonomy making("mimic")

Page 9: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of

tracking congruent and incongruent taxonomic perspectives.

Page 10: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Remsen: Using names, we're lucky when revisions are infrequent

"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and

none or very few subsequent references […].

The name alone, so long as it is a unique name,is sufficient to locate all related material."

– David Remsen 2016: 213

Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546

Page 11: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.

• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".

1912 vs. 1967Logically

reconcilable?

Δ = ?Δ

Δ

Δ

Concepts: tracking progress and conflict in the human domain

Page 12: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Still bigger (re: Synthesis):

Why taxonomic concept reasoning?

Page 13: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Why promote taxonomic pluralism? *• Our work extends and complements prior TDWG efforts related to the

Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"

Page 14: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:

• DwC syntax is too under-powered for tracking multi-taxonomy alignments.

• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).

* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"

Why promote taxonomic pluralism? *

Page 15: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:

• DwC syntax is too under-powered for tracking multi-taxonomy alignments.

• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).

• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.

* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"

Why promote taxonomic pluralism? *

Page 16: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Our work extends and complements prior TDWG efforts related to the Taxonomic Concept Transfer Schema (https://github.com/tdwg/tcs).

• This work is necessary because using only Darwin Core tends to suppress taxonomic pluralism:

• DwC syntax is too under-powered for tracking multi-taxonomy alignments.

• DwC semantics ("Taxon") are too ambiguous to enforce a consistent recognition of the two domains (human taxonomy making vs. natural world).

• Technical and political means of suppressing taxonomic pluralism "by design" have implications for data quality and trust in data aggregation.

• "Synthesis" does not necessarily require taxonomic monism ("backbone"). Logic-reconciled pluralism can provide a trust-generating path for systematists' contributions towards large-scale taxonomic data integration.

* See also Franz & Sterner @ TDWG16, Friday, 11:30 am (#1134) in Session "Data Gaps, Trust, Knowledge Acquisition"

Why promote taxonomic pluralism? *

Page 17: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

An update on Euler/X:

Logic, use cases, and novel services

Page 18: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Euler/X – logically consistent RCC–5 alignments

• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.

• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.

Page 19: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001

Semantic Web. doi:10.3233/SW-160220

Biological Theory (accepted). doi:10.1101/022145

PloS ONE. doi:10.1371/journal.pone.0118247

Systematics Biodiv. doi:10.1080/14772000.2013.806371

Systematic Biology. doi:10.1093/sysbio/syw023

Biodiversity Data Journal (accepted). #6093Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610

Page 20: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

Page 21: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf

Region Connection Calculus (set constraints)

== < > >< !• Two regions N, M are either:

• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)

• RCC–5 articulations answer the query: "can we join regions N and M?"

• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.

Page 22: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use cases – primate classifications & avian phylogenies

1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)

a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint

b. Quantifying name (identifier) reliability

c. Reasoning achieves scalability (matrix)

2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)

a. Psittaciformes with & without coverage

b. Alignment of the "Neoavian explosion"

Page 23: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 1:

Two primate classifications –

MSW2 (1993) versus MSW3 (2005)

Page 24: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Page 25: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

Page 26: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 27: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Many names &congruent region

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 28: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 29: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 30: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 31: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.

Sensible when complete sampling of children is intended.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

Page 32: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 1.b.: Quantifying name (identifier) reliability

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

Page 33: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

One name &congruent region

• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.

Use case 1.b.: Quantifying name (identifier) reliability

Page 34: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

Page 35: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 1.c.: Reasoning achieves scalability (MIR matrix)

Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf

• Input: 402 articulations. Output: 153,111 Maximally Informative Relations

Salmon cells↔ reasoning

Page 36: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 2:

Avian phylogenies sec. Prum et al. (2015)

versus Jarvis et al. (2014)

Page 37: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638

2015 2014

Phylogenetic inferencescan vary over time.

Page 38: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)

• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]

Page 39: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 2.a.: Psittaciformes with & without coverage constraint

• Psittaciformes sec. 2015 – with global coverage constraint

Input visualization Only disjoint articulations

Page 40: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions

Input visualization Only disjoint articulations

Alignment visualization 108 MIR; all disjoint

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 41: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with coverage locally relaxed

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 42: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 43: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]

• Allows for 3 congruent & 7 inclusive RCC–5 articulations

Input visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 44: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 45: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence

• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

Alignment visualization

Additional 2015 low-level sampling

Use case 2.a.: Psittaciformes with & without coverage constraint

Page 46: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Use case 2.b.: Alignment of the "Neoavian explosion"

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Page 47: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015.Paleognathae Non-congruence within

2014.Pelecanimorphae

Use case 2.b.: Alignment of the "Neoavian explosion"

Page 48: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within2015/2014.Neoaves

(see next slide)

Use case 2.b.: Precise semiotics for the "avian explosion"

Page 49: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels

26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict

Use case 2.b.: Precise semiotics for the "avian explosion"

Page 50: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

In conclusion:

Achievements, challenges, promise

Page 51: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:

• 2-6 input taxonomies at once,

• maximally with ca. 3,200 input concepts.

Page 52: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:

• 2-6 input taxonomies at once,

• maximally with ca. 3,200 input concepts.

• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.

• Theory and reasoning performance are no longer most pressing limitations.

Page 53: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Taxonomic concept reasoning – now & soon?• Current reasoning toolkit over can typically handle:

• 2-6 input taxonomies at once,

• maximally with ca. 3,200 input concepts.

• Wider adoption is increasingly a matter of making the case, generating will at various levels: publishing systematists, TDWG, aggregators, publishers, etc.

• Theory and reasoning performance are no longer most pressing limitations.

• Two new applications in planning:

• Integration of taxonomic concept syntax and semantics into Pensoft's "Open Biodiversity Knowledge Management System" (OBKMS).

• Transition of a specimen-based Symbiota flora portal (SERNEC) to utilizing (only) taxonomic concepts and RCC–5 relationships.

Page 54: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Acknowledgements & links to products and references

• TDWG#16 organizers, especially Gail Kampmeier & William Ulate!

• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.

• ProvenanceMatrix: Tuan Nhon Dang.

• NSF DEB–1155984, DBI–1342595 (PI Franz).

• NSF IIS–118088, DBI–1147273 (PI Ludäscher).

• Information @ http://taxonbytes.org/tag/concept-taxonomy/

• Euler/X code @ https://github.com/EulerProject/EulerX

Page 55: Franz ludaescher tdwg 2016 an update on taxonomic concept reasoning

Interested in exploringmulti-taxonomy & -

phylogeny alignments?Please contact me.

[email protected]@taxonbytes

https://biokic.asu.edu/