Schuh ecn2013 tcn_data_structure

36
The structure of insect—plant The structure of insect—plant host data as derived from host data as derived from museum collections: An museum collections: An analysis based on data from analysis based on data from the NSF-funded Tritrophic the NSF-funded Tritrophic Database —Thematic Database —Thematic Collections Network Collections Network (TTD-TCN) (TTD-TCN) Randall T. Schuh Katja Seltmann Christine A. Johnson

description

 

Transcript of Schuh ecn2013 tcn_data_structure

Page 1: Schuh ecn2013 tcn_data_structure

The structure of insect—plant host data The structure of insect—plant host data as derived from museum collections: as derived from museum collections: An analysis based on data from the An analysis based on data from the NSF-funded Tritrophic Database —NSF-funded Tritrophic Database —

Thematic Collections NetworkThematic Collections Network(TTD-TCN)(TTD-TCN)

Randall T. SchuhKatja Seltmann

Christine A. Johnson

American Museum of Natural History

Page 2: Schuh ecn2013 tcn_data_structure

TTD-TCN Rationale

“The data captured via ADBC funding will dramatically improve our understanding of the relationships among the more than 11,000 species of North American Hemiptera (scale insects, aphids, leafhoppers, true bugs, and relatives), their food plants, and the wasps that parasitize the hemipterans.”

Page 3: Schuh ecn2013 tcn_data_structure

The data we will evaluate today were captured through a Web-based application developed with NSF Planetary Biodiversity Inventory funding and used by the TTD-TCN. This software application, known as Arthropod Easy Capture (AEC), is built in open-source code, is being implemented as an appliance by the ADBC-funded Home Uniting Bio-collections (HUB, iDigBio), and through that implementation will be able to be installed with a “one-click” installation application. Server code is on-line at Source Forge:

http://sourceforge.net/projects/arthropodeasy/

Page 4: Schuh ecn2013 tcn_data_structure

Specimen Count by Project(1,144,240)

Page 5: Schuh ecn2013 tcn_data_structure

Sources of Insect—Plant Host DataSources of Insect—Plant Host Data

Page 6: Schuh ecn2013 tcn_data_structure

Data on insect-plant relationships is available primarily from labels on insect specimens—as opposed to labels on plant specimens. Substantial amounts of data were captured for the family Miridae on a world basis under NSF Planetary Biodiversity Inventory funding between 2003—2011.

The TTD-TCN is a collaboration among 17 US entomological institutions. The institutional contributions from these two projects, as represented by numbers of specimen records, are seen in the following graph.

The TTD-TCN is defining the field structure for host data as used by the iDigBio and for other Web-aggregators such as DiscoverLife.org.

Page 7: Schuh ecn2013 tcn_data_structure
Page 8: Schuh ecn2013 tcn_data_structure

Choice of Groups for AnalysisChoice of Groups for Analysis

Page 9: Schuh ecn2013 tcn_data_structure

In order to evaluate the nature of insect-host plant data derived from collections, we need to look at groups that offer large data sets. Necessary attributes are:

1.Large numbers of specimen records with host information2.Large numbers of collecting events3.Substantial diversity of host taxa

At the present time the following taxa in our database meet those criteria:

Page 10: Schuh ecn2013 tcn_data_structure

HemipteraSternorrhyncha

Aphididae (4400 species worldwide)

AuchenorrhynchaMembracidae (3200 species worldwide)

HeteropteraMiridae (11,000 species worldwide)

Raw data for each taxon are distributed as seen in the following four graphs.

Page 11: Schuh ecn2013 tcn_data_structure

Year Specimens Collected

Collection Events

Membracidae Combined data

AphididaeMiridae

Page 12: Schuh ecn2013 tcn_data_structure

Without hosts

Hosts non-unique

Hosts unique

Host Records as a Proportion of Collecting Events

Page 13: Schuh ecn2013 tcn_data_structure

aaaaaaaaaaaaaaa

Miridae

MiridaeAphididae

Aphididae

Membracidae

Membracidae

Page 14: Schuh ecn2013 tcn_data_structure
Page 15: Schuh ecn2013 tcn_data_structure
Page 16: Schuh ecn2013 tcn_data_structure
Page 17: Schuh ecn2013 tcn_data_structure

Algorithmic Assessment of Algorithmic Assessment of Data QualityData Quality

Page 18: Schuh ecn2013 tcn_data_structure

Compute frequency of occurrence on a particular plant genus

COLLECTING EVEN DATA: The occurrence of an insect species on a plant genus

Compare with all insect collecting events on any plant

HEURISTIC DATA:Larvae present? Multiple specimens?Voucher specimen available?

Scores: High, Medium, or Low confidence in insect--plant association

Modify algorithm to improve fit of model to data based on results

ANALYSIS: evaluate insect/plant associations with different scoresANALYSIS: evaluate insect/plant associations with different scores

Page 19: Schuh ecn2013 tcn_data_structure

Dat

a: s

peci

men

col

lect

ing

even

ts

Heu

ristic

s (b

iolo

gica

l: la

rvae

, vou

cher

s, #

spe

cim

en)f(y) ≥ 15.00%

y ≥ 5

f(y) ≥ 2.00%y ≥ 3 ∨f(y) ≥ 15.00%y ≥ 2

x = 1

x = y ′ + y

Analysis

not high or medium

Page 20: Schuh ecn2013 tcn_data_structure

Results of AnalysesResults of Analyses

Page 21: Schuh ecn2013 tcn_data_structure
Page 22: Schuh ecn2013 tcn_data_structure

Using Using LarreaLarrea (creosote bush) as a (creosote bush) as a example hostexample host

Page 23: Schuh ecn2013 tcn_data_structure
Page 24: Schuh ecn2013 tcn_data_structure

Miridae/Larrea Association Network

Page 25: Schuh ecn2013 tcn_data_structure

Miridae/Larrea Association Network with High Confidence

Page 26: Schuh ecn2013 tcn_data_structure

Reasons for Low Host Scores andReasons for Low Host Scores andMethods for Improving Data QualityMethods for Improving Data Quality

Page 27: Schuh ecn2013 tcn_data_structure

Reasons for Low Scores1. Actual low host specificity: Indicated when a large number of

collecting events are distributed across many plant taxa.

Page 28: Schuh ecn2013 tcn_data_structure

Reasons for Low Scores1. Actual low host specificity: Indicated when a large number of

collecting events are distributed across many plant taxa.2. Movement of adult specimens to alternative food sources:

Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.

Page 29: Schuh ecn2013 tcn_data_structure

Reasons for Low Scores1. Actual low host specificity: Indicated when a large number of

collecting events are distributed across many plant taxa.2. Movement of adult specimens to alternative food sources:

Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.

3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.

Page 30: Schuh ecn2013 tcn_data_structure

Reasons for Low Scores1. Actual low host specificity: Indicated when a large number of

collecting events are distributed across many plant taxa.2. Movement of adult specimens to alternative food sources:

Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.

3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.

4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed.

Page 31: Schuh ecn2013 tcn_data_structure

Reasons for Low Scores1. Actual low host specificity: Indicated when a large number of

collecting events are distributed across many plant taxa.2. Movement of adult specimens to alternative food sources:

Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.

3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.

4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed.

5. Single collecting events: Indistinguishable from absolute host fidelity based on multiple events, except no confidence limit can be assessed. Heuristics such as presence of larvae and large numbers of specimens give credence to presumed association. Resolved only by further fieldwork.

Page 32: Schuh ecn2013 tcn_data_structure

Implication of ResultsImplication of Results

Page 33: Schuh ecn2013 tcn_data_structure
Page 34: Schuh ecn2013 tcn_data_structure

ConclusionsConclusions

Page 35: Schuh ecn2013 tcn_data_structure

1. Insect collections offer substantial data on host relationships even though a majority of the specimens lack such information.

2. Our algorithm demonstrates a method for assessing data quality on a large scale. Our initial analyses show that:

- We can have confidence in a significant proportion of the available information

- The data demonstrate a substantial degree of host specificity in our three target groups.

3. Degree of host specificity requires a scoring method that takes into account biological attributes, collecting techniques, and approaches to data capture in the field.

Page 36: Schuh ecn2013 tcn_data_structure

•Participating TCN and PBI Institutions•iDigBio•AMNH Database Data-entry Personnel•Participating TCN Data-entry Personnel•Michael D. Schwartz•National Science Foundation

Acknowledgments