p resentation by Randall Schuh, American Museum of Natural History

36
TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri- Trophic Associations Ten months later… presentation by Randall Schuh, American Museum of Natural History Rob Naczi, New York Botanical Garden Christiane Weirauch, University of California Riverside Katja Seltmann, American Museum of Natural History http://tcn.amn h.org

description

NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later…. p resentation by Randall Schuh, American Museum of Natural History Rob Naczi, New York Botanical Garden - PowerPoint PPT Presentation

Transcript of p resentation by Randall Schuh, American Museum of Natural History

Page 1: p resentation by Randall Schuh, American Museum of Natural History

NSF ADBC Digitization TCN-TTDPlants, Herbivores, and ParasitoidsA Model System for the study of Tri-Trophic Associations

Ten months later…

presentation by

Randall Schuh, American Museum of Natural HistoryRob Naczi, New York Botanical GardenChristiane Weirauch, University of California RiversideKatja Seltmann, American Museum of Natural History

,

http://tcn.amnh.org

Page 2: p resentation by Randall Schuh, American Museum of Natural History

The Tri-Trophic ApproachCapturing Data for the Nearctic Biota

•85% of 11,000 Hemiptera from the Nearctic are herbivorous with high host specificity

•Bias in plant groups attacked, e.g., , Pinaceae, Poaceae, Asteraceae, Chenopodiaceae, Rosaceae

•Some serious agricultural pests (armored scales, mealy bugs, potato leafhoppers, Lygus bugs)

•Vectors of viral and bacterial diseases (green peach aphid is a vector of over 100 plant viruses)

•Parasitic Hymenoptera are beneficial as biological control agents

Page 3: p resentation by Randall Schuh, American Museum of Natural History

MICHMICH

MOMO

NYBGNYBGEMC

WIS

MIN

KANU

ISC

COLO

MAINE

MU

TEX

ILLILLS

Botanical InstitutionsBotanical Institutions

Page 4: p resentation by Randall Schuh, American Museum of Natural History

MICHMICH

MOMO

NYBGNYBGEMC

WIS

MIN

KANU

ISC

COLO

MAINE

MU

TEX

ILLILLS

SEINET

CCH

CPNH

Botanical InstitutionsBotanical InstitutionsBotanical Data ProvidersBotanical Data Providers

Page 5: p resentation by Randall Schuh, American Museum of Natural History

MICHMICH

MOMO

NYBGNYBGEMC

WIS

MIN

KANU

ISC

COLO

MAINE

MU

TEX

ILLILLS

SEINET

CCH

CPNH

AMNHCDFA

UCRC

CAS

BPBM

MEM

CMNHINHS

CUIC

CSUC

TAMU

OSAC

NCSU

SEMC

UDCCEMEC

UMEC

UKIC

Botanical InstitutionsBotanical InstitutionsBotanical Data ProvidersBotanical Data ProvidersEntomological CollectionsEntomological Collections

Page 6: p resentation by Randall Schuh, American Museum of Natural History

Project management

• Steering Committee of 10 PIs + Project Manager▫Decision-making on overall project goals, directions, and

progress

• Full-time Project Manager at AMNH (Katja Seltmann)▫Day-to-day project management, technical capability, data

analysis, training of entomology partners, vetting and upload of authority files, centralized georeferencing

• Full-time Project Coordinator at NYBG (Kim Watson)▫Training of botany partners, barcoding of NYBG specimens,

and label-data capture for all partner institutions

Page 7: p resentation by Randall Schuh, American Museum of Natural History

Entomological Databasing

Page 8: p resentation by Randall Schuh, American Museum of Natural History

Streamlined Interface for Rapid Data Entry

Taxon names

Locality data

Collection Events

Specimen Data

Host names

Page 9: p resentation by Randall Schuh, American Museum of Natural History

Database Attributes•Web enabled•Open-source software•Centralized data storage, backup, and management

Database Benefits•Single-product management•Simplified user training•Centralized authority-file management•Centralized georeferencing•Data aggregation shifted to HUB and DiscoverLife.org

Page 10: p resentation by Randall Schuh, American Museum of Natural History

Authority Files

Botanical• Tropicos database used across entire project

Entomological• Published catalogs and unpublished lists from

specialists

Objectives• Present uniform up-to-date taxonomy• Reduce decision making by data-entry personnel• Limit entry of new names by data-entry personnel

Page 11: p resentation by Randall Schuh, American Museum of Natural History

Data Aggregation and Dissemination------------------------

leveraging DiscoverLife.org

Page 12: p resentation by Randall Schuh, American Museum of Natural History
Page 13: p resentation by Randall Schuh, American Museum of Natural History
Page 14: p resentation by Randall Schuh, American Museum of Natural History

Approaches to OutreachAMNH Short Course in Collection Databasing Fundamentals• Train graduate-students through participant-support funding• Involve students from multiple graduate programs• Provide fundamentals, including database options, data

structures, unique specimen identification, specimen handling, georeferencing, research tools, data dissemination

Undergraduate Research Projects• REU projects joining project data to student research

involvement

Community Outreach• http://research.amnh.org/pbi/heteropteraspeciespage/

Page 15: p resentation by Randall Schuh, American Museum of Natural History

Rob NacziNew York Botanical Garden

Page 16: p resentation by Randall Schuh, American Museum of Natural History

Botanical Specimen Imaging

Page 17: p resentation by Randall Schuh, American Museum of Natural History

Insect Specimen Imaging• Image representative

specimens for each species

•Use existing imaging stations at partner institutions

•About 30% of Hemiptera are already imaged

•Expect to produce about 20,000 new images

Page 18: p resentation by Randall Schuh, American Museum of Natural History

Use of OCR for Populating Botanical Records

Workflow• jpgs of specimen sheets batch-cropped to labels• labels saved as new set of jpgs, then exported to ABBYY Fine

Reader 11 Corporate Edition• overnight, labels batch-processed through ABBYY• each OCR output file saved as individual text file tied to

barcode no.• individual text files merged into Excel spreadsheet, in which

data can be searched, grouped, and parsed• parsed fields pushed to database

Challenges• increasing accuracy of parsing• hand-written labels (now experimenting with out-sourcing)

Page 19: p resentation by Randall Schuh, American Museum of Natural History

Data Storage Issues

Botany• botanical images are valuable products of our digitization

efforts, but also challenges, due to storage demands• our concern is with long-term storage (archiving) of

uncompressed, original images• have encouraged home institutions of our partners to step

up, but some unable/unwilling• our solution for now is storage on portable drives, but this is

tenuous fix and not reliable enough for truly archival storage

Entomology• no major issues

Page 20: p resentation by Randall Schuh, American Museum of Natural History

Christiane WeirauchUniversity of California Riverside

Page 21: p resentation by Randall Schuh, American Museum of Natural History

Subcontract ManagementSetup• 7 collaborating institutions, 27 subawards• Benefit: long-term data capture across >30 institutions

Issues1) Delays: administrative and accounting issues2) Database selection: which one to use?3) Training: onsite versus remote training?4) Tracking productivity of subawards not using PBI database

Solutions/suggestions1) Streamlined administrative and accounting procedures 2) Encourage use of a default database; more discussion3) Combination of onsite and remote training and monitoring4) Regular contact with subawards

Page 22: p resentation by Randall Schuh, American Museum of Natural History

Unique Specimen Identifiers (USIs)

AMNH Matrix-code labels

• Setup: Matrix codes (barcode scanner) and string of prefix and 8-digit number (human eye) encode the same unique identifier

• Benefit: Tracking of specimens; connect images to records

• Format: Prefix (8 characters): acronym and identifier: e.g., UCRC_ENT XXXXXXXX

•Non-standard USIs: accepted in the database

• Exceptions: collections that were previously databased without USIs (e.g., Aphidoidea, certain mirid taxa)

Page 23: p resentation by Randall Schuh, American Museum of Natural History

Collection StagingOrganizing, sorting, and identifying specimens in preparation for databasing

• Importance: highest identification level and accuracy will yield most useful data for future applications

• Priority: well-curated and well-identified collections• TTD: limited budget for staging by experts; very successful

for , e.g., Miridae and Membracidae

• Issue: routine staging more time-consuming than anticipated

• Possible solution: budget for graduate students or post docs to help with staging (and training/supervision of databasing crew)

Page 24: p resentation by Randall Schuh, American Museum of Natural History

Tri-trophic concept: Hemiptera, plants, parasitoidsCapture of host data

•New TTD records: 26% with host records (compared to 24% previously databased); added >800 new hosts

Challenges of integrating parasitoid data

•Level of identification of parasitoids (undescribed species; accurate identification requires skilled personnel)

•Level of host identification (e.g., “white fly”)

•Incorporation of host information from secondary sources (e.g., taxonomic literature)?

On the right track; prioritize specimens with quality host records & integrate secondary host information

Page 25: p resentation by Randall Schuh, American Museum of Natural History

Katja SeltmannThe American Museum of Natural History

Page 26: p resentation by Randall Schuh, American Museum of Natural History

Efficiency of Data Capture: Insects• Total as of October 17, 2012 = 198,409▫ Includes Illinois, Texas, and Kansas▫ All 20 subcontracts are digitizing now▫ 53 contributors for ttd-tcn project

Numbers from NHCR database (central database at AMNH – 11 subcontracts)

• $20,000 in equipment costs• Specimens per min average: 3-3.5min/specimen (range 1.2-6)• Cost per specimen: $.93 (includes equipment)• Peak in July (more hours digitizing)• 65 collecting events on Christmas Day

Page 27: p resentation by Randall Schuh, American Museum of Natural History

Efficiency of Data Capture: PlantsAll but three institutions up and running

• As of October 9, 2012 have 102,651 images▫3 of 15 institutions not yet begun

• 4 plant collections report:▫$30482.51 equipment costs▫$.73 cents a specimen image

▫The unmentioned curator volunteerism 4-8 hrs/week depending on institution/taxon ~19 hours a week total

Page 28: p resentation by Randall Schuh, American Museum of Natural History

Training Methods: Insects (NHCR Database)• Curators also training (sexing specimens, database)

• Online training via Skype▫ Digitizers clubhouse (building community)▫ Online manuals▫ Online videos▫ Remote training

• Using central db can access quality of data▫ Flag when new name is entered▫ Flag when more than 10 specimens entered in one min by one person▫ Flag when exact duplicate collecting events or localities (check training)

Page 29: p resentation by Randall Schuh, American Museum of Natural History

Training Methods: Plants▫Site visits to subcontract institutions

Kim Watson, Melissa Tulig Install imaging equipment Personal involvement

Page 30: p resentation by Randall Schuh, American Museum of Natural History

Quality Assessment of Transformed Records (NHCR)

Determination

Completeness

Note Language

(A,B,B) ; (A,A,A) ; (A,C,B)

Page 31: p resentation by Randall Schuh, American Museum of Natural History

Present total:1487 9134

Canada 14 96

USA 1441 8564

Mexico 32 474

Georeferencing: NHCR database130,000 specimen records

Page 32: p resentation by Randall Schuh, American Museum of Natural History

Georeferencing: NHCR database

•GEOLocate (North America)•Discover Life validation•Centralized and controlled georeferencing (NYBG, AMNH)•Volunteer georeferencing

Page 33: p resentation by Randall Schuh, American Museum of Natural History

Difficult data Issues: specimen relationships

Page 34: p resentation by Randall Schuh, American Museum of Natural History

Difficult data Issues: means for curation?

Page 35: p resentation by Randall Schuh, American Museum of Natural History

Summary and Predictions:• over 50,000 locality records from NHCR

• will reach 1 million new specimen records for insects (harder to predict for plants at the moment)

• less than $1 a specimen (inclusive)

• Arthropod (NHCR) data concerns will become more central as other groups come online

Page 36: p resentation by Randall Schuh, American Museum of Natural History

Thanks to

National Science Foundationco-PIs and collaborators

http://tcn.amnh.org