The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database...
-
Upload
patrick-wilkins -
Category
Documents
-
view
220 -
download
0
Transcript of The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database...
![Page 1: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/1.jpg)
The BARCODE Data Standard:
CBOL’s Partnership with the International Nucleotide Sequence
Database Collaboration (INSDC)
David E. Schindel, Executive SecretaryNational Museum of Natural History
Smithsonian Institution
[email protected]; http://www.barcoding.si.edu202/633-0812; fax 202/633-2938
![Page 2: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/2.jpg)
Infrastructure of Taxonomy:Fragmented, Disconnected
• Collections and databases of specimens
• Seedbanks, culture/cell line collections
• Compilations of taxonomic names
• Floristic and faunistic surveys/inventories
• Monographs, Taxonomic revisions
• Data repositories (gene sequences, characters, images, trees)
• The (undigitized) Taxonomic Literature
![Page 3: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/3.jpg)
Linking Logical Categories (1):Specimens, Names, Opinions
Journal Publication
Species Name
Voucher Specimen
??
![Page 4: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/4.jpg)
Linking Logical Categories (2):Naming and defining species
Journal Publication
Species Name
Voucher Specimen
Holotype specimens
![Page 5: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/5.jpg)
Linking Logical Categories (3):Establishing species boundaries
Journal Publication
Species Name
Voucher Specimen
??
Species concept beyond holotype
- Paratype series - Typological versus population thinking - Genetic lineages - BSC (hard to apply)
![Page 6: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/6.jpg)
Linking Logical Categories (4):Interpreting species boundaries
Journal Publication
Species Name
Voucher Specimen
??
Other assigned specimens:
•Species philosophy of original author
•Interpretation of user
![Page 7: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/7.jpg)
Databases of Names, Specimens, Species Distributions
Journal Publication
Species Name
Voucher Specimen
Authority files of taxonomic
names
Museum databases of
associated dataDatabases of species
occurrences and distribution (OBIS)
![Page 8: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/8.jpg)
DNA Barcodes:A Key Variable for Biodiversity
Informatics
Journal Publication
Species Name
Voucher Specimen
Barcode Sequence
Authority files of taxonomic
names
Museum databases of
associated dataDatabases of species
occurrences and distribution (OBIS)
![Page 9: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/9.jpg)
CBOL’s Working Groups
• Database: Designing/constructing the Barcode Section of GenBank
• DNA: Protocols for formalin-fixed and old museum specimens; Producing LIMS for dissemination
• Data Analysis: Beyond phenetic methods; population genetics perspective
• (Plants: Initiated discussions of plant barcode gene region(s))
![Page 10: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/10.jpg)
BARCODE Data Standards• Consultations with GenBank, ITIS, museum
database developers, GBIF, ISIS, from 2004
• Consensus results of Front Royal meeting– GBIF ITIS GRIN– NBII Species2000 IPNI– ICZN ZooRecord OBIS
• GenBank Proposed to International Nucleotide Sequence Database Collaboration (EMBL, DDBJ)
• Approved by CBOL and INSDC mid-2005
![Page 11: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/11.jpg)
Reserved Keyword “BARCODE”• GenBank reviews records against standard
• Adds keyword “BARCODE” in annotation field
• Can be removed by CBOL
![Page 12: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/12.jpg)
Requirements• Species name selected from authority
• Sequence from COI or other barcode region approved by CBOL
• Structured link to voucher specimen
• Online access to metadata
• Trace files and quality scores
• Primer sequences and names
• Minimum sequence length (500bp for COI)
• Geographic locality
![Page 13: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/13.jpg)
Recommended fields, added to INSDC at CBOL’s request
• Latitude and longitude
• Name of the identifier
• Name of the collector
• Date of collection
![Page 14: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/14.jpg)
New Data Fields
Latitude/Longitude
Collection date
Collector’s name
Identifier’s name
![Page 15: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/15.jpg)
BARCODE Keyword in GenBank
![Page 16: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/16.jpg)
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature(link to content or
citation)
BARCODE Records in INSDC
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace filesOther
DatabasesPhylogenetic
Pop’n GeneticsEcological
Primers
Databases - Provisional sp.
![Page 17: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/17.jpg)
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature(link to content or
citation)
Structured link to Vouchers
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace filesOther
DatabasesPhylogenetic
Pop’n GeneticsEcological
Primers
Databases - Provisional sp.
![Page 18: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/18.jpg)
What constitutes a voucher?
• Long-term reference tied to BARCODE
• Corroborates the species identification
• Provides additional tissue
• CBOL relies on community decisions:– Full specimen?– Parts for morphologic features (e.g., feather?) – Frozen tissue?– E-Vouchers for large specimens, destructive
samples, catch-and-release?
![Page 19: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/19.jpg)
Where’s the voucher?
![Page 20: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/20.jpg)
Linking to Vouchers
Structured Voucher IDs
![Page 21: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/21.jpg)
• Based on Darwin Core
• Eventually will be replaced by GUID
• Triplet:
Institution Acronym : Collection : Specimen #
NMNH : FISH : 123456
• CBOL, GBIF and NCBI discussing global registry of:– Institutional acronyms– Collection codes– “Pre-accession” specimen IDs
Voucher Specimen ID
![Page 22: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/22.jpg)
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature(link to content or
citation)
Link to Species Names
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace filesOther
DatabasesPhylogenetic
Pop’n GeneticsEcological
Primers
Databases - Provisional sp.
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
![Page 23: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/23.jpg)
Species names in INSDC
![Page 24: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/24.jpg)
NCBI Taxonomy BrowserThe good, the bad, and the ugly
• Species names provided by submitters
• Checked against compilations
• Linkout to Catalogue of Life, other sources
• Names not found added to Taxonomy Browser
• Submitters informed of errors but not forced to make corrections
![Page 25: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/25.jpg)
NCBI Taxonomy Browser
![Page 26: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/26.jpg)
NCBI Taxonomy BrowserSome names have no other source
![Page 27: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/27.jpg)
Other names linked to GBIF and Catalogue of Life…
![Page 28: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/28.jpg)
…and primary data source
![Page 29: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/29.jpg)
Authoritative Species Lists
• Catalogue of Life
• Species lists compiled by barcoding projects– FISH-BOL from FishBase, CoF– MBI mosquito catalog
• Nomenclators
• NameBank
• New names in publications
• Eventually, central registries (e.g., ZooBank)
![Page 30: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/30.jpg)
Provisional Species ID• Uncertain identifications
• Species complexes
• Newly discovered variants
• Ecogenomic samples
• Need general guidelines to ensure:– Globally unique, – Stable, retrievable– Can’t be confused with valid species name
![Page 31: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/31.jpg)
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature(link to content or
citation)
BARCODE Records in INSDC
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace filesOther
DatabasesPhylogenetic
Pop’n GeneticsEcological
Primers
Databases - Provisional sp.
![Page 32: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/32.jpg)
Improving links to taxonomic journals
Connecting taxonomic articles
![Page 33: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/33.jpg)
Links to Taxonomic Literature• Library-Laboratory meeting in London,
2005, on electronic access to taxonomic literature
• Led to formation of Biodiversity Heritage Library initiative
• Proactive steps with PubMed to add taxonomic journals to online abstracts
• Aggressive negotiation with publishers of barcoding papers
• Involvement in Encyclopedia of Life
![Page 34: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/34.jpg)
Long-term data curationof BARCODE records
Data records assembled
IDs consistent with other records?
Compliant with BARCODE standards?
Data records released on
INSDC
Data records published in
BOLD
Community feedback
Update records
(audit trail of species names
retained)
CBOL control of BARCODE
flag
GenBank adds BARCODE flag
![Page 35: The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.](https://reader038.fdocuments.net/reader038/viewer/2022102911/5697bff01a28abf838cbae0d/html5/thumbnails/35.jpg)
Acknowledgements
Robert Hanner, University of Guelph, Chair of CBOL’s Database Working Group
Scott Federhen, NCBI Taxonomy Browser
Donald Hobern, Head of Informatics, GBIF