Post on 18-May-2015
description
The BARCODE Data Standard as a
Cross-Cultural Bridge
David E. Schindel, Executive SecretaryNational Museum of Natural History
Smithsonian InstitutionSchindelD@si.edu; http://www.barcoding.si.edu
202/633-0812; fax 202/633-2938
Gaining Large Scale Through Standards
Are our data meant only for small segregated communities of practice or bigger audiences?
Accelerate progress, Economies of scale– Re-use and new use of data, synthesis,
comparative analysis– Shared hardware and software– Standardized protocols, easier training and
technical assistance– Applications by non-specialists (regulatory
agencies, citizen scientists, K-12 classroom)
www.e-biosphere09.org
Species Identification MattersBasic research:– One more character set, but digital and calibrated– Standardized yardstick for measuring variability
and divergence– Objective comparison across taxa, distance– Links to Linnean names– Triage by non-specialists for species discovery– Ecology of juveniles, gut contents, fecal matter– Shallow phylogenies showing history of
community assemblages– Subject to weaknesses of any single character
(convergence, pseudogenes, introgression, etc.)
Species Identification MattersApplied research/regulation by non-specialistsAgricultural pests/beneficial speciesEndangered/protected species Disease vectors/pathogensEnvironmental quality indicatorsInvasive species (e.g., in ballast water)Managing for sustainable harvestingConsumer protection, ensuring food qualityFidelity of seedbanks, culture collections
6
An Internal ID System for All Animals
Typical Animal Cell
Mitochondrion
DNA
mtDNA
D-Loop
ND5
H-strand
ND4
ND4L
ND3COIII
L-strand
ND6
ND2
ND1
COII
Small ribosomal RNA
ATPase subunit 8
ATPase subunit 6
Cytochrome b
COICOI
The Mitochondrial Genome
Non-COI regions for other taxaLand plants:– Chloroplast matK and rbcL approved Nov 09
– 70-75% resolving ability, higher in angiosperms– Non-coding plastid and nuclear regions being
explored
Fungi:– CBOL Working Group met this week in Amsterdam– Agreed to recommend ITS; 72% effective
Protists:– CBOL Working Group July meeting, Berlin
How Barcoding Works
PHASE 1: Build a barcode reference library:– Well-identified specimen– Tissue subsample– DNA extraction, PCR amplification– DNA sequencing– Data submission to GenBank
PHASE 2: Identify unknowns:– Any unidentified juvenile, adult, fragment, product– Tissue sample, DNA, sequencing– Comparison with sequences in reference library
• Promote barcoding as a global standard
• Build participation• Working Groups• BARCODE standard• International
Conferences• Increase production
of public BARCODE records
Networks, Projects, Organizations
Barcode of Life Community1,264,000 specimens already barcoded from 104,500 species
Barcode of Life Data Systems (BOLD)University of Guelph
Workbench with 1.27M records, 105K species/OTUs
USER
/GenBank
Key
Mirroring
Update Channel
Private Records
BARCODE Record Flow Chart
BARCODE Records in GenBank
Submission of BARCODE Records to EBI and DDBJ
Canad
a
Unite
d Sta
tes
Europ
e
China
Austra
lia
Mex
ico
South
Afri
ca
Brazil
New Z
eala
nd
Norway
Argen
tina
Indi
a
Costa
Rica
Mad
agas
car
Panam
aPer
u
Pakist
an
Russia
Kenya
South
Kor
ea
Colom
bia
Saudi
Ara
bia
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
iBOL Barcodes By NodeB
arco
des
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature(link to content or
citation)
BARCODE Records in INSDC
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace filesOther
DatabasesPhylogenetic
Pop’n GeneticsEcological
Primers
Databases - Provisional sp.
Traditional Taxonomy
GSC Minimum Standards
(MI*)
Traditional GenBank
Voucher specimen ID XXX XXXSpecies ID XXX X X
Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin (country, ocean) XXX XLatitude/Longitude XXX XXX
Collection date, collector name XXX XXX
Trace files XXX XXPrimer information X XX
Linkout from GenBank to BOLD
ISBER: 13 May 2009
Linkout from GenBank to Taxonomy
ISBER: 13 May 2009
Link from GenBank to Museums
Darwin Core TripletStructured Link to Vouchers
Institutional Acronym
Collection Code
Catalog ID
: :
Structured Link to Vouchers
NHM LEP 123456: :
personal DHJanzen SRNP12345: :
NCBI’s Biorepository List
Compiled from Index Herbariorum, literature sources, GenBank submissions
6,936 records
1,177 records with non-unique acronyms
517 homonymous acronyms
374 shared by two records
143 shared by three records
AMNHIcelandic Institute of Natural History, Akureyri Division Akureyri Iceland
AMNH American Museum of Natural History New York USA
UNL Universidad Autónoma de Nuevo León Monterrey, Nuevo León Mexico
UNL University of Nebraska State Museum Lincoln, Nebraska USA
UNLCentro de Estratigrafia e Paleobiologia da Universidade Nova de Lisboa Monte de Caparica Portugal
ZMK Zoological Musem, Kristiania Oslo Norway
ZMK Zoologisches Museum der Universität Kiel Kiel Germany
ZMK Zoological Museum, Copenhagen Copenhagen Denmark
CBOL/GBIF/NCBI Registry of Biorepositories
www.biorepositories.org
Collecting events,
specimens
Specimen clustering
Formal naming
Comparisons, concept
validation
Taxon concept formation, refinement
BARCODE data release with provisional nomenclature (PLoS)
Specimen data release (GBIF)
Collaborative consensus-building of taxon concepts (CATE)
Accessibility
Two Taxonomic Research Processes
Sharing of non-BARCODE data (ScratchPads)
Long-term data curationof BARCODE records
Data records assembled in
BOLD
IDs consistent with other records?
Compliant with BARCODE standards?
Data records released on
INSDC
Data records published in
BOLD
Community feedback
Update records
(audit trail of species names
retained)
CBOL control of BARCODE
flag
GenBank adds BARCODE flag