Valery Gorohovsky & Shmuel Koyas Supervisor: Boaz Mizrachi Spring 2012
Ilene Mizrachi - Opening Plenary
-
Upload
consortium-for-the-barcode-of-life-cbol -
Category
Education
-
view
595 -
download
2
description
Transcript of Ilene Mizrachi - Opening Plenary
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USANational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
BARCODE SEQUENCE DATAFLOW INTO GENBANK
Ilene MizrachiNovember 30, 2011
Fourth International Barcode of Life Conference
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Barcode Project -2003 and beyondBarcode of Life project was initiated at in
2003 INSDC would be the repository for raw and
assembled sequence dataINSDC adopts new source fields to
accommodate Barcode metadata requirements
Barcode of Life Database (BOLD) established as a community workbench and sequencing center
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
What is a Barcode? A global reference library of DNA barcode
sequences that is integrated with other systems of biodiversity information (e.g., databases of specimens, species, biogeographic information).
Mechanism to link DNA sequences to vouchered specimens and valid species names.
A reserved BARCODE keyword was adopted for data that met strict barcode standards
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Barcode Standard Formally described species or a provisional label for an unpublished
species Voucher specimen identifier, preferably in a biorepository using a
structured field Country-Code using the controlled vocabulary used by GenBank; Sequence from a gene region specified by the CBOL
COI for animals matK and rbcL for plants ITS for fungi
Contain at least 75% contiguous, high quality bases from within the approved region
Electropherogram trace files for bidirectional sequencing runs Sequences of all forward and reverse primers
Strongly recommended data elements GPS coordinates Name of the identifier Name of the collector Date of collection
Compliant Barcode Record
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Barcode records in GenBank
Life of an iBOL Record
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Submissions from BOLD
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Data Sharing Works
http://www.ncbi.nlm.nih.gov/WebSub/?tool=barcode
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
QA checks at GenBankTo ensure that the sequence data is of high quality, the following checks are run:Barcode data element complianceConsistency checks such as:
reported latitude-longitude falls within cited country
collection date has already occurredSequence quality checks
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Compliance tool
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Checking Sequence Quality
• Trim primer sequences• Check congruence
between fwd and reverse reads
• Align sequences to check for gaps
• Translate sequences to check for internal stops
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Updates Are CriticalPrimary data repository – sequence records
owned by submitter Submitter is responsible for providing
additional data and metadata as it becomes available:PublicationSequenceTaxonomyVoucher
Third party updates are welcome!
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
ChallengesIf Reference Barcodes are to be used for species
identification, phylogenetics, ecological forensics, conservation, and macro-analysis of biodiversity patterns, then the minimal requirement should be (a) high quality sequence (b) link to specimen and (c) taxonomic identification
Need to support rapid data release including preliminary taxonomic classifications similar to “Fort Lauderdale Principles” of genomics community
Data updated asynchronously at BOLD and in GenBank. Need to continue work on update channel
Need to work with communities to devise strict QA tests for plant and fungal Barcodes
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
AcknowledgementsTaxonomy Group
Scott FederhenConrad SchochLu SunCarol HottonDetlef Leipe
GenBank GroupSusan Schafer Michael Fetchko
Software SupportColleen BollinKamen TodorovVasuki Gobu