Post on 05-Jan-2016
VectorBase BRC Overview
Scott Emrich
BRC 2011 – Annual Meeting
UT Southwestern Medical Center
Dallas, TX
26-27 September 2011
VectorBasehttp://www.vectorbase.org
Scott Emrich (on behalf of VectorBase consortium)University of Notre Dame
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Upcoming vector genomes
NHGRI White papers
SandfliesLutzomyia longipalpisPhlebotomus papatasi
Anopheles (AGCC)Anopheles arabiensisAnopheles quadriannulatusAnopheles merusAnopheles melasAnopheles christylAnopheles epiroticusAnopheles stephensiAnopheles maculatusAnopheles funestusAnopheles minimusAnopheles culicifaciesAnopheles farautiAnopheles dirusAnopheles atroparvusAnopheles albimanus
GlossinaGlossina palpalisGlossina fuscipesGlossina pallidipesGlossina brevipalpisGlossina austeniStomoxys calcitransMusca domestica
SimuliumSimulium vittatumSimulium sirbanumSimulium damnosumSimulium ochraceumSimulium squamosumSimulium thyolenseSimulium santipauliSimulium woodiSimulium exiguum Simulium yahense
Tick & MitesLeptotrombidium delienseIxodes scapularis*Dermacentor variabilisOrnithodorus turicata
AnophelesAnopheles darlingi*Anopheles stephensi
Others
AedesAedes albopictus
Culex cluster?
Aedes cluster?
...
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Summary of current contents
Genome Gene setTranscriptomic
sGene
expressionPopGen
Aedes aegypti ✓ ✓ ✓ ✓ ✕
Anopheles gambiae ✓ ✓ ✓ ✓ ✓
Culex quinquefasciatus ✓ ✓ ✕ ✓ ✕
Glossina morsitans ✓ ✓ ✓ ✕ ✕
Ixodes scapularis ✓ ✓ ✕ ✕ ✕
Pediculus humanus ✓ ✓ ✕ ✕ ✕
Rhodnius prolixus ✓ ✓ ✓ ✕ ✕
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Upcoming challenges
• We expect to receive over 30 vector genomes in the next 1-2 years
• Further, our community is generating “-omics” transcriptome data for emerging genomes that need to be integrated
• To address these issues, we introduced “prerelease” sites
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Pre-sites for upcoming genomes
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Pre-sites for upcoming genomes
Genome browser BLAST search
VectorBasehttp://www.vectorbase.org
Supporting species without genomic resources
BRC MeetingSeptember 2011
VectorBase RNAseq data
Leslie Vosshall, Rockefeller University
VectorBasehttp://www.vectorbase.org
Integrating experimental data
RNA-Seq
BRC MeetingSeptember 2011
VectorBasehttp://www.vectorbase.org
Integrating legacy (BRC#1) annotation data
EBI Projection from reference Aim:
• Gene prediction using ‘high’ quality reference set from a related species.
Overview• When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly.• This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly.• Whole-genome alignment (WGA) between reference and target using BLASTz.• Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. • Project predictions through transformation of coordinates between reference and target assemblies.
Summary• Effective for low coverage and poor quality assemblies.• Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction.
BRC MeetingSeptember 2011
Examples of integrating data
http://funcgen.vectorbase.org/PopulationBETA/
• Still under active development• Currently > 15k samples from 1600 field collections
UC-Davis data IR-base dataNeafsey et al. SNP-chip data
GMOD natdiv consortium:
GMOD Natural Diversity module
• Lightweight schema– All objects defined by ontologies
• General– SO / GO / PATO
• Spp. specific– IDOMAL / MIRO
• Flexible– can handle all data from consortium
• Vector spp. & butterflies• Rice & peaches
TGMA – Mosquito Anatomy Ontology; CARO/BFO
TADS – Tick Anatomy Ontology; CARO/BFO
MIRO – Ontology of Insecticide Resistance
IDOMAL – Malaria Ontology; extension: transmission
“VBCV” – Ontology/CV for “completion” of PopGen
OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al.
New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA”
TGMA – Mosquito Anatomy Ontology; CARO/BFO
TADS – Tick Anatomy Ontology; CARO/BFO
MIRO – Ontology of Insecticide Resistance
IDOMAL – Malaria Ontology; extension: transmission
“VBCV” – Ontology/CV for “completion” of PopGen
OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al.
New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA”
Ontologies hosted by VB
VectorBasehttp://www.vectorbase.org
Goal: Anopheles gambiae reference
• Many issues with the PEST assembly as a reference• S molecular form is proposed as the next reference
Sanger*
Illumina†
454
Hybrid assembly strategy
Metrics of
success
• Project existing gene predictions• de novo prediction in novel regions• Re-map important datasets
BRC MeetingSeptember 2011
VectorBasehttp://www.vectorbase.org
Kolymbari MeetingJuly 2011
Anopheles gambiae reference sequence
Validation of the assembly by normal metricsEmphasis on the concordance with large scale restriction map (optical map)
VectorBasehttp://www.vectorbase.org
BRC MeetingSeptember 2011
Acknowledgements
VEMBL-EBI
Imperial College
Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey
Fotis Kafatos Bob MacCallum George Christophides Seth Redmond
NoTre Dame
HaRvardIMBB
New MexicO
ASequencers
EnsEmbl
Maggie Werner-Washburne Phil Baker
Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell
Kitsos Louis Pantelis Topalis Emmanuel Dialynas
TIGR/JCVI WashU Broad Institute Baylor
Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo