VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center...

24
VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011

Transcript of VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center...

Page 1: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBase BRC Overview

Scott Emrich

BRC 2011 – Annual Meeting

UT Southwestern Medical Center

Dallas, TX

26-27 September 2011

Page 2: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Scott Emrich (on behalf of VectorBase consortium)University of Notre Dame

Page 3: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.
Page 4: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.
Page 5: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.
Page 6: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Upcoming vector genomes

NHGRI White papers

SandfliesLutzomyia longipalpisPhlebotomus papatasi

Anopheles (AGCC)Anopheles arabiensisAnopheles quadriannulatusAnopheles merusAnopheles melasAnopheles christylAnopheles epiroticusAnopheles stephensiAnopheles maculatusAnopheles funestusAnopheles minimusAnopheles culicifaciesAnopheles farautiAnopheles dirusAnopheles atroparvusAnopheles albimanus

GlossinaGlossina palpalisGlossina fuscipesGlossina pallidipesGlossina brevipalpisGlossina austeniStomoxys calcitransMusca domestica

SimuliumSimulium vittatumSimulium sirbanumSimulium damnosumSimulium ochraceumSimulium squamosumSimulium thyolenseSimulium santipauliSimulium woodiSimulium exiguum Simulium yahense

Tick & MitesLeptotrombidium delienseIxodes scapularis*Dermacentor variabilisOrnithodorus turicata

AnophelesAnopheles darlingi*Anopheles stephensi

Others

AedesAedes albopictus

Culex cluster?

Aedes cluster?

...

Page 7: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Summary of current contents

Genome Gene setTranscriptomic

sGene

expressionPopGen

Aedes aegypti ✓ ✓ ✓ ✓ ✕

Anopheles gambiae ✓ ✓ ✓ ✓ ✓

Culex quinquefasciatus ✓ ✓ ✕ ✓ ✕

Glossina morsitans ✓ ✓ ✓ ✕ ✕

Ixodes scapularis ✓ ✓ ✕ ✕ ✕

Pediculus humanus ✓ ✓ ✕ ✕ ✕

Rhodnius prolixus ✓ ✓ ✓ ✕ ✕

Page 8: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Upcoming challenges

• We expect to receive over 30 vector genomes in the next 1-2 years

• Further, our community is generating “-omics” transcriptome data for emerging genomes that need to be integrated

• To address these issues, we introduced “prerelease” sites

Page 9: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Pre-sites for upcoming genomes

Page 10: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Pre-sites for upcoming genomes

Genome browser BLAST search

Page 11: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Supporting species without genomic resources

BRC MeetingSeptember 2011

Page 12: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBase RNAseq data

Leslie Vosshall, Rockefeller University

Page 13: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Integrating experimental data

RNA-Seq

BRC MeetingSeptember 2011

Page 14: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Integrating legacy (BRC#1) annotation data

EBI Projection from reference Aim:

• Gene prediction using ‘high’ quality reference set from a related species.

Overview• When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly.• This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly.• Whole-genome alignment (WGA) between reference and target using BLASTz.• Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. • Project predictions through transformation of coordinates between reference and target assemblies.

Summary• Effective for low coverage and poor quality assemblies.• Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction.

BRC MeetingSeptember 2011

Page 15: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

Examples of integrating data

http://funcgen.vectorbase.org/PopulationBETA/

• Still under active development• Currently > 15k samples from 1600 field collections

UC-Davis data IR-base dataNeafsey et al. SNP-chip data

Page 16: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.
Page 17: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.
Page 18: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

GMOD natdiv consortium:

Page 19: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

GMOD Natural Diversity module

• Lightweight schema– All objects defined by ontologies

• General– SO / GO / PATO

• Spp. specific– IDOMAL / MIRO

• Flexible– can handle all data from consortium

• Vector spp. & butterflies• Rice & peaches

Page 20: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

TGMA – Mosquito Anatomy Ontology; CARO/BFO

TADS – Tick Anatomy Ontology; CARO/BFO

MIRO – Ontology of Insecticide Resistance

IDOMAL – Malaria Ontology; extension: transmission

“VBCV” – Ontology/CV for “completion” of PopGen

OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al.

New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA”

TGMA – Mosquito Anatomy Ontology; CARO/BFO

TADS – Tick Anatomy Ontology; CARO/BFO

MIRO – Ontology of Insecticide Resistance

IDOMAL – Malaria Ontology; extension: transmission

“VBCV” – Ontology/CV for “completion” of PopGen

OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al.

New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA”

Ontologies hosted by VB

Page 21: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Goal: Anopheles gambiae reference

• Many issues with the PEST assembly as a reference• S molecular form is proposed as the next reference

Sanger*

Illumina†

454

Hybrid assembly strategy

Metrics of

success

• Project existing gene predictions• de novo prediction in novel regions• Re-map important datasets

BRC MeetingSeptember 2011

Page 22: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

Kolymbari MeetingJuly 2011

Anopheles gambiae reference sequence

Validation of the assembly by normal metricsEmphasis on the concordance with large scale restriction map (optical map)

Page 23: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

VectorBasehttp://www.vectorbase.org

BRC MeetingSeptember 2011

Acknowledgements

VEMBL-EBI

Imperial College

Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey

Fotis Kafatos Bob MacCallum George Christophides Seth Redmond

NoTre Dame

HaRvardIMBB

New MexicO

ASequencers

EnsEmbl

Maggie Werner-Washburne Phil Baker

Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell

Kitsos Louis Pantelis Topalis Emmanuel Dialynas

TIGR/JCVI WashU Broad Institute Baylor

Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo

Page 24: VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.