VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern...

download VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011

of 24

  • date post

    05-Jan-2016
  • Category

    Documents

  • view

    215
  • download

    1

Embed Size (px)

Transcript of VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern...

  • VectorBase BRC OverviewScott EmrichBRC 2011 Annual MeetingUT Southwestern Medical CenterDallas, TX26-27 September 2011

  • VectorBasehttp://www.vectorbase.orgScott Emrich (on behalf of VectorBase consortium)University of Notre Dame

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011Upcoming vector genomesNHGRI White papersSandfliesLutzomyia longipalpisPhlebotomus papatasiAnopheles (AGCC)Anopheles arabiensisAnopheles quadriannulatusAnopheles merusAnopheles melasAnopheles christylAnopheles epiroticusAnopheles stephensiAnopheles maculatusAnopheles funestusAnopheles minimusAnopheles culicifaciesAnopheles farautiAnopheles dirusAnopheles atroparvusAnopheles albimanus

    GlossinaGlossina palpalisGlossina fuscipesGlossina pallidipesGlossina brevipalpisGlossina austeniStomoxys calcitransMusca domesticaSimuliumSimulium vittatumSimulium sirbanumSimulium damnosumSimulium ochraceumSimulium squamosumSimulium thyolenseSimulium santipauliSimulium woodiSimulium exiguum Simulium yahenseTick & MitesLeptotrombidium delienseIxodes scapularis*Dermacentor variabilisOrnithodorus turicataAnophelesAnopheles darlingi*Anopheles stephensiOthersAedesAedes albopictusCulex cluster?Aedes cluster?...

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011Summary of current contents

    GenomeGene setTranscriptomicsGene expressionPopGenAedes aegyptiAnopheles gambiaeCulex quinquefasciatusGlossina morsitansIxodes scapularisPediculus humanusRhodnius prolixus

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011Upcoming challenges We expect to receive over 30 vector genomes in the next 1-2 years

    Further, our community is generating -omics transcriptome data for emerging genomes that need to be integrated

    To address these issues, we introduced prerelease sites

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011Pre-sites for upcoming genomes

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011Pre-sites for upcoming genomesGenome browserBLAST search

  • VectorBasehttp://www.vectorbase.orgSupporting species without genomic resourcesBRC Meeting September 2011

  • VectorBase RNAseq dataLeslie Vosshall, Rockefeller University

  • VectorBasehttp://www.vectorbase.orgIntegrating experimental dataRNA-SeqBRC Meeting September 2011

  • VectorBasehttp://www.vectorbase.orgIntegrating legacy (BRC#1) annotation dataEBI Projection from reference Aim:

    Gene prediction using high quality reference set from a related species.

    Overview When annotating a species for which we have a closely related reference species we can align the genomes and project from the high quality set onto the new assembly. This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. Whole-genome alignment (WGA) between reference and target using BLASTz. Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. Project predictions through transformation of coordinates between reference and target assemblies.

    Summary Effective for low coverage and poor quality assemblies. Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction. BRC Meeting September 2011

  • Examples of integrating datahttp://funcgen.vectorbase.org/PopulationBETA/ Still under active developmentCurrently > 15k samples from 1600 field collectionsUC-Davis data IR-base dataNeafsey et al. SNP-chip data

  • GMOD natdiv consortium:

  • GMOD Natural Diversity moduleLightweight schemaAll objects defined by ontologiesGeneralSO / GO / PATOSpp. specificIDOMAL / MIROFlexiblecan handle all data from consortiumVector spp. & butterfliesRice & peaches

  • TGMA Mosquito Anatomy Ontology; CARO/BFO TADS Tick Anatomy Ontology; CARO/BFO MIRO Ontology of Insecticide Resistance IDOMAL Malaria Ontology; extension: transmission VBCV Ontology/CV for completion of PopGen OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al. New IDO extensions: IDODEN (with S. Lonzano & R. Scheuerman) and IDOCHAOntologies hosted by VB

  • VectorBasehttp://www.vectorbase.orgGoal: Anopheles gambiae reference Many issues with the PEST assembly as a reference S molecular form is proposed as the next reference Sanger*Illumina454Hybrid assembly strategyMetrics of success

    Project existing gene predictions de novo prediction in novel regions Re-map important datasets

    BRC Meeting September 2011

  • VectorBasehttp://www.vectorbase.orgKolymbari Meeting July 2011Anopheles gambiae reference sequenceValidation of the assembly by normal metricsEmphasis on the concordance with large scale restriction map (optical map)

  • VectorBasehttp://www.vectorbase.orgBRC Meeting September 2011AcknowledgementsVEMBL-EBIImperial CollegeDaniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey

    Fotis Kafatos Bob MacCallum George Christophides Seth Redmond

    NoTre DameHaRvardIMBBNew MexicOASequencersEnsEmblMaggie Werner-Washburne Phil Baker

    Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell

    Kitsos Louis Pantelis Topalis Emmanuel Dialynas

    TIGR/JCVI WashU Broad Institute Baylor

    Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo

  • Still developing visualisations & curating dataBut have a dev site up already have imported all the UC Davis data & IR-baseAs well as neafsey data we saw in the genome browser. 15,000 samples *We entered into a collaboration with a diverse group of other sites including several other model organism DBs. and GMOD generic model organism database (extension of DB used in flybase)What we came up with*SMALL database under 25 tables. (ensembl core = 78 + variation = 38) (116 combined)Not species specific uses general ontologies and ones kitsos described. for new spp group, plug in new ontology

    Very flexible.

    On top of this weve built*(*) Sanger assembly is NHGRI funded one assembled by JCVI using data from JCVI/WashU() 150x coverage of Illumina from Broad Institute - Dan Neasey will update Wed 09:40

    normal metrics - N50 values and extended set of metrics from assemblathonCoverage statistics for EST/RNA-Seq and re-sequencing experiments