April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with...

51
April 2006 March 2007 March 2007 Xos Xos é Mª Fernández é Mª Fernández European Bioinformatics Institute European Bioinformatics Institute Browsing Genomes with Ensembl Browsing Genomes with Ensembl
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with...

Page 1: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

April 2006March 2007March 2007

XosXosé Mª Fernándezé Mª FernándezEuropean Bioinformatics InstituteEuropean Bioinformatics Institute

Browsing Genomes with EnsemblBrowsing Genomes with Ensembl

Page 2: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

2 of 50

• Overview of Ensembl• Making genomes useful• Beyond Ensembl

Outline of talkOutline of talk

Page 3: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

3 of 50

• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation

• Making genomes useful• Beyond Ensembl

Outline of talkOutline of talk

Page 4: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

4 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases

and APIs)

Page 5: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

5 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 6: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

6 of 50

Beyond classical Beyond classical ab initioab initio gene predictiongene prediction

• Ensembl automatic gene prediction relies on homology ‘supporting evidence’ to avoid overprediction.

• Classical ab initio gene prediction (eg GENSCAN) relies partly on global statistics of protein coding potentials, not used in the cell

• Genes are just a series of short signals– Transcription start site– Translation start site– 5’ & 3’ Intron splicing signals– Termination signals

• Short signal sequences difficult to recognise over background noise in large genomes

Page 7: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

7 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 8: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

8 of 50

Ensembl v43Ensembl v43

Page 9: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

9 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 10: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

10 of 50

http://www.dasregistry.orghttp://www.dasregistry.org

DAS DAS RegistryRegistry

Page 11: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

11 of 50

DASDAS

Page 12: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

12 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.orghttp://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 13: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

13 of 50

PrPre! and Archiv and Archive! sites sites

http://pre.ensembl.org

http://www.ensembl.org

http://archive.ensembl.org

Page 14: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

14 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases and

APIs)

Page 15: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

15 of 50

• Object model– standard interface makes it easy for others to build

custom applications on top of Ensembl data

• Open discussion of design ([email protected])• Most major pharma and many academics represented

on mailing list and code is being actively developed externally

• Ensembl locally– Both industry & academia

Open source open Open source open standardsstandards

Page 16: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

16 of 50

Ensembl – Open sourceEnsembl – Open source

Page 17: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

17 of 50

Ensembl - ProjectEnsembl - Project

• Joint project– EMBL – European Bioinformatics Institute (EBI) – Wellcome Trust Sanger Institute

• Produce accurate, automatic genome annotation• Focused on selected eukaryotic genomes • Integrate external (distributed) biological data• Presentation of the analysis to all via the Web at

http://www.ensembl.org • Open distribution of the analysis the community• Development of open, collaborative software (databases

and APIs)

Page 18: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

18 of 50

APIsAPIs• Used to retrieve data from and to store data

in Ensembl databases.• Ensembl Perl API;

– Written in Object-Oriented Perl,

– Foundation for the Ensembl Pipeline and Ensembl Web interface.

Page 19: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

19 of 50

• Overview of Ensembl– Ensembl - Project– Exploring genomes– Gene annotation

• Making genomes useful• Beyond Ensembl

Page 20: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

20 of 50

Making genomes usefulMaking genomes useful• Interpretation

– Where are the interesting parts of the genome?– What do they do?– How are they related to elements in other

genomes?• Access

– for bench biologists– for non-programming mid-scale groups– for good programming groups

Page 21: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

21 of 50

Access… bench biologistsAccess… bench biologists• Mainly via the web• Web site designed for non programming, not

that genome aware biologist– Simple things to find are simple to find– Graphically displays and overviews– Consistency of layout, colour and text

Page 22: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

22 of 50

Analysis DB

CPU

Final DB

SupportingDatabases

SNP

ManualAnnotation

EnsemblEnsembl

Page 23: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

23 of 50

Genome browsingGenome browsingwhy present the whole genome?why present the whole genome?

• Explore what is in a chromosome region• See features in and around a specific gene• Search & retrieve across the whole genome• Investigate genome organization• Compare to other genomes

Page 24: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

24 of 50

Introduction to the Introduction to the

Ensembl web siteEnsembl web site Ensembl … …

takes genomic sequence assemblieshuman build 36, mouse, rat, mosquito…

adds annotation and links automated process

presents all the data on a web site

Page 25: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

25 of 50

Basic Genome AnnotationBasic Genome Annotation

• Genes– Genomic location– Gene model structures

• Exons• Introns• UTRs

– Transcript(s)

• Pseudogenes• Non-coding RNA

– Protein(s)– Links to other sources of information

Page 26: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

26 of 50

Advanced Genome AnnotationAdvanced Genome Annotation

• Cytogenetic bands• Polymorphic markers

– Sequence Tagged Sites (STS)

• Genetic variation– Single Nucleotide Polymorphisms (SNPs)

– Deletion-Insertion Polymorphisms (DIPs)

– Short Tandem Repeats (STRs)

• Repetitive sequences• Expressed Sequence Tags (ESTs)• cDNAs or mRNAs from related species• Regions of sequence homology

Page 27: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

27 of 50

How to get started … …How to get started … …

• Species homepage

• Map View

• Text search

• BLAST

• SSAHA

Page 28: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

28 of 50

HomepageHomepage

Page 29: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

MapViewMapView

Page 30: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

30 of 50

BLAST and SSAHABLAST and SSAHA

See blast hit on genome

Page 31: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

31 of 50

Regions, maps and markersRegions, maps and markers

MarkerView

SNPView

GeneSNPView

ContigView

CytoView

SyntenyView

MultiContigView

Page 32: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

EnsemblEnsemblContigView

Page 33: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

33 of 62

ContigViewContigView close-up

Transcriptsred & black(Ensembl predictions)Blue (Vega) & gold (HAVANA, only in human)

Pop-up menu

Page 34: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

34 of 62

ContigViewContigView - Navigation

Click and drag mouse to select region

Page 35: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

CytoViewCytoView

Page 36: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

GeneSNPGeneSNPViewView

Page 37: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

SNPViewSNPView

Page 38: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

MarkerViewMarkerView

Page 39: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

MultiContigViewMultiContigView

Page 40: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

40 of 50

Genes & gene productsGenes & gene products

GeneView

TransViewExonView

ProteinView

FamilyView

GOView

Page 41: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

EnsemblEnsemblGeneView

Page 42: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

ExonViewExonView

TransViewTransView

Page 43: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

ProteinProteinViewView

Page 44: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

FamilyFamilyViewView

Page 45: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

GOViewGOView

Page 46: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

46 of 50

Data retrievalData retrieval

BioMart

Data sets on ftp site

MySQL queries of databases

Perl API access to databases

Export View

Page 47: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

ExportViewExportView

Page 48: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

48 of 50

Help!Help!

• context sensitive help pages - click

• access other documentation via generic home page

• email the helpdesk

Page 49: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

49 of 50

Ensembl TeamEnsembl TeamJuly 2006July 2006

Page 50: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

50 of 50

Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute)

Database Schema and Core API Glenn Proctor, Andreas Kähäri, Ian Longden, Patrick Meidl

BioMart Arek Kasprzyk, Damian Smedley, Richard Holland, Syed Haider

Distributed Annotation System (DAS) Eugene Kulesha

Outreach Xosé M Fernández, Bert Overduin, Giulietta Spudich, Michael Schuster

Web TeamJames Smith, Bethan Pritchard, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion (VEGA), Matt Wood

Comparative GenomicsAbel Ureta-Vidal, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Javier Herrero Sánchez, Albert Vilella

Analysis and Annotation PipelineVal Curwen, Steve Searle, Bronwen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White

Functional Genomics Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios

Zebrafish Annotation Kerstin Howe, Mario Caccamo, Tina Eyre, Ian Sealy

VectorBase Annotation Martin Hammond, Dan Lawson, Karyn Megy

Systems & Support Guy Coates, Tim Cutts, Shelley Goddard

ResearchDamian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel Zerbino, Dace Ruklisa

Ensembl TeamEnsembl Team

March 2007March 2007

Page 51: April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.

51 of 50

Training...Training... Somewhere near you Somewhere near you