Variation and the VEP: Ensembl Online Webinar series
-
Upload
denise-carvalho-silva -
Category
Science
-
view
176 -
download
0
Transcript of Variation and the VEP: Ensembl Online Webinar series
Denise Carvalho-‐Silva Ensembl Outreach team
European Molecular Biology Laboratory
European Bioinforma9cs Ins9tute
Ensembl online
training series 2016
Course Objec=ves
What is Ensembl?
What type of data can you get in Ensembl?
How to navigate the Ensembl browser website?
How to connect with Ensembl
This online course Date Webinar topic Instructor
24th March
Introduc9on to Ensembl Emily Perry
31st March
Ensembl genes Denise Carvalho-‐Silva
7th April Data export with BioMart Helen Sparrow
14th April
Varia=on data in Ensembl and the Ensembl VEP Denise Carvalho-‐Silva
21st April
Comparing genes and genomes with Ensembl Compara Helen Sparrow
28th April
Finding features that regulate genes – the Ensembl Regulatory Build
Emily Perry
5th May Uploading your data to Ensembl and advanced ways to access Ensembl data
Ben Moore
hSp://www.ebi.ac.uk/training/events/2016/ensembl-‐online-‐training-‐series-‐2016
Our Polls: finding more about you
• Previous webinars
Introduc9on, Genes and Transcripts, BioMart
• Poll 1: ASendance • Poll 2: Exercises
Structure for this hour webinar
Presenta=on: SNPs, CNVs, SV available in Ensembl VEP: tool for variant annota9on
Demo: View variants/
run the VEP
Exercises: On the train online course
Ques=ons?
• We’ve muted all the microphones • Ask ques9ons in the Chat box in the webinar interface
• My Ensembl colleagues will respond during my talk
• Please respond with @username
Helen Sparrow Ben Moore Emily Perry
EBI is an Outstation of the European Molecular Biology Laboratory.
Compara9ve Genomics Gene models
Regula9on Varia9on
Custom data display Programma9c access
Toolkit
Ensembl Features
EBI is an Outstation of the European Molecular Biology Laboratory.
Compara9ve Genomics Gene models
Regula9on Varia9on
Custom data display Programma9c access
Toolkit
Ensembl Features
EBI is an Outsta9on of the European Molecular Biology Laboratory.
Module 4: Gene=c Varia=on in Ensembl
Outline
• Classes of varia9on, species and sources
• Browsing varia9on data: some entry points Loca9on tab Gene tab Varia9on tab
• Phenotype and popula9on gene9cs data
• How to annotate your own variants
1) Large scale: structural (> 50 base pairs)
Gene=c varia=on
duplica9on dele9on inversion transloca9on loss
2) Short scale: SNPs (or SNVs), indels
G A C T G A C T A T C G G G G T T T C C C A A A
G A A T G A C T T T C G G -‐ G -‐ T T C C -‐ A A A
Species with varia=on data
Understand the types of gene9c varia9on data and how to view them in the context of our genomes
Sources of varia=on data
• Import alleles and frequencies
• Annotate variants
hSp://www.ensembl.org/info/docs/varia9on/sources_documenta9on.html
Loca=on tab: across a region
SVs SNPs
Ensembl genes
Gene tab: gene-‐centric SNPs
SVs
Varia=on tab: variant centric
summary data
SNP or SV
Variants on the karyotype
Phenotype data in Ensembl species and sources
Popula=on data for variants
hSp://hapmap.ncbi.nlm.nih.gov/
hSp://www.1000genomes.org
pie charts: 1KG super popula9ons
Human Popula=on Gene=cs
Coffee intake is a worldwide phenomenon
with Finland at the top, and UK in the 44th
place. Is caffeine consump9on in our genes?
A) What are the chromosome loca9ons of variants associated with this phenotype?
B) Which variant has got the most significant associa9on?
C) What is the ancestral allele of this variant? Is it conserved in eutherian mammals?
D) What is the most frequent allele in Great Britain?
E) Can you download this variant and 200 nt upstream and downstream flanking sequence in RTF (Rich Text Format)?
Live demo
You can annotate your SNPs and SVs too!
• Variant Effect Predictor
• Different input formats
• SIFT/PolyPhen for missense variants
PMID: 20562413
Perl script Web interface REST API
XML
CODING Synonymous
INTRONIC 5’ UTR
ATG AAAAAAA
Regulatory
Splice sites
CODING Missense
3’ UTR 5’ Upstream 3’ downstream
Mapping variants on transcripts
Iden9fy transcripts that overlap variants and predict the consequence of these on Ensembl (or RefSeq) transcripts using
Consequence terms for variants
www.ensembl.org/info/genome/varia9on/predicted_data.html#consequence_type_table
* defined by the Sequence Ontology (SO) project (hSp://www.sequenceontology.org/)
Consequence: missense GAG >GGG Glu > Gly
SIFT sift.jcvi.org/
PolyPhen-2 genetics.bwh.harvard.edu/pph2/ Condel
dbNSFP
Ensembl tools hSp://www.ensembl.org/tools.html
hSp://www.ensembl.org/vep
Inpu[ng data into the
Chromosome Start End Alleles Strand
Output op=ons in the
GAG > GGG Glu > Gly
GAG > GAA Glu > Glu
Queued Running Done Failed
Save to your account (log in) Edit and resubmit your job
Delete job
Ticket system in the
Ticket iden9fier Job name
Viewing the results
SO consequence terms*
*hSp://www.sequenceontology.org/index.html
ensembl.org/info/docs/tools/vep/online/results.html#summary
Table • Before / aper filtering • novel / exis9ng variants
Pie charts (consequence terms) • total observed (more than one per variant) • Separate chart: coding consequences
Viewing the results
Navigate results (one row per variant/ transcript overlap)
Show/hide columns in results table more columns: scroll right
• Download results • Send results to BioMart
Create and edit filters
ensembl.org/info/docs/tools/vep/online/results.html#table
results table
Filters consist of three components Field • e.g. Consequence, biotype
Operator • e.g. is, matches (par9al string matches)
Value • the value to compare against • some fields have autocomplete values
Mul9ple filters allowed with logical rela9onship (AND, OR) Ac9ve filters can be edited too!
ensembl.org/info/docs/tools/vep/online/results.html#filter
Filtering the e results
I’ve got a list of gene9c variants from my resequencing project of a cohort study of breast cancer in Cambridge. The posi9ons are all on chromosome 9, GRCh37 assembly:
131084628 C/A (posi9ve strand)
131085358 C/G (posi9ve strand)
131085196 G/A (posi9ve strand)
1) Do any of these cause a change at the amino acid level?
2) Are these predicted to be deleterious?
3) Can I get the flanking sequence (200 nucleo9de both up and downstream)
for the known variants in this set?
Tutorial: VEP
Tutorial: VEP 1) Have I got genomic coordinates? Ensembl default format
9 131084628 131084628 C/A + 9 131085358 131085358 C/G + 9 131085196 131085196 G/A +
VEP video
hSp://9nyurl.com/vep-‐video
Things to bear in mind
1) No dis9nc9on between polymorphisms and muta9ons. Excep9on HGMD and COSMIC: all muta9ons;
2) C/T à first allele is the one in the reference genome, not necessarily the major or the ancestral;
3) Ensembl reports all alleles on the forward strand (different from dbSNP).
Next webinar – Comparing genes and genomes
Ensembl allows you to perform detailed analysis of gene models between species. During this webinar we will take a look at the gene trees and homologues of a set of genes, and at whole genome alignments between pairs and groups of species. See you next week, same 9me.
Helen Sparrow
Course exercises hSp://www.ebi.ac.uk/training/online/course/ensembl-‐
browser-‐webinar-‐series-‐2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solu9ons will appear in the page
hierarchy The “previous page”contains the solu9on of previous modules’ exercises
Get help with the exercises
• Use the exercise solu9ons in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Connect with Ensembl
? ? ? ? ?
? ?
? ? ?
www.youtube.com/user/EnsemblHelpdesk www.ensembl.org/info/genome/genebuild/index.html
Acknowledgements The En=re Ensembl Team
Funding
Co-funded by the European Union