MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant...

48
MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Genome Informatics I (2015 Spring)

Transcript of MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant...

Page 1: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

MES7594-01 Genome Infor-matics I

- Lecture VIII. Interpreting variants

Sangwoo Kim, Ph.D.Assistant Professor,

Severance Biomedical Research Institute, Yonsei University College of Medicine

Page 2: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Overview

• Goal of this lecture– You will learn how to interpret discovered vari-

ants to filter and prioritize for associated pheno-type (e.g. disease) and practice

• Predicting functional impact of vari-ants– Utilizing sequence features– Utilizing protein features

• Popular methods and practice– Polyphen2– Mutationassessor– SeattleSeq

Page 3: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

FUNCTIONAL IMPACT OF VARIANTS

Page 4: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

We usually have too many variants

Saksena et al, “Developing Algo-rithms to Discover Novel Cancer Genes: A look at the challenges and approaches”

We want to narrow down the number of “called” vari-ant as small as possible

Page 5: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

A simple mutation calling does not give you the final answer

mutation calling (NGS)

A lot of candidate variants

some from se-quencing error

some from polymorphisms

some from mapping error

some are pas-sengers

Page 6: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

A simple mutation calling does not give you the final answer

mutation calling (NGS)

A lot of candidate variants

some from se-quencing error

some from polymorphisms

some from mapping error

some are pas-sengers

A few real patho-genic variants

Page 7: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Gold mining

Bunch of candidate variants

Many vari-ants

A few vari-ants

Strategy I: Do they really exist?- Any mistakes in sequencing

and variant calling?- Any non-disease causing poly-

morphisms?

Strategy II: Are they functional?- Are they damaging? pathogenic?- Are they related to phenotypes?

Page 8: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Five ways to narrow down1. Include control data

1. eliminate germline variants

2. Use more strict variant quality threshold1. work on only confident variants

3. Filter out polymorphisms1. remove non-damaging polymorphisms

4. Predict functional impacts1. find damaging levels

5. Use disease specific knowledge1. to acquire final candidates

Page 9: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Five ways to narrow down1. Include control data

1. eliminate germline variants

2. Use more strict variant quality threshold1. work on only confident variants

3. Filter out polymorphisms1. remove non-damaging polymorphisms

4. Predict functional impacts1. find damaging levels

5. Use disease specific knowledge1. to acquire final candidates

Strategy I

Page 10: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Five ways to narrow down1. Include control data

1. eliminate germline variants

2. Use more strict variant quality threshold1. work on only confident variants

3. Filter out polymorphisms1. remove non-damaging polymorphisms

4. Predict functional impacts1. find damaging levels

5. Use disease specific knowledge1. to acquire final candidates

Strategy I

Strategy II

Page 11: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

1. Include control data

germline

somatic

somatic

100,000~~500,000 100~10

00

100~1000

We should eliminate unwanted germline variants

Page 12: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

When controls are unavail-able

• Single nucleotide polymorphism rate = 1/100~1/1000

• Whole Genome Sequencing– Total DNA length = 3 billion– Expected SNP numbers = 3~30 million

• Whole Exome Sequencing– Total DNA length = 50 million– Expected SNP numbers = 50~500 thousands

• Targeted Sequencing (Panel)– Total DNA length = 100~1000 thousands– Expected SNP numbers = 1000~10,000

• Hotspot Panel (only for very well known vari-ants)– Controls can be omitted

Page 13: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

2. Use more strict quality threshold

• Variant quality

Low Variant Quality- This variant (although it has

been called) can be false

Cause of low quality- Low read depth (insufficient

observation)- Bad basecall/mapping quality- Low allele frequency

Page 14: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

2. Use more strict quality threshold

• Possible actions– Cut out variants based on

• Variant quality (e.g. QUAL<10)• Total read depth (e.g. <20)• Number of alt-depth (e.g. <5)• Allele frequency (e.g. <0.1)

– Prioritize variants• Sort with variant quality and inspect from the top

Page 15: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

3. Filter out polymorphisms• When you had no control data (panel)

– Check if the variants have been reported as polymor-phism

• When you had control data– You may not have polymorphisms

• Because somatic mutations callers removes germline calls

– However, there are some cases that polymorphisms can be reported (as somatic mutations)• For example, low read depth in control sample

low depthbad region

Variant Undetected

Variant De-tected

Page 16: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

dbSNP

• Database of SNP

chr7:11584142 A>T

Page 17: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

dbSNP

• Database of SNP

chr7:11584142 A>T

Page 18: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

4. Predict functional im-pacts

• Types of point mutations– Coding mutations

• Synonymous (silent)– Amino acid unchanged

• Missense– Amino acid changed

• Nonsense– Stop codon gained

• Readthrough– Stop codon loss

– Non-coding mutations• Intron• Splice-variants• Variants in regulatory elements

Page 19: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Functional impacts

• Types of indels– Inframe

• Insertion or deletion in a multiple of 3 base-pairs

– Frameshift

Page 20: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

General classification (pri-ority)

Page 21: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

General classification (pri-ority)

high-impactlow-inci-dencelow-confi-dence

High inci-dence

Page 22: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Functional impact prediction of missense mutations

• How critical is an AA change to its protein function?– Amino acid conservation

• If the AA is essential, it would be conserved though the evolution

– Amino acid in protein conformation • Substitution of AA in active site would be more dam-

aging

Page 23: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Amino acid conservation

Page 24: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Protein Structure

Page 25: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

5. Use disease specific knowledge

• Your knowledge about the disease– e.g. cancer– “Has it been reported in other previous sam-

ples?”– Search it in COSMIC, if you found it is recurrent,

it is likely to be functional

Page 26: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Five ways to narrow down1. Include control data

1. eliminate germline variants

2. Use more strict variant quality threshold1. work on only confident variants

3. Filter out polymorphisms1. remove non-damaging polymorphisms

4. Predict functional impacts1. find damaging levels

5. Use disease specific knowledge1. to acquire final candidates

Many, uncertain vari-ants

A few, reliable variants

Page 27: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Five ways to narrow down1. Include control data

1. eliminate germline variants

2. Use more strict variant quality threshold1. work on only confident variants

3. Filter out polymorphisms1. remove non-damaging polymorphisms

4. Predict functional impacts1. find damaging levels

5. Use disease specific knowledge1. to acquire final candidates

Many, uncertain vari-ants

A few, reliable variants

Functional study, Mechanism study

Page 28: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

SUMMARY OF PART I

Page 29: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

- Connect to Linux cluster, Job script writing and submission- NGS technologies, NGS data - Short read alignment- Variant Calling, CNV, SV calling - Interpretation of discovered variants

Page 30: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

In the remaining classes

• Genomic data to expression data– Gene mRNA Protein Pathways and Net-

works Phenotype

• Use high throughput data for your study• Don’t forget your project

Page 31: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

PRACTICE - FUNCTIONAL VARIANT ANNOTATION WITH SEATTLESEQ

Page 32: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Genome Informatics I (2015 Spring)

Today’s data

• Somatic variants in chr22 of anonymous cancer called from Virmid

• Data location– /scratch/2015_GenomeInformatics/{yourdir}/

virmidoutput– If you did not complete somatic calling prac-

tice, copy it from /scratch/2015_GenomeInformatics/public

Page 33: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

data download to local PC

① move to your virmid out directory

② check your virmid output

③ click FTP

Page 34: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

④ double click

Page 35: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

seattle-seq

search then click here!!!

Page 36: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

seattle-seq

① write your email

② input your VCF file

③ check!!

④ check!!

Page 37: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

① click file > open..

② select ‘all file’

③ select annotated file

Page 38: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Page 39: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

①②

Page 40: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

Filtering phase• accession (column H)

– for filtering curated isoforms• NM: mNRA• XM: predicted mRNA model filter

• functionGVS (column I)– for filtering damaging mutation type

• missense, missense-near-splice• stop-gain, stop-loss• splice-donor, splice-acceptor• The others filter

Page 41: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

① ②

Page 42: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

①②

Page 43: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV download

search then click here!!!

Page 44: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV download

download then double click!!

Page 45: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV view

Page 46: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV view

Page 47: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV view

① input disease bam file

② input normal bam file

③ input VCF file

Page 48: MES7594-01 Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

IGV view