VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Post on 10-May-2015

880 views 3 download

Tags:

Transcript of VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAASTDeciphering Genetic Disease with Next-Generation Sequencing

Barry Moore, M.S.Research ScientistDepartment of Human GeneticsDepartment of Biomedical Informatics

Outline

The VAAST Analysis Pipeline

Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause

The Future of VAAST Development

$10,000,000Venter Genome

$1,000,000Watson

$5,000You?

geneA geneB geneX geneY geneZ

Disease

Healthy

Next Generation Sequencing

Variant Annotation

Variant Selection

Variant Analysis

Variant

Annotation

Tool

Variant

Selection

Tool

Variant

Annotation

Analysis

Search

Tool

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation

Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation

Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant

VAAST

Prioritized Candidate

Genes

Background Genomes

Target Genomes

CDR CDR

VAAST Report

• Probabilistic

• Feature Based

• Both Allele and AAS Frequencies

• Considers Inheritance Model

• Fast

• Standardized Ontology Based Format

• Modular and Flexible in Design

Key Features of VAAST

VAAST Uses Variant Frequencies in a Probabilistic Fashion

Likelihood Ratio Test

Maximum Likelihoodof the Null Model(No Difference)

Maximum Likelihoodof the Alternate Model(There is Difference)

VAAST Uses Variant Frequencies in a Probabilistic Fashion

VAAST Uses Variant Frequencies in a Probabilistic Fashion• VAAST gives us the likelihood of the composite genotype

at GENE X in the target given the background.

• Do allele frequencies differ between Background and Target genomes within a given gene or feature?

• Composite likelihood calculation assumes independence across sites. To control for LD, statistical significance is estimated by permutation test.

• Multiple test correction for number of features (~20,000) is two orders of magnitude better than for the number of variants (~3,500,000).

1 genome target1 genome background

Noise Decreases Dramatically with Increasing Number of Genomes

1 genome target10 genome background

1 genome target250 genome background

1 genome target250 genome background

Trio Data

G:RG:A

G:A

G:R

G:A

G:R

Mom Dad

R:Q

R:Q R:Q

R:*

CHR 16: DHODH

CHR 5: DNAH5

•Ng et al, Nature Genetics 42, 30–35 (2010) doi:10.1038/ng.499•Roach, et al, Science , 328 636, 2101

Alleles Responsible for Miller Syndrome in Utah Kindred

R:*

R:*

Mom Dad

Son Daughter Son Daughter

DNAH5

DHODH

Schematic of VAAST Analysis of Utah Miller Kindred Using a Single Quartet

DOMINANT RECESSIVE

-500

-400

-300

-200

-100

0

100

200156

132

2189 3

Ave

. ra

nk g

en

om

e-w

ide

2 allele copies

4 allele copies

6 allele copies

SIZE OF CASE COHORT

443 genomes in background

Average Rank for 100 Dominant and Recessive Diseases

DOMINANT RECESSIVE

-500

-300

-100

100

300

500

700639

373

61

219 3

Ave

. ra

nk g

enom

e-w

ide 2 of 6 allele copies

4 of 6 allele copies

6 of 6 allele copies

443 genomes in background

Impact of Missing Data

Outline

The VAAST Analysis Pipeline

Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause

The Future of VAAST Development

An Rare X-linked Mendelian Disorder

• A Utah family coming to the University Hospital for 20+ years

• About half of the male offspring die around 1 year of age

• Aged appearance

• Craniofacial anomalies

• Hypotonia

• Global developmental delays

• Cardiac arrhythmias

Four Affected Boys over Two Generations

I

II

III

• Agilent SureSelect In-Solution X Chromosome Capture

• Covaris S series Sonication (150-200 bp)

• 76 bp single-end reads on one lane each of the Illumina GAIIx

Exome Sequencing

• Sequence alignment with bwa

• Remove duplicate reads with PICARD

• Realign indel regions with GATK

• Variant calling with Samtools, GATK

Variant Calling

VAAST Identifies NAA10 as Candidate Gene

• About 20 min. run time

• 3 candidate genes (NAA10 ranked 2) proband only

• 1 candidate gene (NAA10) with pedigree

Identifying Candidate Genes

Additional Analyses

• Microarray based CNV analysis

• No likely causal variants found

• Sanger sequencing confirmation

• Variant segregates perfectly with disease in 13 family members

• Haplotype sharing (STR genotyping)

• ~11 MB shared between two affected boys

• A second family discovered – same mutation

• IBD relatedness analysis – independent mutational events

N(alpha)-acetyltransferase

• N-alpha-acetylation is one of the most common protein modifications that occurs during protein synthesis.

• NatA (catalytic subunit NAA10 (hARD1)

• Eight exons, Crick strand, highly conserved

• A:G transition causes p.Ser37Pro

Functional Analyses

• Quantitative in vitro N-terminal acetylation assay (RP-HPLC).

• Four peptide substrates previously shown to be acetylated by NatA (NAA10)

• Assays indicate loss-of-function allele.

Functional Analyses

• Probabilistic Disease Gene Finder

• Feature Based not Variant Based

• Both Allele and AAS Frequencies

• Considers Inheritance Model

• As few as two target genomes can be sufficient to identify causative gene.

• Background Genomes are “Reusable”

• Not Limited to Human Analyses

VAAST in Summary

VAAST: Future Directions

• Indel support

• Splice-site

• No-call support

• Pedigree support

• Phylogenetic conservation

AcknowledgementsVAAST Development•Chad Huff•Hao Hu•Lynn Jorde•Barry Moore•Martin Reese•Marc Singleton•Jinchuan Xing•Mark Yandell

Ogden Syndrome•John Carey•Steven Chin•Heidi Deborah Fain•Gholson Lyon•John Optiz•Theodore J. Pysher•Alan Rope•Reid Robison•Sarah T. South

•Chad Huff•Evan Johnson•Barry Moore•Christa Schank•Kai Wang•Jinchuan Xing

Yandell Lab•Michael Campbell•Daniel Ence•Guozhen Fan•Steven Flygare•Hao Hu•Zev Kronenberg•Barry Moore•Marc Singleton•Robert Ross•Mark Yandell

•Thomas Arnesen•Rune Evjenth•Johan R. Lillehaug

•Leslie G. Biesecker•Jennifer J. Johnston•Cathy A. Stevens

•Brian Dalley•Tao Jiang•Jefferey Swensen

•Hakon Hakonarson•Lynn B. Jorde•Mark Yandell

Acknowledgements