ASHG 2012 Poster

1
InVitae reports findings only for requisitioned conditions. A report for the 150 conditions currently offered is 250 pages. Online reporting and organization make the report easily navigable. A few features of our reports are shown below. A consistent definition of a transcript's exon structure is essential to reliably mapping and interpreting variants. Inconsistencies lead to incorrect translations of research findings to clinical settings. We account for the following challenges: Curators and developers may easily generate reports for simulated samples with arbitrary collections of curated and novel variants across multiple conditions. Tests may be saved for future execution and regression testing. one sample one requisition one report up to 150 conditions two weeks one lab, one price Computational and informatics challenges in providing Computational and informatics challenges in providing clinically-relevant genome interpretation from clinically-relevant genome interpretation from high-throughput sequencing data. high-throughput sequencing data. Reece Hart; InVitae Team, San Francisco, CA, 94107 InVitae provides sequencing and clinically-relevant genome interpretation services to physicians from patient blood samples. Our value is based on three essential components: a database of high-quality associations of variants and conditions, carefully designed targeted sequencing assays, and a sophisticated analysis pipeline for interpreting variants. The current process requires less than two weeks from the arrival of blood to the delivery of a clinical report covering over 10,000 curated variants in 250 genes for up to 150 conditions (subject to physician's requisition). This poster summarizes the computational and informatics tools that enable this process. Clinician's view of InVitae InVitae's process features online requisitioning and reporting, CLIA-certified sequencing, and a HIPAA-compliant information management. intake The Trouble with Transcripts Report Excerpts Variant Simulation and Report Testing similar conditions grouped together carriers of known pathogenic variants condition groups sorted by risk level and evidence ancestry-dependent quantitative risks known pathogenic variants have strongest evidence of association predicted effect(s) supporting publications frequency in 1000 Genomes Project haplotype alleles, inferred haplotypes, and risk association absence of known pathogenic variants (covered regions and qualities shown at end of report) pathogenic variants inferred from condition-specific rules for the interpretation of novel variants variants of unknown significance, with and without prior observations ancestry-aware inference of risk from combination of odds ratios regions where transcript sequences differ from the reference genome are not interpretable simulate variants for specified genders and ancestry simulate new variants for VUS analysis select curated variants create homozygous, heterozygous, and no-data loci NM_012345.6 NM_012345.6 ENST987654 disagreement between reference genome and transcript (3514/33165 transcripts) exon structure changes for a single RefSeq accession e.g., NM_001035.2 (RYR2) suboptimal alignments to the reference genome e.g., ALMS1 structure and CDS equivalence of RefSeq and Ensembl transcripts transcript records with atypical record formats (all 18 DMD transcripts) NM_123456.7 The InVitae pipeline is designed to provide at least 50x depth across all targeted regions for all covered genes/conditions. Samples that do not meet stringent criteria for sequence depth, sequence coverage, and coverage of known pathogenic variants for requisitioned conditions are rerun or failed. Personal Health Information remains on premises; the rest of the pipeline (reads through anonymized report) executes with the Amazon Web Services platform. See also: 3692W, lab process (Session I) report known pathogenic alignment bwa base quality recalibration automated coverage analysis variant calling GATK polyMNP caller variant phasing haplotype calling reporting overall pipeline versioning lab director oversight VUS sample intake online requisitioning barcoding information security sequencing assay assay design PCR fill-in multiplexing automation LIMS inferred pathogenic variants alignments reads variant annotation classification variant effect/VUS pipeline quantitative risk modeling blood known pathogenic, novel pathogenic, and VUS variants appear in distinct sections of the report process computing challenges The heart of InVitae is the curation database, a manually curated compendium of associations of genomic variants and clinical conditions derived from literature and public sources. The curation database informs assay design and variant interpretation. curation database Curated genomic variants and clinical findings derived from literature and public databases. See also: 1766W curation process 1771W variant classification (both Session I) Curation Database Sequence Analysis and Variant Interpretation Requisitioning and Laboratory Information Management System interpretation sequencing director review

description

Overview of bioinformatics support for clinical diagnostics at InVitae. This pipeline is the result of a large team effort.

Transcript of ASHG 2012 Poster

Page 1: ASHG 2012 Poster

InVitae reports findings only for requisitioned conditions. A report for the 150 conditions currently offered is 250 pages. Online reporting and organization make the report easily navigable. A few features of our reports are shown below.

A consistent definition of a transcript's exon structure is essential to reliably mapping and interpreting variants. Inconsistencies lead to incorrect translations of research findings to clinical settings. We account for the following challenges:

Curators and developers may easily generate reports for simulated samples with arbitrary collections of curated and novel variants across multiple conditions. Tests may be saved for future execution and regression testing.

one sampleone requisition

one reportup to 150 conditions

two weeksone lab, one price

Computational and informatics challenges in providing Computational and informatics challenges in providing clinically-relevant genome interpretation from clinically-relevant genome interpretation from

high-throughput sequencing data.high-throughput sequencing data.Reece Hart; InVitae Team, San Francisco, CA, 94107

InVitae provides sequencing and clinically-relevant genome interpretation services to physicians from patient blood samples. Our value is based on three essential components: a database of high-quality associations of variants and conditions, carefully designed targeted sequencing assays, and a sophisticated analysis pipeline for interpreting variants. The current process requires less than two weeks from the arrival of blood to the delivery of a clinical report covering over 10,000 curated variants in 250 genes for up to 150 conditions (subject to physician's requisition). This poster summarizes the computational and informatics tools that enable this process.

Clinician's view of InVitae

InVitae's process features online requisitioning and reporting, CLIA-certified sequencing, and a HIPAA-compliant information management.

intake

The Trouble with Transcripts

Report Excerpts

Variant Simulation and Report Testing

similar conditions grouped together

carriers of known pathogenic variants

condition groups sorted by risk level and evidence

ancestry-dependent quantitative risks

known pathogenic variants have strongest evidence of association

predicted effect(s)

supporting publications

frequency in 1000 Genomes Project

haplotype alleles, inferred haplotypes, and risk association

absence of known pathogenic variants(covered regions and qualities shown at end of report)

pathogenic variants inferred from condition-specific rules for the interpretation of novel variants

variants of unknown significance, with and without prior observations

ancestry-aware inference of risk from combination

of odds ratios

regions where transcript sequences differ from the reference genome are not interpretable

simulate variants for specified genders and ancestry

simulate new variants for VUS analysis

select curated variants create homozygous, heterozygous, and no-data loci

NM_012345.6

NM_012345.6

ENST987654

disagreement between reference genome and transcript

(3514/33165 transcripts)

exon structure changes for a single RefSeq accessione.g., NM_001035.2 (RYR2)

suboptimal alignments to the reference genome

e.g., ALMS1

structure and CDS equivalence of RefSeq and Ensembl transcripts

transcript records with atypical record formats(all 18 DMD transcripts)

NM_123456.7

The InVitae pipeline is designed to provide at least 50x depth across all targeted regions for all covered genes/conditions. Samples that do not meet stringent criteria for sequence depth, sequence coverage, and coverage of known pathogenic variants for requisitioned conditions are rerun or failed. Personal Health Information remains on premises; the rest of the pipeline (reads through anonymized report) executes with the Amazon Web Services platform.☞ See also: 3692W, lab process (Session I)

report

knownpathogenic

alignment● bwa● base quality

recalibration● automated● coverage

analysis

variant calling● GATK● polyMNP caller● variant phasing● haplotype calling

reporting● overall pipeline

versioning● lab director

oversight

VUS

sample intake● online requisitioning● barcoding● information security

sequencing assay● assay design● PCR fill-in● multiplexing● automation● LIMS

inferredpathogenic

variantsalignmentsreads

variant annotation● classification● variant effect/VUS

pipeline● quantitative risk

modeling

blood

known pathogenic, novel pathogenic, and VUS variants appear in distinct sections of the report

pro

ces

sco

mp

uti

ng

ch

alle

ng

es

The heart of InVitae is the curation database, a manually curated compendium of associations of genomic variants and clinical conditions derived from literature and public sources. The curation database informs assay design and variant interpretation.

curationdatabase

Curated genomic variants and clinical findings derived from

literature and public databases.

☞ See also:1766W curation process 1771W variant classification(both Session I)

Curation Database

Sequence Analysis and Variant Interpretation

Requisitioning and Laboratory Information Management System

interpretationsequencing director review