Forsharing cshl2011 sequencing
-
Upload
sean-davis -
Category
Technology
-
view
319 -
download
0
description
Transcript of Forsharing cshl2011 sequencing
![Page 1: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/1.jpg)
High-‐Resolu,on Views of Cancer Genomes
![Page 2: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/2.jpg)
![Page 3: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/3.jpg)
The Central Dogma
![Page 4: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/4.jpg)
![Page 5: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/5.jpg)
![Page 6: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/6.jpg)
+
![Page 7: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/7.jpg)
![Page 8: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/8.jpg)
Your Nature Paper
![Page 9: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/9.jpg)
Our First Experiment
![Page 10: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/10.jpg)
Overview of BAC in the Genome
![Page 11: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/11.jpg)
Sequencing a BAC
![Page 12: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/12.jpg)
Sequence Coverage
![Page 13: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/13.jpg)
Repeats
![Page 14: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/14.jpg)
Repeats
![Page 15: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/15.jpg)
Repeats are not created equal
![Page 16: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/16.jpg)
Genomic Sequencing
TargeFng the Exome
![Page 17: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/17.jpg)
Long oligos synthesized on arrays (DNA)
RNA baits synthesized from DNA oligo template
RNA baits hybridized to DNA sequencing library
Targets captured using beads and bioFn-‐labeled baits
RNA bait degraded, leaving sequencing library enriched for target regions
![Page 18: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/18.jpg)
Data Flow
FASTQ files generated by Illumina pipeline Aligned to reference genome (hg18, excluding _random, unmapped, and hap) using Novoalign SAM/BAM used extensively
Follow Broad InsFtute GATK pipeline for exome capture
Use picard java library for quality assessment Processed BAM files available via local hZp for browsing
![Page 19: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/19.jpg)
Data Pipeline....
Samtools import Samtools sort
Picard MarkDuplicates
GATK Indel Realignment
GATK Quality RecalibraFon
Picard QC metrics
![Page 20: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/20.jpg)
Realignment around Indels
The problem - Aligners align each read independently - PotenFally leads to increased error rates around
indels
A potenFal soluFon - Locally realign reads in regions that might
harbor an indel - Goal is to align reads overlying indels more
accurately, reducing errors in each read and, in turn, reducing SNV call error rates
![Page 21: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/21.jpg)
Quality Recalibration
Since most SNV callers will rely on quality scores to estimate error probabilities, having the best possible estimates for error rates is important
Reported error rates from the Illumina sequencer generally reflect technical parameters of the base call process, but not other systematic biases
Quality recalibration can include covariates to account for systematic biases
- Cycle count, dinucleotide context, original quality, and sample/library variables
![Page 22: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/22.jpg)
Variant Calling and EvaluaFon
A developing art
![Page 23: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/23.jpg)
![Page 24: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/24.jpg)
Sequencing Tumor/Normal Pairs
![Page 25: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/25.jpg)
Good SNP
![Page 26: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/26.jpg)
Suspect Variant
![Page 27: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/27.jpg)
SomaFc (tumor only) Variant
![Page 28: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/28.jpg)
Likely False PosiFve (normal only)
![Page 29: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/29.jpg)
LOH
![Page 30: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/30.jpg)
NCI60 Exome Sequencing
No Normals Available!
![Page 31: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/31.jpg)
![Page 32: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/32.jpg)
![Page 33: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/33.jpg)
Variants by Genomic LocaFon
![Page 34: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/34.jpg)
All Coding Variants
![Page 35: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/35.jpg)
Type 1: in dbSNP, Type 2: not in dbSNP
![Page 36: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/36.jpg)
Coding, novel (no dbSNP)
![Page 37: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/37.jpg)
![Page 38: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/38.jpg)
Copy Number from Exomes
![Page 39: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/39.jpg)
![Page 40: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/40.jpg)
![Page 41: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/41.jpg)
![Page 42: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/42.jpg)
Complete Genome Sequencing
Complete Genomics Data
![Page 43: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/43.jpg)
Data
Delivery Via USB results
Storage Sizes are LARGE - 400GB per sample as delivered with raw reads included
Should use 2-‐locaFon backed-‐up storage - Not trivial to find such storage, so might resort to mulFple USB drives
Minimize: - Data movement - Keeping mulFple copies indefinitely
![Page 44: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/44.jpg)
Breakdown of Data Sizes
![Page 45: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/45.jpg)
![Page 46: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/46.jpg)
Data
Delivery Storage Processing
Data are typically tab-‐delimited text files, so Excel can be useful for examining individual small files
Generally, command-‐line tools needed MacOS and linux only supported operaFng systems, but Windows might work....
Some analyses (snpdiff) require large memory
![Page 47: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/47.jpg)
Directory Structure
![Page 48: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/48.jpg)
Workflows
Tumor/Normal Copy Number
Structural Varia,on Annotated SomaFc Variants
Germline List of annotated genotypes per individual, summarized into a single file that can be used for filtering
![Page 49: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/49.jpg)
Germline Workflow
![Page 50: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/50.jpg)
Germline Workflow
Output Future direcFons
Be “smarter” about inheritance framework
Further refinements of comparison to other data types (exomes, snp arrays, RNA-‐seq)
![Page 51: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/51.jpg)
Tumor/Normal Workflow
![Page 52: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/52.jpg)
Medvedev et al., Nature 2009
![Page 53: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/53.jpg)
![Page 54: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/54.jpg)
![Page 55: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/55.jpg)
![Page 56: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/56.jpg)
![Page 57: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/57.jpg)
![Page 58: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/58.jpg)
The Cancer Genome Atlas Research Network Nature 000, 1-‐8 (2008) doi:10.1038/nature07385
Frequent geneFc alteraFons in three criFcal signalling pathways.
![Page 59: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/59.jpg)
![Page 60: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/60.jpg)
![Page 61: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/61.jpg)
ChromaFn
ChromaFn is the complex of protein and DNA that make up the chromosomes. It is not a staFc structure.
![Page 62: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/62.jpg)
DNAse is an enzyme that cuts DNA at locaFons where DNA is accessible
These “accessible” regions have been associated with open chromaFn
Regions of open chromaFn are necessary for transcripFonal and regulatory machinery to have access to gene neighborhoods and facilitate transcripFon
![Page 63: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/63.jpg)
DNAse HypersensiFvity
Method for finding regions of “open” chromaFn
In data published with the ENCODE consorFum, DNAse hypersensiFve (HS) were shown to be correlated with: Histone modificaFon TranscripFon start sites Early replicaFng regions TranscripFon factor binding sites (experimentally determined by ChIP/chip, etc.)
IdenFficaFon and analysis of funcFonal elements in 1% of the human genome by the ENCODE pilot project. The ENCODE ConsorFum. Nature, 2007.
![Page 64: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/64.jpg)
DNAse-‐chip Method
Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., and Collins, F.S. Nat Methods, 2006
![Page 65: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/65.jpg)
DNAse-‐Seq Method
Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., and Collins, F.S. Nat Methods, 2006
![Page 66: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/66.jpg)
![Page 67: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/67.jpg)
DNAse Sites RelaFve to Genes
![Page 68: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/68.jpg)
DNAse HS Sites and Gene Expression
DNAse HS sites near transcripFon start sites are associated with acFvely transcribed genes.
![Page 69: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/69.jpg)
![Page 70: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/70.jpg)
Distances between sequences in non-‐DNAse HS regions have an oscillaFng paZern with frequency that corresponds to a single turn of the double-‐helix
DNAse is known to cut preferenFally in the minor groove, which is exposed every 10.4 bases when wrapped around a nucleosome
A nucleosome is wrapped by 147 base pairs when complexed with DNA
ImplicaFon: Nucleosomes are posiFoned in a highly organized, precise manner
Nucleosome PosiFoning
![Page 71: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/71.jpg)
![Page 72: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/72.jpg)
![Page 73: Forsharing cshl2011 sequencing](https://reader033.fdocuments.net/reader033/viewer/2022060200/55986ba31a28ab1a0b8b478a/html5/thumbnails/73.jpg)
The Last Mile