Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for...
-
Upload
eunice-alexis-waters -
Category
Documents
-
view
226 -
download
0
Transcript of Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for...
![Page 1: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/1.jpg)
Basics of high-throughput sequencing
Olivier Elemento, PhDTA: Jenny Giannopoulou, PhD
Institute for Computational Biomedicine
CSHL High Throughput Data Analysis Workshop, June 2012
![Page 2: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/2.jpg)
Plan
1. What high-throughput sequencing is used for2. Illumina technology3. Primary data analysis (alignment, QC)4. Read formats5. Secondary Analysis (mutation calling, transcript level
quantification, etc) 6. Read data visualization7. Useful R/BioC packages8. Challenges and evolution of sequencing and its analysis
![Page 3: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/3.jpg)
1. What high-throughput sequencing is used for
![Page 4: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/4.jpg)
Full genome sequencing
![Page 5: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/5.jpg)
![Page 6: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/6.jpg)
![Page 7: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/7.jpg)
![Page 8: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/8.jpg)
Targeted sequencing
![Page 9: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/9.jpg)
Exome sequencing
![Page 10: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/10.jpg)
C C U T
After PCR
mC CC U
Bisulfite treatment
DNA methylation profiling
![Page 11: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/11.jpg)
RNA-seq
![Page 12: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/12.jpg)
ChIP-seq
Transcription factor of interest
Antibody
DNA
![Page 13: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/13.jpg)
High-throughput mapping of chromatin interactions (HiC)
Elemento lab (more on this next week)
![Page 14: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/14.jpg)
![Page 15: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/15.jpg)
And many others
• Gene fusion detection• Translational profiling (which mRNAs localize
to ribosomes)• Small/miRNA sequencing• Bacterial communities• Protein-RNA interactions (PAR-CLIP, HITS-CLIP)• …
![Page 16: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/16.jpg)
2. Illumina technology
![Page 17: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/17.jpg)
DNA(0.1-1.0 ug)
Single molecule arraySample
preparation Cluster growth5’
5’3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
TC
A
T
C
A
C
C
TAG
CG
TA
GT
1 2 3 7 8 94 5 6
Image acquisition Base calling
T G C T A C G A T …
Sequencing
Illumina SBS TechnologyReversible Terminator Chemistry Foundation
© Illumina, Inc.http://www.illumina.com/technology/sequencing_technology.ilmnhttp://seqanswers.com/forums/showthread.php?t=21
![Page 18: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/18.jpg)
Single end vs pair end sequencing
![Page 19: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/19.jpg)
What comes out of the machine: short reads in fastq format
@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1CTCCTGGAAAACGCTTTGGTAGATTTGGCCAGGAGCTTTCTTTTATGTAAATTG+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1[^^cedeefee`cghhhfcRX`_gfghf^bZbecg^eeb[caef`ef^a_`eXa@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1TCCANCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTC+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1ab_eBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1TACAAGTGCAGCATCAAGGAGCGAATGCTCTACTCCAGCTGCAAGAGCCGCCTC+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1_[_ceeec[^eeghdffffhh^efh_egfhfgeec_fbafhhhhd`caegfheh@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1GAAGGAGAGAAGGGGAGGAGGGCGGGGGGCACCTACTACATCGCCCTCCACATC+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1\^_accceg`gga`f[fgcb`Ucgfaa_LVV^[bbbbbRWW`W^Y[_[^bbbbb@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1GTGGCCGATTCCTGAGCTGTGTTTGAGGAGAGGGCGGAGTGCCATCTGGGTAGC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1aa_eeeeegggggihhiiifgeghfeghbgcghifiidg^dbgggeeeee`dcd@D3B4KKQ1_0166:8:1101:2358:2174#CGATGT/1CTGACCTGGGTCCTGTGGTGCTCAGCCTTTTGAAGATGCCAGAAAAATACGTCG+D3B4KKQ1_0166:8:1101:2358:2174#CGATGT/1\^_cccccg^Y`ega`fg`ebegfhd^egghhghfffhghdhbfffhhhfgfcf
QS to int In R:as.integer(charToRaw(‘e'))-33
![Page 20: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/20.jpg)
Pair end sequencings_8_1_sequence.txt.gz s_8_2_sequence.txt.gz
@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1CTCCTGGAAAACGCTTTGGTAGATTTGGCCAGGAGCTTTCTTTTATGTAAATTG+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1[^^cedeefee`cghhhfcRX`_gfghf^bZbecg^eeb[caef`ef^a_`eXa@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1TCCANCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTC+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1ab_eBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1TACAAGTGCAGCATCAAGGAGCGAATGCTCTACTCCAGCTGCAAGAGCCGCCTC+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1_[_ceeec[^eeghdffffhh^efh_egfhfgeec_fbafhhhhd`caegfheh@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1GAAGGAGAGAAGGGGAGGAGGGCGGGGGGCACCTACTACATCGCCCTCCACATC+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1\^_accceg`gga`f[fgcb`Ucgfaa_LVV^[bbbbbRWW`W^Y[_[^bbbbb@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1GTGGCCGATTCCTGAGCTGTGTTTGAGGAGAGGGCGGAGTGCCATCTGGGTAGC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1aa_eeeeegggggihhiiifgeghfeghbgcghifiidg^dbgggeeeee`dcd
@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/2GGCATATTTAACAGCATTGAACAGAATTCTGTGTCCTGTAAAAAAATTAGCTTA+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/2a__aaa`ce`cgcffdf_acda^ea]befffbeged`g[a`e_caaac]cb`gb@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/2TTGAGGCTGTTGTCATACTTCTCATGGTTCACACCCATGACGAACATGGGGGCG+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/2a__eeeeeggegefhhhiiihhhhhiieghhhghhiiffhiififhhiihegic@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/2CGGGGTGCACCTCGTCGTAGAGGAACTCTGCCGTCAGCTCTGCCCCATCGCCAA+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/2^__ee__cge`cghghhfgddgfgi]ehhfffff^ec[beegidffhhfhadba@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/2CTTAGTCTCAGTTTTCCTCCAGCAGCCTGAGGAAACTCAAAGGCACAGTTCCCA+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/2_abeaaacg^g^eghhhhgafghhdfghfedeghfiiicfbgdHYagfeecggf@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/2TAGGCTCAAAGTCTAACGCCAATCCCGAACCTGGGCATCTGTACACACACACAC+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/2abbeceeegggcghiihiihhhhiifhiiiiihiiiiiiihegh`eggfebfhg
… …
![Page 21: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/21.jpg)
Illumina sequencing using HiSeq2000
• Previously: GAIIx: ~30M reads per lane, 8 lanes (1QC)
• • Now: HiSeq2000 + TruSeq
v3: 200M reads per lane, 8-16 lanes (1-2QC) in parallel with HiSeq2000
• Multiplexing: attach barcode, mix samples, sequence, identify and remove barcode
![Page 22: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/22.jpg)
Full Genome Sequencing using Illumina technology
• ~$4-6K reagent with Illumina (storage+analysis costs not included)
• Exercise: you want to sequence 1 human genome at 100X coverage; how many lanes ?
![Page 23: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/23.jpg)
QC for Illumina (part 1)
5’
5’3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
TC
A
T
C
A
C
C
TAG
CG
TA
GT
Sequencing
![Page 24: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/24.jpg)
3. Primary data analysis (alignment, QC)
![Page 25: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/25.jpg)
Read alignment programs
• BWA (Burrows-Wheeler Aligner)– http://bio-bwa.sourceforge.net/– Fast, accurate, can find (short) indels– Allow 1-3 mismatches by default– Can also align longer 454 reads
• Bowtie– http://bowtie-bio.sourceforge.net/index.shtml– Ultrafast, accurate, newest version finds indels too – Allow 1-3 mismatches by default– Integrated into TopHat (splice aligner)
• Others: Eland, Maq, SOAP, etc
![Page 26: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/26.jpg)
BWA tutorial (for aligning single end reads to genome)
• Get genome, e.g., from UCSC– http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
• Combine into 1 file– tar zvfx chromFa.tar.gz– cat *.fa > wg.fa
• Indexing the genome– bwa index -p hg19bwaidx -a bwtsw wg.fa
• Align– bwa aln -t 4 hg19bwaidx s_3_sequence.txt.gz > s_3_sequence.txt.bwa
• Convert to SAM format– bwa samse hg19bwaidx s_3_sequence.txt.bwa s_3_sequence.txt.gz >
s_3_sequence.txt.sam
![Page 27: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/27.jpg)
Aligning pair end reads
• Align two files separately– bwa aln -t 4 hg19bwaidx s_3_1_sequence.txt.gz >
s_3_1_sequence.txt.bwa– bwa aln -t 4 hg19bwaidx s_3_2_sequence.txt.gz >
s_3_1_sequence.txt.bwa
• Convert to SAM format– bwa sampe hg19bwaidx s_3_1_sequence.txt.bwa
s_3_1_sequence.txt.bwa s_3_1_sequence.txt.gz s_3_1_sequence.txt.gz > s_3_sequence.txt.sam
![Page 28: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/28.jpg)
TopHat (spliced alignment)
Trapnell et al, 2009tophat –r 100 –p 4 –o outdir/ hg18 s_1_1_sequence.txt s_1_2_sequence.txt
D~100bp
Download genome index ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip
![Page 29: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/29.jpg)
Basic QC
• Fraction of mapped reads • How many unique mappers ?• Fraction of clonal reads (PCR duplicates)
![Page 30: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/30.jpg)
4. Read formats
![Page 31: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/31.jpg)
Read formats
• SAM/BAM• Eland/Eland Export
![Page 32: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/32.jpg)
SAM formatDH1608P1_0130:6:1103:10579:166379#TTAGGC 16 chr1 1249828 37 51M * 0 0 GGGCGTGACTCTGATCTCAGGCATCGTCTCCGCCGCGCTCCCGGACCCGCG eb`XXYbZdadee^ceV]X][ccTcc^ebeeceeeeWbeeeeeeeceeaee XX:Z:NM_017871,32 NM:i:0 MD:Z:51
DH1608P1_0130:6:1102:3415:150915#TTAGGC 16 chr1 1249828 37 51M * 0 0 GGGCGGGACTCTGATCTCAGGCATCGTCTCCGCCGCGCTCCCGGACCCGCG BBBBBBBBBBBac]bbbceedaeddeZceeea_ba_\_eeeeeeedaeeee XX:Z:NM_017871,32 NM:i:1 MD:Z:5T45
DH1608P1_0130:6:1102:13118:62644#TTAGGC 16 chr1 1249828 37 51M * 0 0 GGGCGTGCCTCGGATCTCAGGCATCGTCTCCGCCGCGCTCCCGGACCCGCG BBBBBBBBBBBBBBBBBBBBB`XTbSa`cffegdggeccbeeffdeggggg XX:Z:NM_017871,32 NM:i:2 MD:Z:7A3T39
DH1608P1_0130:6:1203:3012:157120#TTAGGC 16 chr1 1249826 25 51M * 0 0 AAGGCCGTGACTCTGATCTCAGCCCTCGTCTCCGCCGCGCTCCCGGACCCG BBBBBBBB^`QWZZ]UXYSZSTFRU]Z__SO[adcc[acdV\`Y]YWY][_ XX:Z:NM_017871,34 NM:i:3 MD:Z:4G17G1A26
DH1608P1_0130:6:2206:4445:12756#TTAGGC 16 chr1 1246336 25 1M3487N50M * 0 0 CCAAAGGGTGTGACTCTGATCTCGGGCATCGTCTCCGCCGCGCTCCCGGAC BBBBBBBBBBBBBBBBBBBBBBBB`YdddYdc\cacaNddddcdddaeeee XX:Z:NM_017871,37 NM:i:3 MD:Z:2C5C14A27
DH1608P1_0130:6:2203:7903:43788#TTAGGC 16 chr1 1246336 37 1M3487N50M * 0 0 CCCAAGGGCGTGACTCTGATCTCAGGCATCGTCTCCGCCGCGCTCCCGGAC adbe[fbcbccb_cb^cb^^c^edgegggggdfggefffgfbfggggegeg XX:Z:NM_017871,37 NM:i:0 MD:Z:51
MD tag, e.g, MD:Z:4T46 = 5 matches, 1 mismatch (T in read), 46 matches
CIGAR string, eg 5M3487N46M = 5bp-long block, 3487 insert, 46bp-long block
XT tag, e.g. XT:A:U = unique mapper; XT:A:R = more than 1 high-scoring matches
![Page 33: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/33.jpg)
Pair end SAMD3B4KKQ1_0161:8:2206:11080:31374#CTTGTA 83 chr1 4481348 255 51M = 4481165 0
TTAGATGCATTTTCTTACCATTGTAAGAAAAATGAAAATTTTACAATTAAG hiiiiiiihihhdhghggdiiihihffihhheihihhhgggggeeeeebbb NM:i:0 NH:i:1
D3B4KKQ1_0161:8:2206:8294:192062#CTTGTA 147 chr1 4481355 255 51M = 4481284 0 CATTTTCTTACCATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACAC efehffhgfdiihhhhhihghiiihfhihdhiihgghigefggeeeeebbb NM:i:0 NH:i:1
D3B4KKQ1_0161:8:2204:6985:145082#CTTGTA 147 chr1 4481360 255 51M = 4481202 0 TCTTACCATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACACTTCTA ghfhgihihghgihgiiiifiiiiihhhhfifhihhiigggeeceeeea__ NM:i:0 NH:i:1
D3B4KKQ1_0161:8:2205:15014:60805#CTTGTA 83 chr1 4481360 255 51M = 4481238 0 TCTTACCATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACACTTCTA hihheiihiiiiiiiiiiiiiiiiiifhiefhiiiiiigggggeceeebba NM:i:0 NH:i:1
D3B4KKQ1_0161:8:1105:17802:25847#CTTGTA 83 chr1 4481362 255 51M = 4481198 0 TTACCATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACACTTCTAAT gheiiiihhhiiiiiiiiiihiiiiiihgfiiiiiiiigeggceeeeebb_ NM:i:0 NH:i:1
D3B4KKQ1_0161:8:1208:2232:73719#CTTGTA 147 chr1 4481366 255 51M = 4481277 0 CATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACACTTCTAATTGTA fhghiiiiiiiiiiiiiiiiiiihghiihiiiiihgggegfggeeeeebbb NM:i:0 NH:i:1
D3B4KKQ1_0161:8:2104:18142:93861#CTTGTA 83 chr1 4481367 255 51M = 4481198 0 ATTGTAAGAAAAATGAAAATTTTACAATTAAGTATACACTTCTAATTGTAT ihghiiiheiiiiihhihfhifgghhhhfgfhiggge_ggggeeeeee_bb NM:i:0 NH:i:1NM=edit distance NH=number of alignments for that read
![Page 34: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/34.jpg)
BAM format
• Compressed, indexable version of SAM• Can be uploaded to UCSC Genome Browser
![Page 35: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/35.jpg)
SAMtools• http://samtools.sourceforge.net/• Convert SAM to BAM
– samtools view –bS file.sam > file.bam • Sort BAM file
– samtools sort file.bam file.sorted # (will create file.sorted.bam) • Index BAM file
– samtools index file.sorted.bam • Convert BAM to SAM
– samtools view file.bam > file.sam
• Rsamtools• http://www.bioconductor.org/packages/2.6/bioc/html/Rsamtools.html
![Page 36: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/36.jpg)
• Get alignment statistics– samtools flagstat pairendfile.bam
149923886 in total0 QC failure0 duplicates124520915 mapped (83.06%)149923886 paired in sequencing74961943 read174961943 read2120504218 properly paired (80.38%)121586068 with itself and mate mapped2934847 singletons (1.96%)482748 with mate mapped to a different chr143256 with mate mapped to a different chr (mapQ>=5)
SAMtools
![Page 37: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/37.jpg)
• Get pileup– samtools pileup file.sorted.bam
chr1 1156 T 26 tTttTTTtTttTttTtTtTTGTTTTT ggggeggggg^Vgf_fggggJceb_gchr1 1157 T 26 tTttTTTtTttTttTtTtTTTTTTTT ggggfggggg[RgfNfgfgg`ed^]fchr1 1158 G 26 g$GggGGGgGggGggGgGgGGGGGGGG gggg_ggggg[Ugfddgggga_eW\cchr1 1159 A 25 AaaAAAaAaaAaaAaAaAAAAAAAA gggaefggg_Xgf_fggggadd]Zgchr1 1160 A 25 AaaAAAaAaaAaaAaAaAAAAAAAA ggefggggdNVgbZbgggg`ee[\gchr1 1161 C 25 C$c$c$CCCcCccCccCcCcCCCCCCCC gfgfggfggYYgeadgggg`ea^\gchr1 1162 C 23 C$CCcCccCccCcCcCCCCCCCC^FC fgggge_`gf_dgggge_e]_ggchr1 1163 T 22 T$T$tTttTttTtTtTTTTTTTTT ggffg\Rgf_dggeggde]_cgchr1 1164 C 20 cCccCccCcCcCCCCCCCCC ggg`[gf_dggggg\d[]fgchr1 1165 A 22 a$AaaAaaAaAaAAAAAAAAA^FA^FA ged_]ggadffgggecX^ggfgchr1 1166 G 21 G$g$g$GggGgGgGGGGGGGGGGG ggc`gfWfggfggcaSdggfechr1 1167 C 19 CccCcCcCCCCCCCCCCC^FC agg\dgggggbZUdfgfggchr1 1168 T 19 TttTtTtTTTTTTTTTTTT eggcbfgfgg_cXdegfggchr1 1169 T 19 TttTtTtTTTTTTTTTTTT aggccggdggccZdggfgfchr1 1170 T 19 TttTtTtTTTTTTTTTTTT `gfcfgggggccUcggcggchr1 1171 A 19 AaaAaAaAAAAAAAAAAAA ege_fgggggcc[aggcggchr1 1172 A 19 A$aaAaAaAAAAAAAAAAAA XggLfggfggdeM_ggaggchr1 1173 G 18 g$gGgGgGGGGGGGGGGGG gf\fgggggcfPcggeggchr1 1174 A 17 a$AaAaAAAAAAAAAAAA fce[gggg_eL]ggfdfchr1 1175 A 16 A$aAaAAAAAAAAAAAA dfggfggdfS[ggegg
^ = start of read at that position $ = end of read at that position
SAMtools
![Page 38: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/38.jpg)
• Removing clonal reads– Multiple reads that map to
same position, with same orientation as usually considered PCR duplicates
– For mutation detection (less important for RNA-seq), need to collapse them into 1 read (e.g. read with highest quality score)
– samtools rmdup –s file.bam file_noclonal.bam
SAMtools
![Page 39: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/39.jpg)
5. Secondary Analysis (transcript level quantification, mutation calling)
![Page 40: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/40.jpg)
RPKM
Reads per kilobase of transcript per million reads
• R: Count how many reads map to a transcript• K: Divide by ( length of transcript / 1,000 ) • M: Divide by (total number of mapped reads in
sample / 1,000,000 )
CuffLinks uses FPKM (same as RPKM, F=fragment, for paired end reads)
![Page 41: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/41.jpg)
CuffLinks
Trapnell et al, 2010
cufflinks -p 4 –o outdir/ s_1_sequence.txt.sorted.bam
![Page 42: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/42.jpg)
http://www.broadinstitute.org/software/scripture/
http://genes.mit.edu/burgelab/miso/
![Page 43: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/43.jpg)
![Page 44: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/44.jpg)
Detecting Single Nucleotide Variations (SNVs)
![Page 45: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/45.jpg)
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
AAAATACGCGTATTCTCCCAAAACAATATC
Short read
![Page 46: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/46.jpg)
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
AAAATACGCCTATTCTCCCAAAACAATATC
Short read
![Page 47: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/47.jpg)
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
AAAATACGCCTATTCTCCCATAACAATATC
Short read
![Page 48: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/48.jpg)
Sequencing has high error rate
Mismatch = real variation OR sequencing error
AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG
Reference Human Genome (hg18)
AAAATACGCCTATTCTCCCAAAACAATATC
Short read
Typical mismatch rate of entire datasets = 0.5-2% (errors >> real variations)
![Page 49: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/49.jpg)
chr2, pos=85623221 bp
Single Nucleotide Variation
![Page 50: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/50.jpg)
chr14, pos=35859525 bp
Single Nucleotide Variation
![Page 51: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/51.jpg)
chr1, pos=220952447
Single Nucleotide Variation
![Page 52: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/52.jpg)
All cells in tumor have heterozygous mutation
A fraction of cells have heterozygous mutation
Loss of heterozygocity due to loss of genetic material
Cancer mutations
![Page 53: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/53.jpg)
The error/mismatch rate is not uniform across read length
Mismatch
![Page 54: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/54.jpg)
Popular SNV calling programs
• GATK http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit
• VarScan• http://varscan.sourceforge.net/
![Page 55: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/55.jpg)
genome
N reads at considered position
k reads with mutation
Is k greater than expected by chance, given error rates pi ?
p1p3p5p6p8 p9p10p11p14p17
The Poisson-Binomial distribution
Wacker et al, 2012; Jiang et al, 2012 Chen & Liu, 1997
SNVseeqer: Single Nucleotide Variation detection from deep sequencing data
![Page 56: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/56.jpg)
Indel calling• Complicated because indels often occur within
microsatellite regions, eg CACACACA– CA--CACACA as good as CACA--CACA, CACACA--CA
• Since reads are aligned independently, local realignment is needed
• DINDEL (used in 1000 Genomes Project)
http://www.sanger.ac.uk/resources/software/dindel/
![Page 57: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/57.jpg)
Variant annotation• Variants can be either mutation or (more often) polymorphism. dbSNP catalogs all known polymorphisms
• Missense, nonsense, intron, 3’UTR, 5’UTR, etc– SeattleSNP http://pga.gs.washington.edu/
• Severity of missense mutations– PolyPhen http://genetics.bwh.harvard.edu/pph2/ – Mutation Assessor http://mutationassessor.org/
• GATK for variant annotation http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit
• Cross-species conservation
![Page 58: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/58.jpg)
6. Read data visualization
![Page 59: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/59.jpg)
SAMtoolssamtools tview file.sorted.bam wg.fa
![Page 60: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/60.jpg)
UCSC Genome Browser
• Upload BAM file to genome browser or make it accessible to UCSC from your own web page
![Page 61: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/61.jpg)
Integrated Genome Viewer (IGV)
![Page 62: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/62.jpg)
Read count
genome
genome T A T T A A T T A T C C C C A T A T A T G A T A T
Read densities
![Page 63: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/63.jpg)
Wiggle files for Genome BrowservariableStep chrom=chr1 span=101471 0.31481 0.61491 0.61501 0.61511 0.61521 0.61531 1.11541 1.71551 1.91561 2.11571 2.51581 2.81591 3.21601 3.91611 3.91621 4.51631 4.81641 4.21651 3.91661 3.81671 3.21681 2.41691 1.91701 1.41711 1.31721 0.81871 1.41881 4.91891 9.11901 9.71911 10.71921 11.21931 12.31941 16.51951 23.41961 29.91971 32.61981 31.81991 28.02001 29.62011 30.62021 32.72031 32.72041 29.2
http://genome.ucsc.edu/goldenPath/help/bigWig.htmlhttp://genome.ucsc.edu/goldenPath/help/wiggle.html
![Page 64: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/64.jpg)
![Page 65: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/65.jpg)
7. BioConductor packages for high-througput sequencing
![Page 66: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/66.jpg)
BioC packages• IRanges
http://bioconductor.org/packages/release/bioc/html/IRanges.html• Rsamtools
http://bioconductor.org/packages/2.7/bioc/html/Rsamtools.html• ShortRead
http://bioconductor.org/packages/release/bioc/html/ShortRead.html• rtracklayer
http://bioconductor.org/packages/2.8/bioc/html/rtracklayer.html• BSgenome
http://bioconductor.org/packages/release/bioc/html/BSgenome.html
And many more
![Page 67: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/67.jpg)
SAMTools, Unix programs and R/BioC
• RSAMtools• Unix commands can be ran in R
system(“samtools rmdup –s file.bam file_noclonal.bam”)
![Page 69: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/69.jpg)
8. Challenges and evolution of sequencing and its analysis
![Page 70: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/70.jpg)
Storage is becoming a real problem
Kahn, 2011, Science
![Page 71: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/71.jpg)
Sequencing is becoming faster
![Page 72: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/72.jpg)
Reads are becoming longerPacBio
![Page 73: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/73.jpg)
How do you interpret sequencing data in a clinical context ?
![Page 74: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/74.jpg)
![Page 75: Basics of high-throughput sequencing Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e1a5503460f94b0746c/html5/thumbnails/75.jpg)
Data integrationChIP-seq for BCL6, BCOR, SMRT, H3K79me2, H3K4me1, H3K4me3, H3K27Ac, H3K9Ac, H3K27me3, and DNA methylation (HELP) in LY1 cells
Integrative statistical model
Predictions /Mechanisms
ExperimentsChIP-seq / siRNA etc
HiC