20091110 Technical Seminar ChIP-seq Data Analysis

44
Tools and challenges for ChIP-seq data analysis Alba Jené Sanz Biomedical Genomics Lab (UPF)

Transcript of 20091110 Technical Seminar ChIP-seq Data Analysis

Page 1: 20091110 Technical Seminar  ChIP-seq Data Analysis

Tools and challenges for ChIP-seq data analysis

Alba Jené SanzBiomedical Genomics Lab (UPF)

Page 2: 20091110 Technical Seminar  ChIP-seq Data Analysis

Overview

1. ChIP-seq – The basics

2. Typical pipeline

3. Challenges in ChIP-seq data analysis

4. To take into account

5. Available tools

6. Analysis example

7. Future Challenges

8. Where to look for help

Page 3: 20091110 Technical Seminar  ChIP-seq Data Analysis

1. ChIP-seq – The Basics

Page 4: 20091110 Technical Seminar  ChIP-seq Data Analysis

ChIP-on-chip

ChIP-seq

1. ChIP-seq – The Basics

Page 5: 20091110 Technical Seminar  ChIP-seq Data Analysis

ChIP-on-chip

ChIP-seq

Bioinformatics

1. ChIP-seq – The Basics

Page 6: 20091110 Technical Seminar  ChIP-seq Data Analysis

35 bp

500 bp 35 bp

1. ChIP-seq – The Basics

Page 7: 20091110 Technical Seminar  ChIP-seq Data Analysis

1. ChIP-seq – The Basics

Page 8: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

Page 9: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

Page 10: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

Bowtie

Page 11: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

MACSBowtie

Page 12: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

MACSBowtie

CEAS

Page 13: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

Mapping…

Peak calling…

Unique / multiple locations

Allowing mismatches – seed sequence

Balance accuracy / performance

Page 14: 20091110 Technical Seminar  ChIP-seq Data Analysis

2. Typical pipeline

Page 15: 20091110 Technical Seminar  ChIP-seq Data Analysis

3. Challenges in ChIP-seq data analysis

Millions of segments that need a fast mapping to the genome (allowing

mismatches or gaps, performance issues)

Peak detection – find the exact binding site

Data normalization – compare results, background noise

Visualization – thousands of enriched regions. UCSC, JBrowse…

Page 16: 20091110 Technical Seminar  ChIP-seq Data Analysis

4. To take into account

Transcription Factors vs Nucleosomes / Histone modifications

Control available?

Sequencing depth bias in Control vs IP

Different alignment methods produce different peak calling results, but the difference is

not as much as the one due to different peak caller or replicate

Many differences on peak callers can be explained by the different thresholds used

Some peak callers may be specific to some data types

Consistency may be used to set threshold if replicates are available

Page 17: 20091110 Technical Seminar  ChIP-seq Data Analysis

4. To take into account

There are many tools for the analysis of ChIP-

seq data, but no standards yet

Page 18: 20091110 Technical Seminar  ChIP-seq Data Analysis

5. Available tools

Page 19: 20091110 Technical Seminar  ChIP-seq Data Analysis

5. Available tools

Page 20: 20091110 Technical Seminar  ChIP-seq Data Analysis

5. Available tools

Page 21: 20091110 Technical Seminar  ChIP-seq Data Analysis

5. Available tools

Page 22: 20091110 Technical Seminar  ChIP-seq Data Analysis

Uses regional averaging to mitigate sample fluctuations in the control library

Uses the control to model the distribution across the genome using the Poissondistribution (BG). After identifying candidate peaks significantly enriched over theBG, a local labda is estimated using windows around each peak to eliminate local biases

Open-source, open to contributions (Artistic License) and being activelyimproved

Easy to use and fast-responding developers

Compares very well to other methods

5. Available tools

Page 23: 20091110 Technical Seminar  ChIP-seq Data Analysis

lane5_SNAIL_F9_qseq.txt

SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... \Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`\_aabaab`U\ba\BBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_\T]`\]M]OLP^^\[`WBBBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^\^[``BBBBBBBBBBBBBBBBBB 1

QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)

@3_1_3_89

CACAGTGTCCTCCAGGTTCATCCC................

+3_1_3_89

abbab^aaaaaaa\VUVbBBBBBBBBBBBBBBBBBBBBBB

@3_1_3_762

GCAAACAAATGGCGGAAAGCGGCG................

+3_1_3_762

aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB

@3_1_90_512

GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT

+3_1_90_512

ab`a_a`X``WTGW]T]S\Z]T[aXa_T^]XP\]\H_VXY

@3_1_90_1028

GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG

+3_1_90_1028

`bba`X_^ab_aWS`_\b[`aa_]TZ^VY\a`VW^`^b`a

@3_1_90_1651

TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT

+3_1_90_1651

aUVF[aa\`VU_`aaU[__aaa\YV^aP`aQU\a`_^\_a

@3_1_90_1670

@FC30C11AAXX:8:1:1649:1790

GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663

@FC30C11AAXX:8:1:1655:1811

GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT

+

<<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+

@FC30C11AAXX:8:1:1609:1848

GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT

+

<<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-

@FC30C11AAXX:8:1:1667:1880

GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566

@FC30C11AAXX:8:1:1577:1853

GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG

+

<6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044

6. Analysis example

Page 24: 20091110 Technical Seminar  ChIP-seq Data Analysis

lane5_SNAIL_F9_qseq.txt

SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... \Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`\_aabaab`U\ba\BBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_\T]`\]M]OLP^^\[`WBBBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^\^[``BBBBBBBBBBBBBBBBBB 1

QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)

@3_1_3_89

CACAGTGTCCTCCAGGTTCATCCC................

+3_1_3_89

abbab^aaaaaaa\VUVbBBBBBBBBBBBBBBBBBBBBBB

@3_1_3_762

GCAAACAAATGGCGGAAAGCGGCG................

+3_1_3_762

aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB

@3_1_90_512

GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT

+3_1_90_512

ab`a_a`X``WTGW]T]S\Z]T[aXa_T^]XP\]\H_VXY

@3_1_90_1028

GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG

+3_1_90_1028

`bba`X_^ab_aWS`_\b[`aa_]TZ^VY\a`VW^`^b`a

@3_1_90_1651

TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT

+3_1_90_1651

aUVF[aa\`VU_`aaU[__aaa\YV^aP`aQU\a`_^\_a

@3_1_90_1670

@FC30C11AAXX:8:1:1649:1790

GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663

@FC30C11AAXX:8:1:1655:1811

GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT

+

<<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+

@FC30C11AAXX:8:1:1609:1848

GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT

+

<<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-

@FC30C11AAXX:8:1:1667:1880

GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566

@FC30C11AAXX:8:1:1577:1853

GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG

+

<6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044

6. Analysis example

Filter qualities and parse

Page 25: 20091110 Technical Seminar  ChIP-seq Data Analysis

lane5_SNAIL_F9_qseq.txt

SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... \Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0

SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`\_aabaab`U\ba\BBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_\T]`\]M]OLP^^\[`WBBBBBBBBBBBBBBBBBBBB 1

SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^\^[``BBBBBBBBBBBBBBBBBB 1

QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)

@3_1_3_89

CACAGTGTCCTCCAGGTTCATCCC................

+3_1_3_89

abbab^aaaaaaa\VUVbBBBBBBBBBBBBBBBBBBBBBB

@3_1_3_762

GCAAACAAATGGCGGAAAGCGGCG................

+3_1_3_762

aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB

@3_1_90_512

GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT

+3_1_90_512

ab`a_a`X``WTGW]T]S\Z]T[aXa_T^]XP\]\H_VXY

@3_1_90_1028

GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG

+3_1_90_1028

`bba`X_^ab_aWS`_\b[`aa_]TZ^VY\a`VW^`^b`a

@3_1_90_1651

TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT

+3_1_90_1651

aUVF[aa\`VU_`aaU[__aaa\YV^aP`aQU\a`_^\_a

@3_1_90_1670

@FC30C11AAXX:8:1:1649:1790

GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663

@FC30C11AAXX:8:1:1655:1811

GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT

+

<<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+

@FC30C11AAXX:8:1:1609:1848

GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT

+

<<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-

@FC30C11AAXX:8:1:1667:1880

GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA

+

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566

@FC30C11AAXX:8:1:1577:1853

GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG

+

<6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044

6. Analysis example

BOWTIE

Filter qualities and parse

Page 26: 20091110 Technical Seminar  ChIP-seq Data Analysis

SNAIL_F9.bwt

5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T

5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G

5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0

5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0

5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0

5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0

5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0

5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0

5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G

6. Analysis example

Page 27: 20091110 Technical Seminar  ChIP-seq Data Analysis

SNAIL_F9.bwt

5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T

5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G

5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0

5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0

5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0

5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0

5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0

5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0

5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G

SNAIL_F9.bwt.bed

chr21 34604194 34604219 5_1_0_1409 . +

chr23 77246408 77246434 5_1_0_811 . +

chr02 201785208 201785237 5_1_1_1665 . +

chr15 92942360 92942389 5_1_2_1637 . +

chr03 101351498 101351527 5_1_2_1359 . +

chr05 1314600 1314629 5_1_2_730 . -

chr07 157199758 157199787 5_1_2_1118 . -

chr11 133317176 133317208 5_1_3_920 . +

chr12 7497006 7497038 5_1_3_971 . +

chr01 201404048 201404081 5_1_3_1986 . +

6. Analysis example

Parsing

Page 28: 20091110 Technical Seminar  ChIP-seq Data Analysis

SNAIL_F9.bwt

5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T

5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G

5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0

5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0

5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0

5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0

5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0

5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0

5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G

SNAIL_F9.bwt.bed

chr21 34604194 34604219 5_1_0_1409 . +

chr23 77246408 77246434 5_1_0_811 . +

chr02 201785208 201785237 5_1_1_1665 . +

chr15 92942360 92942389 5_1_2_1637 . +

chr03 101351498 101351527 5_1_2_1359 . +

chr05 1314600 1314629 5_1_2_730 . -

chr07 157199758 157199787 5_1_2_1118 . -

chr11 133317176 133317208 5_1_3_920 . +

chr12 7497006 7497038 5_1_3_971 . +

chr01 201404048 201404081 5_1_3_1986 . +

6. Analysis example

Parsing

MACS

Page 29: 20091110 Technical Seminar  ChIP-seq Data Analysis

MACS pipeline

Output:

- Peak locations in BED and XLS format (genome browser)

- Tag count in wiggle format (genome browser)

- Bimodal model in R scripts

6. Analysis example

Page 30: 20091110 Technical Seminar  ChIP-seq Data Analysis

PolII

H3K27me3

6. Analysis example

Page 31: 20091110 Technical Seminar  ChIP-seq Data Analysis

snail_mfold_15_MACS.wig

track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS

tag counts for every 10 bp"

variableStep chrom=chr10 span=10

85171 1

85181 1

85191 1

85201 1

85211 1

85221 1

85231 2

85371 2

snail_mfold_15_tsize41_newbwt_peaks.bed

track name="MACS peaks for snail_mfold_15_tsize41_newbwt"

chr1 559644 559924 MACS_peak_1 79.29

chr1 2435221 2435542 MACS_peak_2 51.58

chr1 14624217 14624571 MACS_peak_3 66.12

chr1 15610639 15611000 MACS_peak_4 56.69

chr1 16822564 16822753 MACS_peak_5 52.84

chr1 18411948 18412187 MACS_peak_6 82.46

chr1 22857612 22857985 MACS_peak_7 88.74

chr1 27541904 27542134 MACS_peak_8 69.47

6. Analysis example

Page 32: 20091110 Technical Seminar  ChIP-seq Data Analysis

snail_mfold_15_MACS.wig

track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS

tag counts for every 10 bp"

variableStep chrom=chr10 span=10

85171 1

85181 1

85191 1

85201 1

85211 1

85221 1

85231 2

85371 2

snail_mfold_15_tsize41_newbwt_peaks.bed

track name="MACS peaks for snail_mfold_15_tsize41_newbwt"

chr1 559644 559924 MACS_peak_1 79.29

chr1 2435221 2435542 MACS_peak_2 51.58

chr1 14624217 14624571 MACS_peak_3 66.12

chr1 15610639 15611000 MACS_peak_4 56.69

chr1 16822564 16822753 MACS_peak_5 52.84

chr1 18411948 18412187 MACS_peak_6 82.46

chr1 22857612 22857985 MACS_peak_7 88.74

chr1 27541904 27542134 MACS_peak_8 69.47

6. Analysis example

CEAS

Page 33: 20091110 Technical Seminar  ChIP-seq Data Analysis

Input:

-BED format peak locations

- Optional signal profile in wiggle format

- BED format extra regions of interest

6. Analysis example

Page 34: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 35: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 36: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 37: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 38: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 39: 20091110 Technical Seminar  ChIP-seq Data Analysis

CEAS output

Page 40: 20091110 Technical Seminar  ChIP-seq Data Analysis

7. Future challenges

Re-analyze data with new algorithms – sequences remain the same

ChIP-seq combined with Chromatin Conformation Capture (3C) –long-range physical interactions

Technical improvements: RNA-seq will benefit from longer reads

Integrated computational analyses – integration of TF, histonemarks, methylation, polymerase loading to predict regulatory output

Page 41: 20091110 Technical Seminar  ChIP-seq Data Analysis

8. Where to look for help...

Seqanswers.com

Page 42: 20091110 Technical Seminar  ChIP-seq Data Analysis

8. Where to look for help...

Seqanswers.com

Google groups, mailing lists of each project

MACS

CEAS FindPeaks

Page 43: 20091110 Technical Seminar  ChIP-seq Data Analysis

8. Where to look for help...

Seqanswers.com

Google groups, mailing lists of each project

Lab mates!

MACS

CEAS FindPeaks

Page 44: 20091110 Technical Seminar  ChIP-seq Data Analysis