Python meetup 2014

16
Unveiling Epigenetic Regulation with Next Generation Sequencing (NGS) and Python Ying Liu Weill Cornell Medical College The New York Python Meetup, 05-29-2014

Transcript of Python meetup 2014

Page 1: Python meetup 2014

Unveiling Epigenetic Regulation with

Next Generation Sequencing (NGS) and Python

Ying Liu

Weill Cornell Medical College

The New York Python Meetup, 05-29-2014

Page 2: Python meetup 2014

About Me

• PhD candidate, Weill Cornell Medical College

• Major Area: Stem cell epigenomics, Computational Biology

• Graduation: Fall 2014

• Email: [email protected]

• LinkedIn: https://www.linkedin.com/pub/ying-liu/b/669/605

Page 3: Python meetup 2014

Reprogram > 20 days(Thousands of genes change expression)

Induced pluripotent stem

(iPS) cells

Adult cells

Express pluripotent stem cell

specific genes (4 genes)

2012 Nobel Prize in

Physiology or Medicine

Generate Pluripotent Stem Cells

from Mature Adult Cells

Limitation

• Reprogram efficiency: 0.01 - 0.1%

• Molecular mechanism is not fully understood

?

Page 4: Python meetup 2014

Human Genome Project

• Human genome: ~ 3 billion DNA base pairs

• Complete sequence: 2003

First sequence draft: 2001

Page 5: Python meetup 2014

Nature 454, 711-715

Gene Expression

My project: Histone X

Enriches at expressing genes

Epigenetic Regulation

• Epigenetics: study of heritable changes

in gene activity that are NOT caused by

changes in the DNA sequence

• One of the major epigenetic regulators:

Histone protein

Histone

proteins

DNA

Page 6: Python meetup 2014

Induced pluripotent stem

(iPS) cells

Adult cells

2012 Nobel Prize in

Physiology or Medicine

Project

Detect histone X function in initiating

adult cells reprogramming to iPS cells.

Experiment

• Collect cells at the beginning (Day 0, 3, 6,

10) and after reprogramming (iPS);

• Map genome-wide histone X localization

with Next Generation Sequencing (NGS);

• Analyze the dynamic change of genome-

wide histone X localization with Python

program and framework.

Reprogram > 20 days(Thousands of genes change expression)

Express pluripotent stem cell

specific genes (4 genes)

Generate Pluripotent Stem Cells

from Mature Adult Cells

Page 7: Python meetup 2014

Next Generation

DNA Sequencing

(Illumina, Inc)

Genome-wide Analysis of Epigenetic Regulation

Computation analysis (by genome)

Tools: Python, R, etc.

Align DNA sequence to chromosome

Display in genome browser (by gene)

chromosome

Page 8: Python meetup 2014

10 kb

Day 0

Day 3

Day 6

Day 10

Day 0

Day 3

Day 6

Day 10

Histone

X

K27me3

Pou5f1 Nanog

Histone X Enriches Near Stem Cell Specific Genes

At the Beginning of Cell Reprogramming

Genome browser (IGV)

Alignment output (BED format)

chr1 3000062 3000113 HWI-1KL117_0134:6:2101:14893:19331#ACAGTG/A..GTG. 37 +

chr1 3000113 3000164 HWI-1KL117_0134:6:2302:6790:10626#ACAGTG/A..GT.. 37 +

chr1 3000146 3000197 HWI-1KL117_0134:6:2303:8145:108924#ACAGTG/A..GT.. 37 -

chr1 3000154 3000205 HWI-1KL117_0134:6:2202:14995:109690#ACAGTG/A..GT.. 37 -

chr1 3000241 3000292 HWI-1KL117_0134:6:1304:12589:77263#ACAGTG/A..GT.. 25 -

chr1 3000311 3000362 HWI-1KL117_0134:6:1101:17212:111473#ACAGTG/A..GT.. 37 -

chr1 3000334 3000385 HWI-1KL117_0134:6:2308:10385:78074#ACAGTG/A..GT.. 25 -

chr1 3000385 3000436 HWI-1KL117_0134:6:2102:20734:102615#ACAGTG/A..GG.. 37 +

chr1 3000498 3000549 HWI-1KL117_0134:6:1203:3146:72739#ACAGTG/A..GTG. 37 -

chr1 3000538 3000589 HWI-1KL117_0134:6:1101:1921:57017#ACAGTG/A..GT.. 37 +

Chrom Start End Strand

Page 9: Python meetup 2014

Computational Pipeline for Genome-wide DNA

Sequence Analysis

Bardet AF, Stark A, Nature Protocols, 2012

Alignment Analysis (Python, Perl)

• BWA

• Picard

• Samtools

• MACS, Cistrome (X. Shirley Liu Lab)

• ChIPseeqer (Olivier Elemento Lab)

Page 10: Python meetup 2014

Peak Identification with Python Program:

Model-based Analysis of ChIP-Seq (MACS)

Zhang Y, Liu XS, et al. Genome Biology 2008

Feng J, Liu XS, et al. Nature Protocol 2012

(1)

(2)

Requirement: ~3 GB of RAM, 1.5 h per data set with

30 million DNA sequence reads.

d: estimated DNA fragment size

5’

3’

3’

5’

d

• Read distribution: Poisson distribution

• Use dynamic λlocal to capture local biases in the genome

λlocal = max (λBG, [λregion, λ1k], λ5k, λ10k)

λBG: constant estimated from the genome background

λregion: estimated from the candidate region

λ1k, λ5k, λ10k: estimated from 1kb, 5kb, 10kb local window in the control

• p-value: default threshold is 10-5

(3)

(4)

Page 11: Python meetup 2014

Galaxy / Cistrome

MACS integrated web-based application: http://cistrome.org/ap/

Page 12: Python meetup 2014

ChIPseeqer

• Graphical User Interface

• Command-line

http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml

Page 13: Python meetup 2014

10 kb

Day 0

Day 3

Day 6

Day 10

Day 0

Day 3

Day 6

Day 10

Histone

X

K27me3

Pou5f1 Nanog

Histone X Enriches Near Stem Cell Specific Genes

At the Beginning of Cell Reprogramming

Page 14: Python meetup 2014

Day: 0 3 6 10 iPS E 0 3 6 10 iPS E

L H L H

Expression Histone X

Ex

pre

ss

ion

Ch

an

ge

Ex

pre

ss

ion

Sta

ble

Pou5f1

Sox2

Cdh1

Cldn3

Jag2

Zbtb32

Elf3

Msh6

Lefty1

Piwil2

Notch4

Tjp3

Fbxo15

Cldn6

Foxh1

Zp3

Fgf15

Nodal

Tdgf1

Gdf3

Nanog

Fgf4

Dppa3

Histone X Enriches At Stem Cell Specific Gene Promoters

Prior to Gene Expression Activation

Embryonic placenta development

Stem cell maintenance

Response to nutrient

Cell-cell signaling

DNA metabolic process

DNA recombination

Formation of primary germ layer

Chromosome organization

Mesoderm development

Cell fate commitment

Stem cell differentiation

Blastocyst formation

Meiosis

Sexual reproduction

Thyroid hormone metabolic process

Cellular response to abiotic stimulus

X0

X1

X2

X3

X4

X5

X6

Embryonic placenta development, GO:0001892

Stem cell maintenance, GO:0019827

Response to nutrient, GO:0007584

Cell-cell signaling, GO:0007267

DNA metabolic process, GO:0006259

DNA recombination, GO:0006310

Formation of primary germ layer, GO:0001704

Chromosome organization, GO:0051276

Mesoderm development, GO:0007498

Cell fate commitment, GO:0045165

Stem cell differentiation, GO:0048863

Blastocyst formation, GO:0001825

Meiosis, GO:0007126

Sexual reproduction, GO:0019953

Thyroid hormone metabolic process, GO:0042403

Cellular response to abiotic stimulus, GO:0071214

5

Enrichment

-5

Depletion

X0

X1

X2

X3

X4

X5

X6

Embryonic placenta development, GO:0001892

Stem cell maintenance, GO:0019827

Response to nutrient, GO:0007584

Cell-cell signaling, GO:0007267

DNA metabolic process, GO:0006259

DNA recombination, GO:0006310

Formation of primary germ layer, GO:0001704

Chromosome organization, GO:0051276

Mesoderm development, GO:0007498

Cell fate commitment, GO:0045165

Stem cell differentiation, GO:0048863

Blastocyst formation, GO:0001825

Meiosis, GO:0007126

Sexual reproduction, GO:0019953

Thyroid hormone metabolic process, GO:0042403

Cellular response to abiotic stimulus, GO:0071214

5

Enrichment

-5Depletion

Expression Active Stable

Group - a b c a b c

Gene Ontology Analysis

a. Histone X enrich during Day 0 – 10

b. Histone X enrich in iPS (after Day 10)

c. Histone X not Enrich

Page 15: Python meetup 2014

Induced pluripotent stem

(iPS) cells

Adult cells

Limitation

• Reprogram efficiency: 0.01 - 0.1%

• Molecular mechanism is not fully understood

Our Genome-wide analysis suggests:

Histone X participates in stem cell gene activation

at the early stage of adult cell reprogram.

Express pluripotent stem cell

specific genes (4 genes)

Reprogram > 20 days(Thousands of genes change expression)

Generate Pluripotent Stem Cells

from Mature Adult Cells

Page 16: Python meetup 2014

AcknowledgementThesis advisors

Dr. Shahin Rafii (Weill Cornell Medical College)

Dr. C. David Allis (Rockefeller Univ.)

Collaborators

Dr. Olivier Elemento (Weill Cornell Medical College)

Dr. Eugenia Giannopoulou (Hospital for Special Surgery)