The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs)...

31
The genetic code of gene regulatory elements Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology Information National Institutes of Health October 23, 2008

Transcript of The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs)...

Page 1: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

The genetic code of gene regulatory elements

Ivan Ovcharenko

Computational Biology BranchNational Center for Biotechnology Information

National Institutes of Health

October 23, 2008

Page 2: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

• Gene deserts and distant gene regulation

• Genetic encryption of gene regulation

• Heart regulatory code

Outline

• Heart regulatory code

• Regulation of regulators:

how transcription factors regulate themselves

Page 3: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

~ 3% is coding for proteins

3 billion letters

~ 45% is “junk” (repetitive elements)

gene regulatory elements (REs) reside

SOMEWHERE in the rest ~50%

The Genome Sequence: The Ultimate Code of Life

Page 4: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Hirschprung disease is associated with a noncoding SNP

RET

Page 5: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

1950th 2000th1920th1880th

Biologically functional regions in the genome tend to stay conserved throughout the evolution.

Therefore, by aligning homologous sequences from different, but related species we can identify Evolutionary Conserved Regions (ECRs) with a putative functional importance

Comparative Sequence Analysis

Page 6: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

• Regulatory elements (REs) orchestrate temporal

E1IL4

E1 genome

E1

wild type

IL13 RAD50 IL5

In vivo validation of a regulatory element

• Regulatory elements (REs) orchestrate temporal

and spatial expression of genes

• Genetic deletion of E1 element decreased the

expression of IL4, IL13, and IL5 3-fold

• REs were identified for a handful of genes in the

human genome

• Knock out of a single candidate RE can take up to

2 years…

deletionwild type

0

Pg

/ml

0 0

IL4 IL13 IL5

wt wt wt

deletiondeletion deletion

Loots et al., Science, 2001

Page 7: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

• Functionally important elements in genomes mutate at slower rates than the

neutrally evolving background

E1

Сomparative genomics to predict regulatory elements

neutrally evolving background

– 1% of sequence is conserved between humans and fish

– 75% of genes are conserved between humans and fish

• RE E1 is highly conserved in human and mouse genomes.

• Comparative genomics can be utilized to prioritize functional elements

Page 8: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Gene Deserts in the Human Genome

Gene deserts = 25% of the human genome sequence

Page 9: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Gene deserts

~500 gene deserts in the human genome

Human chromosome 13

~50% of the HSA13 consists of gene deserts

Gene deserts are NOT enriched in repetitive elements

Page 10: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

430 kb 0.9Mb1.3Mb

Over 1,000 human/mouse ncECRs

DACH gene deserts on chromosome 13

Page 11: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Phylogenetic conservation of DACH gene deserts

Reporter genePromncECRNobrega et al., Science, 2004

Page 12: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

~200 stablegene deserts

- DACH

- OTX2

- SOX2

Dichotomy in the evolutionary conservation of gene deserts

~300 variablegene deserts

-vs-

- Deletion does not lead to a phenotype

Page 13: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

stable gene deserts

- transcription

- regulation of transcription

- regulation of metabolism

- development

etc.

variable gene deserts

Function of genes flanking gene deserts

Page 14: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

gene RE

1. Do stable gene deserts harbor distant regulatory elements?

Distant gene regulation

2. Does the distance between a RE and a gene matter?

gene RE

gene RE

gene REOR

Page 15: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Chromosomal stability of gene deserts

gene RE

Page 16: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

gene RE

Distant REs are distance-independent

gene RE

gene REOR

Page 17: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Gene deserts :: Summary

25% of the human genome consists of gene deserts

There are 2 classes of gene deserts – stable and variable gene deserts

Stable gene deserts are evolutionarily protected from chromosome rearrangements indicating presence of distant regulatory elements

Experimental validation of deeply conserved sequences in some stable gene deserts Experimental validation of deeply conserved sequences in some stable gene deserts confirms their enhancer activity

Gene regulation based on distant regulatory elements does not probably depend on the distance between a regulatory element and the gene it regulates

Page 18: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

• Gene deserts and distant gene regulation

• Genetic encryption of gene regulation

• Heart regulatory code

Outline

• Heart regulatory code

• Regulation of regulators:

how transcription factors regulate themselves

Page 19: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation
Page 20: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

• Transcription factor (TF) proteins bind to very

short (6 to 10 nucleotides) binding sites (TFBS)

• Combinatorial binding of multiple TFs to a RE

defines a specific pattern of gene expression

• Correlating patterns of TFBS in REs with

biological function will decode gene regulation

Patterns of transcription factor binding sites define biological functions of REs

GENE

aaCTGACTCTGACTgaaaagaaaaCTGATATTGCTGATATTGacagtacagtTTGTTGTTGTTGTTGTTGttaattaa

TFBS TFBS TFBS

REGULATORY ELEMENT (RE)

Protein A Protein BProtein C

DNA

Page 21: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Reporter Gene

Limbs

regulatorymodule A

CNS

Reporter Generegulatorymodule B

Page 22: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

GENE

aaCTGACTCTGACTgaaaagaaaaCTGATATTGCTGATATTGacagtacagtTTGTTGTTGTTGTTGTTGttaattaa

TFBS TFBS TFBS

Protein A Protein BProtein C

Computational identification of TFBS

-- Transcription factor bindings sites are very short (~ 6-10 bp)

-- Computational predictions of transcription factor binding sites are overwhelmed with false positives

aaCTGACTCTGACTgaaaagaaaaCTGATATTGCTGATATTGacagtacagtTTGTTGTTGTTGTTGTTGttaattaa

REGULATORY ELEMENT (RE)

Page 23: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

GENEncECR ncECRncECR ncECRncECR

Deciphering the genetic code of enhancers

Page 24: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

GNF Expression Atlas 2

79 human tissues

15k h

um

an g

enes

Tissue A

15K genes

300 highly expressed genes

3,000 lowly expressed genes

Selecting candidate

enhancers based on

noncoding conservation

Mapping conserved

TFBS in candidate

enhancers

Enhancer Identification (EI) method

Assigning tissue

weights to TFs

TF1

TF2

TFN

tissue1

tissue2

tissueM

1.4

0.5

-1.2

TFBS signatures in

candidate enhancers

defining tissue-specific

gene expression

Predicting tissue-specific

enhancers

X X

X X X

-vs-

X

Highly expressed gene

Lowly expressed gene

Promoter

3+ top noncoding ECR

No prediction

TFBS

Legend

Page 25: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

50%

60%

70%

80%

90%

100%

Overexpressed

Neutral

Tissue-specificity noncoding signatures

0%

10%

20%

30%

40%

trac

hea

trig

emin

al g

ang

lion

skin

thym

uspa

ncre

asad

rena

l gla

nd

bone

marrow

uter

us

panc

reat

ic is

lets

tong

ue

feta

l liv

erlu

ng

skel

etal

mus

cle

pros

tate

hear

tad

ipoc

yte

kidne

yth

yroi

d

card

iac

myo

cyte

sliv

er

bron

chia

l epi

thel

ial c

ells

BM

-CD

71+ e

arly

ery

thro

idw

hol

e bl

ood

BM

-CD

34+

test

is

BM

-CD

33+ m

yelo

id

PB-B

DC

A4+

den

triti

c ce

lls

smoot

h m

uscl

epl

acen

ta

PB-C

D19+

Bce

lls

PB-C

D8+

Tce

llssp

inal

cor

dw

hol

e br

ain

feta

l bra

in

Page 26: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Known skeletal muscle enhancer

Page 27: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

DACH brain/CNS enhancers

Page 28: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Precision (the fraction of predicted tissue-specific enhancers that are correct):

EI Performance

heart

liver79 human tissues

Page 29: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Associating TFs with tissue specificities

Page 30: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

EI

Optimization

~4k heart candidate enhancers

Database of enhancers in the human genome

Optimization

~8k cancer candidate enhancers

http://www.dcode.org/EI

Page 31: The genetic code of gene regulatory elements · 2008. 12. 14. · • Regulatory elements (REs) orchestrate temporal IL4 E1 E1 genome E 1 wild type IL13 RAD50 IL5 In vivo validation

Gene regulatory code :: Summary

Combination of comparative genomics, gene expression data, and TFBS clustering provides a tool to define the sequence code of tissue-specific enhancers

Over 7,000 candidate tissue-specific enhancers had been predicted in the human genome

Enhancers for several tissues (including heart and liver) were predicted with high precision

Prediction of TFs associated with the regulation of tissue-specific expression provides the means to move from microarray expression data directly to the

identification of active TFs