Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep...

53
Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9, 2008

Transcript of Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep...

Page 1: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Genomics of Gene Regulation

Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders

Jackson Laboratories

Ross Hardison

September 9, 2008

Page 2: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Heritable variation in gene regulation

“Simple” Mendelian traits, e.g. thalassemias

Variation in expression is common in normal individuals

Variation in expression may be a major contributor to complex traits (including heart, lung, blood and sleep disorders)

Page 3: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Deletions of noncoding DNA can affect gene expression

Forget and Hardison, Chapter in Disorders of Hemoglobin, 2nd edition

Page 4: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Substitutions in promoters can affect expression

Forget and Hardison, Chapter in Disorders of Hemoglobin, 2nd edition

Page 5: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Variation of gene expression among individuals

• Levels of expression of many genes varies in humans (and other species)

• Variation in expression is heritable• Determinants of variability map to discrete genomic intervals• Often multiple determinants• Points to an abundance of cis-regulatory variation in the human

genome• "We predict that variants in regulatory regions make a greater

contribution to complex disease than do variants that affect protein sequence" Manolis Dermitzakis, ScienceDaily– Microarray expression analyses of 3554 genes in 14 families

• Morley M … Cheung VG (2004) Nature 430:743-747

– Expression analysis of EBV-transformed lymphoblastoid cells from all 270 individuals genotypes in HapMap

• Stranger BE … Dermitzakis E (2007) Nature Genetics 39:1217-1224

Page 6: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Risk loci in noncoding regions

(2007) Science 316: 1336-1341

Page 7: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

DNA sequences involved in regulation of gene transcription

Protein-DNA interactions

Chromatin effects

Page 8: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Specific DNA sequences bind proteins that recruit transcriptional machinery

Maston G, Evans S and Green MR (2006) Annu Rev Genomics Hum Genetics 7:29-59

Page 9: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Distinct classes of regulatory regions

Maston G, Evans S and Green M (2006) Annu Rev Genomics Hum Genetics 7:29-59

Act in cis, affecting expression of a gene on the same chromosome.

Cis-regulatory modules (CRMs)

Page 10: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

CRMs are clusters of specific binding sites for transcription factors

Hardison (2002) on-line textbook Working with Molecular Genetics http://www.bx.psu.edu/~ross/

Page 11: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Silent and repressed chromatin

Hardison (2002) on-line textbook Working with Molecular Genetics http://www.bx.psu.edu/~ross/

Page 12: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Transcription initiation and pausing

General transcription initiation factors, GTIFs

Assemble on promoter

Repressors bindto negative controlelements

Page 13: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Basal and activated transcription

Activators bind to enhancers

Page 14: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Histone modifications modulate chromatin structure

http://www.imt.uni-marburg.de/bauer/images/fig2.jpg Uta-Maria Bauer

H3K27me3H3K4me2, 3

Page 15: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Biochemical features of DNA in CRMs

Pol IIaPol II

Coactivators

Accessible to cleavage: DNase hypersensitive site

Bound by specific transcription factors

Associated with RNA polymerase and general transcription factors

Nucleosomes with histone modifications:Acetylation of H3 and H4Methylation of H3K4

Clusters of binding site motifs

Page 16: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Examples of genome-wide data on CRM features

• RNA polymerase II, preinitiation complex– IMR90 cells: Kim TH …Ren B (2005) Nature 436: 876-880

• Start sites for transcription– Carninci et al. (2006) Nature Genetics 38:626-635

• Histone modifications– T cells: Roh ... Zhao K (2006) PNAS 103:15782-15878

• Insulator protein CTCF– Primary fibroblasts: Kim TH … Ren B (2007) Cell 128:1231-1245

• DNase hypersensitive sites– CD4+ T cells: Boyle… Crawford G (2008) Cell 132:311-322

• Many datastreams: ENCODE project– Birney et al. (2007) Nature 477:799-816

Page 17: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Chromatin immunoprecipitation: Greatly enrich for DNA occupied by a protein

Elaine Mardis (2007) Nature Methods 4: 613-614

Page 18: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

ChIP-chip: High throughput mapping of DNA sequences occupied by protein

http://www.chiponchip.org Bing Ren’s lab

Page 19: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Enrichment of sequence tags reveals function

Barbara Wold & Richard M Myers (2008) “Sequence Census Methods” Nature Methods 5:19-21

Page 20: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Genomic features at T2D risk variants

Overlap of SNP rh564398 with DHS suggests a role in transcriptional regulation,but overlap with an exon of a noncoding RNA suggests a role in post-transcriptionalregulation. Different hypotheses to test in future work.

Page 21: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

GATA-1 occupancy in erythroid cells

Page 22: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

GATA-1 is required for erythroid maturation

Aria Rad, 2007 http://commons.wikimedia.org/wiki/Image:Hematopoiesis_(human)_diagram.png

MEP Hematopoietic stem cell

Commonmyeloidprogenitor

Myeloblast

Basophil

Commonlymphoidprogenitor

Neutrophil

Eosinophil

Monocyte, macrophage

GATA-1G1E cells

G1E-ER4 cells

Page 23: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

GATA-1 occupancy over a large chromosomal region

----( )---- Ahsp

enhancer

ChIP: antibody to GATA-1chip: NimbleGen high density tiling arrayYong Cheng, Lou Dore,…Xinmin Zhang, Roland Green, Mitch Weiss, R.H.

Page 24: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

ChIP-chip for GATA-1 at Hbb locus

Page 25: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

GATA-1 ChIP-chip hits localize to targets of this transcription factor

GATA-1

Page 26: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Almost all sites occupied by GATA-1 have the consensus binding site motif WGATAR

• Of the 63 validated ChIP-chip hits, 60 (95%) have at least one WGATAR motif

– Other 3 have AGATAT, GGATAT, CGATAG, …– Of 6000 randomly chosen DNA intervals of 500bp from the 66Mb, 3886 (65%)

have a WGATAR motif– Occupied sites are about 1.4-fold enriched for the motif

• GATA-1 discriminates exquisitely among available sites– Only 94 out of 78,013 potential sites (500bp interval with at least one WGATAR)

are occupied– About 1 in 1000 intervals are occupied– Indicates exquisite specificity of the ChIP-chip data (<99%)

Page 27: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

DNA segments occupied by GATA-1 were tested for enhancer activity on transfected plasmids

Occupiedsegments

Page 28: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Some of the DNA segments occupied by GATA-1 are active as enhancers

Page 29: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Comparative genomics for predicting CRMs

• Sometimes high quality data on biochemical signatures of CRMs is not available

• Use sequence properties of CRMs for prediction• Clusters of binding site motifs for transcription factors

– Low specificity - MANY false positives

• Deep conservation of noncoding DNA sequences, from humans to fish or chicken – Low sensitivity - less than 5% of CRMs show signs of constraint across

vertebrates

• Conservation of clusters of transcription factor binding sites in mammals

• Conservation patterns that distinguish CRMs from neutral DNA

Page 30: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Finding clusters of binding sites for transcription factors

• Resources and servers for finding transcription factor binding sites (TFBSs)– TRANSFAC http://www.gene-regulation.com/

– JASPAR http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl

– TESS http://www.cbil.upenn.edu/cgi-bin/tess/tess

– MOTIF (GenomeNet) http://motif.genome.jp/

– MatInspector http://www.genomatix.de/

Page 31: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Finding known motifs in a query sequence

MatInspector at http://www.genomatix.de/K. Cartharius et al. (2006) MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 21:2933-2942. Genomatix Software GmbH, Munchen, Germany

Query: an enhancer in SOX61356 bp

About 1 in 4 bp is the start of a TFBS match!

Page 32: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Three modes of evolution

Page 33: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Negative and positive selection observed at different phylogenetic distances

:

Page 34: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

phastCons score identifies conserved DNA segments

Siepel et al. 2005, Genome Research

Page 35: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Ultraconserved elements = UCEs

• At least 200 bp with no interspecies differences– Bejerano et al. (2004) Science 304:1321-1325

– 481 UCEs with no changes among human, mouse and rat

– Also conserved between out to dog and chicken

– More highly conserved than vast majority of coding regions

• Most do not code for protein – Only 111 out of 481overlap with protein-coding exons

– Some are developmental enhancers.

– Nonexonic UCEs tend to cluster in introns or in vicinity of genes encoding transcription factors regulating development

– 88 are more than 100 kb away from an annotated gene; may be distal enhancers

Page 36: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Intronic UCE in SOX6 enhances expression in melanocytes in transgenic mice

Pennacchio et al., http://enhancer.lbl.gov/

UCEsTested UCEs

Page 37: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Distinctive divergence rates for different types of functional DNA sequences

pTRRs: putative transcriptional regulatory region; likely CRMs

Sites identified as occupied by sequence-specific transcription factors based on high-throughput chromatin immunoprecipitation assayed by hybridization to high density tiling arrays of genomic DNA= ChIP-chip

Page 38: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Genes likely regulated by clade-specific pTRRs are enriched for distinctive functions

310

450

91

173

Millions ofyears

Percentage of pTRRs that align no further than:

Primates: 3%

Eutherians: 71%

Marsupials: 21%

Tetrapods: 4%

Vertebrates: 1%

David King

Enriched GO categories

q-value for FDR

Immune response

Protease inhibition

Mitosis and cell cycle

Transcriptional regulation

0.0006

0.0005

0.0005

0.004

0.012Ion transport

King, Taylor, et al. (2007) Genome Research

Page 39: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Conservation of TFBSs between species

• Servers to find conserved matches to factor binding sites– Comparative genomics at Lawrence Livermore http://www.dcode.org/

• zPicture and rVista• Mulan and multiTF• ECR browser

– Consite http://mordor.cgb.ki.se/cgi-bin/CONSITE/consite• Conserved TFBSs are available for some assemblies of human genome at UCSC Genome

Browser

Binding site for GATA-1

Page 40: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Clusters of conserved TFBSs: PReMods

Blanchette et al. (2006) Genome Research

http://genomequebec.mcgill.ca/PReMod/

Page 41: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

ESPERREvolutionary and Sequence Pattern

Extraction through Reduced Representation

Taylor et al. (2006) Genome Research 16:1596-1604

Page 42: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

ESPERR: a different approach

• Don’t assume a database of known binding motifs • Don’t assume strict conservation of the important

sequence signals • Instead, use alignments of validated examples to

learn sequence and evolutionary patterns that characterize a class of elements

• Machine learning approach to discriminate functional classes of DNA based on patterns in alignments

Page 43: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Regulatory potential (RP) to distinguish functional classes

Page 44: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Good performance of ESPERR for gene regulatory regions (RP)

-1

Page 45: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Predicted cis-Regulatory Modules (preCRMs) Around Erythroid Genes

- Gene is known to respond to the restoration of GATA-1 in an erythroid cell line - DNA segment with positive regulatory potential (RP) score - DNA segment contains at least one match to the GATA-1 binding site (WGATAR) that is preserved in multiple mammalian lineages Wang et al. (2006) Genome Research 16: 1480-1492

Page 46: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Examples of validated preCRMs

Page 47: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Validation status for 99 tested fragments

cc = consensus binding site motif is conserved and matches the consensus in multiple mammalian lineagescnc = binding site motif has a mismatch from the consensus but is conserved

Wang et al. (2006) Genome Research 16: 1480-1492

Page 48: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

preCRMs with High RP and Conserved Consensus GATA-1 Tend To Be Validated

Page 49: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Accurate prediction of a GATA-1 responsive enhancer for miR-144, 451

A

Dore L, Amigo JD et al. (2008) PNAS 105:3333-3338.

Page 50: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Constraint on a binding site motif in an occupied DNA segment strongly correlates with enhancement

Cheng et al. (2008) revised manuscript submitted

Page 51: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Comparative genomics signals suggestive of CRMs around T2D risk variants

Page 52: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Summary: Genomics of Gene Regulation

• Genetic determinants of variation in expression levels may contribute to complex traits - phenotype is not just determined by coding regions

• Biochemical features associated with cis-regulatory modules are being determined genome-wide for a range of cell types.

• These can be used to predict CRMs, but occupancy does not necessarily mean that the DNA is actively involved in regulation.

• Comparative genomics is a complementary approach to predicting CRMs.

• Evolutionary preservation of binding site motifs within regions containing other indicators of CRMs (e.g. regulatory potential or protein occupancy) is a good predictor of function.

Page 53: Genomics of Gene Regulation Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories Ross Hardison September 9,

Many thanks …

Alignments, chains, nets, browsers, ideas, …Webb Miller, Jim Kent, David Haussler

RP scores and other bioinformatic input:Francesca Chiaromonte, James Taylor

Funding from NIDDK, NHGRI, Huck Institutes of Life Sciences at PSU

Erythroid cell biology and biochemistry:Mitch Weiss, Gerd Blobel, Barry Paw

Yong Cheng, Demesew Abebe, Christine Dorman, …, Ying Zhang, David King, Swathi Ashok Kumar