MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

38
http://cs273a.stanford.edu [Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

description

MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean. Lecture 17. Ultraconservation. Sequence Conservation implies Function. (but which function/s?...). Comparative Genomics of Distantly related species:. functional region!. human. - PowerPoint PPT Presentation

Transcript of MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

Page 1: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 1

MW  11:00-12:15 in Redwood G19Profs: Serafim Batzoglou, Gill BejeranoTA: Cory McLean

Page 2: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 2

Lecture 17

Ultraconservation

Page 3: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 3

Sequence Conservation implies Function

(but which function/s?...)

human

anotherspecies

commonancestor

...CTTTGCGA-TGAGTAGCATCTACTATTT...

...ACGTGGGACTGACTA-CATCGACTACGA...

functional region!

Comparative Genomics of Distantly related species:

Note: the inverse “no conservation no function”is a much weaker statement given current knowledge

Page 4: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 4

How They Measuredall human-mouse alignmentshuman-mouse

ancestral repeats alignment

Difference: 5% of

Human Genome

[Mouse consortium, Nature 2002]

Page 5: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 5

HumanGenome:

3*109 letters

What They Found

[Science 2004 Breakthrough of the Year, 5th runner up]

1.5%known

function >50%junk

3x more functional DNA than known!

compare to other species

>5% human genome functional

~106 substrings do not code for protein

What do they do then?

Page 6: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 6

Ultraconservation

Page 7: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 7

Typical DNA Conservation levels

[Bejerano et al., Science 2004]

Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002]

Page 8: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 8

Ultraconserved Elements

[Bejerano et al., Science 2004]

fish

Page 9: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 9

Ultraconserved Elements

[Bejerano et al., Science 2004]

Page 10: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 10

No known function requires this much conservation

CDS ncRNA TFBS

*****

seq.

?

Page 11: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 11

Discovery can be fun

(compared to 4 results day before our ScienceExpress paper)

58,300

Page 12: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 12

Ultra Conserved (UC) Elements

Any contiguous block of human-mouse-rat alignment that isidentical in all three species, syntenic and ≥200 bp.

(p=10-22 of finding one such element in slowest rate 2.9G neutral DNA)

Turns out there are 481(!) such blocks of sizes 200-779bp (total of 126Kb) in all chromosomes but 21, Y.

*68 (61%) associatedwith alt-spliced exons

Page 13: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 13

Ultra Clusters

By joining two ultras into a cluster when separated <675Kb we obtained 89 clusters (each named after prominent gene/s)

Non exonic elements tend to congregate in clusters.Exonic elements are distributed more randomly

(tend to overlap an alt-spliced exon).

•exonic•non•possibly

Page 14: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 14

Genomic Distribution

•exonic•non•possibly

Page 15: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08]

Associate Ultras with Nearby Genes

481 UCEs identified

111 exonic 114 possibly exonic256 non-exonic

100 introns 156 intergenic

93 type I genes

225 type II genes

Page 16: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 16

Functional Annotation of Related Genes

0 10 20 30 40 50 60 70

Homeobox

DNA binding

Transcription reg.

RNA rec. motif

RNA splicing

RNA binding

Homeobox

DNA binding

Transcription reg.

RNA rec. motif

RNA splicing

RNA binding

No. of Genes

observed

expected

p = 1.3 x 10-19

p = 4.8 x 10-10

p = 5.6 x 10-18

p = 1.2 x 10-4

p = 9.1 x 10-6

p = 4.4 x 10-6Exon

icN

on E

xoni

c

86

7

218

ultrarelatedgenes

Exonic – RNA processing @ transcription regulationNon Exonic – regulation of transcription at DNA level

p = 0.39p = 0.44p = 0.77p = 2.5 x 10-20

p = 1.8 x 10-15

p = 7.8 x 10-15

Page 17: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 17

Non Exonic EnhancersThe non exonic ultras are often found in “gene deserts”(140 / 256 >10Kb from a known gene; 88 > 100Kb away).

The genes flanking these ultras are GO enriched for development (p = 10-6), particularly early developmental tasks (p = 2-7 x 10-5) suggesting distal enhancer roles.

Indeed, uc.351 is contained in a proven enhancer of DACH, located 225Kb upstream of it [Nobrega et al., Science 2003].

ultra conserved

Page 18: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 18

Zoom to uc.351, 225Kb upstream of DACH

ultra conserved

e.d 12.5e.d 12.5

Page 19: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 19

Validating Regulatory Elements

Reporter GeneMinimal PromoterConservedElement

transgenic

where is thewild type geneexpressed?

where is thereporter geneexpressed?

wild type

Page 20: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 20

Predictions and Proofs I

Based on public domain genome wide data:

ultraconservedelements

one subsetcodes protein

larger subsetdoes not

generate testable hypotheses for function from existing knowledge (2004)

[Pennacchio et al., Nature, 2006]

Page 21: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 21

The most conserved elements in the genome

If one concatenates ultras that are 1-2bp away, all 4 longest ultras (1044, 779, 731, 711bp) lie at 3’ end of POLA, near ARX, on chr X. The longest has 8 subs. and no indels (99.3%id) in chicken.

ARX is a homeobox gene involved in CNS development; defects in the gene are linked to epilepsy, mental retardation, autism and cerebral malformations.

ultra conserved

Page 22: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 22

Exonic uc’s correlate with Alt Splicing68 / 111 exonic ultras overlap an exon that shows clear evidence of being alternatively spliced.

Of the 59 GO annotated genes containing these elements:•24 are RNA binding (p = 8.1 x 10-18), including HNRPU, HNRPDL, HNRPH1, HNRPK, HNRPM.

•16 contain the RNA recognition motif (p = 9.1 x 10-19), including SFRS1, SFRS3, SFRS6, SFRS7, SFRS10, SFRS11.

These ultras often overlap a short exon that is retained only in some tissues.

Such is the explicitly studied, uc.33 in PTBP2 (length 312bp) overlaps a 34bp exon included in the mature transcript only in the brain [Rahman et al., Genomics 2004]

Page 23: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 23

Predictions and Proofs II

Based on public domain genome wide data:

ultraconservedelements

one subsetcodes protein

larger subsetdoes not

generate testable hypotheses for function from existing knowledge (2004)

[Pennacchio et al., Nature, 2006]

post transcriptional modification

Page 24: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08]

Splicing Microarrays

Page 25: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 25

Non-Sense Mediated Decay (NMD)

Page 26: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08]

[Ni et al., Genes & Dev 2007 ]

Of the 29 exonic ultraconserved elements in RNA-binding protein genes in human, 15 have human and/or mouse EST evidence suggesting the presence of AS-NMD in those regions.

Page 27: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08]

Model for Homeostatic Auto/Cross-regulation

[Ni et al., Genes & Dev 2007 ]

Page 28: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08]

= 100% conservation; associated with AS

= normal splice

= alternative splice

= retained intron

= normal stop codon

= premature termination codon

[Lareau et al., Nature 2007 ]

Similar Results

Page 29: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 29

Ultraconserved Non-coding RNA

[Calin et al, Cancer Cell, 2007]miRNA complementarity

About 1/3 of all ultras are expressed.Some are predicted to provide

microRNA targets.A few are anti-correlated with miRNA

expression levels.A few even act as oncogenes.

Page 30: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 30

Ultras are Under Strong Human Selection

Ultra DAF NonSyn DAF

[Katzman et al, Science ,2007]

Mutational cold spots? NO. Rare (new) mutations are introduced to the population.

Fierce purifying selection? YES. Very few of these get anywhere near fixation.

chimpA

humansA A G A

Page 31: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 31

Relation to Human Disease

[Derti et al., Nature Genetics, 2006]

SHH LMBR11Mb Limb

Lettice et al. HMG 2003 12: 1725-35

Page 32: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 32

Link to Disease Remains Elusive

Page 33: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 33

A Vertebrate Innovation?

Only 24 ultras can be partially traced back through direct sequence search to Ciona, C. Elegans or Drosophila.

All overlap coding exons from known genes (17 of which show clear evidence of alt-splicing inc. EIF2C1, DDX, BCL11A, EVI1, ZFR, CLK4, HNRPH1, GRIA3).

No intronic element in human was found to be coding in another species, although in some cases EST evidence indicates intron retention, presumably not as CDS.

Interestingly, ribosomal DNA (not part of the draft genomes) also harbors 6 ultraconserved elements in 18S, 28S.

def

defdef

Page 34: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 34

Genomic Distribution of Ultraconserved Elements

•exonic•non•possibly

Page 35: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 35

Computationally Driven Biology Simplified

casestudy

hypothesis

set

gene

raliz

esurvey

analy

ze

CSBIO

candidates

experiment

Page 36: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 36

What we do understand..Ultraconserved elements exist.They are maintained via strong on-going selection.It is a heterogeneous bunch:Some mediate splicingSome regulate gene expressionSome express ncRNAs(categories are not necessarily mutually exclusive)Knockouts of four regulatory ultras did not lead to severe phenotypes (similar protein cases: Pbx2, Nkx6.2, Gli1)

Page 37: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 37

What we don’t understand

Their functional density:How did they come to be?What is the selective advantage that lets them persist?

Page 38: MW  11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

http://cs273a.stanford.edu [Bejerano Aut07/08] 38

Broad Guess

It’s about 3-D structure.Observation: rDNA (18S, 28S) have ultraconserved stretches,multiple constraints in a complex 3-D structure, the Ribosome.

•ncRNA ultras: structure confers function•Splicing related ultras: the Splicosome•Cis-reg ultras: TSS 3-D proximity, chromatinand/or packed TFBS (Transcription factories?)

TSS