MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean
description
Transcript of MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean
http://cs273a.stanford.edu [Bejerano Aut07/08] 1
MW 11:00-12:15 in Redwood G19Profs: Serafim Batzoglou, Gill BejeranoTA: Cory McLean
http://cs273a.stanford.edu [Bejerano Aut07/08] 2
Lecture 17
Ultraconservation
http://cs273a.stanford.edu [Bejerano Aut07/08] 3
Sequence Conservation implies Function
(but which function/s?...)
human
anotherspecies
commonancestor
...CTTTGCGA-TGAGTAGCATCTACTATTT...
...ACGTGGGACTGACTA-CATCGACTACGA...
functional region!
Comparative Genomics of Distantly related species:
Note: the inverse “no conservation no function”is a much weaker statement given current knowledge
http://cs273a.stanford.edu [Bejerano Aut07/08] 4
How They Measuredall human-mouse alignmentshuman-mouse
ancestral repeats alignment
Difference: 5% of
Human Genome
[Mouse consortium, Nature 2002]
http://cs273a.stanford.edu [Bejerano Aut07/08] 5
HumanGenome:
3*109 letters
What They Found
[Science 2004 Breakthrough of the Year, 5th runner up]
1.5%known
function >50%junk
3x more functional DNA than known!
compare to other species
>5% human genome functional
~106 substrings do not code for protein
What do they do then?
http://cs273a.stanford.edu [Bejerano Aut07/08] 6
Ultraconservation
http://cs273a.stanford.edu [Bejerano Aut07/08] 7
Typical DNA Conservation levels
[Bejerano et al., Science 2004]
Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002]
http://cs273a.stanford.edu [Bejerano Aut07/08] 8
Ultraconserved Elements
[Bejerano et al., Science 2004]
fish
http://cs273a.stanford.edu [Bejerano Aut07/08] 9
Ultraconserved Elements
[Bejerano et al., Science 2004]
http://cs273a.stanford.edu [Bejerano Aut07/08] 10
No known function requires this much conservation
CDS ncRNA TFBS
*****
seq.
?
http://cs273a.stanford.edu [Bejerano Aut07/08] 11
Discovery can be fun
(compared to 4 results day before our ScienceExpress paper)
58,300
http://cs273a.stanford.edu [Bejerano Aut07/08] 12
Ultra Conserved (UC) Elements
Any contiguous block of human-mouse-rat alignment that isidentical in all three species, syntenic and ≥200 bp.
(p=10-22 of finding one such element in slowest rate 2.9G neutral DNA)
Turns out there are 481(!) such blocks of sizes 200-779bp (total of 126Kb) in all chromosomes but 21, Y.
*68 (61%) associatedwith alt-spliced exons
http://cs273a.stanford.edu [Bejerano Aut07/08] 13
Ultra Clusters
By joining two ultras into a cluster when separated <675Kb we obtained 89 clusters (each named after prominent gene/s)
Non exonic elements tend to congregate in clusters.Exonic elements are distributed more randomly
(tend to overlap an alt-spliced exon).
•exonic•non•possibly
http://cs273a.stanford.edu [Bejerano Aut07/08] 14
Genomic Distribution
•exonic•non•possibly
http://cs273a.stanford.edu [Bejerano Aut07/08]
Associate Ultras with Nearby Genes
481 UCEs identified
111 exonic 114 possibly exonic256 non-exonic
100 introns 156 intergenic
93 type I genes
225 type II genes
http://cs273a.stanford.edu [Bejerano Aut07/08] 16
Functional Annotation of Related Genes
0 10 20 30 40 50 60 70
Homeobox
DNA binding
Transcription reg.
RNA rec. motif
RNA splicing
RNA binding
Homeobox
DNA binding
Transcription reg.
RNA rec. motif
RNA splicing
RNA binding
No. of Genes
observed
expected
p = 1.3 x 10-19
p = 4.8 x 10-10
p = 5.6 x 10-18
p = 1.2 x 10-4
p = 9.1 x 10-6
p = 4.4 x 10-6Exon
icN
on E
xoni
c
86
7
218
ultrarelatedgenes
Exonic – RNA processing @ transcription regulationNon Exonic – regulation of transcription at DNA level
p = 0.39p = 0.44p = 0.77p = 2.5 x 10-20
p = 1.8 x 10-15
p = 7.8 x 10-15
http://cs273a.stanford.edu [Bejerano Aut07/08] 17
Non Exonic EnhancersThe non exonic ultras are often found in “gene deserts”(140 / 256 >10Kb from a known gene; 88 > 100Kb away).
The genes flanking these ultras are GO enriched for development (p = 10-6), particularly early developmental tasks (p = 2-7 x 10-5) suggesting distal enhancer roles.
Indeed, uc.351 is contained in a proven enhancer of DACH, located 225Kb upstream of it [Nobrega et al., Science 2003].
ultra conserved
http://cs273a.stanford.edu [Bejerano Aut07/08] 18
Zoom to uc.351, 225Kb upstream of DACH
ultra conserved
e.d 12.5e.d 12.5
http://cs273a.stanford.edu [Bejerano Aut07/08] 19
Validating Regulatory Elements
Reporter GeneMinimal PromoterConservedElement
transgenic
where is thewild type geneexpressed?
where is thereporter geneexpressed?
wild type
http://cs273a.stanford.edu [Bejerano Aut07/08] 20
Predictions and Proofs I
Based on public domain genome wide data:
ultraconservedelements
one subsetcodes protein
larger subsetdoes not
generate testable hypotheses for function from existing knowledge (2004)
[Pennacchio et al., Nature, 2006]
http://cs273a.stanford.edu [Bejerano Aut07/08] 21
The most conserved elements in the genome
If one concatenates ultras that are 1-2bp away, all 4 longest ultras (1044, 779, 731, 711bp) lie at 3’ end of POLA, near ARX, on chr X. The longest has 8 subs. and no indels (99.3%id) in chicken.
ARX is a homeobox gene involved in CNS development; defects in the gene are linked to epilepsy, mental retardation, autism and cerebral malformations.
ultra conserved
http://cs273a.stanford.edu [Bejerano Aut07/08] 22
Exonic uc’s correlate with Alt Splicing68 / 111 exonic ultras overlap an exon that shows clear evidence of being alternatively spliced.
Of the 59 GO annotated genes containing these elements:•24 are RNA binding (p = 8.1 x 10-18), including HNRPU, HNRPDL, HNRPH1, HNRPK, HNRPM.
•16 contain the RNA recognition motif (p = 9.1 x 10-19), including SFRS1, SFRS3, SFRS6, SFRS7, SFRS10, SFRS11.
These ultras often overlap a short exon that is retained only in some tissues.
Such is the explicitly studied, uc.33 in PTBP2 (length 312bp) overlaps a 34bp exon included in the mature transcript only in the brain [Rahman et al., Genomics 2004]
http://cs273a.stanford.edu [Bejerano Aut07/08] 23
Predictions and Proofs II
Based on public domain genome wide data:
ultraconservedelements
one subsetcodes protein
larger subsetdoes not
generate testable hypotheses for function from existing knowledge (2004)
[Pennacchio et al., Nature, 2006]
post transcriptional modification
http://cs273a.stanford.edu [Bejerano Aut07/08]
Splicing Microarrays
http://cs273a.stanford.edu [Bejerano Aut07/08] 25
Non-Sense Mediated Decay (NMD)
http://cs273a.stanford.edu [Bejerano Aut07/08]
[Ni et al., Genes & Dev 2007 ]
Of the 29 exonic ultraconserved elements in RNA-binding protein genes in human, 15 have human and/or mouse EST evidence suggesting the presence of AS-NMD in those regions.
http://cs273a.stanford.edu [Bejerano Aut07/08]
Model for Homeostatic Auto/Cross-regulation
[Ni et al., Genes & Dev 2007 ]
http://cs273a.stanford.edu [Bejerano Aut07/08]
= 100% conservation; associated with AS
= normal splice
= alternative splice
= retained intron
= normal stop codon
= premature termination codon
[Lareau et al., Nature 2007 ]
Similar Results
http://cs273a.stanford.edu [Bejerano Aut07/08] 29
Ultraconserved Non-coding RNA
[Calin et al, Cancer Cell, 2007]miRNA complementarity
About 1/3 of all ultras are expressed.Some are predicted to provide
microRNA targets.A few are anti-correlated with miRNA
expression levels.A few even act as oncogenes.
http://cs273a.stanford.edu [Bejerano Aut07/08] 30
Ultras are Under Strong Human Selection
Ultra DAF NonSyn DAF
[Katzman et al, Science ,2007]
Mutational cold spots? NO. Rare (new) mutations are introduced to the population.
Fierce purifying selection? YES. Very few of these get anywhere near fixation.
chimpA
humansA A G A
http://cs273a.stanford.edu [Bejerano Aut07/08] 31
Relation to Human Disease
[Derti et al., Nature Genetics, 2006]
SHH LMBR11Mb Limb
Lettice et al. HMG 2003 12: 1725-35
http://cs273a.stanford.edu [Bejerano Aut07/08] 32
Link to Disease Remains Elusive
http://cs273a.stanford.edu [Bejerano Aut07/08] 33
A Vertebrate Innovation?
Only 24 ultras can be partially traced back through direct sequence search to Ciona, C. Elegans or Drosophila.
All overlap coding exons from known genes (17 of which show clear evidence of alt-splicing inc. EIF2C1, DDX, BCL11A, EVI1, ZFR, CLK4, HNRPH1, GRIA3).
No intronic element in human was found to be coding in another species, although in some cases EST evidence indicates intron retention, presumably not as CDS.
Interestingly, ribosomal DNA (not part of the draft genomes) also harbors 6 ultraconserved elements in 18S, 28S.
def
defdef
http://cs273a.stanford.edu [Bejerano Aut07/08] 34
Genomic Distribution of Ultraconserved Elements
•exonic•non•possibly
http://cs273a.stanford.edu [Bejerano Aut07/08] 35
Computationally Driven Biology Simplified
casestudy
hypothesis
set
gene
raliz
esurvey
analy
ze
CSBIO
candidates
experiment
http://cs273a.stanford.edu [Bejerano Aut07/08] 36
What we do understand..Ultraconserved elements exist.They are maintained via strong on-going selection.It is a heterogeneous bunch:Some mediate splicingSome regulate gene expressionSome express ncRNAs(categories are not necessarily mutually exclusive)Knockouts of four regulatory ultras did not lead to severe phenotypes (similar protein cases: Pbx2, Nkx6.2, Gli1)
http://cs273a.stanford.edu [Bejerano Aut07/08] 37
What we don’t understand
Their functional density:How did they come to be?What is the selective advantage that lets them persist?
http://cs273a.stanford.edu [Bejerano Aut07/08] 38
Broad Guess
It’s about 3-D structure.Observation: rDNA (18S, 28S) have ultraconserved stretches,multiple constraints in a complex 3-D structure, the Ribosome.
•ncRNA ultras: structure confers function•Splicing related ultras: the Splicosome•Cis-reg ultras: TSS 3-D proximity, chromatinand/or packed TFBS (Transcription factories?)
TSS