MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano
description
Transcript of MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano
![Page 1: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/1.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 1
MW 1:30-2:50pm in Clark S361* (behind Peet’s)Profs: Serafim Batzoglou & Gill BejeranoCAs: Karthik Jagadeesh & Johannes Birgmeier* Mostly: track on website/piazza
CS273A
Lecture 3: Non Coding Genes
![Page 2: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/2.jpg)
• http://cs273a.stanford.edu/– Lecture slides, problem sets, etc.
• Course communications via Piazza– Auditors please sign up too
• Second Tutorial this Fri 10/2. UCSC genome browser. We recommend bringing your laptop.
•We are starting to take attendance today.
http://cs273a.stanford.edu [BejeranoFall15/16] 2
Announcements
![Page 3: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/3.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 3
“non coding” RNAs (ncRNA)
![Page 4: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/4.jpg)
4
Central Dogma of Biology:
![Page 5: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/5.jpg)
5
Active forms of “non coding” RNA
reverse transcription(of eg retrogenes)
long non-coding RNA
microRNA
rRNA, snRNA, snoRNA
![Page 6: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/6.jpg)
6
What is ncRNA?• Non-coding RNA (ncRNA) is an RNA that functions without being
translated to a protein.• Known roles for ncRNAs:
– RNA catalyzes excision/ligation in introns.– RNA catalyzes the maturation of tRNA.– RNA catalyzes peptide bond formation.– RNA is a required subunit in telomerase.– RNA plays roles in immunity and development (RNAi).– RNA plays a role in dosage compensation.– RNA plays a role in carbon storage.– RNA is a major subunit in the SRP, which is important in protein trafficking.– RNA guides RNA modification.
– RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.
![Page 7: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/7.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 7
“non coding” RNAs (ncRNA) subtype:
Small structural RNAs (ssRNA)
![Page 8: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/8.jpg)
8
AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCA
ssRNA Folds into Secondary and 3D Structures
P 6b
P 6a
P 6
P 4
P 5P 5a
P 5b
P 5c
120
140
160
180
200
220
240
260
AAU
UGCGGG
AAA
GGGGUCA
ACAGCCG UUCAGU
ACCA
AGUCUCAGGGGAAA
CUUUGAGAU
GGCCUUGCA A A G G G U A U
GGUAA
UA AGC
UGACGGACA
UGGUCC
UAA
CCA CGCA
GC
CAA
GUCC
UAA G
UCAACAG
AU C U
UCUGUUGAU
AU
GGAU
GC
AGU
UC A
Cate, et al. (Cech & Doudna).(1996) Science 273:1678.
Waring & Davies. (1984) Gene 28: 277.
Computational Challenge: predict 2D & 3D structure from sequence (1D).
![Page 9: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/9.jpg)
For example, tRNA Activity
Complementation is a “brilliant” way for the Genome to “get things done”.
Replication, Transcription, Translation, …Compliment(Compliment(Identity)) = Identity
![Page 10: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/10.jpg)
tRNA Structure
![Page 11: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/11.jpg)
ssRNA structure rules• Canonical basepairs:
– Watson-Crick basepairs:• G ≡ C• A = U
– Wobble basepair:• G − U
• Stacks: continuous nested basepairs. (energetically favorable)
• Non-basepaired loops:– Hairpin loop– Bulge– Internal loop– Multiloop
• Pseudo-knots
![Page 12: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/12.jpg)
Ab initio RNA structure prediction: lots of Dynamic Programming
• Objective: Maximizing # of base pairings (Nussinov et al, 1978)
simple model:(i, j) = 1 iff GC|AU|GUfancier model:GC > AU > GU
![Page 13: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/13.jpg)
Dynamic programming Compute S(i,j) recursively (dynamic
programming)– Compares a sequence against itself in a dynamic
programming matrix
Three steps
![Page 14: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/14.jpg)
1. Initialization
Example:
GGGAAAUCC
G G G A A A U C CG 0G 0 0G 0 0A 0 0A 0 0A 0 0U 0 0C 0 0C 0 0
the main diagonal
the diagonal belowL: the length of input sequence
![Page 15: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/15.jpg)
2. RecursionG G G A A A U C C
G 0 0 0 0G 0 0 0 0 0G 0 0 0 0 0A 0 0 0 0 ?A 0 0 0 1A 0 0 1 1U 0 0 0 0C 0 0 0C 0 0
Fill up the table (DP matrix) -- diagonal by diagonal
j
i
![Page 16: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/16.jpg)
3. Traceback
G G G A A A U C CG 0 0 0 0 0 0 1 2 3G 0 0 0 0 0 0 1 2 3G 0 0 0 0 0 1 2 2A 0 0 0 0 1 1 1A 0 0 0 1 1 1A 0 0 1 1 1U 0 0 0 0C 0 0 0C 0 0
The structure is:
What are the other “optimal” structures?
![Page 17: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/17.jpg)
Pseudoknots drastically increase computational complexity
![Page 18: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/18.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 18
Objective: Minimize Secondary StructureFree Energy at 37 OC:
C G U U U G G GUU
CACAAACG
-2 .0
-2 .1
-0 .9
-0 .9
-1 .8
-1 .6
+ 5 .0
G helix = GC GG C
+ G
G UC A
+ 2G
U UA A
+ G
U GA C
=
-2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol
Ghairpin loop = G
initiation (6 nucleotides) + Gmismatch
G GC A
=
5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol
Gtotal = G
hairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol
Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.
Instead of (i, j), measure and sum energies:
![Page 19: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/19.jpg)
Zuker’s algorithm MFOLD: computing loop dependent energies
![Page 20: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/20.jpg)
Bafna 1
RNA structure• Base-pairing defines a secondary
structure. The base-pairing is usually non-crossing.
• In this example the pseudo-knot makes for an exception.
![Page 21: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/21.jpg)
S aSu
S cSg
S gSc
S uSa
S a
S c
S g
S u
S SS
1. A CFG
S aSu
acSgu
accSggu
accuSaggu
accuSSaggu
accugScSaggu
accuggSccSaggu
accuggaccSaggu
accuggacccSgaggu
accuggacccuSagaggu
accuggacccuuagaggu
2. A derivation of “accuggacccuuagaggu”3. Corresponding structure
Stochastic context-free grammar
![Page 22: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/22.jpg)
ssRNA transcription• ssRNAs like tRNAs are usually encoded by short
“non coding” genes, that transcribe independently.• Found in both the UCSC “known genes” track, and
as a subtrack of the RepeatMasker track
http://cs273a.stanford.edu [BejeranoFall15/16] 22
![Page 23: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/23.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 23
“non coding” RNAs (ncRNA) subtype:
microRNAs (miRNA/miR)
![Page 24: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/24.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 24
MicroRNA (miR)
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
![Page 25: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/25.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 25
MicroRNA Transcription
mRNA
![Page 26: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/26.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 26
MicroRNA Transcription
mRNA
![Page 27: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/27.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 27
MicroRNA (miR)
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
Computational challenges:Predict miRs.Predict miR targets.
![Page 28: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/28.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 28
MicroRNA Therapeutics
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
Idea: bolster/inhibit miR production tobroadly modulate protein productionHope: “right” the good guys and/or “wrong” the bad guysChallenge: and not vice versa.
![Page 29: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/29.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 29
Other Non Coding Transcripts
![Page 30: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/30.jpg)
lncRNAs (long non coding RNAs)
http://cs273a.stanford.edu [BejeranoFall15/16] 30
Don’t seem to fold into clear structures (or only a sub-region does).Diverse roles only now starting to be understood.Example: bind splice site, cause intron retention.
Challenges: 1) detect and 2) predict function computationally
![Page 31: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/31.jpg)
X chromosome inactivation in mammals
X X X Y
X
Dosage compensation
![Page 32: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/32.jpg)
Xist – X inactive-specific transcript
Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67
![Page 33: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/33.jpg)
System output measurementsWe can measure non/coding gene expression!(just remember – it is context dependent)1. First generation mRNA (cDNA) and EST sequencing:
http://cs273a.stanford.edu [BejeranoFall15/16] 33
In UCSC Browser:
![Page 34: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/34.jpg)
2. Gene Expression Microarrays (“chips”)
http://cs273a.stanford.edu [BejeranoFall15/16] 34
![Page 35: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/35.jpg)
3. RNA-seq“Next” (2nd) generation sequencing.
http://cs273a.stanford.edu [BejeranoFall15/16] 35
![Page 36: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/36.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 36
Transcripts, transcripts everywhere
Human Genome
Transcribed* (Tx)
Tx from both strands*
* True size of set unknown
![Page 37: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/37.jpg)
Or are they?
http://cs273a.stanford.edu [BejeranoFall15/16] 37
![Page 38: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/38.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 38
The million dollar question
Human Genome
Transcribed* (Tx)
Tx from both strands*
* True size of set unknown
Leaky tx?
Functional?
![Page 39: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681637e550346895dd45e8e/html5/thumbnails/39.jpg)
http://cs273a.stanford.edu [BejeranoFall15/16] 39
Gene Finding II: technology dependenceChallenge:“Find the genes, the whole genes, and nothing but the genes”
We started out trying to predict genes directly from the genome.When you measure gene expression, the challenge changes:
Now you want to build gene models from your observations.These are both technology dependent challenges.The hybrid: what we measure is a tiny fraction of the space-time state
space for cells in our body. We want to generalize from measured states and improve our predictions for the full compendium of states.