Where Will They Strike Next? microRNA targeting tactics in the war on gene expression Jeff Reid...
-
Upload
dominick-greene -
Category
Documents
-
view
216 -
download
0
Transcript of Where Will They Strike Next? microRNA targeting tactics in the war on gene expression Jeff Reid...
Where Will They Strike Next?microRNA targeting tactics
in the war on gene expression
Jeff Reid
Miller “Lab”
Baylor College of Medicine
Outline
• Introduction to miRNAs
• The “ask Bartel” model for targeting
• Our proposed model
• Discuss predictions made by our model– All positions on the miRNA are not equal– A given miRNA’s targets share function
• Have a quantitative model that does not suffer from the arbitrariness of ask Bartel
Plant microRNAs
• This talk is about plant miRNAs– Animal miRNAs different, more complicated– If you want to know more about them ask Tuan Tran!
• What is a microRNA (miRNA)?• ~21nt single-stranded non-coding RNAs• Processed from stem/loop precursors• Bind to mRNA in the cytoplasm• Regulate genes
– Often relevant to development
microRNA biogenesis (conventional wisdom)
1.miRNA gene is transcribed producing primary transcript
2.pri-miRNA processed by dicer…3. ..producing miRNA duplex4.duplex moves out of the nucleus5.helicase activity unzips duplex6.mature miRNA forms RNA-
induced silencing complex (RISC)7.RISC recognizes a target site8.Targeted mRNA is regulated
(mRNA cleavage or translational repression)
Figure from Bartel, D.P. (2004). Cell 116, 281-297.
Target “Acquisition”• How does the RISC identify target sites?• Based solely on mature miRNA sequence
– Consistent all with known examples– “just” string manipulation– With that in mind, consider a simple model…
• Targets have small “mismatch score” – M– Count non-WC pairs in miRNA/target duplex– Score is independent of position
A CG
CU
CC
CC CC
UUUU
U AA AAG
GG GGGG
U A G AC
target site
RISC
mRNA
M = 2
5’3’
5’3’
Complementarity Model*• Look for 21-mers (mRNA sequence) with M < 4
– Find targets…– mir172a1 [AP2]: At5g60120(1) At4g36920(2)
At2g28550(3) At5g67180(3)
At5g12900(3) – …turns out most targets of a given miRNA are in
genes which share a common function
• There are some ask Bartel elements to the model– M = 4 targets sharing function included case-by-case– Single bulges are sometimes allowed (mir162, mir163)
– Model specificity is problematic…*Rhoades, et. al. (2002) Cell 110, 513-520.
APETALA2 transcription factor
Selectivity and Specificity
• Selectivity (false negatives)– Bartel’s model finds “everything” for M < 5
• Putative targets from this model (most confirmed by experiment) define the target population
• Specificity (false positives)– Bartel’s model is problematic
•M < 5 includes many false positives•M < 4 and qualitative ask Bartel elements are
necessary for model specificity
• Our goal is to develop a quantitative model
Position Dependent Model
• Ask Bartel has been spectacularly successful• Build on existing model & make it quantitative• No a priori justification of position-independence
– assumed by the ask Bartel model
• Extend to a position-dependent mismatch model– Assign mismatch at position i weight i
• For ask Bartel modeli = 1
• Quantify target “strength” with binding probability– pt is the probability of finding the miRNA bound to
target site t in the mRNA population
• Now “mismatch score” is position-dependent
• Boltzmann factor gives binding probability
• Quantitative model built, but how to find i?
Boltzmann factors
L
itmit ii
tmE1
, )1(),,( m = miRNA* sequence
t = target site sequence = mismatch parameters
g
gmEgeβmZ ),,(),(
),(),,(
),,(
βmZ
eβtmp
tmE
t
t
A CG
CU
CC
CC CC
UUUU
U AA AAG
GG GGGG
U A G AC
RISC
mRNA1 2 3 4 5
5’3’
5’3’
Model Comparison• Follow DNA binding protein example*
– Consider a thought experiment….• Mix many copies of the genome and N copies of the protein
and count the number of examples of protein bound to site t
– ft = nt / N
• If the model works ft and pt must agree!
• Determine i by looking for this agreement
– Maximize the probability that the data (ft) could have
come from the model (pt)…
*Brown, C.T., and Callan, C.G. (2004). Proc. Natl. Acad. Sci. 101, 2404.
Model Testing
• Probability of data arising from our position dependent mismatch model
• Obtain best match of model to data by maximizing the log probability
• Yields set of parameters i which maximizes the
probability of getting the data from our model
g gg
gtmpm ffP ),,(),,(
g ggg βmZβtmEβm )],(ln[),,(),,(
ffL
Optimization Cartoon
1
2
3
4
5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
UAGCA
measured fraction bound
f1 f2 f3 f4 f5 ... f24
0
• Maximize L to get i
f24p24
Optimization Cartoon
1 2 3 4 5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
0
f1 f2 f3 f4 f5 ... f24UAGCA
f24p24
• Maximize L to get i
measured fraction bound
Optimization Cartoon
1
2
3
4
5
Parameter Controls Inputs
miRNAs
data
Binding Probabilities
miRNA sequence
0
f1 f2 f3 f4 f5 ... f24UAGCA
f24p24
• Maximize L to get i
measured fraction bound
Model Testing
• Probability of data arising from our position dependent mismatch model
• Obtain best match of model to data by maximizing the log probability
• Yields set of parameters i which maximizes the
probability of getting the data from our model
g gg
gtmpm ffP ),,(),,(
g ggg βmZβtmEβm )],(ln[),,(),,(
ffL
Review
• Application of this procedure to miRNAs
• Optimize to get best agreement between
– position-dependent mismatch model: pg
– Ask Bartel complementarity model: fg
• Equal binding probability for each training target• Minimal binding to everything else (background)
– A contribution we made to the method– necessary to avoid overfitting
Multi-miRNA Optimization
• Given the amount of data we have • This method would fail on DNA binding proteins
• All miRNAs share the same machinery for target recognition (all form the RISC)– DNA binding protein recognition depends on
the each specific protein
• Solution to our problem– Simultaneously optimize for several miRNAs
Results - Parameters
• Multi-miRNA optimization of nine Arabidopsis miRNAs– 157b, 159b, 160b, 164a, 165b, 167b, 168a, 171, 172a1– A set of functionally diverse (21-mer) miRNAs
3’ 5’(i)
i
Position 14• Mismatch at position 14
– Has no effect on a target’s binding probability!
• Surprising and exciting because…• …this position is known to be special
– mir162a target• 1g01040 DEAD/DEAH box helicase
– Has a bulge at position 14
• This analysis did not include mir162a!• A provocative result…
14 151 213’
3’
5’
5’ target
mir162a
Results - Targets
• Training targets should have low energy– Found by ask Bartel model– Reside in genes which share majority function
• Targets in the background have high energy– Background targets with low energy are interesting
• We are particularly interested all the majority function targets for a given miRNA– Especially those which are not training targets
• Look at distributions of target energies– For each value of M
mir165b -- HD-Zip
majority functionnot training
targets!
training targetsmajority function
N(E)
N(E)
mir159b -- MYBN(E)
N(E)
Conclusions• Refined the qualitative complementarity model
– A quantitative model which is much less arbitrary• Whatever we get, we get – not “ask Miller”
– Majority function targets group together at low energy– Bartel finds most targets, our model finds all targets
• Appropriate experiments could falsify our model– How important is position 14?– Look at some specific ask Bartel targets
• Advanced technology of optimization– Resolution of the overfitting problem– Simultaneous optimization
Encoding of Networks
• Networks– miRNA families
• A single target mRNA can be regulated by different miRNAs• And a single miRNA can regulate many different mRNAs
– Apparently an overlapping and probably redundant regulatory network
• Encoding– All this regulation encoded in mere text!– How is this encoded in the sequence?– Why is it encoded in this way?
Acknowledgements
• Miller Lab Posse– Jon Miller– Tuan Tran– Will Salerno– Gerald Lim
• Curtis Callan (Princeton)• Keck Center for Computational and
Structural Biology • BCM Biochemistry Department