Setting Up a Replica Exchange Approach to Motif Discovery in DNA
description
Transcript of Setting Up a Replica Exchange Approach to Motif Discovery in DNA
![Page 1: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/1.jpg)
Setting Up a Replica Exchange Approach to Motif
Discovery in DNAJeffrey Goett
Advisor:
Professor Sengupta
![Page 2: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/2.jpg)
Protein Synthesis from DNA
Translation to
Proteins
TranscriptionRegulation
RNA polymerase
Binding
Proteins
geneBinding
sites
![Page 3: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/3.jpg)
Binding Sites
Sequence A:
code for protein
Binding protein “A”Binding Site
A - A - C - G - A - C -
T - T - G - C - T - G -
T - T - C - A - A - C - C - A -
A - A - G - T - T - G - G - T -
Sequence B:
code for protein
A - A - G - G - A - C -
T - T - C - C - T - G -
C - G - T - T - G - C - T - C -
G - C - A - A - C - G - A - G -
Binding protein “A”
![Page 4: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/4.jpg)
Discovering New Binding Motifs
…ATCG GCTCAG CTAG……CACT GATCAG AGTA……TTCC GCTCTG TAAC……GCTA GCTCAA ATCG…
€
A 0 .25 0 0 .75 .25
T 0 0 1 0 .25 0
C 0 .75 0 1 0 0
G 1 0 0 0 0 .75
Motif Probability Model
Motif: GCTCAG
![Page 5: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/5.jpg)
Modeling Motifs in Sequences
ATATCCGTA
AATCGAGAC
TCGATGTGT
CCACCTGCA
Assume:
Break into N sequences
Each sequence has one instance of motif embedded in random background
Variations of motif by point mutation, but not insertion or deletion
![Page 6: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/6.jpg)
Modeling Motifs in Sequences
AT ATC CGTA
A ATC GAGAC
TCG ATG TGT
CC ACC TGCA
€
p j,ρ =
A 1 0 0
T 0 .75 0
C 0 .25 .75
G 0 0 .25
The “Alignment:” Starting position of motif in each sequence
The “Motif Probability Distribution:” Probability of each letter occurring at each motif position
€
rx = {x1,x2, x3 ...xN }
€
ex : r x = {3, 2, 4, 3}
€
p j,ρ
![Page 7: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/7.jpg)
Scoring a Model
€
p(r x , p j ,ρ | S) =
p(S |r x ,p j ,ρ )p(
r x )p( p j ,ρ )
p(S )
€
p(r x , p j ,ρ | S) ⏐ → ⏐ log(
p(S |r x ,p j ,ρ )p( p j ,ρ )
p(S | pρ0 )
) + log(p(r x )) + log(p(S)) =
1N n j,ρ log(
ˆ p j ,ρ
pρ0 ) + constant
ρ ∈Σ
∑j=1
w
∑
€
p(S |r x , p j ,ρ ) :
“Log-likelihood” score:
ATATCCGTA
AATCGAGAC
TCGATGTGT
CCACCTGCA
p1,T p2,A p3,T
p1,A p2,G p3,A
p1,A p2,T p3,G
p1,C p2,C p3,A
pC pC pG pT pA0 0 0 0 0
pA0
pA pA pT pC pG0 0 0 0 0 pC
0
pT pC pG0 0 0 pT pG pT
0 0 0
pC pC pT pG pC pA0 0 0 0 0 0
![Page 8: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/8.jpg)
Example Models
A TAT CCGTA
AAT CGA GAC
TCGATG TGT
CC ACC TGCA
€
p j,ρ =
A 1 0 0
T 0 .75 0
C 0 .25 .75
G 0 0 .25
€
rx = {3, 2, 4, 3}
AT ATC CGTA
A ATC GAGAC
TCG ATG TGT
CC ACC TGCA
€
L(S |r x , p j ,ρ , p j
0) ≈ 3
€
p j,ρ =
A .25 .25 .25
T .5 0 .5
C .25 .25 .25
G 0 .5 0
€
rx = {2, 4, 7, 3}
€
L(S |r x , p j ,ρ , p j
0) ≈1.1
![Page 9: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/9.jpg)
The Gibbs SamplerWe want to find
€
pj, ρ
€
p( p j,ρ | S)that maximizes
€
pj, ρ
€
rx
€
L( p j,ρ ,r x | S)
€
p( p j,ρ | S) = p( p j,ρ∫ ,r x | S)d
r x
![Page 10: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/10.jpg)
The Gibbs Sampler
€
pj, ρ
€
p( p j,ρ ,r x | S)
€
pj, ρ
€
rx
€
pj, ρ
€
rx
€
pj, ρ
€
rx
€
pj, ρ
€
rx
€
pj, ρ
€
rx
€
pj, ρ
€
rx
![Page 11: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/11.jpg)
The Gibbs Sampler
Times visited
€
pj, ρ
Over time, the frequency distribution approaches
€
p( p j,ρ | S)
![Page 12: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/12.jpg)
Biasing our search to these areas may discover the pj,ro values which maximize faster.
If we assume areas of local maximization contribute the most during “integration” to the local maximizations of
Optimization Technique
€
p( p j,ρ | S)
€
p( p j,ρ | S)
![Page 13: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/13.jpg)
Multiple Gibbs Samplers
By combining results from Gibbs Samplers begun at random positions, find maximizing sooner
€
p( p j,ρ | S)
![Page 14: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/14.jpg)
Replica Exchange/Parallel Tempering
“Low-sensitivity” samplers which “scout out area” periodically swap with “high-sensitivity” samplers good at focused searches if swap appears promising.
![Page 15: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/15.jpg)
Controlling Sensitivity
€
˜ p (x i | p j,ρ ,S) = eβL(xi ,p j ,ρ |S )Adjust the relative probability of sampling an xi by adjusting a new parameter in distribution:
Small
€
β Large
€
β
Search breadth of space Focused search of region
![Page 16: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/16.jpg)
Testing the Sensitivity
Running on randomly generated sequences to see motifs found, different sensitivity samplers converge to different scores.
Betas
21.9.1
![Page 17: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/17.jpg)
Predicting Convergence Score
Measure of Similarity:
magnetization
€
m = 1N si
i=1
N
∑
“Configuration Score:” energy
Ex: m=.5
€
E = −12 Jsis j
j=1j≠ i
N
∑i=1
N
∑m=.5
E=0
m=1
E=-6J
m=0
E=2J
m=0
E=2J
m=0
E=2J
€
p ≈ e−β 0
€
p ≈ eβ 6J
€
p ≈ e−β 2J
€
p ≈ e−β 2J
€
p ≈ e−β 2J
![Page 18: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/18.jpg)
Alignment Analogue
m=.77
E=-5J
m=1
E=-9J
m=.77
E=-5J
m=.77
E=-5J
€
p ≈ eβ 9J
€
p ≈ eβ 5J
€
p ≈ eβ 5J
A:
B:
C:
![Page 19: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/19.jpg)
Test Results
L < |alphabet|w
![Page 20: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/20.jpg)
Test Results
L > |alphabet|w
![Page 21: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/21.jpg)
Test Results
![Page 22: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/22.jpg)
Test Results
![Page 23: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/23.jpg)
Hidden Motifs: Gibbs SamplerBeta = .1 Beta = .5 Beta = .9
Beta = 1.3 Beta = 1.7 Beta = 2
W=5, l=500
![Page 24: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/24.jpg)
Hidden Motifs: Replica Exchange
Betas
.9
.93
.961
.8
1.5
![Page 25: Setting Up a Replica Exchange Approach to Motif Discovery in DNA](https://reader036.fdocuments.net/reader036/viewer/2022070410/56814656550346895db36f57/html5/thumbnails/25.jpg)