Algorithms for Bioinformatics · Algorithms for Bioinformatics Lecture 2: Exhaustive search and...

Algorithms for Bioinformatics

Lecture 2: Exhaustive search and randomized algorithms for motifdiscovery

13.9.2017

These slides are based on previous years’ slides byAlexandru Tomescu, Leena Salmela and Veli Mäkinen

These slides use material from http://bix.ucsd.edu/bioalgorithms/slides.php

http://bix.ucsd.edu/bioalgorithms/slides.php

Outline

DNA Sequence Motifs

Motif Finding Problem

Brute Force Motif Finding

Median String Problem

Greedy Motif Search

Randomized Algorithms

2 / 59

DNA Sequence Motif

I A DNA sequence motif is a recurring pattern in DNA presumed tohave some biological funtion.

I Often the occurences of the motif in DNA are binding sites whereproteins such as transcription factors can attach.

3 / 59

Transcription Factors

I Every gene contains a regulatory region typically stretching 100-1000bp upstream of the transcriptional start site.

I Transcription factor (TF) is a regulatory protein that can bind to aregulatory region and promote or inhibit the transcription of the gene.

I A single TF typically regulates multiple (co-regulated) genes.

4 / 59

Transcription Factor Binding Sites and Motifs

I Transcription Factor Binding Site (TFBS) is the DNA location wherethe TF binds to.

I May be located anywhere in the regulatory region.

I A TF binds to a specific kind of DNA sequence.I Similar but not identical in different co-regulated genes.I Some positions are more important and thus conserved while other

positions allow variation.

I Motif is a pattern that describes the TFBS sequences of a given TF.

5 / 59

TBFSs

gene

gene

gene

gene

gene

ATCCCG

ATCCCG

GGTCCT

AT GCCG

AT GCC C

6 / 59

Motif Representations

I Set of sequences

TGGGGGATGAGAGATGGGGGATGAGAGATGAGGGA

I Consensus sequence TGAGGGA

I Profile matrix

A 0 0 3 0 2 0 5C 0 0 0 0 0 0 0G 0 5 2 5 3 5 0T 5 0 0 0 0 0 0

I Motif logo

7 / 59

Outline

DNA Sequence Motifs




Greedy Motif Search


8 / 59

Identifying Motifs

I Identify a set of co-regulated genes.

I Extract their regulatory regions.

I Find a pattern shared by all of the regions: the motif.

I The motif can be used for finding other (previously unknown) genesregulated by the same TF.

cctgatagacgctatctggctatccaGgtacTtaggtcctctgtgcgaatctatgcgtttccaaccat

agtactggtgtacatttgatCcAtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc

aaacgtTAgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt

agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtCcAtataca

ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaCcgtacgGc

9 / 59

The Motif Finding Problem: Formulation

I Goal: Find the best motif in a set of DNA sequences

I Input: A collection of t strings Dna and an integer kI Output: The collection of t k-mers, one from each input string,

forming the best motif.I We will later consider what is the “best” motif.

k = 8

t = 5

cctgatagacgctatctggctatccaGgtacTtaggtcctctgtgcgaatctatgcgtttccaaccat

agtactggtgtacatttgatCcAtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc

aaacgtTAgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt

agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtCcAtataca

ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaCcgtacgGc︸︷︷︸n = 69

10 / 59

Planted (k , d)-Motif Problem

Artificially generated problem instance

1. Generate a random k-mer M (the motif)

2. Generate t random sequences of length n

3. Implant M into a random position in each of the t sequences

4. In each implanted copy of M, make d random mutations

11 / 59

Example: Generate sequences

gatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtac

cctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccg

ttggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggag

gcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttata

tgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgca

gcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttca

ttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgt

ttggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaa

caacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagc

12 / 59

Example: Implant motif AAAAAAAAGGGGGGG

gatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGG

cctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccg

ttggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggag

gcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttata

tgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgca

gcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttca

ttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgt

ttggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaa

caacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGG

13 / 59

Where are the implanted motifs?

gatactgataaaaaaaagggggggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataaaaaaaaaggggggg

cctgggatgacttaaaaaaaagggggggtgctctcccgatttttgaatatgtaggatcattcgccagggtccg

ttggatgaaaaaaaagggggggtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggag

gcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaataaaaaaaagggggggcttata

tgttcttgtgaatggatttaaaaaaaaggggggggaccgcttggcgcacccaaattcagtgtgggcgagcgca

gcccttgttagaggcccccgtaaaaaaaagggggggcaattatgagagagctaatctatcgcgtgcgtgttca

ttaaaaaaaagggggggctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgt

ttggctaaaagcccaacttgacaaatggaagatagaatccttgcataaaaaaaagggggggaccgaaagggaa

caacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttaaaaaaaaggggggg

14 / 59

Example: Add four mutations per motif occurrence

gatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGG

cctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccg

ttggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggag

gcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttata

tgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgca

gcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttca

ttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgt

ttggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaa

caacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGG

15 / 59

Where are the implanted motifs???

gatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggg

cctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccg

ttggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggag

gcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttata

tgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgca

gcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttca

ttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgt

ttggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaa

caacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgg

16 / 59

Why finding (15,4)-motifs is hard?

gatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc

tttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGG

I Occurrences can differ in 8 positionsAgAAgAAAGGttGGG

..|..|||.|..|||

cAAtAAAAcGGcGGG

I Difficult to recognize similarity even if you know the positions

17 / 59

Planted (k , d)-Motif Problem

I Common benchmark or challenge problem for motif finding

I Adjustable difficulty

I For example, the planted (15, 4)-motif problem is difficult but notimpossible

Real data is noisier and more complex

I No upper bound d on mutation

I Some positions have little variation, others have a lot

18 / 59

Outline

DNA Sequence Motifs




Greedy Motif Search


19 / 59

Scoring Motifs

I The output of motif finding is a set of k-mers, one from each inputsequence

I In a desired output, the k-mers are as similar to each other aspossible, representing a coherent motif

I How to measure the goodness of the motif?

20 / 59

Consensus Motif and Score

a G g t a c T tC c A t a c g t

Motif occurrences a c g t T A g ta c g t C c A tC c g t a c g G

A 3 0 1 0 3 1 1 0Counts C 2 4 0 0 1 4 0 0

G 0 1 4 0 0 0 3 1T 0 0 0 5 1 0 1 4

Consensus A C G T A C G T

Score 2+1+1+0+2+1+2+1=10

I t motifs occurrences,one from each sequence

I Count symbols in eachcolumn

I Consensus formed bymost frequent symbols

I Score is the number ofdifferences fromconsensus

21 / 59

Consensus Motif and Score

I Good model for planted motif problem

I In real data, mutations should be expensive in conserved positions butmuch cheaper in variable positions

22 / 59

BruteForceMotifSearch

I Compute the score for every possible combination of motifs

I Output the set of motifs with the smallest score

23 / 59

Running Time of BruteForceMotifSearch

I (n − k + 1) different k-mers in each sequenceI (n − k + 1)t different combinations of motifsI kt time to compute score for one set of motifs

I kt(n − k + 1)t = O(ktnt) time in totalI E.g. for t = 20, n = 600, k = 15 we must perform approximately

1058 computations — it would take billions of years

24 / 59

Outline

DNA Sequence Motifs




Greedy Motif Search


25 / 59

The Median String Problem

I Given a set of t DNA sequences find a pattern that appears in all tsequences with the minimum number of total mismatches

I This pattern will be the shared motif

26 / 59

Hamming Distance

I The Hamming distance d(v ,w) the number of mismatches betweentwo k-mers v and w

I For example:d(AAAAAA,ACAAAC) = 2

27 / 59

Computing Score

a G g t a c T t 2C c A t a c g t 2

Motifs a c g t T A g t 2a c g t C c A t 2C c g t a c g G 2

Score(Motifs) 2+1+1+0+2+1+2+1=10

Consensus(Motifs) A C G T A C G T

I Score is the number ofmismatching symbols

I Can be computedcolumn by column orrow by row

I Row sums are Hammingdistances

28 / 59

Computing Score

Define

I Motifs = {Motif 1,Motif 2, . . . ,Motif t}I d(Pattern,Motifs) =

∑ti=1 d(Pattern,Motif i )

Then

I Score(Motifs) = d(Consensus(Motifs),Motifs)

29 / 59

Best Match Distance

I Assume |String | > |Pattern| = kI The best match distance d(Pattern, String) is the smallest Hamming

distance d(Pattern,Motif ) between Pattern and any k-mer Motif inString

I Example: d(ACGTACGT, gcaaaAGGTACTTccaa) = 2

Generalize for a set of strings

I Dna = {Dna1,Dna2, . . . ,Dnat}I d(Pattern,Dna) =

∑ti=1 d(Pattern,Dnai )

30 / 59

The Median String Problem

I Goal: Given a set of DNA sequences, find a median string

I Input: A collection of strings Dna and an integer k

I Output: A k-mer Pattern minimizing d(Pattern,Dna) among allk-mers Pattern

31 / 59

Motif Finding Problem = Median String Problem

I Input: Dna, k

I Motifs: output of Motif Finding (t k-mers)

I Pattern: output of Median String

I Score(Motifs) = d(Pattern,Dna)

Why?

I If Score(Motifs) < d(Pattern,Dna), we could chooseConsensus(Motifs) as a better Pattern

I If Score(Motifs) > d(Pattern,Dna), we could choose the best matchoccurrences of Pattern as better Motifs

32 / 59

Median String Algorithm

MedianString(DNA, k)

1: BestPattern← AAA...A2: for each k-mer Pattern from AAA...A to TTT...T do3: if d(Pattern,DNA) < d(BestPattern,DNA) then4: BestPattern← Pattern5: return BestPattern

33 / 59

Running Time of MedianString

I 4k different k-mers

I O(k · n) time to compute the best match distance to one stringI O(knt4k) time in total

I E.g. for t = 20, n = 600, k = 15 this is about about 1013

— still a lot but much less than 1058

I Reformulating a problem can help!

34 / 59

Outline

DNA Sequence Motifs




Greedy Motif Search


35 / 59

Search Space

I BruteForceMotifSearch and MedianString algorithms haveexponential running time

I This is because the search space, the set of possible solutions, isexponential

I nt different ways to choose MotifsI 4k different ways to choose Pattern

36 / 59

Exploring Only Part of Search Space

Branch and bound algorithms (covered in study groups)

I Avoid regions that cannot improve solution

I Still exponential in the worst case

Greedy algorithms

I Search the most promising directions

I No guarantee of finding an optimal solution

Randomized algorithms

I Add randomness to greedy search

I Avoids getting stuck in a dead end

37 / 59

Profile Matrixa G g t a c T tC c A t a c g t

Motifs a c g t T A g ta c g t C c A tC c g t a c g G

A 3 0 1 0 3 1 1 0Count(Motifs) C 2 4 0 0 1 4 0 0

G 0 1 4 0 0 0 3 1T 0 0 0 5 1 0 1 4

A .6 0 .2 0 .6 .2 .2 0Profile(Motifs) C .4 .8 0 0 .2 .8 0 0

G 0 .2 .8 0 0 0 .6 .2T 0 0 0 1 .2 0 .2 .8

Consensus(Motifs) A C G T A C G T

I Profile represents theprobability of eachnucleotide in eachposition

I More detailed summaryof the set of motifs thanconsensus

38 / 59

k-Mer Probabilities

A .6 0 .2 0 .6 .2 .2 0Profile C .4 .8 0 0 .2 .8 0 0

G 0 .2 .8 0 0 0 .6 .2T 0 0 0 1 .2 0 .2 .8

The probability of a k-mer given a profile

I Pr(AGGTACTT | Profile) = .6 · .2 · .8 · 1 · .6 · .8 · .2 · .8 = 0.0073728I Measure how well the k-mer matches the motif

I Does 0.0073728 imply a good match?

39 / 59

Profile-Most Probable k-mer

I The k-mer with the highest probability in a string

I Considered the best matching motif

I Example: The Profile-most probable 8-mer ingcaaaAGGTACTTccaa is AGGTACTT

I Pr(AGGTACTT | Profile) = 0.0073728

A .6 0 .2 0 .6 .2 .2 0Profile C .4 .8 0 0 .2 .8 0 0

G 0 .2 .8 0 0 0 .6 .2T 0 0 0 1 .2 0 .2 .8

40 / 59

Problem: Zero Probabilities

A .6 0 .2 0 .6 .2 .2 0Profile C .4 .8 0 0 .2 .8 0 0

G 0 .2 .8 0 0 0 .6 .2T 0 0 0 1 .2 0 .2 .8

Consensus A C G T A C G T

Pr(TCGTACGT | Profile) = 0 · .8 · .8 · 1 · .6 · .8 · .6 · .8 = 0I Only one mismatch compared to consensus

I Should this probability really be 0?

41 / 59

Pseudocounts

I Add one to all counts

I Avoids zero counts

A 3 0 1 0 3 1 1 0Count C 2 4 0 0 1 4 0 0

G 0 1 4 0 0 0 3 1T 0 0 0 5 1 0 1 4

A 4 1 2 1 4 2 2 1PseudoCount C 3 5 1 1 2 5 1 1

G 1 2 5 1 1 1 4 2T 1 1 1 6 2 1 2 5

42 / 59

Laplace’s Rule of Succession

I Use pseudocounts instead of counts to compute probabilities

I As if we had seen one occurrence of each symbol before the main data

A 4 1 2 1 4 2 2 1PseudoCount C 3 5 1 1 2 5 1 1

G 1 2 5 1 1 1 4 2T 1 1 1 6 2 1 2 5

A 4/9 1/9 2/9 1/9 4/9 2/9 2/9 1/9Profile C 3/9 5/9 1/9 1/9 2/9 5/9 1/9 1/9

G 1/9 2/9 5/9 1/9 1/9 1/9 4/9 2/9T 1/9 1/9 1/9 6/9 2/9 1/9 2/9 5/9

Pr(TCGTACGT | Profile) = 1/9 · 5/9 · 5/9 · 6/9 · 4/9 · 5/9 · 4/5 · 5/9 =60000/43046721 = 0.001393834

43 / 59

Greedy Motif Search

Solve Motif Finding problemI For each input string, choose the profile-most probable k-mer as the

motifI Greedy choice

I The profile is computed from motifs of earlier strings

I In first string, try all k-mers

44 / 59

Greedy Motif Search

GreedyMotifSearch(DNA, k , t)

1: BestMotifs ← the first k-mer of each string in DNA2: for each k-mer Motif in the first string in DNA do3: Motif 1 ← Motif4: for i ← 2 to t do5: form Profile from Motif 1, . . . ,Motif i−16: Motif i ← Profile-most probable k-mer in the i-th string in DNA7: Motifs ← Motif 1, . . . ,Motif t8: if Score(Motifs) < Score(BestMotifs) then9: BestMotifs ← Motifs

10: return BestMotifs

45 / 59

Performance of GreedyMotifSearch

I Running time O(n · t · k · (t + n))I polynomial not exponential

I May not find the best motifsI Early choices may lead to a wrong direction

46 / 59

Outline

DNA Sequence Motifs




Greedy Motif Search


47 / 59


I Make random choices during computation

I Use random number generator to “toss coins” or to “roll dice”

Why Randomness Helps?

I If a greedy algorithm fails for some input,it will always fail for that input

I If a randomized algorithm fails,it is unlikely to fail again in the same way

I We can run it many times and choose the best output

48 / 59

Monte Carlo and Las Vegas Algorithms

Monte Carlo algorithm

I May return an incorrect or inoptimal result

I Returns a correct answer or a good approximation with highprobability (if repeated sufficiently many times)

Las Vegas algorithm

I Always returns a correct/optimal result

I Very long runtime is possible but very unlikely

49 / 59

Turning Monte Carlo into Las Vegas

1. Run the Monte Carlo algorithm

2. If the result is good, stop. Otherwise return to Step 1.

I Requires that a correct or optimal result can be easily recognizedI This is not the case with the Motif Finding problem

I The following algoriths are Monte Carlo algorithms

50 / 59

Randomized Motif Search

Improving a set of motifsI Starting with a set of motifs (one from each sequence)

1. Compute a profile from the motifs2. Find the profile-most probable motif in each sequence

I The result is a potentially better set of motifs

I Repeat this as long as the set of motifs keeps improving

Randomization

I Start with a random set of motifs

51 / 59

Randomized Motif Search

RandomizedMotifSearch(DNA, k, t)

1: randomly select k-mers Motifs = (Motif 1, . . . ,Motif t), one from eachstring in DNA

2: BestMotifs ← Motifs3: while forever do4: Profile ← Profile(Motifs)5: for i ← 1 to t do6: Motif i ← Profile-most probable k-mer in the i-th string in DNA7: Motifs ← Motif 1, . . . ,Motif t8: if Score(Motifs) < Score(BestMotifs) then9: BestMotifs ← Motifs

10: else11: return BestMotifs

52 / 59

Why Randomized Motif Search Works?

I If Motifs is a random set, the expectation is that Profile(Motifs) hasabout the same probability 0.25 for each symbol in each column

I If Motifs contains some of the true motifs, it is not random andProfile(Motifs) reflects this

I Then Profile(Motifs) is more likely to match the other true motifs

I Thus we might need just a few of the true motifs in the initial set

I This will happen eventually if repeated many times (may requirethousands of repeats)

53 / 59

Gibbs Sampler

I Gibbs Sampler is a more refined randomized algorithmI Compared to Randomized Motif Search Gibbs Sampler is

I More cautiousI More randomized

54 / 59

Gibbs Sampler Is More Cautious

I Randomized Motif Search might get some true motifs right but throwthem all away in the next round

I Gibbs Sampler changes just one motif in each round

55 / 59

Gibbs Sampler Is More Randomized

I Randomized Motif Search uses randomness only in the beginningI Gibbs Sampler uses randomness in every round

I Choose a random motif to discardI Replace it with a random motif (from the same sequence)I The second random choice is biased: a profile-randomly generated

k-mer

56 / 59

Profile-Randomly Generated k-Mer

I Given a Profile and a String

1. Compute probabilities of all k-mers in String2. Choose one of the k-mers randomly but biased by the probabilities

I The probabilities with respect to Profile do not usually sum up to 1and have to be normalized: Replace p1, . . . , pn with p1/C , . . . p1/C ,where C =

∑ni=1 pi

I ExampleI p1 = 0.1, p2 = 0.2, p3 = 0.3I C = 0.1 + 0.2 + 0.3 = 0.6I p1/C = 1/6, p2/C = 1/3, p3/C = 1/2I p1/C + p2/C + p3/C = 1/6 + 1/3 + 1/2 = 1

57 / 59

Gibbs Sampler

GibbsSampler(DNA, k , t,N)

1: randomly select k-mers Motifs = (Motif 1, . . . ,Motif t) in each stringin Dna

2: BestMotifs ← Motifs3: for j ← 1 to N do4: i ← Random(t)5: Profile ← profile matrix constructed from all strings in Motifs

except for Motif i6: Motif i ← profile-randomly generated k-mer in the i-th sequence in

DNA7: if Score(Motifs) < Score(BestMotifs) then8: BestMotifs ← Motifs9: return BestMotifs

58 / 59

Gibbs Sampler

I Because of randomness in every round, Gibbs Sampler can keep onrunning without getting stuck to single solution

I However, it may end up exploring the same small set of solutionsrepeatedly: It gets stuck in a local optimum

I This can be corrected by restarting from a new random set of motifsevery now and then

59 / 59

DNA Sequence MotifsMotif Finding ProblemBrute Force Motif FindingMedian String ProblemGreedy Motif SearchRandomized Algorithms

Algorithms for Bioinformatics · Algorithms for Bioinformatics Lecture 2: Exhaustive search and...

Documents

Transcript of Algorithms for Bioinformatics · Algorithms for Bioinformatics Lecture 2: Exhaustive search and...