Post on 14-Dec-2015
Algorithms to Search Position Specific Scoring Matrices in Biosequences
Cinzia PizziDipartimento di Ingegneria dell’Informazione
Università degli Studi di Padova
C.Pizzi, DEI – Univ. Of Padova (Italy) 2
Outline Weighted patterns in Biology The problem of profile matching The look-ahead method
Suffix based Algorithms Aho-Corasick Extension (ACE) Look-ahead Filtration Algorithm (LFA) Superalphabet (NS)
Some experimental results
What are Motifs? Motifs are biologically significant
elements that are responsible for common structures or functions
Motifs are statistically significant substrings in bio-sequences
Assumption: if two entities share same function or structure, common over-represented elements might be responsible for observed similarity
C.Pizzi, DEI – Univ. Of Padova (Italy) 3
Motif Discovery Take set of co-expressed genes Compare their promoter regions Common over-represented
substrings are good candidates for TFBS
Need counted/expected frequency
C.Pizzi, DEI – Univ. Of Padova (Italy) 4
Promoters of co-expressedgenes
C.Pizzi, DEI – Univ. Of Padova (Italy) 5
Motif Discovery TFBS, DNA motifs Motifs = binding sites = substrings
Intrinsic variability of biological sequences Mismatches, indels, wildcards,
superalphabets...
Promoters of co-expressedgenes
Motif Representation Binding sites of the same factor
are not exactly the same in all sequences
ACATACCCGAATATGCATGCCTACTCCAAATTCGAAACGGACTCCTATGCCCACTCGGAA
1 2 3 4 5 6A
GC
T
Profile -> matrix representation
C.Pizzi, DEI – Univ. Of Padova (Italy)
Motif Representation Protein classification: each family
is modeled by a matrix
ACDEHNPVACCCDEGAMMATATHCATVVST
1 2 3 4 5 6A
DC
...
C.Pizzi, DEI – Univ. Of Padova (Italy)
1 2 3 4 5 6A
DC
... 1 2 3 4 5 6A
DC
...WVDEHNPVAC
Profile Weighted pattern p oflength m
defined over alphabet Σ |Σ| x m matrix defines scores
1 2 3 4 5 6A 0.
30.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Segment Score
S = s1 s2 … sm
1 2 3 4 5 6A 0.
30.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
s1 s2 s3 s4 s5 s6
m
ii isMScore
1
],[
C.Pizzi, DEI – Univ. Of Padova (Italy)
Meaning of the score
C.Pizzi, DEI – Univ. Of Padova (Italy) 10
)|(
)|(lnlnln],[
1
1
1,
,
1
,
1 BSP
MSP
p
f
p
f
p
fisMScore
m
im
is
m
iis
s
ism
i s
ism
ii
i
i
i
i
i
i
Segment Score Example
Score = 2.1
1 2 3 4 5 6A 0.
30.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
G T A C A C
C.Pizzi, DEI – Univ. Of Padova (Italy)
Profile Matching Problem Text T of length n defined over Σ Profile p (|Σ| x m) Score threshold th Score Si of the segment of length
m starting at position i Find all positions i in T where Si ≥
th
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 0.6
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 2.1
Match at pos 2!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 1.4
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 1.8
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 0.9
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 1.3
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 1.4
Not a match!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: th = 2CGTACACTCGGTA
Score = 2.2
Match at pos 8!
1 2 3 4 5 6
A 0.3
0.0
0.1
0.2
1.0
0.3
C 0.1
0.8
0.5
0.2
0.0
0.4
G 0.2
0.0
0.4
0.3
0.0
0.0
T 0.4
0.2
0.0
0.3
0.0
0.3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Scenarios of applications Online Algorithms (no indexing)
Database of profile matrices (e.g. TRANSFAC, JASPAR for TFBS)
Input sequence to be searched Offline algorithms (indexing)
Sequence or set of sequences Input matrix to search for matches
C.Pizzi, DEI – Univ. Of Padova (Italy)
Summary of current methods
Look-ahead method LA (Wu et al,00)
Offline methods based on LA: Suffix-tree (Dorohonceanu et al, 00) Suffix-array (Beckstette et al, 04,06) Truncated Suffix Tree (Pizzi and
Favaretto, 10) Online methods based on LA:
Aho-Corasick,Filtering(Pizzi et al. 07,09)
C.Pizzi, DEI – Univ. Of Padova (Italy)
Summary of current methods Pattern Matching
Shift-Add (Salmela e Tarhio, 08) KMP (Liefoghee et al, 09)
Matrix partitioning (Liefhooghe et al.,06, Pizzi et al., 07, 09)
FFT based (Rajasekaran et al., 02) Compression based(Freschi et al., 05)
C.Pizzi, DEI – Univ. Of Padova (Italy)
The look-ahead approach
]1max[][
],[]max[
ithiP
isMi
th
m
iki
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
max 3.0 2.2 1.7 1.4 0.4
Pth -1.0 -0.2 0.3 0.6 1.6 2.0
C.Pizzi, DEI – Univ. Of Padova (Italy)
The look-ahead approach
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
max 3.0 2.2 1.7 1.4 0.4
Pth -1.0 -0.2 0.3 0.6 1.6 2.0
C G T A C A 0.1
C.Pizzi, DEI – Univ. Of Padova (Italy)
The look-ahead approach
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
max 3.0 2.2 1.7 1.4 0.4
Pth -1.0 -0.2 0.3 0.6 1.6 2.0
C G T A C A 0.1 0.1
C.Pizzi, DEI – Univ. Of Padova (Italy)
The look-ahead approach
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
max 3.0 2.2 1.7 1.4 0.4
Pth -1.0 -0.2 0.3 0.6 1.6 2.0
C G T A C A 0.1 0.1 0.1 Don’t need to compare these ones!
C.Pizzi, DEI – Univ. Of Padova (Italy)
The suffix tree of T data structure suffix tree, Tree(T),
is compacted trie that represents all the suffixes of string T
linear size: |Tree(T)| = O(|T|) can be constructed in linear time
O(|T|)
C.Pizzi, DEI – Univ. Of Padova (Italy)
Suffix trie and suffix tree
ab
b
aaa
aa
b
b
b
a
baab
baabab
abaabbaabaababb
Trie(abaab) Tree(abaab)
C.Pizzi, DEI – Univ. Of Padova (Italy)
Tree(T) is of linear size only the internal branching nodes
and the leaves represented explicitly
edges labeled by substrings of T v = node(α) if the path from root to
v spells α one-to-one correspondence of
leaves and suffixes |T| leaves, hence < |T| internal
nodesC.Pizzi, DEI – Univ. Of Padova (Italy) 30
Tree(hattivatti)hattivatt
i
attivatti
ttivatti
tivatti
ivatti
vatti
atti
tti
ti
i
hattivatti attivatt
i
ttivatti
tivatti
ivatti
vatti
vattivatti
attiti
i
i
tti
ti
t
i
vatti
vatti
vatti
hattivatti
atti
C.Pizzi, DEI – Univ. Of Padova (Italy)
Tree(hattivatti)hattivatt
i
attivatti
ttivatti
tivatti
ivatti
vatti
atti
tti
ti
i
1 2 34
5
6
6,106,10
2,5 4,510
8
9
3,3
vatti
vatti
hattivatti
hattivatti
7
C.Pizzi, DEI – Univ. Of Padova (Italy)
Tree(T) is full text indexTree(T)
P
31 8
P occurs in T at locations 8, 31, …
P occurs in T P is a prefix of some suffix of T Path for P exists in Tree(T)
All occurrences of P in time O(|P| + #occ)C.Pizzi, DEI – Univ. Of Padova (Italy)
LA over a Suffix Tree
CG
T
Score(CG)=0.2 > -0.2 = Th(2)Score(CGT)=0.2 < 0.3 = Th(3) : Skip the subtree
C.Pizzi, DEI – Univ. Of Padova (Italy)
TCC
G
LA over a Suffix Tree
CG
T
Score(TCC)=1.9 > 0.3 = Th(3)Score(TCCG)=2.2 > 2 = Th(6) : Match, all the subtree
C.Pizzi, DEI – Univ. Of Padova (Italy)
TCC
G
Suffix array: example
suffix array = lexicographic order of the suffixes
hattivatti
attivatti
ttivatti
tivatti
ivatti
vatti
atti
tti
ti
i
ε
εatti
attivatti
hattivatti
i
ivatti
ti
tivatti
tti
ttivatti
vatti
11
7
2
1
10
5
9
4
8
3
6
C.Pizzi, DEI – Univ. Of Padova (Italy)
37
Suffix array suffix array SA(T) = an array giving
the lexicographic order of the suffixes of T
practitioners like suffix arrays (simplicity, space efficiency)
theoreticians like suffix trees (explicit structure)
C.Pizzi, DEI – Univ. Of Padova (Italy)
LA over a Suffix Array
C.Pizzi, DEI – Univ. Of Padova (Italy)
In terms of suffix trees, skp[i] is the lexicographically next leaf that does notoccur in the subtree below the branching node corresponding to the longest common prefix of Ssuf[i-1] and Ssuf[i].skp[i] = min({n + 1} U [ j in [i + 1; n] | lcp[i] > lcp[j])
LA over Truncated ST Build TST with truncation factor h L = max length of a matrix in the
DB if h=L, simply work as ST if h<L, filtering
if a leaf is reached take corresponding positions (p1, p2, …, pt)
For each pi check positions pi+j, h<j<=m with lookahead
C.Pizzi, DEI – Univ. Of Padova (Italy) 39
LA over Truncated ST
C.Pizzi, DEI – Univ. Of Padova (Italy) 40
hL
p1 p3p2p1 + h
p1
p2 +h p3 +hL-h L-h L-h
p2p3
Aho-Corasick Expansion (ACE) Pattern matching + LA
Lookahead Filtration Algorithm(LFA) Score for fixed length prefix as a filter
+ LA Naive Superalphabet (NS)
Encode k-mers in superalphabet symbol
Online Profile Matching
C.Pizzi, DEI – Univ. Of Padova (Italy)
The Aho-Corasick Algorithm
A trie for D = {he, she, his, hers}
C.Pizzi, DEI – Univ. Of Padova (Italy)
The Aho-Corasick algorithm
Add failure links his -- she
Time O(n+m)Space depends on Dm = sum of word lengths
C.Pizzi, DEI – Univ. Of Padova (Italy)
The Fast Aho-Corasick
s
0 1 2 8 9
6 7
3 4 5
h e r s
si
s
h ee,i,r
h
r s
h,sh e,i
s
Time O(n)Space depends on D and Σ
C.Pizzi, DEI – Univ. Of Padova (Italy)
AC and profile matching Build AC automaton for all the words
that are a match for the matrix LA partial threshold limits the number of
words to those that actually match O(|D||Σ|m + m|Σ|) pre-processing |D|≤|Σ|m depends on matrix and threshold
Search the text with AC automaton O(n) search
C.Pizzi, DEI – Univ. Of Padova (Italy)
AC-Extension by LA First position
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
Pth -1.0
-0.2
0.3 0.6 1.6 2.0
[C,0.1]
[G,0.2] [A,0.3]
[T,0.4]
C.Pizzi, DEI – Univ. Of Padova (Italy)
AC-Extension by LA Second position
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
Pth -1.0
-0.2
0.3 0.6 1.6 2.0
[C,0.1]
[G,0.2] [A,0.3]
[T,0.4]
[A,0.1]
[G,0.1] [T,0.3][C,0.9]
C.Pizzi, DEI – Univ. Of Padova (Italy)
AC-Extension by LA Third position
1 2 3 4 5 6
A 0.3 0.0 0.1 0.2 1.0 0.3
C 0.1 0.8 0.5 0.2 0.0 0.4
G 0.2 0.0 0.4 0.3 0.0 0.0
T 0.4 0.2 0.0 0.3 0.0 0.3
Pth -1.0
-0.2
0.3 0.6 1.6 2.0
[C,0.1]
[G,0.2] [A,0.3]
[T,0.4]
[A,0.1]
[G,0.1] [T,0.3][C,0.9]
[G,0.5] [C,0.6]
C.Pizzi, DEI – Univ. Of Padova (Italy)
ACE Example
CGTACACTCGGTA
g t a c
cg g
t
t
a c
ac
7
Match at p-m+1 = 7-6+1=2
C.Pizzi, DEI – Univ. Of Padova (Italy)
Minimum Gain for ACE Dual Concept of look-ahead Compute for every prefix the
minimum contribution of the remaining positions in the pattern
If current_score(i) + min_gain(i) > Th Report a match
Adv: in the automaton save a full subtree of height m-i
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: M0003, MSS=0.85
[G,18500]
[C,37000]
[C,55500]
• GCC is sufficient to detect a match
h=3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Example: M0003, MSS=0.85
[G,18500]
[C,37000]
[C,55500]
hm
i
iA1
||• Save 5464 nodes out of 5468
h=3
C.Pizzi, DEI – Univ. Of Padova (Italy)
Look-ahead Filtration Compute the scores for all words of fixed
length k and store them O(|Σ|k) pre-processing
Sliding window of size k When score ≥ Pth[k], check remaining symbols
with LA (up to m-k)
O(n + (m -k)r) search; k is the prefix length, r is avg number of full scoring
C.Pizzi, DEI – Univ. Of Padova (Italy)
Lookahaed Filtration ExampleK=3
SCORE
AAA
0.4
... ...
ATT 0.5
CAA 0.2
... ...
CGT 0.1
CTT 0.3
GAA
0.3
... ...
GTA 0.5
... ...
GTT 0.4
TAA 0.5
... ...
TTT 0.6
Pth[3]=0.3
CGTACACTCGGTA
Score(CGT) = 0.1 < Pth[3]
Shift and concatenate to obtain thenext 3-mer
|Σ|k
entries
C.Pizzi, DEI – Univ. Of Padova (Italy)
Filtered Lookahaed Example
K=3
SCORE
AAA
0.4
... ...
ATT 0.5
CAA 0.2
... ...
CGT 0.1
CTT 0.3
GAA
0.3
... ...
GTA 0.5
... ...
GTT 0.4
TAA 0.5
... ...
TTT 0.6
Pth[3]=0.3
CGTACACTCGGTA
Score(GTA) = 0.5 > Pth[3]
Check at most m-k remaining symbols
Score(GTAC) = 0.7 > Pth[4]Score(GTACA) = 1.7 > Pth[5]Score(GTACAC) = 2.1 > th
Match!
|Σ|k
entries
C.Pizzi, DEI – Univ. Of Padova (Italy)
More on ACE and LF It is possible to combine both
methods Automaton build on qualifying
prefixes only Multi-matrix version
C.Pizzi, DEI – Univ. Of Padova (Italy) 67
Super-Alphabet Code words of length k to super-
alphabet symbols |Σ|k symbols are needed
Code the matrix M into matrix M’ (|Σ|k x m/k)
Run the naive algorithm on the sequence O(nm/k)
C.Pizzi, DEI – Univ. Of Padova (Italy)
SuperAlphabet ExampleK=2 SCORE 1-2 SCORE 3-4 SCORE 5-6
AA 0.3 0.3 1.3
AC 1.1 0.3 1.4
AG 0.3 0.4 1.0
AT 0.3 0.4 1.3
CA 0.1 0.7 0.3
CC 0.9 0.7 0.4
CG 0.1 0.8 0.0
CT 0.3 0.8 0.3
GA 0.2 0.6 0.3
GC 1.0 0.6 0.4
GG 0.2 0.7 0.0
GT 0.4 0.7 0.3
TA 0.4 0.2 0.3
TC 1.2 0.2 0.4
TG 0.4 0.3 0.0
TT 0.6 0.3 0.3
CGTACACTCGGTA
Score = 0.6 < Th
|Σ|k
entries
C.Pizzi, DEI – Univ. Of Padova (Italy)
SuperAlphabet ExampleK=2 SCORE 1-2 SCORE 3-4 SCORE 5-6
AA 0.3 0.3 1.3
AC 1.1 0.3 1.4
AG 0.3 0.4 1.0
AT 0.3 0.4 1.3
CA 0.1 0.7 0.3
CC 0.9 0.7 0.4
CG 0.1 0.8 0.0
CT 0.3 0.8 0.3
GA 0.2 0.6 0.3
GC 1.0 0.6 0.4
GG 0.2 0.7 0.0
GT 0.4 0.7 0.3
TA 0.4 0.2 0.3
TC 1.2 0.2 0.4
TG 0.4 0.3 0.0
TT 0.6 0.3 0.3
CGTACACTCGGTA
Score = 2.1 match!
|Σ|k
entries
C.Pizzi, DEI – Univ. Of Padova (Italy)
Experiments Jaspar Database: 123 TFBS
matrices (DNA), PRINTS database (proteins)
Test sequence about 50M bases P-value defines threshold 3 GHz Intel Pentium IV processor
with 2 gigabytes of main memory, running under Linux.
C.Pizzi, DEI – Univ. Of Padova (Italy)
Conclusions Searching matrix is a core step for
many bioinformatics applications (searching, discovery, classification…)
Several approaches have been developed in recent years
Online methods based on filtering are currently the most efficient
C.Pizzi, DEI – Univ. Of Padova (Italy) 78
References C.Pizzi, P.Rastas, E.Ukkonen
Fast Search Algorithms for Position Specific Scoring Matrices In Proc. of the 1st Conference on Bioinformatics Research and Development (BIRD 07), Berlin, Germany, March 2007, LNCS/LCBI 4414 pp 239--250
C.Pizzi, E.UkkonenFast Profile Matching Algorithms - a survey Theoretical Computer Science, 395(2-3), 2008, pp 137--157, Special Issue SAIL: String Algorithms, Information and Learning
C.Pizzi, P.Rastas, E.UkkonenFinding significant matches of position weight matrices in linear time Accepted for publication by IEEE Transaction on Computational Biology and Bioinformatics, 2009
J.Korhonen, P.Martinmaki, C.Pizzi, P.Rastas, E.Ukkonen MOODS: fast search for position weight matrix matches in DNA sequences Bioinformatics 2009 25(23):3181-3182
C.Pizzi, DEI – Univ. Of Padova (Italy) 79
Acknowledgements Esko Ukkonen, Pasi Rastas, Janne
Korhonen, P.Martinmaki Academy of Finland grant “From Data
to knowledge” EU Project “Regulatory Networks”
Premio di Ricerca `Avere Trent’Anni’ Univ.Padova, Parco Scientifico Galileo,
Il Mattino, Giovani Confindustria, Scuola Galileiana di Studi SuperioriC.Pizzi, DEI – Univ. Of Padova (Italy)
Length 100
NA = Naïve AlgorithmLSA = Look-ahead Search AlgorithmLFA = Look-ahead Filter Algorithm (k=7)NS = Naïve Superalphabet (k=7)
• 13 patterns obtained by concateneting Jaspar matrices
• MSS: Matrix Similarity Score (% of maximal score)
C.Pizzi, DEI – Univ. Of Padova (Italy)
Length 0 to 15 (108 matrices)
NA = Naïve AlgorithmLSA = Look-ahead Search AlgorithmACE = Aho-Corasick ExpansionLFA = Look-ahead Filtration Algorithm (k=7)NS = Naïve Super-alphabet (k=7)
C.Pizzi, DEI – Univ. Of Padova (Italy)
Length 16 to 30 (15 matrices)
NA = Naïve AlgorithmLSA = Look-ahead Search AlgorithmLFA = Look-ahead Filtration AlgorithmNS = Naïve Super-alphabet
C.Pizzi, DEI – Univ. Of Padova (Italy)
Length 100
NA = Naïve AlgorithmLSA = Look-ahead Search AlgorithmLFA = Look-ahead Filter Algorithm (k=7)NS = Naïve Superalphabet (k=7)
• 13 patterns obtained by concateneting Jaspar matrices
P=10-5 P=10-4 P=10-3 P=10-2
NA 10.234 10.244 10.434 11.080
LSA 11.835 12.675 13.335 15.118
LFA 9.955 10.347 11.096 12.965
NS 3.576 3.677 4.593 9.918
C.Pizzi, DEI – Univ. Of Padova (Italy)