Spaced seeds: a brief presentation and some prospects
-
Upload
laurent-noe -
Category
Science
-
view
277 -
download
1
Transcript of Spaced seeds: a brief presentation and some prospects
1/30
Graines espaceesun rapide tour d’horizon et quelques perspectives
Spaced seeds: a brief presentation and some prospects
Laurent Noe
CRIStAL (UMR 9189 Lille /CNRS) - Inria Lille, Villeneuve d’Ascq, France
seminaire LIX/LRI - AMIB
12 Mai 2016 - Palaiseau
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
2/30
Outline
1 Spaced seeds
DefinitionSeed shape and problems involvedBad news, approximations
2 Recent work
Alignment-free distances, ClassificationSeed Coverage
Automata and Semi-rings3 Parameter-free models
PrinciplePerspectives [your work here]
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
3/30
Sequence alignment
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
3/30
Sequence alignment
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Example
π = 111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
4/30
Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .
Definition
A spaced seed π is defined as a binary word over the alphabet {1, *} :
1 : accepts only the match symbol | (1) ← must match
* : accepts any alignment symbols (1 or 0) ← don’t care
s : span (|π|), w : weight (number of 1 , |π|1) :
Exampleπ = 111*1*11
111*1*11
11111011011111011111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
5/30
Example
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111111.............................................111111
..........................................
...111111.......................................
......111111
....................................
.........111111
.................................
............111111
..............................
...............111111
...........................
..................111111
........................
.....................111111
.....................
........................111111
..................
...........................111111
...............
..............................111111
............
.................................111111
.........
....................................111111
.............................................111111
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11.......................................111*1*11
....................................
...111*1*11.................................
......111*1*11
..............................
.........111*1*11
...........................
............111*1*11
........................
...............111*1*11
.....................
..................111*1*11
..................
.....................111*1*11
...............
........................111*1*11
............
...........................111*1*11
.........
..............................111*1*11
.......................................111*1*11
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . .
but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . .
but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . .
but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . .
but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . .
but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . . but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
6/30
Seed shape and problems involved
π = 111*1*11
Seed shape (placement of some * symbols within the 1w symbols) is essential !
Two problems frequently involved for the best(s) shape(s):
1 lossless problems [Burkhardt and Karkkainen, 2001]
all the alignment of length m with at most k mismatches are all found
by at least one hit of π
→ π is (m, k)-lossless
Example
111*1*11 is (m = 20, k = 3)-lossless . . . but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
7/30
Seed shape and problems involved
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Example
if alignments of lengthm = 20
are generated by a Bernoulli model where
{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of
0.630869
⇒ 111*1*11 is the best seed among all the seeds of weight w = 6
But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
7/30
Seed shape and problems involved
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Example
if alignments of lengthm = 20
are generated by a Bernoulli model where
{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of
0.630869
⇒ 111*1*11 is the best seed among all the seeds of weight w = 6
But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
7/30
Seed shape and problems involved
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Example
if alignments of lengthm = 20
are generated by a Bernoulli model where
{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of
0.630869
⇒ 111*1*11 is the best seed among all the seeds of weight w = 6
But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
7/30
Seed shape and problems involved
2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a
probabilistic model, compute the probability for π to have at least one hit
Example
if alignments of lengthm = 20
are generated by a Bernoulli model where
{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of
0.630869
⇒ 111*1*11 is the best seed among all the seeds of weight w = 6
But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
8/30
Seed shape and problems involved
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
plot_n1_w6_s12_bernoulli_python_maxima.pdf
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
8/30
Seed shape and problems involved
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
plot_n1_w6_s12_bernoulli_python_maxima.pdf
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
8/30
Seed shape and problems involved
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
plot_n1_w6_s12_bernoulli_python_maxima.pdf
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
9/30
Problems involved : some bad news
2 lossy problems
Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)
[Ma and Li, 2007]
Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]
⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,
Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]
The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design
[Ma and Yao, 2009]
(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )
1 lossless problems
Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]
1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
10/30
Recent work related to spaced seeds
1 Alignment-free distances[Leimeister et al., 2014, Horwege et al., 2014, Boden et al., 2013]
2 SVM classification[Onodera and Shibuya, 2013, Ghandi et al., 2014]
3 Read clustering[Bao et al., 2011, Chong et al., 2012, Hauser et al., 2013]
4 Metagenomic classification, . . .[Brinda et al., 2015, Ounit and Lonardi, 2015]
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
11/30
“New Uses for Old Things”
little boy
⇒⇒⇒⇒
frying pan
2
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
11/30
“New Uses for Old Things”
little boy
⇒⇒⇒⇒
frying pan
2
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
11/30
“New Uses for Old Things”
little boy
⇒⇒⇒⇒
frying pan
2
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
11/30
“New Uses for Old Things”
little boy
⇒⇒⇒⇒
frying pan
2
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
11/30
“New Uses for Old Things”
little boy
⇒⇒⇒⇒
frying pan
2
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
12/30
Coverage measure for a seed
DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]
Example
ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
12/30
Coverage measure for a seed
DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]
ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
12/30
Coverage measure for a seed
DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]
ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
12/30
Coverage measure for a seed
DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]
ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||A•T•C•AG•CG•C•AA•A•T•G•C•TC•A•A•G•A
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
12/30
Coverage measure for a seed
DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]
ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||A•T•C•AG•CG•C•AA•A•T•G•C•TC•A•A•G•A
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed
/ a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
......
... 1... 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed
/ a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
......
... 1... 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed
/ a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1
πocc2
......
...
1 1 * 1πocc3
......
...
...... 1
... 1
x = 1 0 1•
1•
1 1•
0 0 1 0 1 1 1 1 1
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed
/ a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1
πocc3
......
......
...
1
...
1
x = 1 0 1•
1•
1 1•
0 0 1 0 1•
1•
1 1•
1
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed
/ a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
...... 1 1 * 1
x = 1 0 1•
1•
1 1•
0 0 1 0 1•
1•
1•
1•
1•
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed / a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
...... 1 1 * 1
x = 1 0 1•
1•
1 1•
0 0 1 0 1•
1•
1•
1•
1•
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed / a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
...... 1 1 * 1
x = 1 0 1•
1•
1 1•
0 0 1 0 1•
1•
1•
1•
1•
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1
x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
13/30
Coverage measure for a seed / a set of seeds
alignment : x = 101111001011111
Example
seed : π = 11*1
πocc1 1 1 * 1πocc2
......
... 1 1 * 1πocc3
......
...... 1 1 * 1
x = 1 0 1•
1•
1 1•
0 0 1 0 1•
1•
1•
1•
1•
set of seeds : {π1, π2} = {11*1, 1*1*1}
π2 occ1 1 * 1 * 1π1 occ2
... 1 1 * 1π2 occ3
......
......
... 1 * 1 * 1π1 occ4
......
......
...... 1 1 * 1
π2 occ5
......
......
...... 1 * 1 * 1
π1 occ6
......
......
......
... 1 1 * 1x = 1
•0 1
•1•
1•
1•
0 0 1•
0 1•
1•
1•
1•
1•
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
Coverage measure for a seed / a set of seeds
{π1, π2} = {11*1, 1*1*1}
01
1
10
0
111
0
101
1
110,1010
0
1111
0
10111
01101,10101,11101•• •,• • •,••• •
1(+3)
1110
0
1111,10111•• •,• • •
1(+3)
0
1(+3)
1010,1110• • ,•••
0
1011• •1
01
(+4)
1111,11111••••,•••••
1(+2)
110,110,1110• , • ,• •
0
0
1(+1)
1(+1)
0
0
1(+1)
0
1(+2)
14/30
Coverage measure for a seed / a set of seeds
{π1, π2} = {11*1, 1*1*1}
01
1
10
0
111
0
101
1
110,1010
0
1111
0
10111
01101,10101,11101•• •,• • •,••• •
1(+3)
1110
0
1111,10111•• •,• • •
1(+3)
0
1(+3)
1010,1110• • ,•••
0
1011• •1
01
(+4)
1111,11111••••,•••••
1(+2)
110,110,1110• , • ,• •
0
0
1(+1)
1(+1)
0
0
1(+1)
0
1(+2)
15/30
Coverage measure for a seed / a set of seeds
{π1, π2} = {11*1, 1*1*1}
01
1
10
0
111
0
101
1
110,1010
0
1111
0
10111
01101,10101,11101•• •,• • •,••• •
1(+3)
1110
0
1111,10111•• •,• • •
1(+3)
0
1(+3)
1010,1110• • ,•••
0
1011• •1
01
(+4)
1111,11111••••,•••••
1(+2)
110,110,1110• , • ,• •
0
0
1(+1)
1(+1)
0
0
1(+1)
0
1(+2)
16/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Problems involved and Semi-rings
2 lossy problems (. . . probability for π to have at least one hit)
Bernoulli model :⊕
= +⊗
= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring
1 lossless problems (. . .π find all length m alignments with k errors)
(m, k)-lossless ? :⊕
= min⊗
= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)
0 other (own) problems : e.g. Pearson / Spearman correlation. . .
. . . . . . . . . . . . . . . :⊕
= +⊗
= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring
17/30
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]
18/30
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]
18/30
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]
18/30
19/30
Parameter-free models
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
plot_n1_w6_s12_bernoulli_python_maxima.pdf
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
19/30
Parameter-free models
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
20
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3
2
6
0 3
2
6
0 3
2
6
0 3
2
6
0 3
2
6
0 3
2
6
0 3
2 7
2 6
0 3
2 7
2 6
0 3 2
7
2 6
0 3 2
7
2 6
0 3 2
7
2 6
0 3 2
7
6
0 3 2
7
6
0 3 2
7
6
0 3 2
7
6
0 3 2
7
6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
['111111'] 0.077330519411998251862651931755
['111*111'] 0.112233522767708459228220149449
['11*1111'] 0.436105861818672948722472931027
['11*1*111'] 0.988223467835828245511583556919
['11*1111'] 1.0
['111111']
15*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1183*p^8*(1-p)^(20-8) + ... + 20*p^19*(1-p)^(20-19) + 1*p^20
['111*111']
14*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1261*p^8*(1-p)^(20-8) + ... + 20*p^19*(1-p)^(20-19) + 1*p^20
['11*1111']
14*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1261*p^8*(1-p)^(20-8) + ...
... ...
... ...
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
19/30
Parameter-free models
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
plot_n1_w6_s12_bernoulli_python_maxima.pdf
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots [Mak and Benson, 2009]
2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]
20/30
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots [Mak and Benson, 2009]
2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]
20/30
21/30
Parameter-free models
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
21/30
Parameter-free models
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
21/30
Parameter-free models
Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0
.0
0.2
0.4
0.6
0.8 1
.0
match probability
10
20
30
40
50
60
alig
nm
ent
length
0 1 2
0 1 3 2
0 1 3 4
0 1 3 4
0 1 3
0 1 3
0 1 3 2 3
0 1 3 2 3 5
0 1 3 2 3
0 1 3 2
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6 3
0 1 3 2 6
0 3 2 6
0 3 2
0 3 2 3
0 3 2 6 3
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 2 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7 6
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 3 2 7 8
0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111
1 - Pr(π) → β × λ^n
π = 111111 λ = 0.95284 β = 0.54942π = 1*1*1*1*1*1 λ = 0.95284 β = 0.30186π = 111*1*11 λ = 0.93306 β = 0.47946π = 11**1*1*11 λ = 0.92295 β = 0.30998π = 11*1****11*1 λ = 0.93218 β = 0.26931
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots [Mak and Benson, 2009]
2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]
3 . . . {P(1) = p,P(0) = 1− p} and m = n or ∞ ???
22/30
Parameter-free models
Does it make sense to set . . .
0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?
1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency
⇒ p-polynomials roots [Mak and Benson, 2009]
2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]
3 . . . {P(1) = p,P(0) = 1− p} and m = n or ∞ ???
22/30
23/30
Parameter-free models [your work here]
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?
π = 11*1q1start q2 q3
q4
q5
q6
0
1
0
1
0
1
01
01
0,1
2 2 4 3 2
seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)
5 4 3 2 5 4 3
+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)
Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )
2 3 4
((p - p) z - 1) p z
q6(z) := ----------------------------------------------------------------------------------
5 4 3 5 4 3 2 4 2 2
(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)
1
q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----
1 - z
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
23/30
Parameter-free models [your work here]
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?
π = 11*1q1start q2 q3
q4
q5
q6
0
1
0
1
0
1
01
01
0,1
2 2 4 3 2
seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)
5 4 3 2 5 4 3
+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)
Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )
2 3 4
((p - p) z - 1) p z
q6(z) := ----------------------------------------------------------------------------------
5 4 3 5 4 3 2 4 2 2
(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)
1
q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----
1 - z
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
23/30
Parameter-free models [your work here]
Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?
π = 11*1q1start q2 q3
q4
q5
q6
0
1
0
1
0
1
01
01
0,1
2 2 4 3 2
seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)
5 4 3 2 5 4 3
+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)
Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )
2 3 4
((p - p) z - 1) p z
q6(z) := ----------------------------------------------------------------------------------
5 4 3 5 4 3 2 4 2 2
(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)
1
q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----
1 - z
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
24/30
...
Collaborators : Donald E. K. Martin,Martin C. Frith, Gregory Kucherov, MikhailA. Roytberg & Eugenia Furletova, MartaGırdea, Slawomir Lasota & Anna Gambin& Ewa Szczurek, Yann Ponty :-)
a
ahttps://c1.staticflickr.com/7/6169/6203268626_1c1fa2ff7a_b.jpg
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
25/30
References I
Bao, E., Jiang, T., Kaloshian, I., and Girke, T. (2011).
SEED: efficient clustering of next-generation sequences.Bioinformatics, 27(18):2502–2509.
Benson, G. and Mak, D. Y. (2008).
Exact distribution of a spaced seed statistic for DNA homology detection.In Amir, A., Turpin, A., and Moffat, A., editors, Proceedings of the 15th InternationalSymposium on String Processing and Information Retrieval (SPIRE), Melbourne (Australia),volume 5280 of Lecture Notes in Computer Science, pages 282–293. Springer.
Boden, M., Schoneich, M., Horwege, S., Lindner, S., Leimeister, C., and Morgenstern, B.
(2013).Alignment-free sequence comparison with spaced k-mers.In Proceedings of the German Conference on Bioinformatics (GCB), volume 34 ofOpenAccess Series in Informatics (OASIcs), pages 24–34.
Brinda, K., Sykulski, M., and Kucherov, G. (2015).
Spaced seeds improve metagenomic classification.Bioinformatics, 31(22):3584–3592.
Buhler, J., Keich, U., and Sun, Y. (2005).
Designing seeds for similarity search in genomic DNA.Journal of Computer and System Sciences, 70(3):342–363.(earlier version in RECOMB 2003).
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
26/30
References II
Burkhardt, S. and Karkkainen, J. (2001).
Better filtering with gapped q-grams.In Proceedings of the 12th Symposium on Combinatorial Pattern Matching (CPM), volume2089 of Lecture Notes in Computer Science, pages 73–85. Springer.
Choi, K. P. and Zhang, L. (2004).
Sensitivity analysis and efficient method for identifying optimal spaced seeds.Journal of Computer and System Sciences, 68(1):22–40.
Chong, Z., Ruan, J., and Wu, C.-I. (2012).
Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.Bioinformatics, 28(21):2732–2737.
Do, P.-T. and Tran-Thi, C.-G. (2015).
An improvement of the overlap complexity in the spaced seed searching problem betweengenomic DNAs.In Proceedings of the 2nd National Foundation for Science and Technology DevelopmentConference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam,pages 271–276.
Egidi, L. and Manzini, G. (2013).
Better spaced seeds using quadratic residues.Journal of Computer and System Sciences, 79(7):1144–1155.
Ghandi, M., Lee, D., Mohammad-Noori, M., and Beer, M. A. (2014).
Enhanced regulatory sequence prediction using gapped k-mer features.PLoS Computational Biology, 10(7):e1003711.
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
27/30
References III
Gheraibia, Y., Moussaoui, A., Djenouri, Y., Kabir, S., Yin, P.-Y., and Mazouzi, S. (2015).
Penguin search optimisation algorithm for finding optimal spaced seeds.International Journal of Software Science and Computational Intelligence (IJSSCI),7(2):85–99.
Hahn, L., Leimeister, C.-A., and Morgenstern, B. (2015).
RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-freesequence comparison.ARXiV.
Hauser, M., Mayer, C. E., and Soding, J. (2013).
kClust: fast and sensitive clustering of large protein sequence databases.BMC Bioinformatics, 14(248).
Horwege, S., Lindner, S., Boden, M., Hatje, K., Kollmar, M., Leimeister, C.-A., and
Morgenstern, B. (2014).Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact wordmatches.Nucleic Acids Research, 42(W1):W7–W11.
Ilie, L. and Ilie, S. (2007).
Multiple spaced seeds for homology search.Bioinformatics, 23(22):2969–2977.
Ilie, L., Ilie, S., and Mansouri Bigvand, A. (2011).
SpEED: fast computation of sensitive spaced seeds.Bioinformatics, 27(17):2433–2434.
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
28/30
References IV
Keich, U., Li, M., Ma, B., and Tromp, J. (2004).
On spaced seeds for similarity search.Discrete Applied Mathematics, 138(3):253–263.(earlier version in 2002).
Kong, Y. (2007).
Generalized correlation functions and their applications in selection of optimal multiplespaced seeds for homology search.Journal of Computational Biology, 14(2):238–254.
Kucherov, G., Noe, L., and Roytberg, M. A. (2006).
A unifying framework for seed sensitivity and its application to subset seeds.Journal of Bioinformatics and Computational Biology, 4(2):553–569.
Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., and Morgenstern, B. (2014).
Fast alignment-free sequence comparison using spaced-word frequencies.Bioinformatics, 30(14):1991–1999.
Ma, B. and Li, M. (2007).
On the complexity of spaced seeds.Journal of Computer and System Sciences, 73(7):1024–1034.
Ma, B., Tromp, J., and Li, M. (2002).
PatternHunter: Faster and more sensitive homology search.Bioinformatics, 18(3):440–445.
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
29/30
References V
Ma, B. and Yao, H. (2009).
Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design.Information Processing Letters, 109(19):1120–1124.(earlier version in APBC 2008).
Mak, D. Y. and Benson, G. (2009).
All hits all the time: parameter free calculation of seed sensitivity.Bioinformatics, 25(3):302–308.(earlier version in APBC 2007).
Martin, D. E. K. and Noe, L. (2015).
Faster exact distributions of pattern statistics through sequential elimination of states.Annals of the Institute of Statistical Mathematics, pages 1–18.
Morgenstern, B., Zhu, B., Horwege, S., and Leimeister, C.-A. (2015).
Estimating evolutionary distances between genomic sequences from spaced-word matches.Algorithms for Molecular Biology, 10(5).
Nicolas, F. and Rivals, E. (2008).
Hardness of optimal spaced seed design.Journal of Computer and System Sciences, 74(5):831–849.(earlier version in CPM 2005).
Noe, L. and Martin, D. E. K. (2014).
A coverage criterion for spaced seeds and its applications to support vector machine stringkernels and k-mer distances.Journal of Computational Biology, 21(12):947–963.
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives
30/30
References VI
Onodera, T. and Shibuya, T. (2013).
The gapped spectrum kernel for support vector machines.In Proceedings of the International Conference on Machine Learning and Data Mining inPattern Recognition (MLDM), volume 7988 of Lecture Notes in Computer Science, pages1–15. Springer.
Ounit, R. and Lonardi, S. (2015).
Higher classification accuracy of short metagenomic reads by discriminative spaced k-mers.In Proceedings of the 15th International Workshop on Algorithms in Bioinformatics (WABI),Atlanta, (USA), volume 9289 of Lecture Notes in Bioinformatics, pages 286–295. Springer.
Yang, J. and Zhang, L. (2008).
Run probability of high-order seed patterns and its applications to finding good transitionseeds.In Brazma, A., Miyano, S., and Akutsu, T., editors, Proceedings of the 6th Asia PacificBioinformatics Conference (APBC), 14-17 January 2008, Kyoto, Japan, volume 6 ofAdvances in Bioinformatics and Computational Biology, pages 123–132. Imperial CollegePress.
Zhang, L. (2007).
Superiority of spaced seeds for homology search.IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB),4(3):496–505.
Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives