Spaced seeds: a brief presentation and some prospects

106
1/30 Graines espac´ ees un rapide tour d’horizon et quelques perspectives Spaced seeds: a brief presentation and some prospects Laurent No´ e CRIStAL (UMR 9189 Lille /CNRS) - Inria Lille, Villeneuve d’Ascq, France seminaire LIX/LRI - AMIB 12 Mai 2016 - Palaiseau Laurent No´ e Graines espac´ ees : un rapide tour d’horizon et quelques perspectives

Transcript of Spaced seeds: a brief presentation and some prospects

1/30

Graines espaceesun rapide tour d’horizon et quelques perspectives

Spaced seeds: a brief presentation and some prospects

Laurent Noe

CRIStAL (UMR 9189 Lille /CNRS) - Inria Lille, Villeneuve d’Ascq, France

seminaire LIX/LRI - AMIB

12 Mai 2016 - Palaiseau

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

2/30

Outline

1 Spaced seeds

DefinitionSeed shape and problems involvedBad news, approximations

2 Recent work

Alignment-free distances, ClassificationSeed Coverage

Automata and Semi-rings3 Parameter-free models

PrinciplePerspectives [your work here]

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

3/30

Sequence alignment

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

3/30

Sequence alignment

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Example

π = 111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

4/30

Spaced Seeds[Burkhardt and Karkkainen, 2001, Ma et al., 2002]. . .

Definition

A spaced seed π is defined as a binary word over the alphabet {1, *} :

1 : accepts only the match symbol | (1) ← must match

* : accepts any alignment symbols (1 or 0) ← don’t care

s : span (|π|), w : weight (number of 1 , |π|1) :

Exampleπ = 111*1*11

111*1*11

11111011011111011111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA||||||||||||||||||||ATCAGTGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCTCAAGA|||||:||||||||||||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCAAATGCGCAAGA|||||:||||||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

5/30

Example

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111111.............................................111111

..........................................

...111111.......................................

......111111

....................................

.........111111

.................................

............111111

..............................

...............111111

...........................

..................111111

........................

.....................111111

.....................

........................111111

..................

...........................111111

...............

..............................111111

............

.................................111111

.........

....................................111111

.............................................111111

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11.......................................111*1*11

....................................

...111*1*11.................................

......111*1*11

..............................

.........111*1*11

...........................

............111*1*11

........................

...............111*1*11

.....................

..................111*1*11

..................

.....................111*1*11

...............

........................111*1*11

............

...........................111*1*11

.........

..............................111*1*11

.......................................111*1*11

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . .

but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . .

but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . .

but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . .

but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . .

but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . . but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

6/30

Seed shape and problems involved

π = 111*1*11

Seed shape (placement of some * symbols within the 1w symbols) is essential !

Two problems frequently involved for the best(s) shape(s):

1 lossless problems [Burkhardt and Karkkainen, 2001]

all the alignment of length m with at most k mismatches are all found

by at least one hit of π

→ π is (m, k)-lossless

Example

111*1*11 is (m = 20, k = 3)-lossless . . . but . . .11*1****11*1 is (m = 19, k = 3)-lossless (better?)

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

7/30

Seed shape and problems involved

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Example

if alignments of lengthm = 20

are generated by a Bernoulli model where

{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of

0.630869

⇒ 111*1*11 is the best seed among all the seeds of weight w = 6

But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

7/30

Seed shape and problems involved

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Example

if alignments of lengthm = 20

are generated by a Bernoulli model where

{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of

0.630869

⇒ 111*1*11 is the best seed among all the seeds of weight w = 6

But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

7/30

Seed shape and problems involved

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Example

if alignments of lengthm = 20

are generated by a Bernoulli model where

{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of

0.630869

⇒ 111*1*11 is the best seed among all the seeds of weight w = 6

But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

7/30

Seed shape and problems involved

2 lossy problems [Ma et al., 2002]if all the possible alignments of length m are generated by a

probabilistic model, compute the probability for π to have at least one hit

Example

if alignments of lengthm = 20

are generated by a Bernoulli model where

{P(1) = 0.7,P(0) = 0.3},then, the probability for 111*1*11 to hit is of

0.630869

⇒ 111*1*11 is the best seed among all the seeds of weight w = 6

But, does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

8/30

Seed shape and problems involved

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

plot_n1_w6_s12_bernoulli_python_maxima.pdf

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

8/30

Seed shape and problems involved

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

plot_n1_w6_s12_bernoulli_python_maxima.pdf

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

8/30

Seed shape and problems involved

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

plot_n1_w6_s12_bernoulli_python_maxima.pdf

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

9/30

Problems involved : some bad news

2 lossy problems

Computing the probability for a given seed is NP-Hard(there exists a probabilistic PTAS based on sampling)

[Ma and Li, 2007]

Probability [Keich et al., 2004, Choi and Zhang, 2004, Buhler et al., 2005,Kucherov et al., 2006, Kong, 2007, Mak and Benson, 2009]

⇒⇒⇒Overlap Complexity & Variance [Ilie and Ilie, 2007, Yang and Zhang, 2008,

Ilie et al., 2011, Morgenstern et al., 2015, Do and Tran-Thi, 2015,Gheraibia et al., 2015, Hahn et al., 2015]

The i.i.d seed optimization problem is at least as hard as optimalGolomb ruler design

[Ma and Yao, 2009]

(Many) Heuristics (hill-climbing [Buhler et al., 2005], quadraticresidues [Egidi and Manzini, 2013]. . . )

1 lossless problems

Non-Detection1 is NP-Complete[Nicolas and Rivals, 2008]

1Given a seed π, integers m, k, does there exist an (m, k) similarity not detected ?Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

10/30

Recent work related to spaced seeds

1 Alignment-free distances[Leimeister et al., 2014, Horwege et al., 2014, Boden et al., 2013]

2 SVM classification[Onodera and Shibuya, 2013, Ghandi et al., 2014]

3 Read clustering[Bao et al., 2011, Chong et al., 2012, Hauser et al., 2013]

4 Metagenomic classification, . . .[Brinda et al., 2015, Ounit and Lonardi, 2015]

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

11/30

“New Uses for Old Things”

little boy

⇒⇒⇒⇒

frying pan

2

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

11/30

“New Uses for Old Things”

little boy

⇒⇒⇒⇒

frying pan

2

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

11/30

“New Uses for Old Things”

little boy

⇒⇒⇒⇒

frying pan

2

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

11/30

“New Uses for Old Things”

little boy

⇒⇒⇒⇒

frying pan

2

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

11/30

“New Uses for Old Things”

little boy

⇒⇒⇒⇒

frying pan

2

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

⇒⇒⇒⇒ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

2http://arch5541.wordpress.com/2012/11/16/and-then-there-was-teflon/

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

12/30

Coverage measure for a seed

DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]

Example

ATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

Coverage is of 15

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

12/30

Coverage measure for a seed

DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]

ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

Coverage is of 15

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

12/30

Coverage measure for a seed

DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]

ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||ATCAGCGCAAATGCTCAAGA

111*1*11

111*1*11

111*1*11

Coverage is of 15

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

12/30

Coverage measure for a seed

DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]

ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||A•T•C•AG•CG•C•AA•A•T•G•C•TC•A•A•G•A

111*1*11

111*1*11

111*1*11

Coverage is of 15

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

12/30

Coverage measure for a seed

DefinitionNumber of matches covered by at least one 1 symbol from any seed hit[Benson and Mak, 2008, Noe and Martin, 2014, Martin and Noe, 2015]

ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||A•T•C•AG•CG•C•AA•A•T•G•C•TC•A•A•G•A

111*1*11

111*1*11

111*1*11

Coverage is of 15

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed

/ a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

......

... 1... 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed

/ a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

......

... 1... 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed

/ a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1

πocc2

......

...

1 1 * 1πocc3

......

...

...... 1

... 1

x = 1 0 1•

1•

1 1•

0 0 1 0 1 1 1 1 1

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed

/ a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1

πocc3

......

......

...

1

...

1

x = 1 0 1•

1•

1 1•

0 0 1 0 1•

1•

1 1•

1

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed

/ a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

...... 1 1 * 1

x = 1 0 1•

1•

1 1•

0 0 1 0 1•

1•

1•

1•

1•

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed / a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

...... 1 1 * 1

x = 1 0 1•

1•

1 1•

0 0 1 0 1•

1•

1•

1•

1•

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed / a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

...... 1 1 * 1

x = 1 0 1•

1•

1 1•

0 0 1 0 1•

1•

1•

1•

1•

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1

x = 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

13/30

Coverage measure for a seed / a set of seeds

alignment : x = 101111001011111

Example

seed : π = 11*1

πocc1 1 1 * 1πocc2

......

... 1 1 * 1πocc3

......

...... 1 1 * 1

x = 1 0 1•

1•

1 1•

0 0 1 0 1•

1•

1•

1•

1•

set of seeds : {π1, π2} = {11*1, 1*1*1}

π2 occ1 1 * 1 * 1π1 occ2

... 1 1 * 1π2 occ3

......

......

... 1 * 1 * 1π1 occ4

......

......

...... 1 1 * 1

π2 occ5

......

......

...... 1 * 1 * 1

π1 occ6

......

......

......

... 1 1 * 1x = 1

•0 1

•1•

1•

1•

0 0 1•

0 1•

1•

1•

1•

1•

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

01

1

10

0

111

0

101

1

110,1010

0

1111

0

10111

01101,10101,11101•• •,• • •,••• •

1(+3)

1110

0

1111,10111•• •,• • •

1(+3)

0

1(+3)

1010,1110• • ,•••

0

1011• •1

01

(+4)

1111,11111••••,•••••

1(+2)

110,110,1110• , • ,• •

0

0

1(+1)

1(+1)

0

0

1(+1)

0

1(+2)

14/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

01

1

10

0

111

0

101

1

110,1010

0

1111

0

10111

01101,10101,11101•• •,• • •,••• •

1(+3)

1110

0

1111,10111•• •,• • •

1(+3)

0

1(+3)

1010,1110• • ,•••

0

1011• •1

01

(+4)

1111,11111••••,•••••

1(+2)

110,110,1110• , • ,• •

0

0

1(+1)

1(+1)

0

0

1(+1)

0

1(+2)

15/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

01

1

10

0

111

0

101

1

110,1010

0

1111

0

10111

01101,10101,11101•• •,• • •,••• •

1(+3)

1110

0

1111,10111•• •,• • •

1(+3)

0

1(+3)

1010,1110• • ,•••

0

1011• •1

01

(+4)

1111,11111••••,•••••

1(+2)

110,110,1110• , • ,• •

0

0

1(+1)

1(+1)

0

0

1(+1)

0

1(+2)

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Coverage measure for a seed / a set of seeds

{π1, π2} = {11*1, 1*1*1}

16/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Problems involved and Semi-rings

2 lossy problems (. . . probability for π to have at least one hit)

Bernoulli model :⊕

= +⊗

= × {w(1) = Pr(1),w(0) = Pr(0)}⇒ Probability semi-ring

1 lossless problems (. . .π find all length m alignments with k errors)

(m, k)-lossless ? :⊕

= min⊗

= + {w(1) = 0,w(0) = 1}⇒ Tropical (cost/score) semi-ring(s)

0 other (own) problems : e.g. Pearson / Spearman correlation. . .

. . . . . . . . . . . . . . . :⊕

= +⊗

= × {w(1) = 1,w(0) = 1}⇒ Counting semi-ring

17/30

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]

18/30

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]

18/30

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots (π1 vs π2 comparison) [Mak and Benson, 2009]

18/30

19/30

Parameter-free models

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

plot_n1_w6_s12_bernoulli_python_maxima.pdf

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

19/30

Parameter-free models

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

20

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3

2

6

0 3

2

6

0 3

2

6

0 3

2

6

0 3

2

6

0 3

2

6

0 3

2 7

2 6

0 3

2 7

2 6

0 3 2

7

2 6

0 3 2

7

2 6

0 3 2

7

2 6

0 3 2

7

6

0 3 2

7

6

0 3 2

7

6

0 3 2

7

6

0 3 2

7

6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

['111111'] 0.077330519411998251862651931755

['111*111'] 0.112233522767708459228220149449

['11*1111'] 0.436105861818672948722472931027

['11*1*111'] 0.988223467835828245511583556919

['11*1111'] 1.0

['111111']

15*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1183*p^8*(1-p)^(20-8) + ... + 20*p^19*(1-p)^(20-19) + 1*p^20

['111*111']

14*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1261*p^8*(1-p)^(20-8) + ... + 20*p^19*(1-p)^(20-19) + 1*p^20

['11*1111']

14*p^6*(1-p)^(20-6) + 196*p^7*(1-p)^(20-7) + 1261*p^8*(1-p)^(20-8) + ...

... ...

... ...

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

19/30

Parameter-free models

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = 20 ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

plot_n1_w6_s12_bernoulli_python_maxima.pdf

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots [Mak and Benson, 2009]

2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]

20/30

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots [Mak and Benson, 2009]

2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]

20/30

21/30

Parameter-free models

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

21/30

Parameter-free models

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

21/30

Parameter-free models

Does it make sense to set {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?0

.0

0.2

0.4

0.6

0.8 1

.0

match probability

10

20

30

40

50

60

alig

nm

ent

length

0 1 2

0 1 3 2

0 1 3 4

0 1 3 4

0 1 3

0 1 3

0 1 3 2 3

0 1 3 2 3 5

0 1 3 2 3

0 1 3 2

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6 3

0 1 3 2 6

0 3 2 6

0 3 2

0 3 2 3

0 3 2 6 3

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 2 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7 6

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 3 2 7 8

0 = 111111 1 = 1111* 111 2 = 11* 1* 111 3 = 11* 1111 4 = 11* 1* * 111 5 = 1* 11* * * * 1* 11 6 = 1* 11* 111 7 = 11* ** 1* 1* 11 8 = 1* * 1* 1* 111

1 - Pr(π) → β × λ^n

π = 111111 λ = 0.95284 β = 0.54942π = 1*1*1*1*1*1 λ = 0.95284 β = 0.30186π = 111*1*11 λ = 0.93306 β = 0.47946π = 11**1*1*11 λ = 0.92295 β = 0.30998π = 11*1****11*1 λ = 0.93218 β = 0.26931

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots [Mak and Benson, 2009]

2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]

3 . . . {P(1) = p,P(0) = 1− p} and m = n or ∞ ???

22/30

Parameter-free models

Does it make sense to set . . .

0 . . . {P(1) = 0.7,P(0) = 0.3} and m = 20 ?

1 . . . {P(1) = p,P(0) = 1− p} and m = 20 ?p is a free variable, but it would be convenient to remove this dependency

⇒ p-polynomials roots [Mak and Benson, 2009]

2 . . . {P(1) = 0.7,P(0) = 0.3} and m = n or ∞ ?∞ : [Choi and Zhang, 2004, Buhler et al., 2005, Zhang, 2007]

3 . . . {P(1) = p,P(0) = 1− p} and m = n or ∞ ???

22/30

23/30

Parameter-free models [your work here]

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?

π = 11*1q1start q2 q3

q4

q5

q6

0

1

0

1

0

1

01

01

0,1

2 2 4 3 2

seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)

5 4 3 2 5 4 3

+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)

Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )

2 3 4

((p - p) z - 1) p z

q6(z) := ----------------------------------------------------------------------------------

5 4 3 5 4 3 2 4 2 2

(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)

1

q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----

1 - z

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

23/30

Parameter-free models [your work here]

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?

π = 11*1q1start q2 q3

q4

q5

q6

0

1

0

1

0

1

01

01

0,1

2 2 4 3 2

seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)

5 4 3 2 5 4 3

+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)

Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )

2 3 4

((p - p) z - 1) p z

q6(z) := ----------------------------------------------------------------------------------

5 4 3 5 4 3 2 4 2 2

(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)

1

q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----

1 - z

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

23/30

Parameter-free models [your work here]

Does it make sense to set {P(1) = p,P(0) = 1− p} and m = n or ∞ ?

π = 11*1q1start q2 q3

q4

q5

q6

0

1

0

1

0

1

01

01

0,1

2 2 4 3 2

seed(x) = (2 - p) seed(x - 1) - (p - 2 p + 1) seed(x - 2) + (p - p) seed(x - 3) + (p - 2 p + p ) seed(x - 4)

5 4 3 2 5 4 3

+ (p - 3 p + 3 p - p ) seed(x - 5) - (p - 2 p + p ) seed(x - 6)

Chomsky & Schutzenberger . . . (en fait . . . surtout via maple . . . )

2 3 4

((p - p) z - 1) p z

q6(z) := ----------------------------------------------------------------------------------

5 4 3 5 4 3 2 4 2 2

(1 - z) ((p - 2 p + p ) z + (p - 2 p + p ) z + (-p + p) z + (1 - p) z - 1)

1

q1(z) + q2(z) + q3(z) + q4(z) + q5(z) + q6(z) = -----

1 - z

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

24/30

...

Collaborators : Donald E. K. Martin,Martin C. Frith, Gregory Kucherov, MikhailA. Roytberg & Eugenia Furletova, MartaGırdea, Slawomir Lasota & Anna Gambin& Ewa Szczurek, Yann Ponty :-)

a

ahttps://c1.staticflickr.com/7/6169/6203268626_1c1fa2ff7a_b.jpg

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

25/30

References I

Bao, E., Jiang, T., Kaloshian, I., and Girke, T. (2011).

SEED: efficient clustering of next-generation sequences.Bioinformatics, 27(18):2502–2509.

Benson, G. and Mak, D. Y. (2008).

Exact distribution of a spaced seed statistic for DNA homology detection.In Amir, A., Turpin, A., and Moffat, A., editors, Proceedings of the 15th InternationalSymposium on String Processing and Information Retrieval (SPIRE), Melbourne (Australia),volume 5280 of Lecture Notes in Computer Science, pages 282–293. Springer.

Boden, M., Schoneich, M., Horwege, S., Lindner, S., Leimeister, C., and Morgenstern, B.

(2013).Alignment-free sequence comparison with spaced k-mers.In Proceedings of the German Conference on Bioinformatics (GCB), volume 34 ofOpenAccess Series in Informatics (OASIcs), pages 24–34.

Brinda, K., Sykulski, M., and Kucherov, G. (2015).

Spaced seeds improve metagenomic classification.Bioinformatics, 31(22):3584–3592.

Buhler, J., Keich, U., and Sun, Y. (2005).

Designing seeds for similarity search in genomic DNA.Journal of Computer and System Sciences, 70(3):342–363.(earlier version in RECOMB 2003).

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

26/30

References II

Burkhardt, S. and Karkkainen, J. (2001).

Better filtering with gapped q-grams.In Proceedings of the 12th Symposium on Combinatorial Pattern Matching (CPM), volume2089 of Lecture Notes in Computer Science, pages 73–85. Springer.

Choi, K. P. and Zhang, L. (2004).

Sensitivity analysis and efficient method for identifying optimal spaced seeds.Journal of Computer and System Sciences, 68(1):22–40.

Chong, Z., Ruan, J., and Wu, C.-I. (2012).

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.Bioinformatics, 28(21):2732–2737.

Do, P.-T. and Tran-Thi, C.-G. (2015).

An improvement of the overlap complexity in the spaced seed searching problem betweengenomic DNAs.In Proceedings of the 2nd National Foundation for Science and Technology DevelopmentConference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam,pages 271–276.

Egidi, L. and Manzini, G. (2013).

Better spaced seeds using quadratic residues.Journal of Computer and System Sciences, 79(7):1144–1155.

Ghandi, M., Lee, D., Mohammad-Noori, M., and Beer, M. A. (2014).

Enhanced regulatory sequence prediction using gapped k-mer features.PLoS Computational Biology, 10(7):e1003711.

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

27/30

References III

Gheraibia, Y., Moussaoui, A., Djenouri, Y., Kabir, S., Yin, P.-Y., and Mazouzi, S. (2015).

Penguin search optimisation algorithm for finding optimal spaced seeds.International Journal of Software Science and Computational Intelligence (IJSSCI),7(2):85–99.

Hahn, L., Leimeister, C.-A., and Morgenstern, B. (2015).

RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-freesequence comparison.ARXiV.

Hauser, M., Mayer, C. E., and Soding, J. (2013).

kClust: fast and sensitive clustering of large protein sequence databases.BMC Bioinformatics, 14(248).

Horwege, S., Lindner, S., Boden, M., Hatje, K., Kollmar, M., Leimeister, C.-A., and

Morgenstern, B. (2014).Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact wordmatches.Nucleic Acids Research, 42(W1):W7–W11.

Ilie, L. and Ilie, S. (2007).

Multiple spaced seeds for homology search.Bioinformatics, 23(22):2969–2977.

Ilie, L., Ilie, S., and Mansouri Bigvand, A. (2011).

SpEED: fast computation of sensitive spaced seeds.Bioinformatics, 27(17):2433–2434.

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

28/30

References IV

Keich, U., Li, M., Ma, B., and Tromp, J. (2004).

On spaced seeds for similarity search.Discrete Applied Mathematics, 138(3):253–263.(earlier version in 2002).

Kong, Y. (2007).

Generalized correlation functions and their applications in selection of optimal multiplespaced seeds for homology search.Journal of Computational Biology, 14(2):238–254.

Kucherov, G., Noe, L., and Roytberg, M. A. (2006).

A unifying framework for seed sensitivity and its application to subset seeds.Journal of Bioinformatics and Computational Biology, 4(2):553–569.

Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., and Morgenstern, B. (2014).

Fast alignment-free sequence comparison using spaced-word frequencies.Bioinformatics, 30(14):1991–1999.

Ma, B. and Li, M. (2007).

On the complexity of spaced seeds.Journal of Computer and System Sciences, 73(7):1024–1034.

Ma, B., Tromp, J., and Li, M. (2002).

PatternHunter: Faster and more sensitive homology search.Bioinformatics, 18(3):440–445.

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

29/30

References V

Ma, B. and Yao, H. (2009).

Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design.Information Processing Letters, 109(19):1120–1124.(earlier version in APBC 2008).

Mak, D. Y. and Benson, G. (2009).

All hits all the time: parameter free calculation of seed sensitivity.Bioinformatics, 25(3):302–308.(earlier version in APBC 2007).

Martin, D. E. K. and Noe, L. (2015).

Faster exact distributions of pattern statistics through sequential elimination of states.Annals of the Institute of Statistical Mathematics, pages 1–18.

Morgenstern, B., Zhu, B., Horwege, S., and Leimeister, C.-A. (2015).

Estimating evolutionary distances between genomic sequences from spaced-word matches.Algorithms for Molecular Biology, 10(5).

Nicolas, F. and Rivals, E. (2008).

Hardness of optimal spaced seed design.Journal of Computer and System Sciences, 74(5):831–849.(earlier version in CPM 2005).

Noe, L. and Martin, D. E. K. (2014).

A coverage criterion for spaced seeds and its applications to support vector machine stringkernels and k-mer distances.Journal of Computational Biology, 21(12):947–963.

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives

30/30

References VI

Onodera, T. and Shibuya, T. (2013).

The gapped spectrum kernel for support vector machines.In Proceedings of the International Conference on Machine Learning and Data Mining inPattern Recognition (MLDM), volume 7988 of Lecture Notes in Computer Science, pages1–15. Springer.

Ounit, R. and Lonardi, S. (2015).

Higher classification accuracy of short metagenomic reads by discriminative spaced k-mers.In Proceedings of the 15th International Workshop on Algorithms in Bioinformatics (WABI),Atlanta, (USA), volume 9289 of Lecture Notes in Bioinformatics, pages 286–295. Springer.

Yang, J. and Zhang, L. (2008).

Run probability of high-order seed patterns and its applications to finding good transitionseeds.In Brazma, A., Miyano, S., and Akutsu, T., editors, Proceedings of the 6th Asia PacificBioinformatics Conference (APBC), 14-17 January 2008, Kyoto, Japan, volume 6 ofAdvances in Bioinformatics and Computational Biology, pages 123–132. Imperial CollegePress.

Zhang, L. (2007).

Superiority of spaced seeds for homology search.IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB),4(3):496–505.

Laurent Noe Graines espacees : un rapide tour d’horizon et quelques perspectives