Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi...

77
Flipping letters to minimize Flipping letters to minimize the support of a string the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine

Transcript of Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi...

Page 1: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Flipping letters to Flipping letters to minimize minimize

the support of a stringthe support of a stringGiuseppe Lancia, Franca Rinaldi, Romeo Rizzi

University of Udine

Page 2: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Outline of talk:

1. Problem definition2. Parametrized complexity3. Polynomial cases4. NP-hardness5. ILP formulations

Page 3: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

1. Problem definition1. Problem definition

Page 4: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 5: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010 }

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 6: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100 }

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 7: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001}

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 8: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001}

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 9: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001}

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 10: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001}

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 11: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001, 011}

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 12: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001, 011} | K(s) | = 4

We are given a string s and a parameter k (e.g., k = 3)

The string has a set of k-mers, its support, K(s)

010010011

Page 13: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001, 011} | K(s) | = 4

By flipping some bits, we could reduce the number of k-mers

010010011

We are given a string s and a parameter k (e.g., k = 3)

Page 14: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001, 011} | K(s) | = 4

By flipping some bits, we could reduce the number of k-mers

010010011010010010

S’=

We are given a string s and a parameter k (e.g., k = 3)

Page 15: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

K(s) = { 010, 100, 001, 011} | K(s) | = 4

By flipping some bits, we could reduce the number of k-mers

010010011010010010

S’=

K(s’) = { 010, 100, 001} | K(s’) | = 3

We are given a string s and a parameter k (e.g., k = 3)

Page 16: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The Problem :The Problem :

- A string s over an alphabet

IngredienIngredients:ts:

Page 17: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The Problem :The Problem :

- A string s over an alphabet

- A parameter k (k-mer size)

IngredienIngredients:ts:

Page 18: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The Problem :The Problem :

- A string s over an alphabet

- A parameter k (k-mer size)

- A budget B

IngredienIngredients:ts:

Page 19: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The Problem :The Problem :

- A string s over an alphabet

- A parameter k (k-mer size)

- A budget B

IngredienIngredients:ts:

Change at most B letters in s so as resulting s’ has as few distinct k-mers as possible

ObjectiveObjective::

Page 20: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The Problem :The Problem :

- A string s over an alphabet

- A parameter k (k-mer size)

- A budget B

IngredienIngredients:ts:

Find a string s’ with d(s,s’) <= B with the smallest number of kmers

ObjectiveObjective::

s

s’

Page 21: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Motivation :Motivation :

Curiosity-driven (it’s a cute combinatorial problem)Real:Real:

Page 22: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Motivation :Motivation :

Curiosity-driven (it’s a cute combinatorial problem)Real:Real:

Analysis of DNA sequencesFictious:Fictious:

atcgattgatccttta

atc, tcg, cga, gat, ….

3-mers are aminoacid codons. Protein complexity relates to # of codons. Mutations may reduce complexity….

Page 23: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Our results:Our results:

The problem has many parameters (|s|, ||, k, B), we study all versions (when possibly some of the parameters are bounded)

- Polynomial special cases (e.g. for B fixed or both k,|| fixed)

- NP-hard special cases (even k=2 or ||=2)

Page 24: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

2. Parametrized 2. Parametrized complexitycomplexity

Page 25: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

s NO B NO

s YES B NO

s NO B YES

s YES B YES

NO k NO

YES k NO

NO k YES

YES k YES

Page 26: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

s NO B NO

s YES B NO

s NO B YES

s YES B YES

We can assume : k <= |s|

NO k NO

YES k NO

NO k YES

YES k YES

Page 27: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

s NO B NO

s YES B NO

s NO B YES

s YES B YES

We can assume : k <= |s|

NO k NO

YES k NO

NO k YES

YES k YES

Page 28: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

s NO B NO

s YES B NO

s NO B YES

s YES B YES

We can assume : B <= |s|

NO k NO

YES k NO

NO k YES

YES k YES

Page 29: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES

We can assume : || <= |s| (we don’t need any symbol not already in s)

Page 30: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES

Page 31: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES

Polynomial cases

)1(O

)|(| )1(2 BsO

|)(| sO

)||( )1(2 BskO

Page 32: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES

NP-hard cases

)1(O

)|(| )1(2 BsO

|)(| sO

)||( )1(2 BskO

NP-hard NP-hardfor ||=2

NP-hardfor k=2

Page 33: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

3. Polynomial cases3. Polynomial cases

Page 34: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

Page 35: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES )1(O

)|(| )1(2 BsO

|)(| sO

)||( )1(2 BskO

NP-hard NP-hardfor ||=2

NP-hardfor k=2

Page 36: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 37: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 38: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 39: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 40: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 41: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 42: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 43: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

0

0

0

0

0

3

2

2

0

1

1

0

0

1

1

0

0

1

1

0

0

0

1

1

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 44: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

0

0

0

0

0

3

2

2

0

1

1

0

0

1

1

0

0

1

1

0

0

0

1

1

Each path corresponds to a string s’ with all its kmers in A

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 45: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

0

0

0

0

0

3

2

2

0

1

1

0

0

1

1

0

0

1

1

0

0

0

1

1

The length of the path is the Hamming distance d(s’, s)

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

Page 46: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

s = 01000100101110A = 0100, 1001, 0010 , 0001 B = 3

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

……

……

……

0100

1001

0010

0001

0100

1001

0010

0001

0100

1001

0010

0001

0100 0 1 ….. 1 1 0

0

0

0

0

0

3

2

2

0

1

1

0

0

1

1

0

0

1

1

0

0

0

1

1

SUB(A) has a solution iff the shortest path is <= B

Page 47: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

- we can solve SUB(A) in polytime (O|A||||s|) = O(|s|) sincekA 2||

Page 48: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case |The case || and k fixed:| and k fixed:

We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

- we can solve SUB(A) in polytime (O|A||||s|) = O(|s|) since

- There are “only” possible subsets A to try…

)1(2 || Ok

problem is solved in polytime O(|s|)

kA 2||

Page 49: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case of B fixed:The case of B fixed:

Page 50: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case of B fixed:The case of B fixed:

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES )1(O

)|(| )1(2 BsO

|)(| sO

)||( )1(2 BskO

NP-hard NP-hardfor ||=2

NP-hardfor k=2

Page 51: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

The case of B fixed:The case of B fixed:

For B fixed, we can try all possible solutions. There are

)|(||| BsO

B

s

possible choices of bits to flip. We can try them all, and count the # of k-mers.

Since ||<=|s|, the way to flip them is bounded by )|(| ksO

Page 52: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

4. NP-hardness4. NP-hardness

Page 53: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

Page 54: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

NO k NO

YES k NO

NO k YES

YES k YES

s NO B NO

s YES B NO

s NO B YES

s YES B YES )1(O

)|(| )1(2 BsO

|)(| sO

)||( )1(2 BskO

NP-hard NP-hardfor ||=2

NP-hardfor k=2

Page 55: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

- Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS)

INSTANCE: a bipartite graph G=(U,V;E). Integers n, m

PROBLEM: does there exist a set VUX such that

(i)

(ii)

nX ||

mXGE |])[(|

Page 56: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

- Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS)

INSTANCE: a bipartite graph G=(U,V;E). Integers n, m

PROBLEM: does there exist a set such that

(i)

(ii)

nX ||

mXGE |])[(|

n = 6, m = 5

VUX

Page 57: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

- Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS)

INSTANCE: a bipartite graph G=(U,V;E). Integers n, m

PROBLEM: does there exist a set such that

(i)

(ii)

nX ||

mXGE |])[(|

n = 6, m = 5

VUX

Page 58: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

- Theorem: the problem is NP-hard even for k=2.

- Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS)

INSTANCE: a bipartite graph G=(U,V;E). Integers n, m

PROBLEM: does there exist a set such that

(i)

(ii)

nX ||

mXGE |])[(|

n = 6, m = 5

( note that CBS is NP-hard because itincludes MAX BALANCED BIP GRAPH,for n = 2t, m = t^2 )

VUX

Page 59: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Page 60: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Let = {a,b,c,d,e,f,g,h,i,l} U { , } B = |E| - m

Page 61: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Let = {a,b,c,d,e,f,g,h,i,l} U { , } B = |E| - m

IDEA: Encode an edge (i,j) as …ij…

and make all k-mers x, x and unavoidable(i.e., insert a LOT of each of them in s)

Page 62: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Let = {a,b,c,d,e,f,g,h,i,l} U { , } B = |E| - m

The only kmers that can be destroyed are of the form x or x, and this is achieved by “flippling” the into a . This corresponds to removing theedge.

ij.. ...ij

IDEA: Encode an edge (i,j) as …ij…

and make all k-mers x, x and unavoidable(i.e., insert a LOT of each of them in s)

Page 63: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Let = {a,b,c,d,e,f,g,h,i,l} U { , } B = |E| - m

The set of kmers of type x or xwhich remain define the set X whichcovers at least m edges

IDEA: Encode an edge (i,j) as …ij…

and make all k-mers x, x and unavoidable(i.e., insert a LOT of each of them in s)

The only kmers that can be destroyed are of the form x or x, and this is achieved by “flippling” the into a . This corresponds to removing theedge.

ij.. ...ij

Page 64: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

CBS: does there exist VUX such that nX || mXGE |])[(|

The reduction:The reduction:and

?

a

b

c

d

e

f

g

h

i

l

Let = {a,b,c,d,e,f,g,h,i,l} U { , } B = |E| - m

IDEA: Encode an edge (i,j) as …ij…

and make all k-mers x, x and unavoidable(i.e., insert a LOT of each of them in s)

S =aa. . .aabb. . .bb. . .lll. . .lagbgei

Page 65: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

-With similar reductions, we can also prove the

Theorem: the problem is NP-hard even for || = 2

Page 66: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

5. Integer Linear 5. Integer Linear ProgrammingProgrammingformulationsformulations

Page 67: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Let K be the set of all possible kmers

hx

iz

Define a 0/1 variable in K

and a 0/1 variable for each position

x

Exponential-sizeExponential-size formulation (formulation (={0,1})={0,1})

K={ 000, 001, 010, 011, 100, 101, 110, 111 }

||,...,1 si

Page 68: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

x min

Exponential-sizeExponential-size formulation (formulation (={0,1})={0,1})

Page 69: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

x min

||

1

s

ii Bz

Exponential-sizeExponential-size formulation (formulation (={0,1})={0,1})

Page 70: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

)1()1(),(),(

kzzxrEQLi

irDIFFi

i

x min

||

1

s

ii Bz

1||,...,1 ksr

Exponential-sizeExponential-size formulation (formulation (={0,1})={0,1})

Page 71: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Exponential n. of variables/constraints

pricing & separation problems

Our P&S strategy is exponential in the general case (polynomial for k fixed)

Page 72: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

}0:{ * xA

Exponential n. of variables/constraints

pricing & separation problems

Our P&S strategy is exponential in the general case (polynomial for k fixed)

Fractional solutions may lead to effective heuristics (e.g., solving SUB(A) with

Page 73: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Polynomial-sizePolynomial-size formulation (formulation (={0,1})={0,1})

Page 74: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

ijw

Polynomial-sizePolynomial-size formulation (formulation (={0,1})={0,1})

In addition to variables ,

there are variables for positions i and j saying

“is the kmer starting at i identical to the kmer starting at j ? “

iz

The variables w depend on s and z via linear constraints

Page 75: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

hx

ijw

1iu

To count only kmers that have no identical kmer following them:

if all kmers starting after i are different from kmer at i

Polynomial-sizePolynomial-size formulation (formulation (={0,1})={0,1})

In addition to variables ,

there are variables for positions i and j saying

“is the kmer starting at i identical to the kmer starting at j ? “

iz

The variables w depend on s and z via linear constraints

Page 76: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Polynomial-sizePolynomial-size formulation (formulation (={0,1})={0,1})

-The formulation (not show) is basically a big boolean formula

-It yields a poor bound compared to the exponential formulation

-It can be improved in many ways (expecially via valid cuts)

-At this stage it’s not clear which method is best for not too small instances

-We’ll run experiments with variants of both formulations

Page 77: Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

<EOT><EOT>