Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya...

22
Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin Milanič, University of Primorska

description

Questions ?? ??

Transcript of Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya...

Page 1: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Readability

Combinatorial Pattern Matching (CPM)June 29, 2015

Rayan Chikhi, CNRS LilleSofya Raskhodnikova, Penn State

Paul Medvedev, Penn StateMartin Milanič, University of Primorska

Page 2: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Overlap Digraph (definition)• A string overlaps a string if there is a suffix of that is equal to a prefix of . • They overlap properly if, in addition, the suffix and prefix are both proper.• The overlap digraph of a set of strings is a digraph where each string is a

vertex and there is an edge if and only if properly overlaps .• Various variants of overlap graphs used in bioinformatics applications

ACGTA GTAAC

CCCCTGGACT

Page 3: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Questions• Do overlap digraphs have any properties or structure that can be

exploited?– Given a graph, Braga and Meidanis (2002) showed how to label the

vertices so that the graph is an overlap graph

• How does the set of graphs generated depend on the string length?– BM labeling used strings of length – Limiting the string length limits the graphs that can be generated

? ?

??

Page 4: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Readability in the digraph model• A labeling is an assignment of strings to vertices• Let be a directed graph.• An overlap labeling is a labeling such that is an edge if and only if the

string of x properly overlaps the string of y.• The readability of a digraph D, denoted , is the smallest nonnegative

integer such that there exists an injective overlap labeling of with strings of length .

ACGTA GTAAC

CCCCTGGACT2<𝑟 (𝐺 )≤5

Page 5: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Readability in the bipartite graph model

• Let be a bipartite graph.• An overlap labeling is a labeling such that is an edge if and only if the

string of x properly overlaps the string of y.• The readability of a bipartite graph , denoted r(G), is the smallest

nonnegative integer r such that there exists an injective overlap labeling of G with strings of length r.

• Thm: There exists a bijection such that for all – = set of bipartite graphs with nodes in each part– = set of all digraphs with nodes .

ACA

CAC

AGA

CAT

Page 6: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Examples• Complete bipartite graph on vertices ()

• Even cycle on vertices ()

41

12

12

23

23

34

34

41

Page 7: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Is there a simple and useful string-free formulation of readability?

Page 8: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

𝑢1𝑢2

𝑣1𝑣2𝑒3

𝑒2𝑒1

P4-rule and P4 Lemma• A decomposition of size k is a weight function • Given an overlap labeling , the -decomposition is a decomposition

assigning each edge the length of the minimum overlap between and .• P4 Lemma: If is an overlap labeling, then the -decomposition satisfies the

following (called the P4-rule):– For every induced , if middle edge has the maximum weight, then

ℓ (𝑢¿¿1)¿

ℓ (𝑣¿¿2)¿𝑤(𝑒¿¿2)¿

ℓ (𝑣¿¿1)¿𝑤(𝑒¿¿1)¿

ℓ (𝑢¿¿2)¿ 𝑤(𝑒¿¿3)¿

Page 9: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Trees• Given a decomposition , we say that labeling achieves if it is an overlap

labeling and is the -decomposition. • Let be a tree. Theorem:

• P4 Lemma implies • Claim: if satisfies the P4-rule, then there exists a labeling achieving • Order edges by non-decreasing weight, and def• Inductively construct labeling for . Let

– Note that , because of -rule and is -free– Relabel and with – where A has length and is composed of new, non-repeating characters

𝑢𝑤(𝑢 ,𝑣) 𝑣ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢) ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢)

|ℓ 𝑗 (𝑢)| |ℓ 𝑗 (𝑣 )|

Page 10: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Proof of claim (key idea)

Case

𝑢𝑤(𝑢 ,𝑣)𝑣ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢) ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢)

𝑢 ′

Case

𝑢𝑤(𝑢 ,𝑣)𝑣ℓ 𝑗 (𝑣 ) ℓ 𝑗 (𝑣 )ℓ 𝑗 (𝑢)

𝑢 ′

ℓ 𝑗 (𝑢)

ℓ 𝑗 (𝑢′ )

ℓ 𝑗 (𝑢 ′)

Page 11: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

For cycles, theorem not true

2

4

2 3

1

2

3

Page 12: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

-free bipartite graphs• The strict -rule is

– For every induced , if middle edge has the maximum weight, then

• Theorem: For a -free bipartite graph • For graphs with , theorem not true

4

2

3 3

1

1

1

Page 13: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

General bipartite graphs• Let be the subgraph of including only edges with weight .• Define as the size of the smallest decomposition satisfying the HUB-rule:

for all – bicliques: is a disjoint union of bicliques– hierarchical: If and have the same neighborhoods in , then they have

the same neighborhoods in for . • Thm:

𝑢1𝑢2

𝑣1𝑣2𝑖

𝑖𝑖𝑖

𝑢1𝑢2

𝑣1𝑖𝑖

ℓ (𝑢¿¿1)¿

ℓ (𝑣¿¿1)¿𝑖ℓ (𝑢¿¿2)¿

Page 14: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

How large can readability be?

• Theorem: Almost all graphs have readability – via counting argument

Page 15: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Distinctness• Distinctness of two vertices in the same bipartition is the number of vertices

in one neighborhood and not the other (taking the max of the two values)• Distinctness of is the minimum distinctness over all pairs• Thm:

– Consider the decomposition of an optimal labeling– Case 1: every is a matching

• Adding a matching can increase the distinctness by at most one

– Case 2: Let be the last one that is not a matching• Using the fact that the decomposition satisfies the HUB-rule

𝑢1

𝑢2

𝑗𝑗

¿ 𝑗¿ 𝑗 ≥𝐷(𝐺)

Page 16: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Hadamard Graphs• bipartite graph

– vertices assigned -long binary codewords– edge if the inner-product of the codewords is odd

𝐻300

01

10

11

00

01

10

11

𝐻2

• Theorem:

Page 17: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Trees

1

12

2

2 3

3

33

𝑤1

𝑤1

𝑤2>𝑤1

• Thm: – For all trees , – For full k-ary tree of height k,

• Assume fsoc there exists an opt decomp of size • A path from root to leaf with distinct edge weights,

with values, with edges

𝑘=3

𝑤3>𝑤2

𝑤2

𝑤2𝑤3

𝑤3

Page 18: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

ConclusionsResults• A string-free formulation of readability that is

– exactly equivalent for trees– asymptotically equivalent for -free bipartite graphs– “weakly” equivalent for general graphs

• Existence of a graph family with readability of

Open problems• Find other rules that an -decomposition must satisfy to close the gap : • Let

– We know – Do there exists graphs with ?

• Complexity• Understand graphs that have poly-logarithmic readability

Page 19: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

The end

Combinatorial Pattern Matching (CPM)June 29, 2015

Rayan Chikhi, CNRS LilleSofya Raskhodnikova, Penn State

Paul Medvedev, Penn StateMartin Milanič, University of Primorska

Page 20: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

General graphs• Define for as the subgraph of including only edges with weight at most .• Lem: An -decomposition satisfies the following (HUB-rule), for all

– is a disjoint union of bicliques– If and have the same neighborhoods in , then they have the same

neighborhoods in for .

𝑢1𝑢2

𝑣1𝑣2𝑖

𝑖𝑖𝑖

𝑢1𝑢2

𝑣1𝑖𝑖

ℓ (𝑢¿¿1)¿

ℓ (𝑣¿¿1)¿𝑖ℓ (𝑢¿¿2)¿

• Define as the size of the smallest decomposition satisfying the HUB-rule.• Thm:

Page 21: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Questions/Results• Do there exists graphs with readability

Page 22: Readability Combinatorial Pattern Matching (CPM) June 29, 2015 Rayan Chikhi, CNRS Lille Sofya Raskhodnikova, Penn State Paul Medvedev, Penn State Martin.

Almost all graphs have readability

• Counting argument– There are bipartite graphs with vertices.– There are at most labellings of length