Simple Algorithm for Sorting the Fibonacci String Rotations
description
Transcript of Simple Algorithm for Sorting the Fibonacci String Rotations
Simple Algorithm for Sorting theFibonacci String Rotations
Manolis ChristodoulakisManolis Christodoulakis King’s College London
Joint work with Costas S. IliopoulosYoan José Pinzón Ardila
SOFSEM 2006 2
Our GoalOur Goal
What makes Fibonacci strings a best case input for the Burrows-Wheeler Transform (BWT)?
Relationship between different rotations of a Fibonacci string
What is their lexicographic order? Side effect: we can deduce the symbol
stored at any position of any Fibonacci string in constant time (without using , provided that the fn values are known)
SOFSEM 2006 3
Fibonacci Strings & NumbersFibonacci Strings & Numbers
The n-th Fibonacci stringFn = Fn-1Fn-2 n≥2 F0=b, F1=a
The n-th Fibonacci numberfn = fn-1+fn-2 n≥2 f0=1, f1=1
F2 a= b
F3 a= b a
F4 a= b a a b
F1 a=
F0 b= f0 1=
f1 1=f2 2=f3 3=f4 5=
SOFSEM 2006 4
NotationNotation
The i-th rotation of a string
where i is taken modulo n.
rank(i,x) = the rank of Ri(x) rot(ρ,x) = the rotation whose rank is ρ
0 1 … i-1 i …n-1x =
0 1 … i-1 i …n-1Ri(x)
=
SOFSEM 2006 5
Burrows-Wheeler Transform (BWT)Burrows-Wheeler Transform (BWT)
M.Burrows and D.J.Wheeler. 1994 Purpose: to make a string more
compressible BWT Algorithm:
1. Create list of all rotations2. Sort them3. Output last symbol of every rotation4. Output the rank of the 0-th rotation
SOFSEM 2006 6
BWT on Fibonacci StringsBWT on Fibonacci Strings
F5 = abaababa, f5 = 8
R0(F5) a= b a a b a b aR1(F5) b= a a b a b a aR2(F5) a= a b a b a a bR3(F5) a= b a b a a b aR4(F5) b= a b a a b a aR5(F5) a= b a a b a a bR6(F5) b= a a b a a b aR7(F5) a= a b a a b a b
R0(F5) a= b a a b a b a
R1(F5) b= a a b a b a a
R2(F5) a= a b a b a a b
R3(F5) a= b a b a a b a
R4(F5) b= a b a a b a a
R5(F5) a= b a a b a a b
R6(F5) b= a a b a a b a
R7(F5) a= a b a a b a b
SOFSEM 2006 7
Properties of Fibonacci StringsProperties of Fibonacci Strings
The number of ‘b’ in Fn is fn-2
Proof: By induction.
C.S.Iliopoulos, D.W.Moore and W.F.Smyth. 1997Fn = Fn-2Fn-3…F1un, un = ba (n odd)
un = ab (n even)
Let’s call this the IMS formula.
SOFSEM 2006 8
Similarities in RotationsSimilarities in Rotations
R0(Fn) differs from Rfn-2(Fn) in 2 symbols Proof:
R0(Fn) = Fn-2Fn-3…F1un
Rfn-2(Fn) = Fn-3…F1unFn-2 (1)
R0(Fn) = Fn-1Fn-2
= Fn-3…F1un-1Fn-2 (2) Ri(Fn) differs from Ri+fn-2(Fn) in 2 symbols Proof:
Ri(Fn) = Ri(R0(Fn))
Ri+fn-2(Fn) = Ri(Rfn-2(Fn))
SOFSEM 2006 9
Relative Order of RotationsRelative Order of Rotations
Ri(Fn) < Ri+fn-2(Fn) for n odd, i fn-1-1 Proof:
R0(Fn) = Fn-3…F1un-1Fn-2
Rfn-2(Fn) = Fn-3…F1un Fn-2
For i=fn-1-1:
Ri(Fn) = bFn-2Fn-3…F1a
Ri+fn-2(Fn)= aFn-2Fn-3…F1b
Similarly, Ri(Fn) > Ri+fn-2(Fn) for n even, i fn-1-1
= Fn-3 … F1 ab Fn-2
= Fn-3 … F1 ba Fn-2
SOFSEM 2006 10
Sorted List of RotationsSorted List of Rotations
We proved (n odd):Ri(Fn) < Ri+fn-2(Fn) i fn-1-1 (3)
We will now prove that there is no j s.t.Ri(Fn) < Rj(Fn) < Ri+fn-2(Fn)
Proof: (constructive)Start at i=fn-1 and construct the partial list
Ri Ri+fn-2 Ri+2fn-2 Ri+3fn-2 … Ri+kfn-2 …
for as long asi+kfn-2 fn-1-1 (mod fn) kfn-1
I.e. the list is complete!
SOFSEM 2006 11
Identify Rotation Identify Rotation (i)(i) by Rank by Rank ((ρρ))
Therefore, for n odd:rot(ρ,Fn) = fn-1
= (ρfn-2-1) mod fn
Similarly, for n even, the sorted list is constructed bottom-up giving
rot(ρ,Fn) = (-(ρ+1)fn-2-1) mod fn
+ρfn-2) mod fn(
SOFSEM 2006 12
Identify Rank Identify Rank ((ρρ)) of a Rotation of a Rotation (i)(i)
This is simply the inverse of the previous function
n oddrank(i,Fn) = ((i+1)fn-2) mod fn
n evenrank(i,Fn) = ((i+1)fn-2-1) mod fn
SOFSEM 2006 13
Symbols of Fibonacci StringsSymbols of Fibonacci Strings
Fn[i] = ? Observe that
Fn[i] = Ri(Fn)[0]
In the sorted list of rotations, the first fn-1 rotations start with ‘a’, the rest with ‘b’
Thus Fn[i] can be deduced from rank(i,Fn)
If rank(i,Fn) ≤ fn-1 then Fn[i]=a else b.
SOFSEM 2006 14
BWT & Fibonacci ― The Quick WayBWT & Fibonacci ― The Quick Way
The first fn-2 symbols of BWT are ‘b’ Proof: (n odd)
We proved the first fn-2 rotations have index
(ρ·fn-2-1)modfn for 0 ≤ ρ < fn-2
The last symbol of these rotations isFn[ (ρ·fn-2-1 )modfn ]
Which for 0 ≤ ρ < fn-2 is ‘b’
The next fn-1 symbols of BWT are ‘a’ Proof: Consequence of previous lemma
+fn-1