Lower Bounds for Read / Write Streams
description
Transcript of Lower Bounds for Read / Write Streams
![Page 1: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/1.jpg)
Lower Bounds for Read/Write Streams
Paul Beame
Joint work with Trinh Huynh (Dang-Trinh Huynh-
Ngoc)
University of Washington
![Page 2: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/2.jpg)
Data stream Algorithms
• Many huge successes– No need to remind people at this workshop!
• Some problems provably hard
– E.g. Frequency moments Fk, k > 2 require space Ω(n1-2/k) [Bar-Yossef-Jayram-Kumar-Sivakumar 02], [Chakrabarti-Khot-Sun 03]
![Page 3: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/3.jpg)
Beyond Data Streams
• Disk storage can be huge– Can stream data to/from disks in real time
• Sequential access hides latency– Motivates multipass streams
• Analyzed by similar methods to single pass
• Why stop at a single copy?– Working with more than one copy at once may
make computations easier
• Why stream the data onto disks exactly as read?– Can make modifications to data while writing
![Page 4: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/4.jpg)
0 1 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0
0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0 0 0
Read/write streams model
• Disks read/write streams– Key Parameters: space, #passes=reversals– Assume #streams is constant
• Introduced by [Grohe-Schweikardt 05]
0 1 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0
memory
0 0 1 1 1 1 0 1 0
![Page 5: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/5.jpg)
Read/write streams model
• Much more powerful than data-stream model– Sort with O(log n) passes, O(log n) space, 3
streams• MergeSort
– Exactly compute any frequency moment• Data-stream requires passes space = Ω(n)
– Θ(log n) passes, O(1) space gives all of LOGSPACE [Hernich-Schweikardt 08]
What can be computed in o(log n) passes + small space?
![Page 6: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/6.jpg)
Previous lower bounds for R/W streams
• In o(log n) passes need Ω(n1-ε) space to– Sort n numbers
[Grohe-Schweikardt 05]– Test set-equality A=B, multiset equality,
XQuery, XPath
[Grohe-Hernich-Schweikardt 06]
• Same lower bounds apply for randomized algorithms with one-sided error [Grohe-Hernich-Schweikardt 06]
![Page 7: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/7.jpg)
Previous lower bounds for R/W streams
• Lower bounds for general randomness and two-sided error:– In o(log nlog log n) passes, need Ω(n1-ε)
space to:• Approximate F
* within factor 2 • Find Empty-Join, XQuery/XPath-Filtering etc.
[B-Jayram-Rudra 07]
What about approximating frequency moments Fk for k 2 ?
![Page 8: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/8.jpg)
Our Main Result
Theorem: Any randomized R/W-stream algorithm using o(log n) passes needs Ω(n1-4/k-ε) space to 2-approximate Fk
• Implies polynomial space for k>4
• Compare with: Θ(n1-2/k) on data streamsR/W streams with o(log n) passes don’t help
much for approximating frequency moments.R/W streams with o(log n) passes don’t help much for approximating frequency moments.
![Page 9: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/9.jpg)
Methods
![Page 10: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/10.jpg)
1. Reduce testing t-party set-disjointness to Fk
Easy!
2. Simulate any data-stream algorithm by amulti-party number-in-hand communication game
Trivial!
3. Apply Ω(n/t) communication lower bound on t-party set-disjointness
[AMS 96,Saks-Sun 02,Bar-Yossef-Jayram-Kumar-Sivakumar
02, Chakrabarti-Khot-Sun 03,Grönemeier 09] (tight!)
[Alon-Matias-Szegedy 96] approach to lower bounding Fk in data streams
Fails for R/W streams!
Fails for R/W streams!
Solved easily by R/W streams!Solved easily by R/W streams!
Cannot be applied to R/W streams!
Cannot be applied to R/W streams!
![Page 11: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/11.jpg)
Promise Set-Disjointness (DISJ)
0, x1,…,xt are pair-wise disjoint
DISJn,t(x1,…,xt) = 1, a s.t. a xi for every i
Undefined otherwise
0 1 0 1 0 0 1 0 1 0 0 0 1 0 01 0 0 0 1 0 0 0 1 0 0 0 0 0 10 0 1 0 0 0 0 1 1 0 1 0 0 0 00 0 0 0 0 1 0 0 1 0 0 1 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 1 0
x1
x2
x3
x4
x5
• t-party NIH communication: Ω(nt)• Approximating Fk testing DISJn,t for t n1/k
![Page 12: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/12.jpg)
xtxt-1x2x1
• Testing DISJn,t with 2 streams,3 passes,O(log n)
space
• Input: x1,x2,…,xt{0,1}n
R/W streams easily solve DISJn,t
x1 x2 xt-1 xt
![Page 13: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/13.jpg)
• Lower bounds [GS05], [GHS05], [BJR07] for R/W streams don’t use [AMS96] outline
– Introduce permuted 2-party versions of problems
– Employ ad-hoc combinatorial arguments
How to prove lower bounds in R/W streams?
We take a more general approach related to [AMS96] directly using NIH comm. complexity
![Page 14: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/14.jpg)
Our approach to lower bound Fk
R/W streams algorithm for
t-party-permuted-DISJ
on input size n
Number-in-hand communication protocol for t-party-DISJ
on input size nt2
![Page 15: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/15.jpg)
1.Reduce testing t-party set-disjointness to Fk
Easy!
2.Simulate data-stream algorithms bymulti-party number-in-hand communication game
Apply our simulation
3.Apply communication lower bound on t-party set-disjointness
[AMS96,SS02,B-YJKS02,CKS03,G09] (tight!)
2. Simulate R/W streams for permuted DISJ by NIH comm. for DISJ on slightly smaller input size
1. Reduce testing permuted t-party DISJ to Fk
[Alon,Matias,Szegedy 96]’s approach to lower bound Fk in data streamOur approach to lower bound Fk
in R/W streams
![Page 16: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/16.jpg)
Ideas from the proof
![Page 17: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/17.jpg)
Segmenting DISJn,t
Input: x1,x2,…,xt{0,1}n
• View DISJn,t as an OR of m subproblems DISJn/m,t
x1 x2 xt-1 xt
1 2 m
nm 1 2 m
nm
![Page 18: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/18.jpg)
Fix 1,2,…,t permutations on [m]
Permuted-DISJn,m,t
• View Permuted-DISJn,m,t as an OR of m subproblems
DISJn/m,t
Permuted DISJ
1(1) 1(2) 1(m)
1(x1) 2(x2) t(xt)
1 2 m
DISJn/m,tDISJn/m,t
nm
DISJn/m,tDISJn/m,t
1 2 m
nm
t(1) t(2) t(m)
![Page 19: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/19.jpg)
• Intuitively, to solve a subproblem (e.g. blue), we need to compare at least two blue
segments
• Need to compare at least two segments of every color
• If segments are shuffled, many passes are needed
Why is permuted-DISJ hard?
i(xi) j(xj) l(xl)
DISJn/m,tDISJn/m,t
![Page 20: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/20.jpg)
Permuted DISJ• Good subproblem: computation always depends
only on at most one of its t segments (and the memory/state)
• If segments are randomly shuffled:With o(log m) passes, t=o(m1/2) parties,
99% of the m subproblems are good• Reduction idea: Try to embed an ordinary
DISJn/m,t in one of the good subproblems
Catch: Which subproblems are good depends on input
![Page 21: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/21.jpg)
t players on input y1,y2,…,yt:1. Generate m-1 DISJn/m,t’s
that look like* y1,y2,…,yt
2. Shuffle with 1,2,…,t
• (y1,y2,…,yt) is good w.h.p
3. Run A on 1(x1),…,t(xt)
Simulation
s-space R/W streams algo A for permuted-
DISJn,m,t
NIH comm. protocol
for DISJn/m,t
y1
y2
1(x1)
2(x2)
x1
x2
*same sizes but don’t intersect
![Page 22: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/22.jpg)
Generating the extended input
Given y1,y2,…,yt, players– Exchange the sizes of each of the sets
• O(t log n) bits– Choose random consistent reordering of the indices
of each y1,y2,…,yt
– Generate m-1 random inputs to DISJn/m,t with same set sizes as y1,y2,…,yt but that are disjoint
– Place y1,y2,…,yt in random position and then shuffle
Key observation: If y1,y2,…,yt are disjoint then this resolves the catch– After shuffling, all the subproblems look the same
so the probability that the subproblem where y1,y2,…,yt lands is good does not depend on the input
![Page 23: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/23.jpg)
Simulating R/W stream algorithm A using NIH
communication• As A executes on input v=1(x1),…,t(xt) players
know all inputs except y1,…,yt – each player builds up copy of a dependency graph
σ(v) for the elements of each stream so far• Using σ(v), at each step all players either
– know the next move, or – know which one player knows next block of moves
• that player communicates – know that need two players’ info: simulation
“fails” • If subproblem y1,…,yt is good for v then simulation
does not fail• If players detect failure they output “not disjoint”
– If input was disjoint then only 1% chance of this
![Page 24: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/24.jpg)
Dependency Graph
pass j
pass j+1
Stream R to L Stream L to R
Stream L to R
Vertices: Elements of each stream in each passEdges: From element to elements in previous pass that contained heads at same time it did
pass j -1
pass 0
pass 1
![Page 25: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/25.jpg)
Why most subproblems are good
• Simple case: algorithm just makes copies of the input stream and compares them– # of subproblems with > 1 segment read at same
time on single pass through the streams (L-to-R or R-to-L on each stream)
• ≤ # segments appearing in the same (or reversed) order
– Almost surely, for random permutations 1,2,…,t
no pair has a common subsequence or inverted subsequence longer than 2em1/2
– When t is o(m1/2) the total is o(m).
![Page 26: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/26.jpg)
Why most subproblems are good
• General case: May combine information about all streams onto a single stream in single pass– What is combined may depend on the
input values
– Each element depends on the segments that it can reach in the input stream via the dependency graph
![Page 27: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/27.jpg)
• For each fixed v, after p=o(log m) passes: – Each element can depend on only 2O(p) different
input segments
– For any one stream, the sequence of its
elements’ dependencies on input segments is
the interleaving of 2O(p) monotone
subsequences from 1,2,…,t
Only 2O(p) t m1/2=mo(1) bad subproblems on
input v
Why most subproblems are good
![Page 28: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/28.jpg)
Communication Cost of Simulation
• For each fixed v, after p=o(log m) passes: – Only 2O(p) t elements depend on a segment and
have a neighbor that does not depend on it
• Players only need to communicate when segment dependencies change – only happens 2O(p)t times at cost of O(ps) bits
per time
![Page 29: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/29.jpg)
Limitations and Future Work
![Page 30: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/30.jpg)
• Gap from data stream due to loss in input size
• Most of this loss is necessary– Need nm (t2) to use Ω(n/t) CC lower bound for
DISJn/m,t
– Efficient R/W algo for permuted-DISJn,m,t unless m ≥ t32
– Implies that n is Ω(mt2) which is Ω(t3.5)
Since we need t≈n1/k, the lower bound Ω(n/t) is trivial for k 3.5
Limitation of using permuted-DISJ
R/W streams algo for
permuted-DISJn,m,t
NIH CC protocol for DISJn/m,t
![Page 31: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/31.jpg)
• Algorithm for permuted-DISJn,m,t follows from the following theorem:
Proof: For each i [m] define a triple ti of integers:
For each of the 3 pairs of permutations put length of the longest common subsequence for that pair that ends with value i. Can show that all m triples are different.
So some triple must contain a coordinate ≥ m1/3
• Tight even for 4 permutations
In any 3 permutations on [m] there is a pair
with
longest common subsequence length ≥
m1/3.
In any 3 permutations on [m] there is a pair
with
longest common subsequence length ≥
m1/3.
A longest-common-subsequence problem on
permutations
![Page 32: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/32.jpg)
t m2/3, any : Testing permuted-DISJn,m,t
with 2 streams, 3 passes, O(log nmt) space
R/W stream algorithm for permuted-DISJn,m,t for large t
In any three permutations on [m] there is a pair
with
longest common subsequence length ≥ m1/3.
In any three permutations on [m] there is a pair
with
longest common subsequence length ≥ m1/3.
1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6)
1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6)
• Compare m1/3 blocks each time
![Page 33: Lower Bounds for Read / Write Streams](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814064550346895dabdd72/html5/thumbnails/33.jpg)
Open problems
• Is Ω(n1-4/k-ε) lower bound for R/W streams tight?– Gap from O(n1-2/k) upper bound in data stream
• Can’t use permuted-DISJn,m,t to close it
– Polynomial space to compute Fk for 2 < k ≤ 4 ?
• Other problems on R/W streams?• L(m,k) maximum LCS length that can be guaranteed
between some pair in any set of k permutations on [m].
– We show L(m,3) L(m,4) m1/3
– What is L(m,k) for other values of k?
– [B-Blais-Huynh 08] L(m,k) = m1/3+o(1) for k mO(1)