Post on 04-Apr-2018
7/30/2019 2001-05-02 Ideas on Treaps
1/45
Ideas on Treaps
Maverick Woo
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
7/30/2019 2001-05-02 Ideas on Treaps
2/45
May 2, 2001 2
Disclaimer
Articles of interest
Raimund Seidel and Cecilia R. Aragon.
Randomized search trees.Algorithmica 16 (1996), 464-497.
Guy E. Blelloch and Margaret Reid-Miller.Fast Set Operations Using Treaps.
In Proc. 10th Annual ACM SPAA, 1998.Of course this is joint work with Guy.
Hopefully Daniel will also show up.
7/30/2019 2001-05-02 Ideas on Treaps
3/45
May 2, 2001 3
Background
Very high level talk
No analysis
To make this a technical talk
Some background
Splay Trees (zig, zig zig, zig zig zig)
Treaps, if you still remember
7/30/2019 2001-05-02 Ideas on Treaps
4/45
May 2, 2001 4
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
7/30/2019 2001-05-02 Ideas on Treaps
5/45
May 2, 2001 5
Data Structure Research
I am not qualified to say yet, but I dohave some feelings about it.
Not that many high-level problems.Representing a set/ordering
Support some operations
Some say its all about applications.Applications dont have to very specific.
But need to be specific enough---we canmake assumptions.
7/30/2019 2001-05-02 Ideas on Treaps
6/45
May 2, 2001 6
What Operations?
Basic
Insert, Membership
IntermediateDelete (e.g. Binomial vs. Fibonacci Heaps)
Disjoint-Union (e.g. Union-Find)
Higher LevelUnion, Intersection, Difference
Finger Search
7/30/2019 2001-05-02 Ideas on Treaps
7/45
May 2, 2001 7
Behavior Restrictions
PersistenceFunctional
More later
Architecture IndependenceRelatively new, a.k.a. Cache-oblivious
Runs efficiently on hierarchical memory
Avoid memory-specific parameterizationForget data block size, cache line width etc.
Not my theme today
7/30/2019 2001-05-02 Ideas on Treaps
8/45
May 2, 2001 8
Why Persistence?
Many reasons for persistence Its practical with good garbage collectors.
Functional programming makes everyoneslife easier.For the theoretician
You dont need to worry about side effects.
Better analysis possible: NESL
For the programmerYou dont need to worry about side effects.
Less memory leak, less dangling pointers
7/30/2019 2001-05-02 Ideas on Treaps
9/45
May 2, 2001 9
Real-life example 1
You are have operations working onmultiple-instances.
You index the web.You build your indices with your cool data
structures.
Conjunction query (AND) is intersection.
You do the intersection on two indices.
Now one of the indices can get corrupted.
7/30/2019 2001-05-02 Ideas on Treaps
10/45
May 2, 2001 10
Real-life example 2
You are rich.Once upon a time, in a dot-com far away
You run a multi-processor machine.
You learned that Splay Trees are cool.
You even learned how to write multi-threaded programs.
Thread1 searches forx on SplayInstance42.Thread2 searches fory on SplayInstance42.
Real-world situation: search engines
7/30/2019 2001-05-02 Ideas on Treaps
11/45
May 2, 2001 11
Data Structure vs. Hacking
Examples
To learn more about Splay Trees
Dial (412)-HACKERS.Ask for Danny Sleator
OK, real example
(Persistent) FIFO Queues
Operations IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)
Need to grow, lets use Linked List
7/30/2019 2001-05-02 Ideas on Treaps
12/45
May 2, 2001 12
FIFO Queues
Linked List is bad though
Transverse to tail takes linear time.
Either Enqueue or Dequeue is going to be linear time.
How about doubly-ended queues (deques)? With that much extra space, may be faster with a tree.
If one is not good enough, use two.
Suppose queue is x1
x2
xi
yi+1
yi+2
yn
.
Represent as [x1x2xi],[yn,yn-1yi+1].
You can figure out the details yourself.
In the end, isnt thisjust a hack?
7/30/2019 2001-05-02 Ideas on Treaps
13/45
May 2, 2001 13
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
7/30/2019 2001-05-02 Ideas on Treaps
14/45
May 2, 2001 14
Treaps Refresher
A Treap is a recursive data structure. datatype 'a Treap =
E | T of priority * 'a Treap * 'a * 'a Treap
Each node has a key and a priority.Assume all unique
Arrange key in in-order, priority in heap-order
Priority is chosen uniformly at random.
8-way independence suffices for the analysis Can be computed with hash functions
Dont need to store the priority
A keys priority can be made consistent across runs
7/30/2019 2001-05-02 Ideas on Treaps
15/45
May 2, 2001 15
Treap Operations
Membership
As in binary search trees
Insert
Add as leaf by key (in-order)
Rotate up by priority (heap-order)
Delete
Reverse what insert doesFind-min, etc.
Walk on the left spine, etc.
7/30/2019 2001-05-02 Ideas on Treaps
16/45
May 2, 2001 16
Treap Split
Want top-down split (its faster)
(less, x, gtr) = Split(root, k)
If (root.k > k) // want to split left subtree
Let (l1, m, r1) = Split(root.left, k)
(l1, m, T(root.p, r1, root.k, root.right))
If (root.k < k) // want to split right subtree
Let (l1, m, r1) = Split(root.right, k)
(T(root.p, root.left, root.k, l1), m, r1)
Else
(root.left, root.k, root.right)
7/30/2019 2001-05-02 Ideas on Treaps
17/45
May 2, 2001 17
Treap Split Example
Before After
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtrSplit(Tr,V)
7/30/2019 2001-05-02 Ideas on Treaps
18/45
May 2, 2001 18
Treap Split Persistence
These figures are deceptive.
Only 4 new nodes created
All on the search path to V
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtr
7/30/2019 2001-05-02 Ideas on Treaps
19/45
May 2, 2001 19
Treap Join
Join(less, gtr) // less < x < gtr
Handle empty less or gtr
If (less.p > gtr.p)T(less.p, less.left, less.k, Join(less.right, gtr))
Else
T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)
7/30/2019 2001-05-02 Ideas on Treaps
20/45
May 2, 2001 20
Treap Join Example
After Before
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtrJoin(less,gtr)
7/30/2019 2001-05-02 Ideas on Treaps
21/45
May 2, 2001 21
Treap Running Time
All expected O(lg n)
Also of note is Finger Search
Given a finger in a treapFind the key that is d away in sorted order
Expected O(lg d) time
Require parent pointersEvil Waste so much space
See Seidel and Aragon for details.
7/30/2019 2001-05-02 Ideas on Treaps
22/45
May 2, 2001 22
Treap Union
Treaps really shine in set operations.
Union(a,b)
Suppose roots are (k1,p1), (k2,p2)WLOG assume p1 > p2.
Let (less,x,gtr) = Split(b,k1).
T(p1, Union(a.left, less), k1,Union(a.right, gtr))
7/30/2019 2001-05-02 Ideas on Treaps
23/45
May 2, 2001 23
Treap Intersection
Inter(a,b)
Suppose roots are (k1,p1), (k2,p2); p1>p2
Let (less,x,gtr) = Split(b,k1) If x is null // k1 is not in b, sorry dude
Join(Inter(a.left, less), Inter(a.right, gtr))
Else
T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))
7/30/2019 2001-05-02 Ideas on Treaps
24/45
May 2, 2001 24
Treap Difference
Similar to intersection
Change the logic a bit
Messier because it is not symmetricLeave as an exercise to the reader.
7/30/2019 2001-05-02 Ideas on Treaps
25/45
May 2, 2001 25
Points of Note
Persistence
Did you see a side effect? (assignments?)
ParallelizationParallelize without persistence is a pain.
Very natural divide-and-conqueror
Run the two recursive calls on different CPUs
Running times
7/30/2019 2001-05-02 Ideas on Treaps
26/45
May 2, 2001 26
Set Operation Running Time
For two sets of size m and n (m n)
Optimal is
(m lg (n/m))
Whats known before this work With AVL Trees, O(m lg(n/m))
Rather complicated algorithms For the sake of your smooth digestion
Compare this to O(m+n) or O(m lg n)
With Treaps Can use Finger Search if we have parent pointers
Does not parallelize---multiple fingers???
7/30/2019 2001-05-02 Ideas on Treaps
27/45
May 2, 2001 27
Set Operation Running Time
Whats known after this work
No parent pointers
Parallelize naturally
Optimal expected running time O(m lg (n/m))
Analysis available in Blelloch and Miller
Relatively simple algorithm
Experimental results 6.3-6.8 speedup on 8-processor SGI machine
4.1-4.4 speedup on 5-processor Sun machine
7/30/2019 2001-05-02 Ideas on Treaps
28/45
May 2, 2001 28
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
7/30/2019 2001-05-02 Ideas on Treaps
29/45
May 2, 2001 29
A Word on Splay Trees
Splay Trees are slow in practice!
Even a single simple search would requireO(lg n)pointer updates!
Skip Lists are way simpler and faster.
Lets switch all Splay Trees to Skip Lists.
Danny???
7/30/2019 2001-05-02 Ideas on Treaps
30/45
May 2, 2001 30
Bruce said
First find Danny.Ditch Splay Trees---say they are slow.
Then praise Skip Lists.
Danny will refute by quoting experimental studies.
Splay Trees are not much slower than Skip List
in practice.ask whos my advisor.
I wonder if that works. So I tried.
7/30/2019 2001-05-02 Ideas on Treaps
31/45
May 2, 2001 31
Current Issues on Treaps
Treaps are simpler than Splay Trees
No famous conjecture for my back pocket
Neat idea from Adam Kalai
Not self-adjusting
Access introduces more explicit changes
Adding data compression to Treaps
Finger search on Treaps
Work by Guy + Daniel Blandford
7/30/2019 2001-05-02 Ideas on Treaps
32/45
May 2, 2001 32
Adding Compression to Treaps
Search engines
Infrequent offline update (once a month)
Frequent online query and set operationsKeys are unique.
Keys can be huge and occurs sparsely.
Lets compress the keys!Assume they are 64-bit integers.
7/30/2019 2001-05-02 Ideas on Treaps
33/45
May 2, 2001 33
Weve got a problem!
I dont know how to deploy datacompression to general data structures.
Begin with the simplest---Array
The nave approachCompress the whole array
When need to access an element
decompress the whole arraydo the access
compress the whole array again
7/30/2019 2001-05-02 Ideas on Treaps
34/45
May 2, 2001 34
Isnt that dumb?
Any suggestions?
Use chunking
Divide the array into blocks of size C.Compress each block individually.
Now we are back to constant time!
Shh!!! That could be a trade secret.Of course they use something better than
vanilla array.
7/30/2019 2001-05-02 Ideas on Treaps
35/45
May 2, 2001 35
Chunking a Treap
A sub-tree is a chunk.
Desire consistent chunk size
But Treaps are usually not full.Need better chunking rules
Chunks
Cant be too big---hurt running timeCant be too small---hurt compression
(space)
7/30/2019 2001-05-02 Ideas on Treaps
36/45
May 2, 2001 36
Vocab
Internal node and Leaf block
More precisely datatype
tblock =
Packed of int * key * key * key vector
| UnPacked of int * int * key vector
datatype
trearray =
TE
| TB of tblock
| TN of trearray * key * trearray
All running time are in expected case.
7/30/2019 2001-05-02 Ideas on Treaps
37/45
May 2, 2001 37
Idea 1 Thresholds
Priority is in the range 1 to maxP
Invent a threshold Pth
e.g. maxP - log(maxP)For n=(p,k)
Ifp > Pth, then n is an internal node.
Otherwise, n is in some leaf block.Trick done when a key is inserted.
Also maintained by various operations.
7/30/2019 2001-05-02 Ideas on Treaps
38/45
May 2, 2001 38
Idea 1 Features
On average, constant ratio betweeninternal keys to keys in block.
With Pth = maxP - log(maxP), N keys log N internal nodes
Height is log log N.
O(log N)bottom node, each w/ a blockExpect (N-log N) / O(log N) keys / block
Binary search in block takes O(log N)
7/30/2019 2001-05-02 Ideas on Treaps
39/45
May 2, 2001 39
Idea 1 Running Time
Query is still O(log n).
Insert is also O(log n).
Join, Split both take O(log n).Set operations rely on Join and SplitsO(log n) running time.
Looking good
7/30/2019 2001-05-02 Ideas on Treaps
40/45
May 2, 2001 40
Idea 1 Problems
Asymptotic bound
Need to work out the constants
Exact analysis in progress I now think of Knuth even higher
SML implementation
Make the idea as concrete as code
Can now do more experiments
7/30/2019 2001-05-02 Ideas on Treaps
41/45
May 2, 2001 41
Idea 1 Questions
Do we really need to maintainconsistent priority across runs?
Make things simpler
But Union looks suspicious
What compression algorithm to use?
No general data compressionTake advantage of index distribution
7/30/2019 2001-05-02 Ideas on Treaps
42/45
May 2, 2001 42
Idea 2 Small Blocks
Want a more-or-less constant block size
Small blocks are more realistic
Say 20Processor specific---fit cache line size
How well can we compress 20 integers?
Leave for second stage investigation
7/30/2019 2001-05-02 Ideas on Treaps
43/45
May 2, 2001 43
Perhaps I can share
Writing down algorithm as code helps
Pseudo code are good for short algorithms
Real code is more concrete.Good for sloppy people like me.
Actual SML code
You can figure out you missed some cases.
Now if SML has a debugger
Space time tradeoff is very real
http://www.cs.cmu.edu/~maverick/Trearray.htmlhttp://www.cs.cmu.edu/~maverick/Trearray.html7/30/2019 2001-05-02 Ideas on Treaps
44/45
May 2, 2001 44
Treap Finger Search
Daniel is working on it.
No parent pointers needed
Can mimic parent pointers by reversingroot-to-(last accessed leaf) path
Should probably leave this to him
7/30/2019 2001-05-02 Ideas on Treaps
45/45
Q&A / Suggestions
Work in progress, welcome suggestions
Danny, dont kick me too hard