1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify....

71
1 Persistent data structures
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    1

Transcript of 1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify....

1

Persistent data structures

2

Ephemeral: A modification destroys the version which we modify.

Persistent: Modifications are nondestructive. Each modification creates a new version. All version coexist.

We have a big data structure that represent all versions

3

Partially persistent: Can access any version, but can modify only the most recent one.

V1

V2

V3

V4

V5

4

fully persistent: Can access and modify any version any number of times .

V1

V2

V3

V4

V5

V5

5

confluently persistent: fully persistent and there is an operation that combines two or more versions to one new version.

V1

V2

V3

V4

V5

V5

6

Purely functional: You are not allowed to change a field in a node after it is initialized.

This is everything you can do in a pure functional programming language

7

Example -- stacks

Two operations

S’ = push(x,S)(x,S’) = pop(S) y

S

xS’

S’ = push(x,S)S’’ = push(z,S’)

zS’’

(y,S3) = pop( S)S3

Stacks are automatically fully persistent.

S

S’

s’’

S3

8

Example -- queues

Two operations

Q’ = inject(x,Q)(x,Q’) = pop(Q)

Q

y

Q’ = inject(x,Q)

(y,Q3) = pop( Q’’)

Q

Q’’ = inject(z,Q’)

Q’

x

Q’ Q’’

z

Q’’

Q3 Q3

We have partial persistent, We never want to store two different values in the same field

How do we make the queue fully persistent ?

9

Example -- double ended queues

four operations

Q’ = inject(x,Q)(x,Q’) = eject(Q) Q’ = push(x,Q)(x,Q’) = pop(Q)

(x,Q’) = eject(Q)Q’’ = inject(z,Q’)

x

Here its not even obvious how to get partial persistence ?

QQ Q’Q’

10

Maybe we should use stacks

Stacks are easy.We know how to simulate queues with stacks.So we should be able to get persistent queues this way...

injecteject

pushpop

When one of the stacks gets empty we split the other

4 3 2 1 eject

4 3 2 1 eject

11

Deque by stack simulation (ephemeral analysis)

= | |Sl| - |Sr| |

Each operation changes the potential by O(1)

The amortized cost of the reverse is 0.

4 3 2 1 eject

4 3 2 1 eject

In a persistent setting it is not clear that this potential is well defined

12

Deque by stack simulation (partial persistence)

= | |Sl| - |Sr| |

Where S is the “live” stack, the one which we can modify

Everything still works

4 3 2 1 eject

4 3 2 1 eject

When we do the reversal in order not to modify any other stack we copy the nodes !

13

Deque by stack simulation (full persistence)

....

Can repeat the expensive operation over and over again

....

or

A sequence of n operations that costs (n2)

eject

eject

14

Summary so far

Stacks are automatically fully persistent

Got partially persistent queues in O(1) time per pop/inject

Got partially persistent deques in O(1) amortized time per operation

How about fully persistent queues ?

Partially persistent search trees, other data structures ?

Can we do something general ?

16

How about search trees ?

All modifications occur on a path.

So it suffices to copy one path.

This is the path copying method.

17

Example -- path copying

16

1231 181514 20 2821 40

. . . . . . . .. . . . . . .

. .

18

Example -- path copying

1231 181514 20 2821 40. . . . . . . .

. . . . . . .

. .

12 181514 16

19

Path copying -- analysis

O(log n) time for update and access

O(log n) space per update

Want the space bound to be proportional to the number of field modifications that the ephemeral update did.

In case of search trees we want the space consumption of update to be O(1) (at least amortized).

Gives fully persistent search trees!

20

Application -- planar point location

Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints.

Given such polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it.

Measure an algorithm by three parameters:

1) The preprocessing time.

2) The space required for the data structure.

3) The time per query.

21

Planar point location -- example

22

Planar point location -- example

23

Solving planar point location (Cont.)

Partition the plane into vertical slabs by drawing a vertical line through each endpoint.

Within each slab the lines are totally ordered.

Allocate a search tree per slab containing the lines at the leaves with each line associate the polygon above it.

Allocate another search tree on the x-coordinates of the vertical lines

24

Solving planar point location (Cont.)

To answer query

first find the appropriate slab

Then search the slab to find the polygon

25

Planar point location -- example

26

Planar point location -- analysis

Query time is O(log n)

How about the space ?

(n2)

And so could be the preprocessing time

27

Planar point location -- bad example

Total # lines O(n), and number of lines in each slab is O(n).

28

Planar point location & persistence

So how do we improve the space bound ?

Key observation: The lists of the lines in adjacent slabs are very similar.

Create the search tree for the first slab.

Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex

How many insertions/deletions are there alltogether ?

2n

29

Planar point location & persistence (cont)

Updates should be persistent since we need all search trees at the end.

Partial persistence is enough

Well, we already have the path copying method, lets use it.What do we get ?

O(nlogn) space and O(nlog n) preprocessing time.

We shall improve the space bound to O(n).

30

What are we after ?

Break each operation into elementary access steps (ptr traversal) and update steps (assignments, allocations).

Want a persistent simulation with consumes O(1) time per update or access step, and O(1) space per update step.

31

Making data structures persistent (DSST 89)

We will show a general technique to make data structures partially and later fully persistent.

The time penalty of the transformation would be O(1) per elementary access and update step.

In particular, this would give us an O(n) space solution to the planar point location problem

The space penalty of the transformation would be O(1) per update step.

32

The fat node method

Every pointer field can store many values, each tagged with a version number.

5

7

15

NULL 4

33

The fat node method (Cont.)

5

7

15

NULL 4Simulation of an update step when producing version i:

• When a new node is created by the ephemeral update we create a new node, each value of a field in the new node is marked with version i.

• When we change a value of a field f to v, we add an entry to the list of f with key i and value v

34

The fat node method (Cont.)

5

7

15

NULL 4Simulation of an access step when navigating in version i:

• The relevant value is the one tagged with the largest version number smaller than i

35

Partialy persistent deques via the fat node method

V1x NullNull11

V2 = inject(y,V1)xNull21

y

Null1

2

Null2

Null

Null

V3 = eject(V2)xNull21

y

1

2

Null2

3

z

V4= inject(z,V3)xNull21

y

Null1

2

Null2

Null3

Null4

4 4

36

Fat node -- analysis

Space is ok -- O(1) per update step

That would give O(n) space for planar point location since each insertion/deletion does O(1) changes amortized.

We screwed up the update time, it may take O(log m) to traverse a pointer, where m is the # of versions

So query time goes up to O(log2n) and preprocessing time is O(nlog2n)

37

Node copying

This is a general method to make pointer based data structures partially persistent.

Idea: It is similar to the fat node method just that we won’t make nodes too fat.

We will show this method first for balanced search trees which is a slightly simpler case than the general case.

Nodes have to have bounded in degree and bounded outdegree

38

Partially persistent balanced search trees via node copying

Here it suffices to allow one extra pointer field in each node

When the ephemeral update changes a pointer field if the extra pointer is empty use it, otherwise copy the node. Try to store pointer to the new copy in its parent.

If the extra ptr at the parent is occupied copy the parent and continue going up this way.

Each extra pointer is tagged with a version number and a field name.

When the ephemeral update allocates a new node you allocate a new node as well.

39

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

40

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

41

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

29

42

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

2920 2821

43

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

2920 2821

2

44

Node copying -- analysis

The time slowdown per access step is O(1) since there is only a constant # of extra pointers per node.

What about the space blowup ?

O(1) (amortized) new nodes per update step due to nodes that would have been created by the ephemeral implementation as well.

How about nodes that are created due to node copying when the extra pointer is full ?

45

Node copying -- analysis

We’ll show that only O(1) of copings occur on the average per update step.

Amorized space consumption = real space consumption +

= #(used slots in live nodes)

A node is live if it is reachable from the root of the most recent version.

==> Amortized space cost of node copying is 0.

46

Node copying in general

Each persistent node has d + p + e + 1 pointers

e = extra pointers

p = predecessor pointers

1 = copy pointer.

live

4 7 115 6

47

Simulating an update step in node xWhen there is no free extra ptr in x copy x.

When you copy node x, and x points to y, c(x) should point to y, update the corresponding predecessor ptr in y. Add x to the set S of copied nodes.

(S contains no nodes initially)

7

7

y

x 11 c(x)7

48

Node copying in general (cont)Take out a node x from S, go to nodes pointing to x and update then, maybe copying more nodes

7

7

y

x 117

111111

49

Node copying in general (cont)Take out a node x from S, go to nodes pointing to x and update then, maybe copying more nodes

7

7

y

x 11

111111

50

Node copying in general (cont)Take out a node x from S, go to nodes pointing to x and update then, maybe copying more nodes

7

7

y

x 11

111111

51

Node copying in general (cont)Take out a node x from S, go to nodes pointing to x and update then, maybe copying more nodes

7

7

y

x 11

111111

52

Node copying in general (cont)

Remove any node x from S,

for each node y indicated by a predecessor pointer in x

find in y the live pointer to x.

• If this ptr has version stamp i, replace it by a ptr to c(x). Update the corresponding reverse pointer

• If this ptr has version stamp less than i, add to y a ptr to c(x) with version stamp i. If there is no room, copy y as before, and add it to S. Update the corresponding reverse pointer

53

Node copying (analysis)

Actual space consumed is |S|

= #(used extra fields in live nodes)

This is smaller than |S| if e > p (Actually e ≥ p suffices if we were more careful)

So whether there were any copings or not the amortized space cost of a single update step is O(1)

= -e|S| + p|S|

54

The fat node method - full persistence

Does it also work for full persistence ?

5

6

NULL 15

6 7

We have a navigation problem.

55

The fat node method - full persistence (cont)

Maintain a total order of the version tree.

5

6 7

8

5 67 8

9

5 67 89

56

The fat node method - full persistence (cont)

When a new version is created add it to the list immediately after its parent.

==> The list is a preorder of the version tree.

57

The fat node method - full persistence (cont)

When traversing a field in version i, the relevant value is the one recorded with a version preceding i in the list and closest to it.

5

6

NULL 1

5

6 7

8 9

5 67 89

5 67 8

58

The fat node method - full persistence (cont)

How do we update ?

5

6

NULL 1

5

6 7

8 9

5 67 89

10

10

5 67 8910

7

59

The fat node method - full persistence (cont)

5

6

NULL 1

5

6 7

8 9

5 67 89

10

10

5 67 8910

7

So what is the algorithm in general ?

60

The fat node method - full persistence (cont)

Suppose that when we create version i we change field f to have value v.

i2

i1

i

v

f

i1 i2i

Let i1 (i2) be the first version to the left (right) of i that has a value recorded at field f

61

The fat node method - full persistence (cont)

We add the pair (i,v) to the list of f

Let i+ be the version following i in the version list

i2

i1

i

v

f

i1 i2i i+

If (i+ < i2) or i+ exists and i2 does not exist add the pair (i+,v’) where v’ is the value associated with i1.

v’

i+

62

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Fully persistent 2-4 trees with the fat node method

0

16

63

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into fully persistent 2-4 trees (fat nodes)

12 1814

1

0

1

0 1

64

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into fully persistent 2-4 trees (fat nodes)

12 1814

1

29

0

1

0 1

2

2

65

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

2920 2821

0

1

0 1

2

2

2

66

16

1231 1814 20 2821

. . . . . . . .. . . . . . .

. .

Insert into persistent 2-4 trees with node copying

12 1814

1

2920 2821

0

1

0 1

2

2

21

67

Fat node method (cont)

How do we efficiently find the right value of a field in version i ?

Store the values sorted by the order determined by the version list. Use a search tree to represent this sorted list.

To carry out a find on such a search tree we need in each node to answer an order query on the version list.

Use Dietz and Sleator’s data structure for the version list.

68

Fat node method (summary)

We can find the value to traverse in O(log(m)) where m is the number of versions

We get O(1) space increase per ephemeral update step

O(log m) time slowdown per ephemeral access step

69

Node splitting

Similar to node copying. (slightly more evolved)

Allows to avoid the O(log m) time slowdown.

Converts any pointer based data structure with constant indegrees and outdegrees to a fully persistent one.

The time slowdown per access step is O(1) (amortized).

The space blowup per update step is O(1) (amortized)

70

Search trees via node splitting

You get fully persistent search trees in which each operation takes O(log n) amortized time and space.

Why is the space O(log n) ?

Since in the ephemeral settings the space consumption is O(1) only amortized.

71

Search trees via node splitting

So what do we need in order to get persistent search trees with O(1) space cost per update (amortized) ?

We need an ephemeral structure in which the space consumption per update is O(1) on the worst case.

You can do it !

==> Red-black trees with lazy recoloring

72

What about deques ?

We can apply node splitting to get fully persistent deques with O(1) time per operation.

We can also transform the simulation by stacks into a real time simulation and get O(1) time solution.

What if we want to add the operation concatenate ?

None of the methods seems to extend...