Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf ·...

262
Datastructuren Datastructuren Data Structures Fenia Aivaloglou Hendrik Jan Hoogeboom Informatica – LIACS Universiteit Leiden najaar 2019

Transcript of Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf ·...

Page 1: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

DatastructurenData Structures

Fenia AivaloglouHendrik Jan Hoogeboom

Informatica – LIACSUniversiteit Leiden

najaar 2019

Page 2: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Table of Contents I

1 Basic Data Structures

2 Tree Traversal

3 Binary Search Trees

4 Balancing Binary Trees

5 Priority Queues

6 B-Trees

7 Graphs

8 Hash Tables

9 Data Compression

10 Pattern Matching

Page 3: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Contents

1 Basic Data StructuresLinear listsAbstract Data StructuresAdvanced C++ programmingTrees and their Representations

Page 4: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

hierarchy of lists

A deque (”double-ended queue”) is a linear list for which allinsertions and deletions (and usually all accesses) are made atthe ends of the list. A deque is therefore more general than astack or a queue; it has some properties in common with a deckof cards, and it is pronounced the same way. (Knuth, TAoCPvol. 1)

linear list

deque ‘deck’

stack stapel lifoqueue rij →fifo→

Page 5: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

first last[6]position

inspectchange

at position

insert

×

delete

Page 6: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

implementation: doubly linked list

Λ 6

first

13

prv

-2

nxt

6 4 Λ

last

sentinel

Λ ⊥first

6

Page 7: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

Babel

insert remove inspectback front back front back front

C++ push back push front pop back pop front back frontPerl push unshift pop shift [-1] [0]Python append appendleft pop popleft [-1] [0]

1

1Double-ended queue Operations

Page 8: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

singly linked list

stack stapel

Λ x1 x2 xn

top

queue (wacht-)rij

Λ x1

first

x2 xn

last

Page 9: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

Programmeermethoden

class stapel { // de stapel zelf

public:

stapel ( ) {

bovenste = NULL; } // maak lege stapel

~stapel ( ); // destructor

void zetopstapel (int); // push

void haalvanstapel (int&); // pop

bool isstapelleeg ( ) { // is stapel leeg?

return ( ( bovenste == NULL ) ? true : false );

}//isstapelleeg

...

private: // het begin van de lijst is

vakje* bovenste; // de bovenkant van de stapel

};//stapel

void stapel::zetopstapel (int getal) { // push

vakje* temp = new vakje;

temp->info = getal;

temp->volgende = bovenste;

bovenste = temp;

}//stapel::zetopstapel

Page 10: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

contiguous representation

stack stapel

x x x x x x x

top

queue (wacht-)rij cyclic

x x x x x x

first last

x x xx x x x

last first

empty vs. full (?)

Page 11: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Linear lists

Programmeermethoden

const int MAX = 100;

class stapel { // voor maximaal MAX integers

public:

stapel ( ) { bovenste = -1; } // constructor

void zetopstapel (int);

void haalvanstapel (int&);

bool isstapelleeg ( ) {

return ( bovenste == -1 ); }

...

private:

int inhoud[MAX];

int bovenste; // index bovenste waarde

};//stapel

void stapel::zetopstapel (int getal) {

bovenste++;

inhoud[bovenste] = getal;

}//stapel::zetopstapel

Page 12: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

OOP: object oriented programming

object class– data members +– methods

data encapsulation ⇒ nicer modelling

localization operations ⇒ easier error finding

information hiding ⇒ avoiding errors

see Programmeermethoden

Page 13: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

black box

stack

bottom

0 1 2 3 4 5

pop

push

isE

mpt

y

top

data

remove

insert

quer

y

Page 14: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

elements

domain operations

data structure

specification

representation

implementation

structure

Page 15: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

ADT – what, not how

Definition

An abstract data structure (ADT) is a specification of the valuesstored in the data structure as well as the description andsignatures of the operations that can be performed.

no representation or implementation in ADT“mathematical model”

Page 16: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

abstract “native” data structures– float R– int Z

now get used to consider stacks (etc) that way

Page 17: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

structure

unordered linear hierarchical network

set list tree graph

Page 18: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

ADT Stack

Initialize: void → stack<T>. Construct an emptysequence ().

IsEmpty: void → Boolean. Check whether there the stackis empty, i.e., contains no elements).

Size: void → Integer. Return the number n of elements, thelength of the sequence (x1, . . . , xn).

Top: void → T. Returns the top xn of the sequence(x1, . . . , xn). Undefined on the empty sequence.

Push(x): T → void. Add the given element x to the top ofthe sequence (x1, . . . , xn), so afterwards the sequence is(x1, . . . , xn, x).

Pop: void → void. Remove the topmost xn element of thesequence (x1, . . . , xn), so afterwards the sequence is(x1, . . . , xn−1). Undefined on the empty sequence.

Page 19: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

ADT Queue

Initialize: construct an empty sequence ().

IsEmpty: check whether there the queue is empty, i.e.,contains no elements).

Size: return the number n of elements, the length of thesequence (x1, . . . , xn).

Front: returns the first element x1 of the sequence(x1, . . . , xn). Undefined on the empty sequence.

Enqueue(x): add the given element x to the end/back ofthe sequence (x1, . . . , xn), so afterwards the sequence is(x1, . . . , xn, x).

DeQueue: removes the first element of the sequence(x1, . . . , xn), so afterwards the sequence is (x2, . . . , xn).Undefined on the empty sequence.

Page 20: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Abstract Data Structures

other ADTs (and implementation)

Set ⇒ (balanced) binary trees, hash tables

Map

Priority Queue ⇒ binary heap, leftist heap

Graph

Union-Find

Page 21: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Advanced C++ programming

templates

templated function

template <typename T>

T max(T a, T b) { return a>b ? a : b ; }

templated class

template <typename Typ>

class Stack {

...

private:

vector<Typ> storage;

}

Stack<int> intStack;

Stack<string> stringStack;

Page 22: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Advanced C++ programming

standard template library

container

iterator

algorithm

Page 23: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Advanced C++ programming

stl container classes

helper: pair

sequences: contiguous: array (fixed length),vector (flexible length), deque (double ended),linked: forward list (single), list (double)

adaptors: based on one of the sequences:stack (lifo), queue (fifo),based on binary heap: priority queue

associative: based on balanced trees:set, map, multiset, multimap

unordered: based on hash table:unordered set, unordered map,unordered multiset,unordered multimap

Page 24: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Advanced C++ programming

STL vector of pair#include <iostream>

#include <string>

#include <queue>

using namespace std;

using paar = pair<string, unsigned int>; // replacing typedef

int main() {

vector <paar> club // ’modern’ initialization

{ {"Jan", 1}, {"Piet", 6}, {"Katrien", 5}, {"Ramon", 2} };

for (auto& mem: club) { // range based for-loop

cout << mem.first << " " ;

}

cout << endl;

return 0;

}

Jan Piet Katrien Ramon

Page 25: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Advanced C++ programming

STL priority queueclass Comp {

public:

int operator() ( const paar& p1, const paar& p2 ) {

return p1.second < p2.second;

}

};

int main() {

vector <paar> club // ’modern’ initialization

{ {"Jan", 1}, {"Piet", 6}, {"Katrien", 5}, {"Ramon", 2} };

using pqtype = priority_queue< paar, vector <paar>, Comp > ;

pqtype pq (club.begin(), club.end() ); // wow! converts into

// priority_queue

while ( !pq. empty() ) {

cout << pq.top().first << " (" << pq.top().second << ") ";

pq.pop();

}

return 0;

}

Piet (6) Katrien (5) Ramon (2) Jan (1)

Page 26: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

trees

Tree

structure AVL, B-, red-black• number children• height

contents• relative position values Heap, BST

Definition (Binary Tree)

A binary tree is: an empty tree (without any nodes), or a nodewith two children L and R where L and R are binary trees.

Page 27: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

representating binary trees: pointers

template <class T>

class BinKnp {

\\ CONSTRUCTOR

BinKnp ( const T& i,

BinKnp<T> *l = nullptr, \\ default

BinKnp<T> *r = nullptr )

: info(i) \\ constructor of type T

{ links = l; rechts = r; }

private: \\ DATA

T info;

BinKnp<T> *links, *rechts;

};

Page 28: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

binary search tree vs heap order

35

20

10

5 14

30

26

23

45

39 51

56

83

70

10

5 7

30

26

23

45

39 37

3

Page 29: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

AVL-tree and B-tree

6+1

30

9+1

2-1

5-1

8-1

11+1

1 4 7 10 12+1

13

10 20 25 32 34 40 41 44 46 52 54 58 60

30 38 50 56

42

Page 30: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

text compression Huffman & ZLW

a

b e

0 1

0 1

f

c d

0 1

0 1

0 1

1 2 3

a b c

4 5

6

7b a b

c

8

9

10

11b

a

a

a

Page 31: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

expression tree

3

π x

·

sin 2

·

π x

·

cos

·

0 x

·

π 1

·

+

·

Page 32: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

full binary tree

Page 33: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

complete binary tree −→ array

1

2 3

4 5 6 7

8 9 10 11 12

33

42 17

8 24 3 3

98 55 10 19 5

33

1

42

2

17

3

8

4

24

5

3

6

3

7

98

8

55

9

10

10

19

11

5

12

Page 34: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

left-child right-sibling

a

b

c

d

e

f

g h

a

b

c

d

e

f

g h

a

b

c d

e f

g

h

Page 35: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Basic Data Structures

Trees and their Representations

trie “retrieval”

tp

o

t

a

t

o

t

e

r

y

a

t

t

o

o

em p

o

t

pot

ato$tery$

attoo$

empo$$

Page 36: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Contents

2 Tree TraversalDefinitions and representationRecursionUsing a StackUsing Inorder ThreadsMorris TraversalLink Inversion

Page 37: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Definitions and representation

Traversal

The process of visiting each node (precisely once) in a systematicway:

breadth-first

NLR preorder

LNR inorder

LRN postorder

recursion

(parent pointer)

iterative, with stack

threads

link inversion

Page 38: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Recursion

recursion (binary trees)

recursivetraversal( node )

if (node != nil) then

// pre-visit(node)

traversal(node.left)

// in-visit(node)

traversal(node.right)

// post-visit(node)

fi

end // traversal

pre

in

post

Page 39: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Recursion

Algoritmiek2

class knoop { // struct mag ook

public:

knoop ( ) { // constructor

info = 0;

links = NULL;

rechts = NULL;

}

// maar misschien private

int info;

knoop* links;

knoop* rechts;

}; // knoop

void preorde (knoop* root) {

if ( root != NULL ) {

cout << root->info << endl;

preorde (root->links);

preorde (root->rechts);

} // if

} // preorde

void symmetrisch (knoop* root) {

if ( root != NULL ) {

symmetrisch (root->links);

cout << root->info << endl;

symmetrisch (root->rechts);

} // if

} // symmetrisch

2ja, dat hebben we gezien

Page 40: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Recursion

pre-traversal( node )

if (node != nil) then

pre-visit(node)

pre-traversal(node.left)

pre-traversal(node.right)

fi

end

b2

d3

a1

g6

i7

e5

j9

h8

k10

c4

f11

NLR = preordera b d c e g i h j k f

Page 41: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Recursion

in-traversal( node )

if (node != nil) then

in-traversal(node.left)

in-visit(node)

in-traversal(node.right)

fi

end

b1

d2

a

3

g

4

i5

e

6

j

7

h8

k9

c

10

f11

LNR = inorderb d a g i e j h k c f

Page 42: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Recursion

post-traversal( node )

if (node != nil) then

post-traversal(node.left)

post-traversal(node.right)

post-visit(node)

fi

end

b 2

d 1

a 11

g 4

i 3

e 8

j 5

h 7

k 6

c 10

f 9

LRN = postorderd b i g j k h e f c a

Page 43: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

generic binary tree traversal

visit direction visit at next node

1 down-left 1(stay) 2 node has no left-child

2 down-right 1(stay) 3 node has no right-child

3 up 2 at left-child3 at right-child

1

current

1next

2

current

1next

3

next

3current

2

next

3current

problem: going up to parent

Page 44: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

visit = 1; node = root

while (visit != 3 or not S.isEmpty() )

case visit of

1 : if (node.left != nil) then

S.push(node)

node = node.left

else

visit = 2

fi

2 : if (node.left != nil) then

S.push(node)

node = node.right

visit = 1

else

visit = 3

fi

3 : parent = S.pop()

if (parent.left != node) then

visit = 2

else

visit = 3

fi

node = parent

end//case

end//while

1node

1next

2node

1 next

3 next

3 node

2 next

3node

Page 45: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

which nodes on stack

pre: right children

b2

d3

a

1

g

6

i7

e

5

j

9

h8

k10

c

4

f11

end

in: left parents

b1

d2

a

3

g

5

i4

e

6

j

7

h8

k9

c

10

f11

end

Page 46: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

pre-orderiterative-preorder( root )

S : Stack

S.create()

S.push( root )

while ( not S.isEmpty() ) do

node = S.pop()

if (node != nil) then

visit( node )

S.push( node.right )

S.push( node.left )

fi

do

end // iterative-preorder

* currentX visitedX on stack

X

X X

X X

X X

X X

X X

X *current

Page 47: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

pre-orderiterative-preorder( root )

S : Stack

S.create()

S.push( root )

while ( not S.isEmpty() ) do

node = S.pop()

if (node != nil) then

visit( node )

S.push( node.right )

S.push( node.left )

fi

do

end // iterative-preorder

pre-order (2)iterative-preorder( root )

S : Stack

S.create()

S.push( root )

while ( not S.isEmpty() ) do

node = S.pop()

while (node != nil) do

visit( node )

S.push( node.right )

node = node.left

od

do

end // iterative-preorder [bis]

Page 48: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

in-orderiterative-inorder( root : Node)

S : Stack

S.create()

// move to first node (left-most)

walkLeft( root, S )

while ( not S.isEmpty() ) do

node = S.pop()

visit( node )

walkLeft( node.right, S )

do

end // iterative-inorder

walkLeft( node : Node, S : Stack)

while (node != nil) do

S.push( node )

node = node.left

od

end // walkLeft

X

X

X X

X X

X

X

X *current

X

Page 49: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

in-orderiterative-inorder( root : Node)

S : Stack

S.create()

// move to first node (left-most)

walkLeft( root, S )

while ( not S.isEmpty() ) do

node = S.pop()

visit( node )

walkLeft( node.right, S )

do

end // iterative-inorder

walkLeft( node : Node, S : Stack)

while (node != nil) do

S.push( node )

node = node.left

od

end // walkLeft

in-order (2)iterative-inorder( root )

S : Stack

S.create()

node = root;

while (node != nil or

not S.isEmpty() ) do

if (node != nil) then

S.push( node )

node = node.left

else

node = S.pop()

visit( node )

node = node.right

fi

od

end // iterative-inorder [bis]

Page 50: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

post-orderiterative-postorder( root )

S : Stack; // contains path from root

S.create();

last = nil

node = root

while (not S.isEmpty() or node != nil) do

if (node != nil) then

S.push(node)

node = node.left

else

peek = S.top()

if (peek.right != nil and last != peek.right) then

// right child exists AND traversing from left, move right

node = peek.right

else

visit(peek)

last = S.pop()

fi

fi

od

end // iterative-postorder

Page 51: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using a Stack

post-orderiterative-postorder( root )

S : Stack; // contains path from root

S.create();

last = nil

node = root

while (not S.isEmpty() or node != nil) do

if (node != nil) then

S.push(node)

node = node.left

else

peek = S.top()

if (peek.right != nil and last != peek.right) then

// right child exists AND traversing from left, move right

node = peek.right

else

visit(peek)

last = S.pop()

fi

fi

od

end // iterative-postorder

X

X

X X

X X

X

X

X *current

X X

Page 52: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using Inorder Threads

using inorder threads

threads:replace nil-pointers, explicitly store inorder successors

can be used to perform stack-less traversal

need one bit [boolean] per node to mark thread

Morris-variant: temporary threads, no extra bit

nb. inorder = symmetric

Page 53: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using Inorder Threads

inorder successor with threads

xcurr

Λ

succ

succ

x

Λ

curr

9

5

3

2

1

4

7

6 8

12

10

11

nil

Page 54: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using Inorder Threads

traversal with symmetric threads

inorder threads// assuming Root != nil, find first position in inorder

Curr = walkLeft( Root );

while (Curr != nil) do

inOrderVisit( Curr );

if (Curr.IsThread) then

Curr = Curr.right; // to inorder successor

else

Curr = walkLeft (Curr.right)

fi

od

walkLeft( node : Node)

while (node.left != nil) do

node = node.left

od

return node

end // walkLeft

Page 55: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Using Inorder Threads

what about

pre-order traversal with inorder threads

Page 56: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Morris Traversal

Morris: temporary threads

inorder successor(to left parents)

b1

d2

a

3

g

5

i4

e

6

j

7

h8

k9

c

10

f11

end

stack vs. threads

*1

d2

X3

g

5

i4

e

6

*7

X8

k9

X10

f11

nil

Page 57: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Morris Traversal

Morris: basics

inorder successor(to left parents)

b1

d2

a

3

g

5

i4

e

6

j

7

h8

k9

c

10

f11

end

two visits

1 (pre-order)from parentvia child-link (left or right)add thread to current node

2 (inorder)from subtree, via threaddelete thread

algorithm does not know threadsso does not know which visitbut will check!

Page 58: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Morris Traversal

Morris traversal - algorithm

no left subtree:

1st and 2nd visitgo right

(by edge or by thread)

Λ

Curr

new subtree: 1st visitconstruct thread

go left

Λ

Curr

Pred

been there: 2nd visitdelete thread

go right

Curr

Pred

Page 59: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Morris Traversal

Morris traversal - pseudo codemorris-algo

Curr = Root;

while (Curr != nil) do

if (Curr.left = nil) then

inOrderVisit( Curr )

Curr = Curr.right

else

// find predecessor

Pred = Curr.left

while (Pred.right != Curr && Pred.right != nil) do

Pred = Pred.right

od

if (Pred.right=nil) then

// no thread: subtree not yet visited

Pred.right = Curr

Curr = Curr.left

else

// been there, remove thread

Pred.right = nil

inOrderVisit( Curr )

Curr = Curr.right

fi

fi

od

Page 60: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Morris Traversal

alternative view: tree transformation

6

2

1 4

3 5

8

7 9

2

1 4

3 5

6

8

7 9

1

2

4

3 5

6

8

7 9

6

2

1 4

3 5

8

7 9

6

2

1 4

3 5

8

7 9

6

2

1 4

3 5

8

7 9

Page 61: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Link Inversion

features

– use generic traversal– at each step we know which visit– no stack, invert links on path from root– use bit on path (tag) to distinguish left/right

bit stack?– keep parent– global visit counter (pre-/in-/post-order)– single traversal at a time

Page 62: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Tree Traversal

Link Inversion

inverted links

*

binary tree

parent

tag=1

* curr

tag=0

3 visits at *

tag=1

* parent

tag=0

curr

tag=0

after 1st visit

curr

*

tag=0

parent

after 3rd visit

Page 63: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Contents

3 Binary Search TreesIntroductionBST use casesConstructing BSTsAnalysis of treesADT Set and Dictionary

Page 64: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

binary search tree BST3

K

< K > K

Definition

A binary search tree is a binary tree such that for each node:

all nodes in its left subtree have smaller values, and

all nodes in its right subtree have larger values

3BZB, zie Algoritmiek

Page 65: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

comparables

chico

harpo

groucho

gummo

marx

zeppo 4

5

11

18

25

30 11.6.1509

28.5.1533

30.5.1536

6.1.1540

28.7.1540

12.7.1543

Page 66: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

binary search tree BST

worst case search complexity: unsuccessful search in

linear tree: O(n)

optimal tree: O(log2(n)) (complete tree)

Average case behaviour: see later

Page 67: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

BST with 31 most common English words

top five frequencies indicated the15568

to5739

this with

was you

which

of9767

and7638

that

on

or

a5074

in

I is

it

not

for

as his

are be he

at

but

from

have herby

had

Inserted in BST by decreasing order of frequencySuccessful search of BST requires 4.042 comparisons (on avg.)

Page 68: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

balanced BST

a

5074

and7638

are

as

at

be

but

by

for

from

had

have

he

her

his

I

in

is

it

not

of

9767

on

or

that

the

15568

this

to

5739

was

which

with

you

Perfectly balanced BST

Successful search requires 4.393 comparisons (on avg.)

Page 69: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Introduction

optimal BST

are at but from have her I which

as by had his is not or was you

a5074

be he it on this with

and7638

in that to5739

for the15568

of9767

Optimal tree taking frequencies into account

Successful search requires 3.437 comparisons (on avg.)

source: Knuth TAoCP Vol.3 (Sorting and Searching)

Page 70: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

BST use cases

search value

bool contains( const Comparable & x, Node *t ) const {

if( t == nullptr )

return false;

else if( x < t->element )

return contains( x, t->left );

else if( t->element < x )

return contains( x, t->right );

else

return true; // found

}

call with: contains(v,root);

Page 71: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

BST use cases

find min/max value

BinaryNode * findMin( BinaryNode *t ) const {

if( t == nullptr )

return nullptr;

if( t->left == nullptr )

return t;

return findMin( t->left );

}

BinaryNode * findMax( BinaryNode *t ) const {

if( t != nullptr )

while( t->right != nullptr )

t = t->right;

return t;

}

call with: findMin(root); and findMax(root);

Page 72: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

BST use cases

inorder is sorted

81

112

153

204

265

336

347

428

519

5710

6111

inorder : 8 11 15 29 26 33 34 42 51 57 61

Page 73: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

BST use cases

find k-th element

Augment each node with the size of its subtree

51

103

141

206

261

302

3511

391

454

512

561

Let r be left->size + 1

If k = r: stop! This node has kth item

If k < r: search kth item in left subtree

If k > r: search (k − r)th item in right subtree

Page 74: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

BST use cases

counting items in [12, 52]

3

6

9

12

X

15

1

18

X

21

24

2

27

X 60

30

33

4

36

39

42

X

45

148

X

51

X

54

57

Page 75: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Constructing BSTs

insertion (implementation)

template<class T>

void Node<T>::insert(const T& el, Node<T> * & p) {

if( p == nullptr ) {

p = new Node{el, nullptr, nullptr};

} else if (el < p->data) {

insert(el, p->left);

} else if (el > p->data) {

insert(el, p->right);

} else {

; // Duplicate; do nothing

}

}

call with: insert(el,root);

Page 76: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Constructing BSTs

deletion “by copying”

f

×

T1

Λ

=⇒

f

T1

×

T1 T2

=

×

p

Λ

T2

=⇒

p

×

Λ

T2

Page 77: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Constructing BSTs

deletion (implementation)

void remove( const Comparable & x, Node * & t ) {

if( t == nullptr ) return;

if( x < t->data ) remove( x, t->left );

else if( x > t->data) remove( x, t->right );

else if( t->left != nullptr && t->right != nullptr ) {

Node *pred = findMax( t->left );

t->element = pred->element;

remove( t->element, t->left );

}

else {

BinaryNode *oldNode = t;

if(t->left != nullptr ) t = t->left

else t = t->right;

delete oldNode;

}

}

aanroepen met: remove(el,root);

Page 78: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

counting trees

i

Bi−1 Bn−i

Unlabeled n-node binary trees

Bn =∑n−1

i=0 (Bi−1 ·Bn−i) with B0 = 1

nth Catalan number: Bn = 1n+1

(2nn

)= (2n)!

(n+1)!n! ∼4n

n3/2√π

this is also the number of BST with given values:unique way to store values in given [unlabeled] tree

Page 79: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

internal path length

0

1

2 2

1

2ipl = 0 + 1 + 1 + 2 + 2 + 2 = 8

Path length of node: # edges from root to node

Definition (Internal path length)

ipl = sum of all path lengths to all nodes

Avg # comparisons in successful search: ipln + 1

Page 80: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

external path length

0

1

2 2

1

2

E = 3 + 3 + 3 + 3 + 2 + 3 + 3 = 20

Definition (External path length)

E = sum of all path lengths to the ‘extended’ leaves

Avg # comparisons in unsuccessful search: En+1 (n+ 1 leaves)

Relation to ipl: E = ipl + 2n proof: induction

Page 81: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

path length extremal trees

optimal (balanced) worst case (linear)h levels: n = 2h − 1 nodes

h = lg(n+1)

0

1 1

2 2 2 2

0

1

2

6

ipl =∑h−1

i=0 i · 2i, E = 2h · h ipl =∑n−1

i=0 i = n(n−1)2

⇒ ipl = (n+1) lg(n+1)− 2n E = ipl + 2n = n(n+3)2

avg = n+1n lg(n+1)− 1 avg = n+1

2

Page 82: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

average tree

intuition: more balance ⇒ more permutations yield that treeexample: 4-node BSTs

1

2

3

4

1234ipl=6

1

2

4

3

1243ipl=6

1

3

2 4

13241342ipl=5

1

4

2

3

1423ipl=6

1

4

3

2

1432ipl=6

2

1 3

4

213423142341ipl=4

2

1 4

3

214324132431ipl=4

14 BSTs (7 symmetric to above)4! = 24 permutationsaverage ipl: 1

24(12× 4 + 4× 5 + 8× 6) = 11624 = 29

6

Page 83: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

average ipl BST

In average internal path length BST n nodes

insert permutation 1, . . . , n into BST ⇒ tree structurewe average over permutations

5

2

1 4

3

6

7

permutationdetermines left & right subtrees

2 4 1 35

6 7

any k can be root = first elementIn = (n− 1) + 2

n

∑nk=1(Ik−1 + In−k)

Page 84: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

Analysis of trees

telescope!

In average internal path length n nodes

so In = (n− 1) + 2(I0 + I1 + · · ·+ In−1)/n

also In−1 = (n− 2) + 2(I0 + I1 + · · ·+ In−2)/(n− 1)

subtract n In − (n− 1)In−1 = 2n− 2 + 2In−1

thus n In = (n+ 1)In−1 + 2n− 2

In

n+ 1=In−1

n+

2

n+ 1−

2

n(n+ 1)

In−1

n=In−2

n− 1+

2

n−

2

(n− 1)n

. . .

I1

2=I0

1+

2

2−

2

1 · 2In

n+ 1=I0

1+O(lnn)−

2n

n+ 1

Page 85: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Binary Search Trees

ADT Set and Dictionary

ADT Set

Initialize: construct an empty set.

IsEmpty: check whether there the set is empty (∅, containsno elements).

Size: return the number of elements, the cardinality of theset.

IsElement(a): returns whether a given object from thedomain belongs to the set, a ∈ A.

Insert(a): add an element to the set (if it is not present,A ∪ {a})Delete(a): removes an element from the set (if it is present,A \ {a}).

Efficient implementation of ADT Set possible with BST

Page 86: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Contents

4 Balancing Binary TreesTree rotationAVL TreesAdding an item to an AVL TreeDeletion in an AVL TreeSplay Trees

Page 87: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Tree rotation

single rotation

root: p, pivot: q ⇒

p

q

T1

T2 T3

⇐⇒ p

q

T1 T2

T3

⇐ root: q, pivot: p

Page 88: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Tree rotation

double rotation

r

p

q

T1

T2 T3

T4

=⇒ r

p

q

T1 T2

T3

T4

=⇒

rp

q

T1 T2 T3 T4

rotate two times with pivot=q

Page 89: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Tree rotation

Day/Stout/Warren

2

4

6

8

12

2

4

6

8

12

24

68

12

Page 90: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Tree rotation

Day/Stout/Warren Algorithm - createBackBone

rotate(root, pivot) { ... }

createBackBone(root)

tmp = root;

while (tmp != nil) do

if(tmp.left != nil) then

rotate(tmp, tmp.left);

tmp = tmp.left;

else

tmp = tmp.right;

fi

od

Page 91: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Tree rotation

Day/Stout/Warren Algorithm

createCompleteTree(root)

createBackBone(root);

n = number of nodes

m = 2^floor(log(n+1)) - 1;

rotate n-m times at every other node in the backbone

while(m>1) do

m = m/2;

rotate m times at every other node in the backbone

od

Page 92: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

AVL Trees

stl container classes

helper: pair

sequences: contiguous: array (fixed length),vector (flexible length), deque (double ended),linked: forward list (single), list (double)

adaptors: based on one of the sequences:stack (lifo), queue (fifo),based on binary heap: priority queue

associative: based on balanced trees:set, map, multiset, multimap

unordered: based on hash table:unordered set, unordered map,unordered multiset,unordered multimap

Page 93: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

AVL Trees

balance factor

4

3

2

1

0

depth

35-1

20

10

5 14

30-2

26

23

45+1

39 51

56

3

2 height

Page 94: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

AVL Trees

Definition (AVL Tree)

An AVL tree is a BST where for each node: |balance(node)| ≤ 1

6+1

30

9+1

2-1

5-1

8-1

11+1

1 4 7 10 12+1

13

Page 95: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

AVL Trees

Fibonacci ‘worst’ AVL tree

1

3

2

1

5

2

1

4

1

3

2

1

Fh−2Fh−1

Fh = Fh−2 + Fh−1 + 1 ≈ (1+√

(5)

2 )h, thus worst-case search inAVL tree grows O(lgn) in the number of nodes n

Page 96: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Adding an item to an AVL Tree

Adding in left subtree

a)

p+1/0

new node

ok, stop

b)

p0/-1

ok, go up

c)

p-1/-2

=⇒

rebalance (next 2 slides), stop

q0

Page 97: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Adding an item to an AVL Tree

rebalance: LL-case

q-1/-2

p0/-1

=⇒ q0

p0

Page 98: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Adding an item to an AVL Tree

Rebalance: LR-cases

r-1/-2

p0/+1

q0/± 1

OR

=⇒

r+1

p0

q0

OR

r0

p-1

q0

Page 99: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Adding an item to an AVL Tree

example: adding 11

1

20

3

4+1/+2

5

6-1

70/+1

8

90/+1

100/+1

11new

70

40

20

1 3

6-1

5

9+1

8 10+1

11

inbalance at 4, RR-case so rotate at 4 with pivot=7

Page 100: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Adding an item to an AVL Tree

example: adding 5

1

20

3

4+1/+2

6

0/-1

7

0/-1

90/-1

8

10+1

11

5new

70

40

20

1 3

6-1

5

9+1

8 10+1

11

inbalance at 4, RL-case so rotate twice with pivot=7

Page 101: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Deletion in an AVL Tree

Deletion: RR cases

q0

p+1/+2

=⇒ q+1

p-1

q+1

p+1/+2

=⇒ q0

p0

Page 102: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Deletion in an AVL Tree

Delete: RL cases (ε = 0,±1)

p-1

r+1/+2

=⇒

p0,+1

q0

r0,-1

Page 103: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Deletion in an AVL Tree

8-1

5-1

3-1

1+1

2

4

7-1

6

11-1

9+1

10

×

8-1

5-1

3-1

1+1

2

4

7-1

6

11-2

9+1

10

8-2

5-1

3-1

1+1

2

4

7-1

6

100

9 11+1

50

3-1

1+1

2

4

80

7-1

6

100

9 11

Page 104: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Splay Trees

splay zig-zag (LR)

g

p

x

T1

T2 T3

T4

=⇒ gp

x

T1 T2 T3 T4

Page 105: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Splay Trees

splay zig-zig (LL)

g

p

x

T1 T2

T3

T4

=⇒

g

p

x

T1

T2

T3 T4

Page 106: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Balancing Binary Trees

Splay Trees

splay linear tree

1

2

3

4

5

6

7

1

2

3

4

5

6

7

2

3

4

5

1

6

7

2

3

4

5

1

6

7

Page 107: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Contents

5 Priority QueuesADT Priority QueueBinary HeapLeftist heapsPairing Heap (niet)Double-ended Priority Queues

Page 108: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Abstract Data Structures

ADT – what, not how

Definition

An abstract data structure (ADT) is a specification of the valuesstored in the data structure as well as the description andsignatures of the operations that can be performed.

no representation or implementation in ADT“mathematical model”

Page 109: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Abstract Data Structures

stl container classes

helper: pair

sequences: contiguous: array (fixed length),vector (flexible length), deque (double ended),linked: forward list (single), list (double)

adaptors: based on one of the sequences:stack (lifo), queue (fifo),based on binary heap: priority queue

associative: based on balanced trees:set, map, multiset, multimap

unordered: based on hash table:unordered set, unordered map,unordered multiset,unordered multimap

Page 110: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Abstract Data Structures

STL priority queueclass Comp {

public:

int operator() ( const paar& p1, const paar& p2 ) {

return p1.second < p2.second;

}

};

int main() {

vector <paar> club // ’modern’ initialization

{ {"Jan", 1}, {"Piet", 6}, {"Katrien", 5}, {"Ramon", 2} };

using pqtype = priority_queue< paar, vector <paar>, Comp > ;

pqtype pq (club.begin(), club.end() );

// wow! converts into priority_queue

while ( !pq. empty() ) {

cout << pq.top().first << " (" << pq.top().second << ") ";

pq.pop();

}

return 0;

}

Piet (6) Katrien (5) Ramon (2) Jan (1)

Page 111: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

dictionary vs. priority queue

Both store a set of (key,value) pairs

{ (’Detra’,17), (’Nova’,84), (’Charlie’,22), (’Henry’,75), (’Elsa’,29) }

both:Insert(’Roxanne’,29)

dictionary:Delete(’Detra’)Find(’Elsa’) returns 29Set(’Henry’,76)

priority queue:FindMax() returns (’Nova’,84)DeleteMax()

Page 112: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

ADT dictionary / map / associative array

Stores a set of (key,value) pairs

Initialize, IsEmpty, Size

Insert: add (key,value) pair, provided key is not yet present

Delete: deletes (key,value) pair, given the key

Find: returns the value associated to a given key

Set: reassigns a new value to a (existing) given key

usually implemented as (balanced) binary serach tree,or hash table “unordered”

Page 113: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

ADT priority queue

Initialize: construct an empty queue.

IsEmpty: check whether there are any elements in the queue.

Size: returns the number of elements.

Insert: given a data element with its priority, it is added tothe queue

DeleteMax: returns a data element with maximal priority,and deletes it.

GetMax: returns a data element with maximal priority.

IncreaseKey: given an element with its position in thequeue it is assigned a higher priority.

Meld, or Union: takes two priority queues and returns anew priority queue containing the data elements from both.

Page 114: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

min & max queues

max-queue ≥

Initialize, IsEmpty, Size, Insert, DeleteMax, GetMax,IncreaseKey, Meld

min-queue ≤

Initialize, IsEmpty, Size, Insert, DeleteMin, GetMin,DecreaseKey, Meld

even opletten welke ordening

er staat vaak ook data (niet alleen prioriteit)

Page 115: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

priority queue - use cases

sorting (heapsort)

graph algorithms (Dijkstra shortest path, Prim’s algorithm)

compression (Huffman)

operating systems: task queue, print job queue

discrete event simulation

Page 116: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

ADT Priority Queue

implementations

Binary Leftist Pairing Fibonacci Brodal

GetMax Θ(1) Θ(1) Θ(1) Θ(1) Θ(1)Insert O(log n) Θ(log n) Θ(1) Θ(1) Θ(1)DeleteMax Θ(log n) Θ(log n) O(log n)† O(log n)† O(log n)

IncreaseKey Θ(log n) Θ(log n) O(log n)† Θ(1)† Θ(1)Meld Θ(n) Θ(log n) Θ(1) Θ(1) Θ(1)† amortized complexity

“. . . is based on heap ordered trees where [. . . ] nodes may violateheap order.” “The data structure presented is quite complicated.”

Page 117: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

binary search tree vs heap order

35

20

10

5 14

30

26

23

45

39 51

56

83

70

10

5 7

30

26

23

45

39 37

3

Page 118: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

representing binary tree with an array

root at index 1, left/right child i at index 2i/2i+1.

1

10 11

100 101 110 111

1000 1001 1010 1011 1100

33

42 17

8 24 3 3

98 55 10 19 5

33

1

42

2

17

3

8

4

24

5

3

6

3

7

98

8

55

9

10

10

19

11

5

12

works well for complete binary treeswaste of space when ‘missing’ nodes

Page 119: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

binary heap: three levels

functioning: abstract (priority queue)

understanding: binary tree

implementation: array

internal operations (change key at position):bubble up, trickle down

“To add an element to a heap we must perform an up-heap operation(also known as bubble-up, percolate-up, sift-up, trickle-up, swim-up,heapify-up, or cascade-up), . . . ” What’s in a name? [Wikipedia]

Page 120: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

increasekey / bubble up

98

57 55

42 24 17 3

8 33 10 19 71 13

981

572

553

424

245

176

37

88

339

1010

1911

x

12

711313

98

57 71

42 24 55 3

8 33 10 19 17 13

981

572

713

424

245

556

37

88

339

1010

1911

1712

1313

BubbleUp : swap with parent until heap-ordered

Page 121: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

decreasekey / trickle down

37

57 55

42 24 17 3

8 33 10 19 5 13

x

1

37572

553

424

245

176

37

88

339

1010

1911

512

1313

57

42 55

37 24 17 3

8 33 10 19 5 13

571

422

553

374

245

176

37

88

339

1010

1911

512

1313

TrickleDown : swap with largest child until heap-ordered

Page 122: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

Insert to priority queue

98

57 55

42 24 17 3

8 33 10 19 5 13 29

981

572

553

424

245

176

37

88

339

1010

1911

512

1313 14

29

98

57 55

42 24 17 29

8 33 10 19 5 13 3

981

572

553

424

245

176

297

88

339

1010

1911

512

1313

314

Insert: add as last, BubbleUp

Page 123: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

DeleteMax from priority queue

98 98

57 55

42 24 17 3

8 33 10 19 5 13

981

13572

553

424

245

176

37

88

339

1010

1911

512

1313

x

57

42 55

33 24 17 3

8 13 10 19 5

571

422

553

334

245

176

37

88

139

1010

1911

512

DeleteMax: move last element to root, trickleDown

Page 124: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

heapify (1)

33

42 17

8 24 13 3

98 57 10 19 5 55

331

422

173

84

245

136

37

988

579

1010

1911

512

5513

33

42 17

98 24 55 3

8 57 10 19 5 13

331

422

173

984

245

556

37

88

579

1010

1911

512

1313

TrickleDown new key: swap with parent until heap-ordered

Page 125: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

heapify (2)

33

42 17

98 24 55 3

8 57 10 19 5 13

331

422

173

984

245

556

37

88

579

1010

1911

512

1313

33

98 55

57 24 17 3

8 42 10 19 5 13

331

982

553

574

245

176

37

88

429

1010

1911

512

1313

Page 126: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

heapify (3)

33

98 55

57 24 17 3

8 42 10 19 5 13

331

982

553

574

245

176

37

88

429

1010

1911

512

1313

98

57 55

42 24 17 3

8 33 10 19 5 13

981

572

553

424

245

176

37

88

339

1010

1911

512

1313

Page 127: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Binary Heap

complexity heapify

Lemma∑hd=0 d2d = (h− 1)2h+1 + 2

n levels, N = 2n − 1 keys

top-down∑n−1`=0 2`` = (n− 2)2n = N lgN (ongeveer)

bottom-up∑n−1`=0 2`(n− 1− `) =

∑n−1`=0 2`(n− 1) +

∑n−1`=0 2`` = 2n − n− 1

which is O(N)

Page 128: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

leftist heaps

“bladafstand”npl(x) nil path length, shortest distance to external leaf

Definition (Leftist tree)

An (extended) binary tree where for each internal node x,npl(left(x)) ≥ npl(right(x)).

Definition (Leftist heap)

A leftist tree where the priorities satisfy the heap order.

structure vs. node order

Page 129: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

leftist tree (structure)

npl(left(x)) ≥ npl(right(x))

3

2

1

1

2

1

1

2

1 1

2

1 1

1

3

2

2

2

1 1

1

1

1

1

2

1 1

1

Page 130: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

basic (internal) operation: ZIP

a b

T1 T2 T3 T4

︷ ︸︸ ︷Zipa

bT1

T2

T3 T4

︷ ︸︸ ︷Zipa ≥ b

Page 131: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

example (step 1: recursive Zipping)

38

37 25

29 10

35

31 32

28 30

Zip︷ ︸︸ ︷38

37

29

25

10

35

31 32

28 30

Zip︷ ︸︸ ︷ 38

37

29 25

10

35

31

28

32

30

Zip︷ ︸︸ ︷

Page 132: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

example (step 2: bottom-up swapping)

382

371

352

29 311

322

28 301

251

101

38

3735

293132

2830 25

10

Page 133: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

complexity

Lemma

Let T be a leftist tree with root v such that npl(v) = k, then(1) T contains at least 2k − 1 (internal) nodes, and(2) the rightmost path in T has exactly k (internal) nodes.

3

2 2

1 1 2 1

2 1

2 1

. . . . . .

Page 134: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

priority queue operations: Insert

Zip︷ ︸︸ ︷38

37 25

29 10

27

38

2737

29 25

10

38

2737

29 25

10

Page 135: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Leftist heaps

priority queue operations: DeleteMax

38

37 25

29 10

38

37 25

29 10

︷ ︸︸ ︷ 37

29 25

10

Page 136: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

dual structure min-max heap

3

11 5

14 15 9

31

4

112

5

53

6

144

2

155

1

96

3

-

7

15

14 9

3 11 5

151

5

142

4

93

6

34

1

115

2

56

3

-

7

Pointer from min-heap item to same item in max-heap

Insertion: as in ordinary heap, but twice: once in each heap

Deletion: find item to delete in other heap using pointer,move last element to that position and do normal deletion

Page 137: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

interval heap

2-92

8-80 11-75

17-69 42-70 44-73 14-39

24-33 23-65 55-60 44-50 54-57 61

[8,80] ⊆ [2,92]

Page 138: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

interval heap: insert

2-92

11-75

44-73 14-39

54-57 6180

2-92

11-75

44-73 14-39

54-57 61-80

2-92

11-80

44-75 14-39

54-57 61-73

Page 139: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

embedded min&max heap

2

8 11

17 42 44 14

24 23 55 44 54 61

92

80 75

69 70 73 39

33 65 60 50 57

Page 140: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

Double ended priority queue - use case

wikipedia

One example application of the double-ended priority queue isexternal sorting. In an external sort, there are more elementsthan can be held in the computer’s memory.

Page 141: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Priority Queues

Double-ended Priority Queues

Quiz4

AVL boom

71

42

23

14 35

56

67

98

89 110

voeg 7 toe

Binary min-heap

14

23

42

98 71

56

67

35

89 110

70

voeg 7 toe

4A quiz is a brief assessment used in education to measure growth in knowledge,abilities, and/or skills. Wikipedia

Page 142: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Contents

6 B-TreesDefinition & InsertionDeleting KeysRed-Black Trees

Page 143: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

AVL-tree and B-tree

6+1

30

9+1

2-1

5-1

8-1

11+1

1 4 7 10 12+1

13

10 20 25 32 34 40 41 44 46 52 54 58 60

30 38 50 56

42

Page 144: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

multiway search tree

K

T0 T1T0 T1 T2 T`

K1K2 . . . K`

T0 < K1 < T1 < · · · < K` < T`

Page 145: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

B-tree (Bayer & McCreight, 1972)

Definition

A B-tree of order m is a multi-way search tree such that

every node has at most m children(contains at most m− 1 keys),

every node other than the root has at least dm2 e children(contains at least dm2 e − 1 keys),

the root contains at least one key, and

all leaves are on the same level of the tree.

Page 146: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

B-tree of order 5

3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83

13 29 41 49 69 77

61

order m = 5: between dm2 e − 1 = 2 and m− 1 = 4 keys.

Page 147: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

adding keys

Add the new key to a leaf.

When at maximal capacity, split leaf, move middle key up.Recurse.

Splits can reach the root. We then obtain a new root with asingle key.

Page 148: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

adding a key (order 5)

10 20 25 32 38 40 41 44 50 56

30 42

+34

32 34 38 40 41

10 20 25 32 34 40 41 44 50 56

30 38 42

Page 149: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

adding more keys (order 5)

10 20 25 32 34 40 41 44 50 56

30 38 42

+58,+60

44 50 56 58 60

10 20 25 32 34 40 41 44 50 58 60

30 38 42 56

Page 150: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Definition & Insertion

adding even more keys (order 5)

10 20 25 32 34 40 41 44 50 58 60

30 38 42 56

+46,+52,+54

44 46 50 52 54

30 38 42 50 56

10 20 25 32 34 40 41 44 46 52 54 58 60

30 38 50 56

42

Page 151: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Deleting Keys

deleting keys

For non-leafs: swap key with predecessor (key moves to a leaf)

Deleting from leafs:

If below minimal capacity, move key from sibling with surplusleafs to parent, move from parent to underfull node.If no siblings with surplus leafs: merge with sibling and getseparating key from parent. Recurse with parent.

Due to recursion with parent, deletion may reach the root, and cancollapse a level.

Page 152: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Deleting Keys

deleting keys (order 5)

OK

10 20 25 32 34 40 41 42

30 38

45

Page 153: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Deleting Keys

deleting keys (order 5)

10 20 25 32 34 40 41 42

30 38

45

swap predecessor

40 41 ×

30 38

42

Page 154: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Deleting Keys

deleting keys (order 5)

10 20 25 32

borrow(‘via’ parent)

34 40 41 42

30 38

45

10 20 × 30 34 40 41 42

25 38

45

Page 155: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Deleting Keys

deleting, ctd (order 5)

10 20 25 32 34 40

underfull:merge brother

41

×

30 38

42

10 20 25 32 34 38 40

30

underfull:merge brother

× 50 56

42

new root 30 42 50 56

Page 156: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

2-4-tree to red-black tree

20 37 40 41 44 50

30 42 4230

20 4037 41 44 50

42

30

20 40

37 41

44

50

Page 157: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

2-4-tree vs red-black tree

a b c

b

a c

a b

b

a

a

b

a

a

Page 158: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

red-black tree

Definition

A red-black tree is a

binary search tree

such that each node is either black or red, where

the root is black,

no red node is the son of another red node,

the number of black nodes on each path from root toextended leaf (NIL-pointers) is the same.

Page 159: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

examples

42

30

20 40

37 41

44

50

40

30 42

20 37 41 44

35 50

Page 160: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

fun fact

every AVL-tree can be red-black coloured.

1

3

2

1

5

2

1

4

1

3

2

1

Page 161: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

insertion in red-black tree

Insert as red leaf. Red node with red parent then:

If uncle is red: flag-flip. Continue at grandparent.

If uncle is black: rotate (see AVL-trees), Repaint and Stop.

g

p u

x

flag flip=⇒

g

p u

x

g

p u

x

rotation=⇒

p

x g

u

If the root has been coloured red, make it black.

Page 162: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

just classical single/double rotation

42

30

20

30

20 42

42

30

40

40

30 42

Page 163: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

example: adding key

42

30

20 40

37 41

44

50

35new

42

30

20 40

37 41

44

50

35

40

30 42

20 37 41 44

35 50

Page 164: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

B-Trees

Red-Black Trees

GNU C++ stl tree.h

“Red-black tree class, designed for use in implementing STLassociative containers (set, multiset, map, and multimap). Theinsertion and deletion algorithms are based on those in Cormen,Leiserson, and Rivest, Introduction to Algorithms (MIT Press,1990), except that . . . ”

Linux“There are a number of red-black trees in use in the kernel. The anticipatory,deadline, and CFQ I/O schedulers all employ rbtrees to track requests; the packetCD/DVD driver does the same. The high-resolution timer code uses an rbtree toorganize outstanding timer requests. The ext3 filesystem tracks directory entries in ared-black tree. Virtual memory areas (VMAs) are tracked with red-black trees, as areepoll file descriptors, cryptographic keys, and network packets in the ”hierarchicaltoken bucket” scheduler.” lwn.net/Articles/184495/

Page 165: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Contents

7 GraphsDefinitionRepresentationGraph traversalDisjoint Sets, ADT Union-FindMinimal Spanning TreesShortest Paths

Page 166: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Definition

graph definition

zie Algoritmiek!

Definition

A graph is a pair G = (V,E) where:

V is a set of vertices, or nodes

E ⊆ V × V is a set edges, or arcs, lines

directed / undirected

vertices / edges can have labels (string, number)

complexity in |V | and |E|. |E| ≤ |V |2

Page 167: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Representation

adjacency matrix

1

2 3 4

56

7

1 2 3 4 5 6 7

1 · 1 · · · 1 ·2 · · 1 · · · ·3 · · · · 1 1 14 · · · · · · ·5 · · · 1 · 1 ·6 1 · · · · · 17 1 · · · · · ·

Page 168: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Representation

adjacency lists

1

2 3 4

56

7

1

2

3

4

5

6

7

2 6

3

5 6 7

4 6

1 7

1

Page 169: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

depth first search

Recursive DFSvoid DFS(v)

{ visit(v)

mark(v)

for each w adjacent to v

do if w is not marked

DFS(w)

fi

od

}

Iterative DFS// start with unmarked nodes

S.push(init) // S.push((init,init))

while S is not empty

do v = S.pop() // (p,v) = S.pop

if v is not marked

then mark v

// add (p,v) to DFS tree (if p!=v)

for all edges (v, w)

do if w is unmarked then

S.push(w) // S.push((v,w))

fi

od

fi

od

Page 170: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

dfs tree (directed)

b

a

c

ed

f

g

b2

a1c 5

e4

d

7 f 6

g3forward

back

back

cross

b6

a1c 2

e5

d

4 f 3

g7back

forward

forward back

DFS tree with edge classification (tree, back, cross, forward)not unique

Page 171: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

dfs edges

1

2

3

4

5

6

7

forward back

cross

1

2

3

4

5

6

7

back

Page 172: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

applications of DFS

topological sorting

articulation points

A DFS traversal itself and the forest-like representation of the graph itprovides have proved to be extremely helpful for the development of efficientalgorithms for checking many important properties of graphs. Note thatthe DFS yields two orderings of vertices: the order in which the vertices arereached for the first time (pushed onto the stack) and the order in whichthe vertices become dead ends (popped off the stack). These orders arequalitatively different, and various applications can take advantage of eitherof them. [Levitin, Design & Analysis of Algorithms]

Page 173: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

application: topological sort

1 2 3

4 5

6 7 8

1 2 34 56 7 8

Page 174: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

Let G = (V,E) be a directed graph.

Definition

A topological ordering [or sort] of G is an ordering (v1, . . . , vn) ofV , such that if (vi, vj) ∈ E then i < j.

finding a topological sort:

1 pick node without incoming edges

2 remove outgoing edges from that node and go to step 1.

(or use depth-first search)

Page 175: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

application: topological sort

11,6

pre,post

27,7 38,8

44,5 52,2

65,3 73,1 86,4

1 2 345 67 8

Page 176: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

application: articulation points

12

3

45

67

8

9

10

11

12

13

Page 177: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

articulation points with dfs tree

12

3

45

67

8

9

10

11

12

13 12

3

4

5

6

7

8

9 10

1112

13vertex v is an articulation point if

v is the root, and has two or more children

v has a subtree where no node has a back edge reachingabove v

Page 178: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

breadth-first search (BFS)

Iterative BFS// Q is a queue of vertices

// start with unmarked nodes

Q.enqueue(init)

dist[init] = 0

while Q is not empty

do v = Q.dequeue()

if v is not marked

then newdist = dist[v] + 1

for all edges (v, w)

do if w is not marked

then Q.enqueue(w)

mark w

dist[w] = newdist

fi

od

fi

od

Page 179: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Graph traversal

bfs: ’floodfill’

4

3

4

3

2

1

2

5

4

5

4

3

2

3

6

5

6

5

4

3

4

7

6

7

6

5

4

5

8

7

8

7

6

5

6

9

8

9

8

7

6

7

10

9

10

9

8

7

8

0 1

1 2

2 3

Page 180: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

minimal spanning tree

A B

C D E

F G

H

2

2 4 77 6

29 2

4

5

5

A B

C D E

F G

H

2

2 4

22

4

5

Definition (Minimal spanning tree of weighted graph)

A tree containing all nodes of the graph, with minimal total sum ofedge weights

Page 181: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

minimal spanning tree Kruskal vs. Prim

A B

C D E

F G

H

22

22

4

A B

C D E

F G

H

22 4

22

4

A B

C D E

F G

H

22 4

22

4 5

A B

C D4

E7

F9

22 4 7

7

9

A B

C D E6

F2

22 4 7

7 62

9

A B

C D E6

F G2

H4

22 4 7

7 62

9 2

4

Page 182: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

minimal spanning tree - Kruskal

High-level algorithm:

repeatconsider edge with smallest weightif it does not yield a cycle

add it to the treeotherwise discard the edge

until no edges left

Page 183: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

spanning tree

1 2

3

4 5

6 7

8

910

?

?

edges that do not cause a cycle use union-find ADT

Page 184: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

partition domain D = {1, 2, . . . , n}each set has a name, a representative

Initialize: construct the initial partition; each componentconsists of a singleton set {d}, with d ∈ D.

Find: retrieves the name of the component, i.e,Find(u) = Find(v) iff u and v belong to the same set in thepartition.

Union: given two elements u and v, the sets they belong toare merged. Has no effects when u and v already belong tothe same set.Usually it is assumed that u, v are representatives, i.e., namesof components, not arbitrary elements.

Page 185: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

Union-Find implementation with path-compression

1 2 3 4 5 6 7 8 9 10

1 2 1 4 5 9 6 5 9 9 parent2 1 . 1 2 . . . 4 . size

1 2

3

4 5

6

7

8

9

10

Page 186: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

Union-Find implementation with path-compression

v

T1 T2

w

T1 T2

v

T1 T2

w

T1 T2

v

T4

T3

T2

T1

v

T4T3T2

T1

Page 187: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

minimal spanning tree - Kruskal

detailed algorithm with priority queue and union-find ADTs:Kruskal

KRUSKAL(G):

A = emptyTree

PQ = empty

foreach v:

MAKE-SET(v)

PQ.insert( weight(u,v), (u,v) )

repeat until PQ is empty:

(u,v) = PQ.DELETE-MIN()

if (FIND-SET (u) != FIND-SET(v)) then

A.add((u, v))

UNION(u, v)

fi

return A;

Page 188: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

Algorithms from the Book

116 union-find Galler and Fischer109 Knuth-Morris-Pratt pattern matching

94 Blum,Floyd,Pratt,Rivest,Tarjan median89 binary search84 Floyd-Warshall all-pairs shortest path79 Euclidean algorithm greatest common divisor (GCD)73 quicksort Tony Hoare59 Huffman coding data compression51 Miller-Rabin primality test50 Schwartz-Zippel lemma polynomial identity46 depth first search42 sieve of Eratosthenes primes42 Dijkstra shortest path

3.11’19

Page 189: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

Primcost[source] = 0 // infinite for other nodes

prev[source] = 0 // code for the root

Q = V // all vertices

while Q is not empty

do u is node in Q with minimal cost[u]

remove u from Q

for each edge (u,v) with v outside tree

do if length(u,v) < cost[v]

then cost[v] = length(u,v)

prev[v] = u

fi

od

od

high-level algorithm:

initialize tree with randomly chosen node

repeat until all vertices are connected:link unconnected node attached to edge with minimum weight

Page 190: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Minimal Spanning Trees

directed graphs not supported

u

6

4

2

Prim fails

u

4

6

21

Kruskal fails

Page 191: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

Dijkstra1 dist[source] = 0 // infinite for other nodes

2 prev[source] = 0 // code for the root

3 PQ = V // all nodes

4 while PQ is not empty

5 do u is node in Q with minimal dist[u]

6 remove u from Q

7 for each edge (u,v)

8 do newdist = dist[u] + length(u,v);

9 if newdist < dist[v]

10 then dist[v] = newdist

11 prev[v] = u

12 fi

13 od

14 od

finds shortest path from fixed source node to all other nodesalso: shortest path from source to target node

Page 192: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

distance vs. bottleneck

A B

C D E

F G

H

22 4 7

7 62

9 2

4

5

5

A2

B4

C0

D6

E11

F8

G10

H12

22 4 7

22

4

A4

B6

C∞

D7

E6

F9

G5

H5

4 77 6

9 5

5

Page 193: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

all pairs distance

Lk(i, j) = min(Lk−1(i, j), Lk−1(i, k) + Lk−1(k, j)).Floyd-Warshall

// initially dist equals the adjacency matrix

for each edge (i,j)

do prev[i,j] = i

od

for k from 1 to n

do for i from 1 to n

do for j from 1 to n

do if dist[i,k] + dist[k,j] < dist[i,j]

then dist[i,j] = dist[i,k] + dist[k,j]

prev[i,j] = prev[k,j]

fi

od

od

od

Page 194: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

example Floyd

1

2

3

9

3

6 2

Page 195: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

Floyd

partial result A3, and distances via node 4

A3 =

0 2 1 63 0 1 44 1 0 5−2 0 −1 .

. 6 + 0 6− 1 64− 2 . 4− 1 45− 2 5 + 0 . 5−2 0 −1 .

Page 196: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

path reconstruction

Path-reconstructionPath(u, v)

if prev[u][v] = null then

return []

path = [v]

while u != v do

v = prev[u][v]

path.insert_at_begin(v)

od

return path

Page 197: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Graphs

Shortest Paths

Warshall// initially conn equals the adjacency matrix

// with additionally 1=true on the diagonal

for k from 1 to n

do for i from 1 to n

do for j from 1 to n

do conn[i,j] = conn[i,j] or ( conn[i,k] and conn[k,j] )

od

od

od

Page 198: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Contents

8 Hash TablesPerfect Hash FunctionOpen AddressingChainingChoosing a hash function

Page 199: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

ADT map, dictionary

associative array, [hash-]map, symbol table, or dictionary wiki

is composed of a collection of (key, value) pairs;each possible key appears at most once

find insert deleteav wc order

unordered list n 1 n nobin tree log n n log n n log n n yesbalanced log n log n log n yeshash table n n n no

worst case

Page 200: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

ADT map, dictionary

associative array, [hash-]map, symbol table, or dictionary wiki

is composed of a collection of (key, value) pairs;each possible key appears at most once

find insert deleteav wc av wc av wc order

unordered list n 1 n nobin tree log n n log n n log n n yesbalanced log n log n log n yeshash table 1 n 1 n 1 n no

av=average, wc=worst case

Page 201: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Hashing

Store keys of arbitrary size (usually large domains) in table offixed size (usually small)

Hash table: ADT that performs finds, insertions and deletionsin (on avg) constant time

Used to implement unordered sets, maps (C++ STL, Java),store passes, checksums (MD5, CRC32)

Hash function calculates position in table: h(k)mod TableSize

Collision: attempt to store key k when h(k) is occupied

Collision resolution

perfect hashing: Keys are known a-priori; can avoid collisions

open addressing: Collision resolved by storing key elsewhere

chained hashing: Store multiple keys at the same address(i.e. table entries are linked lists of items with same hash)

Page 202: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Perfect Hash Function

Cichelli

h(w) = |w|+ v(first(w)) + v(last(w)), with v defined by:

a b c f g h i l m n p r s t u v w y11 15 1 15 3 15 13 15 15 13 15 14 6 6 14 10 6 13

Value for other letters: 0

2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

do

end

else

case

dow

nto

goto

to oth

erw

ise

typ

ew

hile

con

std

ivan

dse

tor of m

od

file

reco

rdp

acke

dn

otth

enpr

oce

du

rew

ith

rep

eat

var

in arra

yif n

ilfo

rb

egin

un

til

lab

elfu

nct

ion

prog

ram

h(goto) = |goto|+ v(g) + v(o) = 4 + 3 + 0 = 7

h(const) = |const|+ v(c) + c(t) = 5 + 1 + 6 = 12

Page 203: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

Open Addressing

If insert causes collision, attempt to store hash elsewhere

Extend hash function with extra parameter i, the number ofthe attempt to store the key

General structure of hash function:h(k, i) = (g(k)− f(i)) mod TableSize

Linear probing: f(i) is linear (in i), i.e. f(i) = i

Quadratic probing: f(i) is quadratic (in i), i.e. f(i) = i2

Double hashing: h(k) = (g(k)− i ∗ f(k)) mod TableSize

Insert, find and remove operations

Insert/Find: probe at h(k, 0), h(k, 1), h(k, 2), ...

Delete: keep tag for each cell: active, deleted, empty

Page 204: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

linear

keys 605, 297, 748, 385, 198, 231 and 407address function g(K) = K mod 11probe function f(i) = ihash function h(K, i) = ((K mod 11)− i) mod 11

Page 205: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

linear

keys 605, 297, 748, 385, 198, 231 and 407address function g(K) = K mod 11probe function f(i) = ihash function h(K, i) = ((K mod 11)− i) mod 11

0 1 2 3 4 5 6 7 8 9 10

38 60

38

29 74

23 38 60 19 29 74

19

23 40 38 60 19 29

40

74

Page 206: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

linear step size f(i) = 3i

h(K, i) = ((K mod 10)− 3i) mod 100 1 2 3 4 5 6 7 8 9

65 32 43 55 72 19

Neighbors (relative to step size 3):0 3 6 9 2 5 8 1 4 7

65 43 72 19 32 55

Primary clustering: keys with “nearby” hash in same cluster

Careful! Pick step size coprime to table size, if not, insertscan fail even if table is not full: not all positions are probed

Page 207: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

quadratic

keys 605, 297, 748, 385, 198, 231 and 407address function h(K) = K mod 11probe function: f(i) = ±i2hash function: h(K, i) = (h(K)± i2) mod 11probes at h(K)± 1, h(K)± 4, h(K)± 9, . . .

Page 208: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

quadratic

keys 605, 297, 748, 385, 198, 231 and 407address function h(K) = K mod 11probe function: f(i) = ±i2hash function: h(K, i) = (h(K)± i2) mod 11probes at h(K)± 1, h(K)± 4, h(K)± 9, . . .

0 1 2 3 4 5 6 7 8 9 10

38 60

38

29 74

23 38 60 29 74 19

19

23 38 60 40 29

40

74 19

Secondary clustering: only keys with same hash cluster

Page 209: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

double

keys 605, 297, 748, 385, 198, 231 and 407table size: 11address function g(K) = K mod 11probe function p(K) = (K mod 4) + 1hash function h(K) = (g(K)− i ∗ p(K)) mod 11

Page 210: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

double

keys 605, 297, 748, 385, 198, 231 and 407table size: 11address function g(K) = K mod 11probe function p(K) = (K mod 4) + 1hash function h(K) = (g(K)− i ∗ p(K)) mod 11

0 1 2 3 4 5 6 7 8 9 10

38 6038

29 74

23 38 19 60 29 7419

23 38 19 60 40 29

40

74

Minimize clustering: diff. probes even for keys with same hash

Page 211: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Open Addressing

find / successful (α = 0.5− 0.8) add / unsuccessful (α = 0.5− 0.8)

linear 12(1 + 1

1−α) 1.5–3 12(1 + 1

(1−α)2 ) 2.5–13

quadratic 1 + ln( 11−α)− α

2 1.4–2.2 11−α − α+ ln( 1

1−α) 2.2–5.8

double 1α ln( 1

1−α) 1.4–2.0 11−α 2–5

1

2

3

4

5

6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Exp

ecte

dpr

obes

Load factor α

linear hashquadraticdouble hash

Page 212: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Chaining

chaining

keys 605, 297, 748, 385, 198, 231 and 407table size: 11address function h(K) = K mod 11

0 1 2 3 4 5 6 7 8 9 10Λ

?23Λ

Λ Λ Λ?

?

38

60Λ

Λ?

?

?

?

29

40Λ

19

74Λ

Λ Λ

Page 213: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Choosing a hash function

A good hash function h(K) should

be fast to compute, and

evenly and deterministically distribute the keys over the table

depend on all “distinctive bits” of the key K

Techniques:

• extraction: compute address based on selected bits of key

• division: address = key mod TSize, choose TSize carefully

• folding: chop key into parts, combine (add/xor) parts

• mid-squaring: square key and take middle bits

Page 214: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Hash Tables

Choosing a hash function

MurmurHash

Murmur3_32(key, len, seed)

// integer arithmetic with unsigned 32 bit integers.

c1 := 0xcc9e2d51

c2 := 0x1b873593

r1 := 15

r2 := 13

m := 5

n := 0xe6546b64

hash := seed

for each fourByteChunk of key

k := fourByteChunk

k := k * c1

k := (k << r1) OR (k >> (32-r1))

k := k * c2

hash := hash XOR k

hash := (hash << r2) OR (hash >> (32-r2))

hash := hash * m + n

with any remainingBytesInKey

\\ (also do Endian swapping on big-endian machines.)

remainingBytes := remainingBytesInKey * c1

remainingBytes := (remainingBytes << r1) OR (remainingBytes >> (32 - r1))

remainingBytes := remainingBytes * c2

hash := hash XOR remainingBytes

hash := hash XOR len

hash := hash XOR (hash >> 16)

hash := hash * 0x85ebca6b

hash := hash XOR (hash >> 13)

hash := hash * 0xc2b2ae35

hash := hash XOR (hash >> 16)

Page 215: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Contents

9 Data CompressionHuffman CodingLempel-Ziv-WelshBurrows-Wheeler

Page 216: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

lossless (omkeerbaar)

GIF

2222

2222

2222

2222

0000

0000

0000

0000

0000

0000

0000

0000

0 0 0 0

111

111

111

111

1 1

0 0 0

colour table012

Page 217: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

lossless (omkeerbaar)

Huffman vs. LZW coding

prefix code trie

a

b e

0 1

0 1

f

c d

0 1

0 1

0 1

1 2 3

a b c

4 5

6

7b a b

c

8

9

10

11b

a

a

a

e 7→ 011f 7→ 10

cb 7→ 00111 7aaa 7→ 01011 11

Page 218: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

lossless (omkeerbaar)

Huffman vs. LZW coding

a

b e

0 1

0 1

f

c d

0 1

0 1

0 1

1 2 3

a b c

4 5

6

7b a b

c

8

9

10

11b

a

a

a

frequencies given self learningsingle letter to variable bits variable string to fixed lengthprefix code-tree trie- letters as leafs - letters along edges- bits left/right - code in nodestore code decoder learns too

Page 219: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

va-kan-tie-oord

Morse

H V F L P J B X C Y Z Q

S U R W D K G O

I A N M

E T

dot dash

BYOXO Are you trying to weasel out of our deal?tks om bv cu 73 thanks old-man bon-voyage see-you best regards

Page 220: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

a19

b

8c9

d

9e7

Shannon–Fano Huffman

a d c

b e

0 1

0 1

0 1

0 1

19 + 9

9 + 8 + 7

a

d c b e

0 1 0 1

0 1

0 1

2 · (19 + 9 + 9) + 3 · (8 + 7) = 119 1 · 19 + 3 · (9 + 9 + 8 + 7) = 118 :)

Page 221: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

David Albert Huffman (1925–1999)

photo: 1978, UCSC

(maa.org)

photo: 1991, Matthew Mulbry

(SciAm / huffmancoding.com)

Page 222: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

Huffman (1952)

variable length code (bitstring) for single lettersa1, . . . , an ∈ Σ 7→ w1, . . . , wn ∈ {0, 1}∗

based on character frequencies (known in advance)f1, . . . , fn

optimal expected code length (for prefix code)n∑i=1

fi · |wi|

code has to be known by decoder

‘old’ Shannon-Fano algorithm not always produces optimal code

Page 223: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

Huffman// initialize:

for each input letter: create tree with that letter

and its frequency

repeat until one tree left:

take two trees of minimal frequencies

join these as children in a new tree,

with combined (summed) frequency

Page 224: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

a18

b

8c

12

d

13e7

f

21a18

c12

d

13

f

21

b e

15

0 1

a18

f

21 15

b e

0 1

c d

25

0 1

33

a

b e

0 1

0 1f

21

c d

25

0 1

33

a

b e

0 1

0 1

46

f

c d

0 1

0 1

79

a

b e

0 1

0 1

f

c d

0 1

0 1

0 1

Page 225: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Huffman Coding

keuzes, keuzes, . . .

a:10, b:5, c:5, d:5

a b c d

0 10 1

0 1

2 · 10 + 2 · 5 + 2 · 5 + 2 · 5 = 50

a

b

c d

0 1

0 1

0 1

1 · 10 + 2 · 5 + 3 · 5 + 3 · 5 = 50

Page 226: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Lempel-Ziv-Welsh

Ziv-Lempel & Welsh (1977, 1984)

fixed length code for repeating patterns in inputx1, . . . , xn ∈ Σ∗ 7→ w1, . . . , wn ∈ {0, 1}k

strings xi plus code is learned while reading input

code is also learned by decoder and does not have to betransmitted

Page 227: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Lempel-Ziv-Welsh

Ziv-Lempel & Welsh - compression

ZLW-compressinitialize dict with codes for single characters

w = "";

while ( not end of input )

do

read next character c

if w+c exists in the dict

w = w+c;

else

add to dict: w+c;

output code(w);

w = c;

fi

od

output code(w)

Page 228: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Lempel-Ziv-Welsh

input abab cbab abaa aa

w c dict? output new code

a Xa b × 1 4 7→ abb a × 2 5 7→ baa b Xab c × 4 6 7→ abcc b × 3 7 7→ cbb a Xba b × 5 8 7→ babb a Xba b Xbab a × 8 9 7→ babaa a × 1 10 7→ aaa a Xaa a × 10 11 7→ aaaa ⊥ 1

1 2 3

a b c

4 5

6

7

b a b

c

8

9

10

11

b

a

a

a

0 (end)

1 a2 b3 c

4 ab5 ba6 abc7 cb8 bab9 baba10 aa11 aaa

Page 229: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Lempel-Ziv-Welsh

Ziv-Lempel & Welsh - decompression

Decoding 1 2 4 3 5 8 1 10 1.code text new codes

1, 2, 3 7→ a, b, c initialization1 a we learn the new code one step late2 b 4 7→ ab last text + first letter4 ab 5 7→ ba3 c 6 7→ abc5 ba 7 7→ cb8 bab 8 7→ bab the new code is too late! is of the

form last text (ba) + first (b)1 a 9 7→ baba10 aa 10 7→ aa too late again1 a 11 7→ aaa

Page 230: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Lempel-Ziv-Welsh

Ziv-Lempel & Welsh - decompression

ZLW-decompressinitialize dict with codes for single characters

read first code in variable prev and output str(prev)

while( not end of input )

read w;

if w exists in the dict

output str(w);

add to dict: str(prev) + firstchar(str(w));

else

// special case

output str(prev) + firstchar(str(prev));

add to dict: str(prev) + firstchar(str(prev));

fi

prev = w;

od

Page 231: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

truukje

MISSISSIPPI 7→ SSMP-PISSIII

Page 232: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

MISSISSIPPI.rotate

1 M I S S I S S I P P I -2 I S S I S S I P P I - M3 S S I S S I P P I - M I4 S I S S I P P I - M I S5 I S S I P P I - M I S S6 S S I P P I - M I S S I7 S I P P I - M I S S I S8 I P P I - M I S S I S S9 P P I - M I S S I S S I

10 P I - M I S S I S S I P11 I - M I S S I S S I P P12 - M I S S I S S I P P I

alphabetize, last column

8 I P P I - M I S S I S S5 I S S I P P I - M I S S2 I S S I S S I P P I - M

11 I - M I S S I S S I P P1 M I S S I S S I P P I -

10 P I - M I S S I S S I P9 P P I - M I S S I S S I7 S I P P I - M I S S I S4 S I S S I P P I - M I S3 S S I S S I P P I - M I6 S S I P P I - M I S S I

12 - M I S S I S S I P P I

Page 233: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

decode

12 1S1 I1S2 I2M1 I3P1 I4- M1

P2 P1

I1 P2

S3 S1

S4 S2

I2 S3

I3 S4

I4 -

M1 I3 S4 S2 I2 S3 S1 I1 P2 P1 I4

Page 234: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

decode

12 1S1 I1S2 I2M1 I3P1 I4- M1

P2 P1

I1 P2

S3 S1

S4 S2

I2 S3

I3 S4

I4 -

M1 I3 S4 S2 I2 S3 S1 I1 P2 P1 I4

Page 235: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

decode

12 1S1 I1S2 I2M1 I3P1 I4- M1

P2 P1

I1 P2

S3 S1

S4 S2

I2 S3

I3 S4

I4 -

M1 I3 S4 S2 I2 S3 S1 I1 P2 P1 I4

Page 236: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

decode

12 1S1 I1S2 I2M1 I3P1 I4- M1

P2 P1

I1 P2

S3 S1

S4 S2

I2 S3

I3 S4

I4 -

M1 I3 S4 S2 I2 S3 S1 I1 P2 P1 I4

Page 237: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Data Compression

Burrows-Wheeler

Quiz

Add final step for Floyd

A3 =

0 2 1 63 0 1 44 1 0 5−2 0 −1 0

Page 238: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Contents

10 Pattern MatchingKnuth-Morris-PrattAho-CorasickComparing texts

Page 239: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 240: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 241: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 242: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 243: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 244: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

naive

1T = ABABC. . .

↑ ↑×P = ABCAB. . .

3

2ABABCAB. . .×ABCABA. . .1

3ABABCABCAB. . .↑ ↑ ↑ ↑ ↑×ABCABABC. . .

6

4ABABCABC. . .

×ABCAB. . .1

5ABABCABC. . .

×ABCA. . .1

6ABABCABCABABCC. . .

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ABCABABC

Page 245: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

match pattern against itself

T = . . . ABCABAB? . . .P = ABCABAB×

8

2 8

ABCABAB .. ABCABAB×

3 8

ABCABAB .. . ABCABAB×

4 8

ABCABAB .. . . ABCABAB

×

5 8

ABCABAB .. . . . ABCABAB

×

6 8

ABCABAB .. . . . . ABCABAB

3

Page 246: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

linear-time algorithm (1970, 1977)

Donald Knuth, Vaughan Pratt, and James H. Morris

failure links

linear time preprocessing

search will never back-up in text

Page 247: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

2

A. A

1

3

AB. . AB

1

4

ABC. . . ABC

1

5

ABCA. . . ABCA

2

6

ABCAB. . . ABCAB

3

7

ABCABA. . . . . ABCA

2

k 1 2 3 4 5 6 7 8P[k] A B C A B A B C

FLink[k] 0 1 1 1 2 3 2 3

at position k: the maximal r < k such thatP1 . . . Pr−1 = Pk−r+1 . . . Pk−1

mismatch at position k, then continue at position FLink[k](and same position in Text)

FLink[k] = 0: next position in Text, first position Pattern

Page 248: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

k 1 2 3 4 5 6 7 8P[k] A B C A B A B C

FLink[k] 0 1 1 1 2 3 2 3

0 1 2 3 4 5 6 7 8 9A B C A B A B C

match

skip to nextletter in text

mismatch fail

Page 249: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

KMP search// using failure links

Pos = 1 // position in pattern

TPos = 1 // position in text

while ((Pos <= PatLen) and (TPos <= TextLen)) do

if (P[Pos] == Text[TPos]) then

Pos ++;

TPos ++;

else

Pos = FLink[Pos]

if (Pos == 0) then

// start from scratch at next position in text

Pos = 1

TPos ++;

fi

fi

od

if (Pos > PatLen) then

Pattern found in text at position TPos-PatLen+1

fi

Page 250: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

computing KMP failure linksk = 1 // position in pattern

FLink[1] = 0

for k = 2 to PatLen do

Fail = FLink[k-1]

while ( (Fail > 0) and (P[Fail] != P[k-1]) ) do

Fail = FLink[Fail]

od

FLink[k] = Fail+1

od

k−1 k

A C B A

Page 251: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

why does this work?

all prefixes that are also a suffix:P1 . . . Pt−1 = Pk−t+1 . . . Pk−1

can be found by following failure links t0 = FLink[k] andti = FLink[ti−1]

t1 t0 k

P1 · · · Pt0−1 Pk−t0+1 · · · Pk−1P1 · · · Pt1−1 Pk−t1+1 · · · Pk−1

Page 252: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

why does this work?

t1 t0 k k + 1

Page 253: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Knuth-Morris-Pratt

k 1 2 3 4 5 6 7 8P[k] A B C A B A B C

FLink[k] 0 1 1 1 2 3 2 3FLink′[k] 0 1 1 0 1 3 1 1

improving KMP failure linksfor Pos = 2 to PatLen

do if ( P[Pos] == P[FLink[Pos]] )

then FLink[Pos] = FLink[FLink[Pos]]

fi

od

Page 254: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Aho-Corasick

{aaa, abc, baa, baba, cb}

1 2 3

a b c

4 5

a

6 12

c

7

b

8

a b

9

a

10

a b

11

a

trie

Page 255: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Aho-Corasick

{aaa, abc, baa, baba, cb}

1 2 3

a b c

4 5

a

6 12

c

7

b

8

a b

9

a

10

a b

11

a

failure links

Page 256: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Aho-Corasick

{aaa, abc, baa, baba, cb}

1 2 3

a b c

4 5

a

6 12

c

7

b

8

a b

9

a

10

a b

11

a

searching aaba . . .

Page 257: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Aho-Corasick

{aaa, abc, baa, baba, cb}

1 2 3

a b c

4 5

a

6 12

c

7

b

8

a b

9

a

10

a b

11

a

construct next failure link

Page 258: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Comparing texts

alignment

enzymes and their amino acids

82 TYHMCQFHCRYVNNHSGEKLYECNERSKAFSCPSHLQCHKRRQIGEKTHEHNQCGKAFPT 60

81 --------------------YECNQCGKAFAQHSSLKCHYRTHIGEKPYECNQCGKAFSK 40

****: .***: * *:** * :****.:* *******..

82 PSHLQYHERTHTGEKPYECHQCGQAFKKCSLLQRHKRTHTGEKPYE-CNQCGKAFAQ- 116

81 HSHLQCHKRTHTGEKPYECNQCGKAFSQHGLLQRHKRTHTGEKPYMNVINMVKPLHNS 98

**** *:***********:***:**.: .*************** : *.: :

Page 259: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Comparing texts

similarity TCAGACGATTG and TCGGAGCTG

TCAG - ACG - ATTGTC - GGA - GC - T - G

TCAGACGATTGTCGGA - GCT - G

match, mismatch, insdel (gap)GG

AG

-G

A-

Page 260: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Comparing texts

global alignment

TTCAT vs. TGCATCGT

T G C A T C G T

T

T

C

A

T insdel

mismatch

match

as shortest path

Page 261: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Comparing texts

global versus local alignment

11

mn

0

maxmax

Needleman-Wunsch (1970), Smith-Waterman (1981)

Levenshtein distance (1966)

Page 262: Datastructuren - Data Structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present.pdf · Datastructuren Basic Data Structures Linear lists hierarchy of lists Adeque("double-ended

Datastructuren

Pattern Matching

Comparing texts

end.