CLR Explained

Introduction to Algorithms6.046J/18.401J

Prof. Charles E. Leiserson

LECTURE 1Analysis of Algorithms• Insertion sort• Merge sort

September 8, 2004 Introduction to Algorithms L1.2

Course information

1. Staff2. Prerequisites3. Lectures4. Recitations5. Handouts6. Textbook (CLRS)

7. Extra help8. Registration 9.Problem sets10.Describing algorithms11.Grading policy12.Collaboration policy

Course information handout

© 2001–4 by Charles E. Leiserson


Analysis of algorithms

The theoretical study of computer-program performance and resource usage.

What’s more important than performance?• modularity• correctness• maintainability• functionality• robustness

• user-friendliness• programmer time• simplicity• extensibility• reliability



Why study algorithms and performance?

• Algorithms help us to understand scalability.• Performance often draws the line between what

is feasible and what is impossible.• Algorithmic mathematics provides a language

for talking about program behavior.• Performance is the currency of computing.• The lessons of program performance generalize

to other computing resources. • Speed is fun!



The problem of sorting

Input: sequence ⟨a1, a2, …, an⟩ of numbers.

Example:Input: 8 2 4 9 3 6

Output: 2 3 4 6 8 9

Output: permutation ⟨a'1, a'2, …, a'n⟩ suchthat a'1 ≤ a'2 ≤ … ≤ a'n .



Insertion sortINSERTION-SORT (A, n) ⊳ A[1 . . n]

for j ← 2 to ndo key ← A[ j]

i ← j – 1while i > 0 and A[i] > key

do A[i+1] ← A[i]i ← i – 1

A[i+1] = key

“pseudocode”



Insertion sortINSERTION-SORT (A, n) ⊳ A[1 . . n]

for j ← 2 to ndo key ← A[ j]

i ← j – 1while i > 0 and A[i] > key

do A[i+1] ← A[i]i ← i – 1

A[i+1] = key

“pseudocode”

sorted

i j

keyA:

1 n



Example of insertion sort8 2 4 9 3 6




2 8 4 9 3 6




2 8 4 9 3 6

2 4 8 9 3 6




2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6




2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6




2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

2 3 4 6 8 9 done© 2001–4 by Charles E. Leiserson


Running time

• The running time depends on the input: an already sorted sequence is easier to sort.

• Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones.

• Generally, we seek upper bounds on the running time, because everybody likes a guarantee.



Kinds of analysesWorst-case: (usually)

• T(n) = maximum time of algorithm on any input of size n.

Average-case: (sometimes)• T(n) = expected time of algorithm

over all inputs of size n.• Need assumption of statistical

distribution of inputs.Best-case: (bogus)

• Cheat with a slow algorithm that works fast on some input. © 2001–4 by Charles E. Leiserson


Machine-independent time

What is insertion sort’s worst-case time?• It depends on the speed of our computer:

• relative speed (on the same machine),• absolute speed (on different machines).

BIG IDEA:• Ignore machine-dependent constants.• Look at growth of T(n) as n →∞ .

“Asymptotic Analysis”“Asymptotic Analysis”© 2001–4 by Charles E. Leiserson


Θ-notation

• Drop low-order terms; ignore leading constants.• Example: 3n3 + 90n2 – 5n + 6046 = Θ(n3)

Math:Θ(g(n)) = f (n) : there exist positive constants c1, c2, and

n0 such that 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n)for all n ≥ n0

Engineering:



Asymptotic performance

n

T(n)

n0

• We shouldn’t ignore asymptotically slower algorithms, however.

• Real-world design situations often call for a careful balancing of engineering objectives.

• Asymptotic analysis is a useful tool to help to structure our thinking.

When n gets large enough, a Θ(n2) algorithm always beats a Θ(n3) algorithm.



Insertion sort analysisWorst case: Input reverse sorted.

( )∑=

Θ=Θ=n

jnjnT

2

2)()(

Average case: All permutations equally likely.

( )∑=

Θ=Θ=n

jnjnT

2

2)2/()(

Is insertion sort a fast sorting algorithm?• Moderately so, for small n.• Not at all, for large n.

[arithmetic series]



Merge sort

MERGE-SORT A[1 . . n]1. If n = 1, done.2. Recursively sort A[ 1 . . n/2 ]

and A[ n/2+1 . . n ] .3. “Merge” the 2 sorted lists.

Key subroutine: MERGE



Merging two sorted arrays

20

13

7

2

12

11

9

1




20

13

7

2

12

11

9

1

1




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9

20

13

12

11




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9

20

13

12

11

11




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9

20

13

12

11

11

20

13

12




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9

20

13

12

11

11

20

13

12

12




20

13

7

2

12

11

9

1

1

20

13

7

2

12

11

9

2

20

13

7

12

11

9

7

20

13

12

11

9

9

20

13

12

11

11

20

13

12

12

Time = Θ(n) to merge a total of n elements (linear time).



Analyzing merge sort

MERGE-SORT A[1 . . n]1. If n = 1, done.2. Recursively sort A[ 1 . . n/2 ]

and A[ n/2+1 . . n ] .3. “Merge” the 2 sorted lists

T(n)Θ(1)2T(n/2)

Θ(n)Abuse

Sloppiness: Should be T( n/2 ) + T( n/2 ) , but it turns out not to matter asymptotically.



Recurrence for merge sort

T(n) =Θ(1) if n = 1;2T(n/2) + Θ(n) if n > 1.

• We shall usually omit stating the base case when T(n) = Θ(1) for sufficiently small n, but only when it has no effect on the asymptotic solution to the recurrence.

• CLRS and Lecture 2 provide several ways to find a good upper bound on T(n).



Recursion treeSolve T(n) = 2T(n/2) + cn, where c > 0 is constant.




T(n)




T(n/2) T(n/2)

cn




cn

T(n/4) T(n/4) T(n/4) T(n/4)

cn/2 cn/2




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n

cn




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n

cn

cn




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n

cn

cn

cn

…




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n

cn

cn

cn

#leaves = n Θ(n)

…




cn

cn/4 cn/4 cn/4 cn/4

cn/2 cn/2

Θ(1)

…

h = lg n

cn

cn

cn

#leaves = n Θ(n)Total = Θ(n lg n)

…



Conclusions

• Θ(n lg n) grows more slowly than Θ(n2).• Therefore, merge sort asymptotically

beats insertion sort in the worst case.• In practice, merge sort beats insertion

sort for n > 30 or so.• Go test it out for yourself!




LECTURE 2Asymptotic Notation• O-, Ω-, and Θ-notationRecurrences• Substitution method• Iterating the recurrence• Recursion tree• Master method


Asymptotic notation

We write f(n) = O(g(n)) if there exist constants c > 0, n0 > 0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0.


O-notation (upper bounds):



Asymptotic notation




EXAMPLE: 2n2 = O(n3) (c = 1, n0 = 2)



Asymptotic notation




EXAMPLE: 2n2 = O(n3)

functions, not values

(c = 1, n0 = 2)



Asymptotic notation




EXAMPLE: 2n2 = O(n3)

functions, not values

funny, “one-way” equality

(c = 1, n0 = 2)



Set definition of O-notation

O(g(n)) = f(n) : there exist constants c > 0, n0 > 0 such that 0 ≤ f(n) ≤ cg(n)for all n ≥ n0







EXAMPLE: 2n2 ∈ O(n3)






EXAMPLE: 2n2 ∈ O(n3)(Logicians: λn.2n2 ∈ O(λn.n3), but it’s convenient to be sloppy, as long as we understand what’s really going on.)



Macro substitution

Convention: A set in a formula represents an anonymous function in the set.



Macro substitution

Convention: A set in a formula represents an anonymous function in the set.

f(n) = n3 + O(n2) means f(n) = n3 + h(n)for some h(n) ∈ O(n2) .

EXAMPLE:



Ω−notation (lower bounds)

O-notation is an upper-bound notation. It makes no sense to say f(n) is at least O(n2).




Ω(g(n)) = f(n) : there exist constants c > 0, n0 > 0 such that 0 ≤ cg(n) ≤ f(n)for all n ≥ n0


O-notation is an upper-bound notation. It makes no sense to say f(n) is at least O(n2).



Θ-notation (tight bounds)

Θ(g(n)) = O (g(n)) ∩ Ω(g(n))Θ(g(n)) = O (g(n)) ∩ Ω(g(n))





EXAMPLE: )(221 22 nnn Θ=−





EXAMPLE: )(2 2221 nnn Θ=−

Theorem. The leading constant and low-order terms don’t matter.



Solving recurrences

• The analysis of merge sort from Lecture 1required us to solve a recurrence.

• Recurrences are like solving integrals, differential equations, etc.oLearn a few tricks.

• Lecture 3: Applications of recurrences to divide-and-conquer algorithms.



Substitution method

1. Guess the form of the solution.2. Verify by induction.3. Solve for constants.

The most general method:



Substitution method

1. Guess the form of the solution.2. Verify by induction.3. Solve for constants.

The most general method:

EXAMPLE: T(n) = 4T(n/2) + n• [Assume that T(1) = Θ(1).]• Guess O(n3) . (Prove O and Ω separately.)• Assume that T(k) ≤ ck3 for k < n .• Prove T(n) ≤ cn3 by induction.



Example of substitution

3

33

3

3

))2/(()2/(

)2/(4)2/(4)(

cnnnccn

nncnnc

nnTnT

≤−−=

+=+≤

+=

desired – residual

whenever (c/2)n3 – n ≥ 0, for example, if c ≥ 2 and n ≥ 1.

desired

residual



Example (continued)• We must also handle the initial conditions,

that is, ground the induction with base cases.

• Base: T(n) = Θ(1) for all n < n0, where n0is a suitable constant.

• For 1 ≤ n < n0, we have “Θ(1)” ≤ cn3, if we pick c big enough.



Example (continued)• We must also handle the initial conditions,

that is, ground the induction with base cases.

• Base: T(n) = Θ(1) for all n < n0, where n0is a suitable constant.

• For 1 ≤ n < n0, we have “Θ(1)” ≤ cn3, if we pick c big enough.

This bound is not tight!© 2001–4 by Charles E. Leiserson


A tighter upper bound?

We shall prove that T(n) = O(n2).





Assume that T(k) ≤ ck2 for k < n:

)(

)2/(4)2/(4)(

2

2

2

nOncn

nncnnTnT

=+=

+≤+=






)(

)2/(4)2/(4)(

2

2

2

nOncn

nncnnTnT

=+=

+≤+=

Wrong! We must prove the I.H.






)(

)2/(4)2/(4)(

2

2

2

nOncn

nncnnTnT

=+=

+≤+=

Wrong! We must prove the I.H.

2

2 )(cn

ncn≤

−−=for no choice of c > 0. Lose!

[ desired – residual ]



A tighter upper bound!IDEA: Strengthen the inductive hypothesis.• Subtract a low-order term.Inductive hypothesis: T(k) ≤ c1k2 – c2k for k < n.




T(n) = 4T(n/2) + n= 4(c1(n/2)2 – c2(n/2) + n= c1n2 – 2c2n + n= c1n2 – c2n – (c2n – n)≤ c1n2 – c2n if c2 ≥ 1.




Pick c1 big enough to handle the initial conditions.

T(n) = 4T(n/2) + n= 4(c1(n/2)2 – c2(n/2) + n= c1n2 – 2c2n + n= c1n2 – c2n – (c2n – n)≤ c1n2 – c2n if c2 ≥ 1.



Recursion-tree method

• A recursion tree models the costs (time) of a recursive execution of an algorithm.

• The recursion-tree method can be unreliable, just like any method that uses ellipses (…).

• The recursion-tree method promotes intuition, however.

• The recursion tree method is good for generating guesses for the substitution method.



Example of recursion treeSolve T(n) = T(n/4) + T(n/2) + n2:



Example of recursion tree

T(n)

Solve T(n) = T(n/4) + T(n/2) + n2:




T(n/4) T(n/2)

n2

Solve T(n) = T(n/4) + T(n/2) + n2:




n2

(n/4)2 (n/2)2

T(n/16) T(n/8) T(n/8) T(n/4)




(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2 (n/2)2

Θ(1)

…

Solve T(n) = T(n/4) + T(n/2) + n2:n2




(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2 (n/2)2

Θ(1)

…

2nn2




(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2 (n/2)2

Θ(1)

…

2165 n

2nn2




(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2

Θ(1)

…

2165 n

2n

225625 n

n2

(n/2)2

…




(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2

Θ(1)

…

2165 n

2n

225625 n

( ) ( )( ) 1 31652

165

1652 L++++n

…

Total == Θ(n2)

n2

(n/2)2

geometric series© 2001–4 by Charles E. Leiserson


The master method

The master method applies to recurrences of the form

T(n) = a T(n/b) + f (n) , where a ≥ 1, b > 1, and f is asymptotically positive.



Three common casesCompare f (n) with nlogba:1. f (n) = O(nlogba – ε) for some constant ε > 0.

• f (n) grows polynomially slower than nlogba

(by an nε factor).Solution: T(n) = Θ(nlogba) .



Three common casesCompare f (n) with nlogba:1. f (n) = O(nlogba – ε) for some constant ε > 0.

• f (n) grows polynomially slower than nlogba

(by an nε factor).Solution: T(n) = Θ(nlogba) .

2. f (n) = Θ(nlogba lgkn) for some constant k ≥ 0.• f (n) and nlogba grow at similar rates.Solution: T(n) = Θ(nlogba lgk+1n) .



Three common cases (cont.)Compare f (n) with nlogba:

3. f (n) = Ω(nlogba + ε) for some constant ε > 0.• f (n) grows polynomially faster than nlogba (by

an nε factor),and f (n) satisfies the regularity condition that a f (n/b) ≤ c f (n) for some constant c < 1.Solution: T(n) = Θ( f (n)) .



Examples

EX. T(n) = 4T(n/2) + na = 4, b = 2 ⇒ nlogba = n2; f (n) = n.CASE 1: f (n) = O(n2 – ε) for ε = 1.∴ T(n) = Θ(n2).



Examples

EX. T(n) = 4T(n/2) + na = 4, b = 2 ⇒ nlogba = n2; f (n) = n.CASE 1: f (n) = O(n2 – ε) for ε = 1.∴ T(n) = Θ(n2).

EX. T(n) = 4T(n/2) + n2

a = 4, b = 2 ⇒ nlogba = n2; f (n) = n2.CASE 2: f (n) = Θ(n2lg0n), that is, k = 0.∴ T(n) = Θ(n2lg n).



Examples

EX. T(n) = 4T(n/2) + n3

a = 4, b = 2 ⇒ nlogba = n2; f (n) = n3.CASE 3: f (n) = Ω(n2 + ε) for ε = 1and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.∴ T(n) = Θ(n3).



Examples

EX. T(n) = 4T(n/2) + n3

a = 4, b = 2 ⇒ nlogba = n2; f (n) = n3.CASE 3: f (n) = Ω(n2 + ε) for ε = 1and 4(n/2)3 ≤ cn3 (reg. cond.) for c = 1/2.∴ T(n) = Θ(n3).

EX. T(n) = 4T(n/2) + n2/lgna = 4, b = 2 ⇒ nlogba = n2; f (n) = n2/lgn.Master method does not apply. In particular, for every constant ε > 0, we have nε = ω(lgn).



f (n/b)

Idea of master theorem

f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a

f (n/b2)f (n/b2) f (n/b2)…a



f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a

f (n/b2)f (n/b2) f (n/b2)…a

f (n)

a f (n/b)

a2 f (n/b2)

…



f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a

f (n/b2)f (n/b2) f (n/b2)…ah = logbn

f (n)

a f (n/b)

a2 f (n/b2)

…



nlogbaΤ (1)

f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a


f (n)

a f (n/b)

a2 f (n/b2)

#leaves = ah

= alogbn

= nlogba

…



f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a


f (n)

a f (n/b)

a2 f (n/b2)

CASE 1: The weight increases geometrically from the root to the leaves. The leaves hold a constant fraction of the total weight.

CASE 1: The weight increases geometrically from the root to the leaves. The leaves hold a constant fraction of the total weight.

Θ(nlogba)

…

nlogbaΤ (1)



f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a


f (n)

a f (n/b)

a2 f (n/b2)

CASE 2: (k = 0) The weight is approximately the same on each of the logbn levels.

CASE 2: (k = 0) The weight is approximately the same on each of the logbn levels.

Θ(nlogbalg n)

…

nlogbaΤ (1)



f (n/b)


f (n/b) f (n/b)

Τ (1)

…Recursion tree:

…f (n) a


f (n)

a f (n/b)

a2 f (n/b2)

…CASE 3: The weight decreases geometrically from the root to the leaves. The root holds a constant fraction of the total weight.

CASE 3: The weight decreases geometrically from the root to the leaves. The root holds a constant fraction of the total weight.

nlogbaΤ (1)

Θ( f (n))© 2001–4 by Charles E. Leiserson


Appendix: geometric series

1

11 2x

xx−

=+++ L for |x| < 1

1

111

2x

xxxxn

n−

−=+++++

L for x ≠ 1

Return to last slide viewed.




LECTURE 3Divide and conquer• Binary search• Powering a number• Fibonacci numbers• Matrix multiplication• Strassen’s algorithm• VLSI tree layout


The divide-and-conquer design paradigm

1. Divide the problem (instance) into subproblems.

2. Conquer the subproblems by solving them recursively.

3. Combine subproblem solutions.



Merge sort

1. Divide: Trivial.2. Conquer: Recursively sort 2 subarrays.3. Combine: Linear-time merge.



Merge sort

1. Divide: Trivial.2. Conquer: Recursively sort 2 subarrays.3. Combine: Linear-time merge.

T(n) = 2 T(n/2) + Θ(n)

# subproblemssubproblem size

work dividing and combining



Master theorem (reprise)T(n) = a T(n/b) + f (n)

CASE 1: f (n) = O(nlogba – ε), constant ε > 0⇒ T(n) = Θ(nlogba) .

CASE 2: f (n) = Θ(nlogba lgkn), constant k ≥ 0⇒ T(n) = Θ(nlogba lgk+1n) .

CASE 3: f (n) = Ω(nlogba + ε ), constant ε > 0, and regularity condition

⇒ T(n) = Θ( f (n)) .



Master theorem (reprise)T(n) = a T(n/b) + f (n)

CASE 1: f (n) = O(nlogba – ε), constant ε > 0⇒ T(n) = Θ(nlogba) .

CASE 2: f (n) = Θ(nlogba lgkn), constant k ≥ 0⇒ T(n) = Θ(nlogba lgk+1n) .

CASE 3: f (n) = Ω(nlogba + ε ), constant ε > 0, and regularity condition

⇒ T(n) = Θ( f (n)) .Merge sort: a = 2, b = 2 ⇒ nlogba = nlog22 = n

⇒ CASE 2 (k = 0) ⇒ T(n) = Θ(n lg n) . © 2001–4 by Charles E. Leiserson


Binary search

Find an element in a sorted array:1. Divide: Check middle element.2. Conquer: Recursively search 1 subarray.3. Combine: Trivial.



Binary search

Example: Find 9

3 5 7 8 9 12 15




Binary search


Example: Find 9

3 5 7 8 9 12 15



Recurrence for binary search

T(n) = 1 T(n/2) + Θ(1)





Recurrence for binary search

T(n) = 1 T(n/2) + Θ(1)



nlogba = nlog21 = n0 = 1 ⇒ CASE 2 (k = 0)⇒ T(n) = Θ(lg n) .



Powering a number

Problem: Compute a n, where n ∈ N.

Naive algorithm: Θ(n).



Powering a number


a n =a n/2 ⋅ a n/2 if n is even;a (n–1)/2 ⋅ a (n–1)/2 ⋅ a if n is odd.

Divide-and-conquer algorithm:




Powering a number


a n =a n/2 ⋅ a n/2 if n is even;a (n–1)/2 ⋅ a (n–1)/2 ⋅ a if n is odd.

Divide-and-conquer algorithm:

T(n) = T(n/2) + Θ(1) ⇒ T(n) = Θ(lg n) .




Fibonacci numbersRecursive definition:

Fn =0 if n = 0;

Fn–1 + Fn–2 if n ≥ 2.1 if n = 1;

0 1 1 2 3 5 8 13 21 34 L



Fibonacci numbersRecursive definition:

Fn =0 if n = 0;

Fn–1 + Fn–2 if n ≥ 2.1 if n = 1;

0 1 1 2 3 5 8 13 21 34 L

Naive recursive algorithm: Ω(φ n)(exponential time), where φ =is the golden ratio.

2/)51( +



Computing Fibonacci numbers

Bottom-up: • Compute F0, F1, F2, …, Fn in order, forming

each number by summing the two previous.• Running time: Θ(n).



Computing Fibonacci numbers

Naive recursive squaring:Fn = φ n/ rounded to the nearest integer.5

• Recursive squaring: Θ(lg n) time. • This method is unreliable, since floating-point

arithmetic is prone to round-off errors.

Bottom-up: • Compute F0, F1, F2, …, Fn in order, forming

each number by summing the two previous.• Running time: Θ(n).



Recursive squaringn

FFFF

nn

nn

=

−

+

0111

1

1Theorem: .



Recursive squaringn

FFFF

nn

nn

=

−

+

0111

1

1Theorem: .

Algorithm: Recursive squaring.Time = Θ(lg n) .



Recursive squaringn

FFFF

nn

nn

=

−

+

0111

1

1Theorem: .

Proof of theorem. (Induction on n.)

Base (n = 1): .1

0111

01

12

=

FFFF

Algorithm: Recursive squaring.Time = Θ(lg n) .



Recursive squaring

.

.

Inductive step (n ≥ 2):

n

nFFFF

FFFF

nn

nn

nn

nn

=

⋅−

=

⋅

=

−−

−

−

+

0111

01111

0111

0111

21

1

1

1



Matrix multiplication

⋅

=

nnnn

n

n

nnnn

n

n

nnnn

n

n

bbb

bbbbbb

aaa

aaaaaa

ccc

cccccc

L

MOMM

L

L

L

MOMM

L

L

L

MOMM

L

L

21

22221

11211

21

22221

11211

21

22221

11211

∑=

⋅=n

kkjikij bac

1

Input: A = [aij], B = [bij].Output: C = [cij] = A⋅ B. i, j = 1, 2,… , n.



Standard algorithm

for i ← 1 to ndo for j ← 1 to n

do cij ← 0for k ← 1 to n

do cij ← cij + aik⋅ bkj



Standard algorithm

for i ← 1 to ndo for j ← 1 to n

do cij ← 0for k ← 1 to n

do cij ← cij + aik⋅ bkj

Running time = Θ(n3)



Divide-and-conquer algorithm

n×n matrix = 2×2 matrix of (n/2)×(n/2) submatrices:IDEA:

⋅

=

hgfe

dcba

utsr

C = A ⋅ Br = ae + bgs = af + bht = ce + dgu = cf + dh

8 mults of (n/2)×(n/2) submatrices4 adds of (n/2)×(n/2) submatrices



Divide-and-conquer algorithm

n×n matrix = 2×2 matrix of (n/2)×(n/2) submatrices:IDEA:

⋅

=

hgfe

dcba

utsr

C = A ⋅ Br = ae + bgs = af + bht = ce + dhu = cf + dg

8 mults of (n/2)×(n/2) submatrices4 adds of (n/2)×(n/2) submatrices^

recursive



Analysis of D&C algorithm

# submatricessubmatrix size

work adding submatrices

T(n) = 8 T(n/2) + Θ(n2)




nlogba = nlog28 = n3 ⇒ CASE 1 ⇒ T(n) = Θ(n3).



T(n) = 8 T(n/2) + Θ(n2)




nlogba = nlog28 = n3 ⇒ CASE 1 ⇒ T(n) = Θ(n3).

No better than the ordinary algorithm.



T(n) = 8 T(n/2) + Θ(n2)



Strassen’s idea• Multiply 2×2 matrices with only 7 recursive mults.




P1 = a ⋅ ( f – h)P2 = (a + b) ⋅ hP3 = (c + d) ⋅ eP4 = d ⋅ (g – e)P5 = (a + d) ⋅ (e + h)P6 = (b – d) ⋅ (g + h)P7 = (a – c) ⋅ (e + f )





r = P5 + P4 – P2 + P6s = P1 + P2t = P3 + P4u = P5 + P1 – P3 – P7



7 mults, 18 adds/subs.Note: No reliance on commutativity of mult!

7 mults, 18 adds/subs.Note: No reliance on commutativity of mult!



r = P5 + P4 – P2 + P6s = P1 + P2t = P3 + P4u = P5 + P1 – P3 – P7





r = P5 + P4 – P2 + P6= (a + d) (e + h)

+ d (g – e) – (a + b) h+ (b – d) (g + h)

= ae + ah + de + dh + dg –de – ah – bh+ bg + bh – dg – dh

= ae + bg© 2001–4 by Charles E. Leiserson


Strassen’s algorithm1. Divide: Partition A and B into

(n/2)×(n/2) submatrices. Form terms to be multiplied using + and – .

2. Conquer: Perform 7 multiplications of (n/2)×(n/2) submatrices recursively.

3. Combine: Form C using + and – on (n/2)×(n/2) submatrices.



Strassen’s algorithm1. Divide: Partition A and B into

(n/2)×(n/2) submatrices. Form terms to be multiplied using + and – .

2. Conquer: Perform 7 multiplications of (n/2)×(n/2) submatrices recursively.

3. Combine: Form C using + and – on (n/2)×(n/2) submatrices.

T(n) = 7 T(n/2) + Θ(n2)



Analysis of StrassenT(n) = 7 T(n/2) + Θ(n2)




nlogba = nlog27 ≈ n2.81 ⇒ CASE 1 ⇒ T(n) = Θ(nlg 7).





The number 2.81 may not seem much smaller than 3, but because the difference is in the exponent, the impact on running time is significant. In fact, Strassen’s algorithm beats the ordinary algorithm on today’s machines for n ≥ 32 or so.





Best to date (of theoretical interest only): Θ(n2.376L).

The number 2.81 may not seem much smaller than 3, but because the difference is in the exponent, the impact on running time is significant. In fact, Strassen’s algorithm beats the ordinary algorithm on today’s machines for n ≥ 32 or so.



VLSI layoutProblem: Embed a complete binary tree with n leaves in a grid using minimal area.




H(n)

W(n)




H(n)

W(n)

H(n) = H(n/2) + Θ(1)= Θ(lg n)




H(n)

W(n)

H(n) = H(n/2) + Θ(1)= Θ(lg n)

W(n) = 2W(n/2) + Θ(1)= Θ(n)




H(n)

W(n)

H(n) = H(n/2) + Θ(1)= Θ(lg n)

W(n) = 2W(n/2) + Θ(1)= Θ(n)

Area = Θ(n lg n)© 2001–4 by Charles E. Leiserson


H-tree embeddingL(n)

L(n)




L(n)

L(n/4) L(n/4)Θ(1)© 2001–4 by Charles E. Leiserson



L(n)

L(n/4) L(n/4)Θ(1)

L(n) = 2L(n/4) + Θ(1)= Θ( )n

Area = Θ(n)



Conclusion

• Divide and conquer is just one of several powerful techniques for algorithm design.

• Divide-and-conquer algorithms can be analyzed using recurrences and the master method (so practice this math).

• The divide-and-conquer strategy often leads to efficient algorithms.



Lecture 4Prof. Piotr Indyk

September 20, 2004 (c) Piotr Indyk & Charles Leiserson L4.2

Today

• Randomized algorithms: algorithms that flip coins– Matrix product checker: is AB=C ?– Quicksort:

• Example of divide and conquer• Fast and practical sorting algorithm• Other applications on Wednesday


Randomized Algorithms

• Algorithms that make random decisions• That is:

– Can generate a random number x from some range 1…R

– Make decisions based on the value of x• Why would it make sense ?


Two cups, one coin

• If you always choose a fixed cup, the adversary will put the coin in the other one, so the expected payoff = $0

• If you choose a random cup, the expected payoff = $0.5



• Two basic types:– Typically fast (but sometimes slow):

Las Vegas– Typically correct (but sometimes output

garbage): Monte Carlo• The probabilities are defined by the random

numbers of the algorithm! (not by random choices of the problem instance)


Matrix Product

• Compute C=A×B– Simple algorithm: O(n3) time– Multiply two 2×2 matrices using 7 mult. →O(n2.81…) time [Strassen’69]

– Multiply two 70 × 70 matrices using 143640multiplications → O(n2.795…) time [Pan’78]

– …– O(n2.376…) [Coppersmith-Winograd]


Matrix Product Checker

• Given: n×n matrices A,B,C• Goal: is A×B=C ?• We will see an O(n2) algorithm that:

– If answer=YES, then Pr[output=YES]=1– If answer=NO, then Pr[output=YES] ≤ ½


The algorithm

• Algorithm:– Choose a random binary vector x[1…n] ,

such that Pr[xi=1]=½ , i=1…n– Check if ABx=Cx

• Does it run in O(n2) time ?– YES, because ABx = A(Bx)


Correctness

• Let D=AB, need to check if D=C• What if D=C ?

– Then Dx=Cx ,so the output is YES• What if D≠C ?

– Presumably there exists x such that Dx≠Cx

– We need to show there are many such x


D≠C

≠?


Vector product

• Consider vectors d≠c (say, di≠ci)• Choose a random binary x• We have dx=cx iff (d-c)x=0• Pr[(d-c)x=0]= ?

(d-c): d1-c1 d2-c2 di-ci dn-cn… …x: x1 x2 xi xn… … = Σj≠i(dj-cj)xj + (di-ci)xi


Analysis, ctd.

• If xi=0, then (c-d)x=S1

• If xi=1, then (c-d)x=S2≠S1

• So, ≥1 of the choices gives (c-d)x≠0→ Pr[cx=dx] ≤ ½


Matrix Product Checker

• Is A×B=C ?• We have an algorithm that:

– If answer=YES, then Pr[output=YES]=1– If answer=NO, then Pr[output=YES] ≤ ½

• What if we want to reduce ½ to ¼ ?– Run the algorithm twice, using independent random numbers– Output YES only if both runs say YES

• Analysis:– If answer=YES, then Pr[output1=YES, output2=YES ]=1– If answer=NO, then

Pr[output=YES] = Pr[output1=YES, output2=YES]= Pr[output1=YES]*Pr[output2=YES]≤ ¼


Quicksort

• Proposed by C.A.R. Hoare in 1962.• Divide-and-conquer algorithm.• Sorts “in place” (like insertion sort, but not

like merge sort).• Very practical (with tuning).• Can be viewed as a randomized Las Vegas

algorithm


Quicksort an n-element array:1. Divide: Partition the array into two subarrays

around a pivot x such that elements in lower subarray ≤ x ≤ elements in upper subarray.

2. Conquer: Recursively sort the two subarrays.3. Combine: Trivial.

Divide and conquer

xx

Key: Linear-time partitioning subroutine.

≤ x≤ x≤ x≤ x xx ≥ x≥ x


Pseudocode for quicksortQUICKSORT(A, p, r)

if p < rthen q ← PARTITION(A, p, r)

QUICKSORT(A, p, q–1)QUICKSORT(A, q+1, r)

Initial call: QUICKSORT(A, 1, n)


xx

Partitioning subroutinePARTITION(A, p, r) ⊳ A[p . . r]

x ← A[p] ⊳ pivot = A[p]i ← pfor j ← p + 1 to r

do if A[ j] ≤ xthen i ← i + 1

exchange A[i] ↔ A[ j]exchange A[p] ↔ A[i]return i

≤ x≤ x ≥ x≥ x ??p i rj

Invariant:


Example of partitioning

i j66 1010 1313 55 88 33 22 1111



i j66 1010 1313 55 88 33 22 1111



66 1010 1313 55 88 33 22 1111

i j66 55 1313 1010 88 33 22 1111



66 1010 1313 55 88 33 22 1111

i j66 55 33 1010 88 1313 22 1111

66 55 1313 1010 88 33 22 1111



66 1010 1313 55 88 33 22 1111

66 55 33 1010 88 1313 22 1111

66 55 1313 1010 88 33 22 1111

i j66 55 33 22 88 1313 1010 1111



66 1010 1313 55 88 33 22 1111

66 55 33 1010 88 1313 22 1111

66 55 1313 1010 88 33 22 1111

66 55 33 22 88 1313 1010 1111

i22 55 33 66 88 1313 1010 1111


xx

Analysis of quicksort

• Assume all input elements are distinct.• In practice, there are better partitioning

algorithms for when duplicate input elements may exist.

• What is the worst case running time of Quicksort ?

≤ x≤ x ≥ x≥ x


Worst-case of quicksort

• Input sorted or reverse sorted.• Partition around min or max element.• One side of partition always has no elements.

)()()1(

)()1()1()()1()0()(

2nnnT

nnTnnTTnT

Θ=

Θ+−=Θ+−+Θ=Θ+−+=

(arithmetic series)


Worst-case recursion treeT(n) = T(0) + T(n–1) + cn



T(n)


cnT(0) T(n–1)



cnT(0) c(n–1)


T(0) T(n–2)


cnT(0) c(n–1)


T(0) c(n–2)

T(0)

Θ(1)

O


cnT(0) c(n–1)


T(0) c(n–2)

T(0)

Θ(1)

O

( )2

1nk

n

kΘ=

Θ ∑

=


cnΘ(1) c(n–1)


Θ(1) c(n–2)

Θ(1)

Θ(1)

O

( )2

1nk

n

kΘ=

Θ ∑

=

h = n


Nice-case analysis

If we’re lucky, PARTITION splits the array evenly:T(n) = 2T(n/2) + Θ(n)

= Θ(n lg n) (same as merge sort)

What if the split is always 109

101 : ?

( ) ( ) )()( 109

101 nnTnTnT Θ++=


Analysis of nice case)(nT


Analysis of nice casecn

( )nT 101 ( )nT 10

9



cn101 cn10

9

( )nT 1001 ( )nT 100

9 ( )nT 1009 ( )nT 100

81



cn101 cn10

9

cn1001 cn100

9 cn1009 cn100

81

Θ(1)

Θ(1)

… …log10/9n

cn

cn

cn

…


log10n


cn101 cn10

9

cn1001 cn100

9 cn1009 cn100

81

Θ(1)

Θ(1)

… …log10/9n

cn

cn

cn

T(n) ≤ cn log10/9n + Ο(n)

…

cn log10n ≤


Randomized quicksort• Partition around a random element. I.e.,

around A[t] , where t chosen uniformly at random from p…r

• We will show that the expected time is O(n log n)


“Paranoid” quicksort

• Will modify the algorithm to make it easier to analyze:

• Repeat:• Choose the pivot to be a random element of the array• Perform PARTITION

• Until the resulting split is “lucky”, i.e., not worse than 1/10: 9/10• Recurse on both sub-arrays


Analysis• Let T(n) be an upper bound on the expectedrunning time on any array of n elements• Consider any input of size n• The time needed to sort the input is bounded from the above by a sum of

• The time needed to sort the left subarray• The time needed to sort the right subarray• The number of iterations until we get a lucky split, times cn


Expectations

cnpartitionsEinTiTnT •+−+≤ ][#)()(max)(

• By linearity of expectation:

where maximum is taken over i ∈ [n/10,9n/10]• We will show that E[#partitions] is ≤ 10/8• Therefore:

]10/9,10/[,2)()(max)( nnicninTiTnT ∈+−+≤


Final bound

• Can use the recursion tree argument:• Tree depth is Θ(log n)• Total expected work at each level is at most 10/8 cn• The total expected time is Ο(n log n)


Lucky partitions

• The probability that a random pivot induces lucky partition is at least 8/10(we are not lucky if the pivot happens to be among the smallest/largest n/10 elements)

• If we flip a coin, with heads prob. p=8/10 , the expected waiting time for the first head is 1/p = 10/8


Quicksort in practice

• Quicksort is a great general-purpose sorting algorithm.

• Quicksort is typically over twice as fast as merge sort.

• Quicksort can benefit substantially from code tuning.

• Quicksort behaves well even with caching and virtual memory.

• Quicksort is great!


More intuitionSuppose we alternate lucky, unlucky, lucky, unlucky, lucky, ….

L(n) = 2U(n/2) + Θ(n) luckyU(n) = L(n – 1) + Θ(n) unlucky

Solving:L(n) = 2(L(n/2 – 1) + Θ(n/2)) + Θ(n)

= 2L(n/2 – 1) + Θ(n)= Θ(n lg n)

How can we make sure we are usually lucky?Lucky!


Randomized quicksort analysis

Let T(n) = the random variable for the running time of randomized quicksort on an input of size n, assuming random numbers are independent.For k = 0, 1, …, n–1, define the indicator random variable

Xk =1 if PARTITION generates a k : n–k–1 split,0 otherwise.

E[Xk] = PrXk = 1 = 1/n, since all splits are equally likely, assuming elements are distinct.


Analysis (continued)

T(n) =

T(0) + T(n–1) + Θ(n) if 0 : n–1 split,T(1) + T(n–2) + Θ(n) if 1 : n–2 split,

MT(n–1) + T(0) + Θ(n) if n–1 : 0 split,

( )∑−

=Θ+−−+=

1

0)()1()(

n

kk nknTkTX .


Calculating expectation( )

Θ+−−+= ∑

−

=

1

0)()1()()]([

n

kk nknTkTXEnTE

Take expectations of both sides.



( )[ ]∑

∑−

=

−

=

Θ+−−+=

Θ+−−+=

1

0

1

0

)()1()(

)()1()()]([

n

kk

n

kk

nknTkTXE

nknTkTXEnTE

Linearity of expectation.



( )[ ]

[ ] [ ]∑

∑

∑

−

=

−

=

−

=

Θ+−−+⋅=

Θ+−−+=

Θ+−−+=

1

0

1

0

1

0

)()1()(

)()1()(

)()1()()]([

n

kk

n

kk

n

kk

nknTkTEXE

nknTkTXE

nknTkTXEnTE

Independence of Xk from other random choices.



( )[ ]

[ ] [ ]

[ ] [ ] ∑∑∑

∑

∑

∑

−

=

−

=

−

=

−

=

−

=

−

=

Θ+−−+=

Θ+−−+⋅=

Θ+−−+=

Θ+−−+=

1

0

1

0

1

0

1

0

1

0

1

0

)(1)1(1)(1

)()1()(

)()1()(

)()1()()]([

n

k

n

k

n

k

n

kk

n

kk

n

kk

nn

knTEn

kTEn

nknTkTEXE

nknTkTXE

nknTkTXEnTE

Linearity of expectation; E[Xk] = 1/n .



( )[ ]

[ ] [ ]

[ ] [ ]

[ ] )()(2

)(1)1(1)(1

)()1()(

)()1()(

)()1()()]([

1

1

1

0

1

0

1

0

1

0

1

0

1

0

nkTEn

nn

knTEn

kTEn

nknTkTEXE

nknTkTXE

nknTkTXEnTE

n

k

n

k

n

k

n

k

n

kk

n

kk

n

kk

Θ+=

Θ+−−+=

Θ+−−+⋅=

Θ+−−+=

Θ+−−+=

∑

∑∑∑

∑

∑

∑

−

=

−

=

−

=

−

=

−

=

−

=

−

=

Summations have identical terms.


Hairy recurrence

[ ] )()(2)]([1

2nkTE

nnTE

n

kΘ+= ∑

−

=

(The k = 0, 1 terms can be absorbed in the Θ(n).)

Prove: E[T(n)] ≤ an lg n for constant a > 0 .

Use fact: 21

2812

21 lglg nnnkk

n

k∑

−

=−≤ (exercise).

• Choose a large enough so that an lg ndominates E[T(n)] for sufficiently small n ≥ 2.


Substitution method

[ ] )(lg2)(1

2nkak

nnTE

n

kΘ+≤ ∑

−

=

Substitute inductive hypothesis.


Substitution method

[ ]

)(81lg

212

)(lg2)(

22

1

2

nnnnna

nkakn

nTEn

k

Θ+

−≤

Θ+≤ ∑−

=

Use fact.


Substitution method

[ ]

Θ−−=

Θ+

−≤

Θ+≤ ∑−

=

)(4

lg

)(81lg

212

)(lg2)(

22

1

2

nannan

nnnnna

nkakn

nTEn

k

Express as desired – residual.


Substitution method

[ ]

nan

nannan

nnnnna

nkakn

nTEn

k

lg

)(4

lg

)(81lg

212

)(lg2)(

22

1

2

≤

Θ−−=

Θ+

−=

Θ+≤ ∑−

=

if a is chosen large enough so that an/4 dominates the Θ(n).

,


• Assume Running time= O(n) for nelements.

Running time= O(n) for nelements.



• Algorithms that make decisions based on random coin flips.

• Can “fool” the adversary.• The running time (or even correctness) is a

random variable; we measure the expectedrunning time.

• We assume all random choices are independent .

• This is not the average case !

Introduction to Algorithms September 22, 2004 L5.2© Piotr Indyk and Charles Leiserson

Today

• Order statistics (e.g., finding median)• Two O(n) time algorithms:

– Randomized: similar to Quicksort– Deterministic: quite tricky

• Both are examples of divide and conquer


Order statisticsSelect the ith smallest of n elements (the element with rank i).• i = 1: minimum;• i = n: maximum;• i = (n+1)/2 or (n+1)/2: median.

How fast can we solve the problem ?• Min/max: O(n)• General i : O(n log n) by sorting• We will see how to do it in O(n) time


Randomized Algorithm for Finding the ith element

• Divide and Conquer Approach• Main idea: PARTITION

≤ x≤ x ≥ x≥ xp q

kx

• If i<k, recurse on the left• If i>k, recurse on the right• Otherwise, output x


Randomized Divide-and-Conquer

RAND-SELECT(A, p, r, i)if p = r then return A[p]q ← RAND-PARTITION(A, p, r)k ← q – p + 1 ⊳ k=rank(A[q])if i = k then return A[q]if i < k

then return RAND-SELECT(A, p, q – 1, i )else return RAND-SELECT(A, q + 1, r, i – k )

≤ A[q]≤ A[q] ≥ A[q]≥ A[q]qp r

k


Example

pivoti = 766 1010 1313 55 88 33 22 1111

k = 4

Select the 7 – 4 = 3rd smallest recursively.

Select the i = 7th smallest:

22 55 33 66 88 1313 1010 1111Partition:


Analysis

Lucky:101log 9/10 == nn

CASE 3T(n) = T(9n/10) + Θ(n)

= Θ(n)

• What is the worst-case running time ?Unlucky:

T(n) = T(n – 1) + Θ(n)= Θ(n2)

arithmetic series

• Recall that a lucky partition splits into arrays with size ratio at most 9:1

• What if all partitions are lucky ?


Expected Running Time

• The probability that a random pivot induces lucky partition is at least 8/10 (Lecture 4)

• Let ti be the number of partitions performed between the (i-1) -th and the i-th lucky partition

• The total time is at most…T = t1 n + t2 (9/10) n + t3 (9/10)2 n + …

• The total expected time is at most:E[T]=E[t1] n + E[t2] (9/10) n + E[t3] (9/10)2 n + …

= 10/8 * [n + (9/10)n + …]= O(n)


Digression: 9 to 1

• Do we need to define the lucky partition as 9:1 balanced ?

• No. Suffices to say that both sides have size ≥αn , for 0< α <½

• Probability of getting a lucky partition is1-2α


How Does it Work In Practice?

• Need 7 volunteers (a.k.a. elements)• Will choose the median according to height


xx





≤ x≤ x ≥ x≥ x ??p i rj

Invariant:


Summary of randomized order-statistic selection

• Works fast: linear expected time.• Excellent algorithm in practice.• But, the worst case is very bad: Θ(n2).

Q. Is there an algorithm that runs in linear time in the worst case?

IDEA: Generate a good pivot recursively.

A. Yes, due to [Blum-Floyd-Pratt-Rivest-Tarjan’73].


Worst-case linear-time order statistics

if i = k then return xelseif i < k

then recursively SELECT the ith smallest element in the lower part

else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by hand.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

Same as RAND-SELECT


Choosing the pivot


Choosing the pivot

1. Divide the n elements into groups of 5.


Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.


Choosing the pivot

lesser

greater

1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

2. Recursively SELECT the median x of the n/5group medians to be the pivot.

x


Analysis

lesser

greater

x

At least half the group medians are ≤ x, which is at least n/5 /2 = n/10 group medians.


Analysis

lesser

greater

x

At least half the group medians are ≤ x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10 elements are ≤ x.


Analysis

lesser

greater

x

At least half the group medians are ≤ x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10 elements are ≤ x.• Similarly, at least 3 n/10 elements are ≥ x.


Developing the recurrence

if i = k then return xelseif i < k

then recursively SELECT the ith smallest element in the lower part

else recursively SELECT the (i–k)th smallest element in the upper part

SELECT(i, n)1. Divide the n elements into groups of 5. Find

the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5

group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.

T(n)

Θ(n)

T(n/5)Θ(n)

T(7n/10)


Solving the recurrence)(

107

51)( nnTnTnT Θ+

+

=

if c is chosen large enough to handle the Θ(n).

cn

ncncn

ncn

ncncnnT

≤

Θ−−=

Θ+=

Θ++≤

)(202

)(2018

)(107

51)(Substitution:

T(n) ≤ cn


Minor simplification• For n ≥ 50, we have 3 n/10 ≥ n/4.• Therefore, for n ≥ 50 the recursive call to

SELECT in Step 4 is executed recursively on ≤ 3n/4 elements.

• Thus, the recurrence for running time can assume that Step 4 takes time T(3n/4) in the worst case.

• For n < 50, we know that the worst-case time is T(n) = Θ(1).


Conclusions• Since the work at each level of recursion

is a constant fraction (18/20) smaller, the work per level is a geometric series dominated by the linear work at the root.

• In practice, this algorithm runs slowly, because the constant in front of n is large.

• The randomized algorithm is far more practical.

Exercise: Why not divide into groups of 3?

Introduction to Algorithms September 27, 2004 L6.2© Charles E. Leiserson and Piotr Indyk

Today: sorting

• Show that Θ (n lg n) is the best possible running time for a sorting algorithm.

• Design an algorithm that sorts in O(n) time.• Hint: different models ?


Comparison sortAll the sorting algorithms we have seen so far are comparison sorts: only use comparisons to determine the relative order of elements.• E.g., insertion sort, merge sort, quicksort,

heapsort.


xx





≤ x≤ x ≥ x≥ x ??p i rj

Invariant:


Comparison sort

• All of our algorithms used comparisons• All of our algorithms have running time

Ω(n lg n)• Is it the best that we can do using just

comparisons ?• Answer: YES, via decision trees


Decision-tree example

1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321

Each internal node is labeled i:j for i, j ∈ 1, 2,…, n.•The left subtree shows subsequent comparisons if ai ≤ aj.•The right subtree shows subsequent comparisons if ai ≥ aj.

Sort ⟨a1, a2, …, an⟩(n=3)



1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321


Sort ⟨a1, a2, a3⟩= ⟨ 9, 4, 6 ⟩:



1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321


9 ≥ 4Sort ⟨a1, a2, a3⟩= ⟨ 9, 4, 6 ⟩:



1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321


9 ≥ 6

Sort ⟨a1, a2, a3⟩= ⟨ 9, 4, 6 ⟩:



1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321


4 ≤ 6

Sort ⟨a1, a2, a3⟩= ⟨ 9, 4, 6 ⟩:



1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321

Each leaf contains a permutation ⟨π(1), π(2),…, π(n)⟩ to indicate that the ordering aπ(1) ≤ aπ(2) ≤ L ≤ aπ(n) has been established.

4 ≤ 6 ≤ 9

Sort ⟨a1, a2, a3⟩= ⟨ 9, 4, 6 ⟩:


Decision-tree modelA decision tree can model the execution of any comparison sort:• One tree for each input size n. • View the algorithm as splitting whenever it compares two

elements.• The tree contains the comparisons along all possible

instruction traces.• The number of comparisons done by the algorithm on a given

input =…the length of the path taken.

• Worst-case number of comparisons =…max path length = height of tree.

• Worst-case time ≥ worst-case number of comparisons


Lower bound for decision-tree sorting

Theorem. Any decision tree that can sort n elements must have height Ω(n lg n) .

Corollary. Any comparison sorting algorithm has worst-case running time Ω(n lg n).

Corollary 2. Merge sort and Heap Sort areasymptotically optimal comparison sorting algorithms.


Lower bound for decision-tree sorting

Theorem. Any decision tree that can sort n elements must have height Ω(n lg n) .Proof.• The tree must contain ≥ n! leaves, since there are n! possible permutations• A height-h binary tree has ≤ 2h leaves• Thus, 2h ≥ #leaves ≥ n! , or h ≥ lg(n!)


Proof, ctd.

2h ≥ n! ≥ n*(n-1)*…* n/2≥ (n/2)n/2

⇒ h ≥ lg( (n/2)n/2 )≥ (n/2) (lg n –lg 2)= Ω(n lg n) .


Example: sorting 3 elements

Recall h ≥ lg(n!)• n=3• n!=6• log26 = 2.58• Sorting 3 elements requires…

… ≥3 comparisons in the worst case


Decision-tree for n=3

1:21:2

2:32:3

123123 1:31:3

132132 312312

1:31:3

213213 2:32:3

231231 321321

Sort ⟨a1, a2, a3⟩


Sorting in linear time

Counting sort: No comparisons between elements.• Input: A[1 . . n], where A[ j]∈1, 2, …, k .• Output: B[1 . . n], sorted*• Auxiliary storage: C[1 . . k] .

*Actually, we require the algorithm to construct a permutation of the inputarray A that produces the sorted array B. This permutation can be obtainedby making small changes to the last loop of the algorithm.


Counting sort

for i ← 1 to kdo C[i] ← 0

for j ← 1 to ndo C[A[ j]] ← C[A[ j]] + 1 ⊳ C[i] = |key = i|

for i ← 2 to kdo C[i] ← C[i] + C[i–1] ⊳ C[i] = |key ≤ i|

for j ← n downto 1do B[C[A[ j]]] ← A[ j]

C[A[ j]] ← C[A[ j]] – 1


Counting-sort example

A: 44 11 33 44 33

B:

1 2 3 4 5

C:1 2 3 4


Loop 1

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 00 00 00 001 2 3 4

for i ← 1 to kdo C[i] ← 0


Loop 2

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 00 00 00 111 2 3 4



Loop 2

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 00 111 2 3 4



Loop 2

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 11 111 2 3 4



Loop 2

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 11 221 2 3 4



Loop 2

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 22 221 2 3 4



Loop 3

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 22 221 2 3 4

C': 11 11 22 22



Loop 3

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 22 221 2 3 4

C': 11 11 33 22



Loop 3

A: 44 11 33 44 33

B:

1 2 3 4 5

C: 11 00 22 221 2 3 4

C': 11 11 33 55



Loop 4

A: 44 11 33 44 33

B:

1 2 3 4 5

C': 11 11 33 55


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33

1 2 3 4 5

C': 11 11 33 55


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33

1 2 3 4 5

C': 11 11 22 55


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33 44

1 2 3 4 5

C': 11 11 22 55


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33 44

1 2 3 4 5

C': 11 11 22 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33 33 44

1 2 3 4 5

C': 11 11 22 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 33 33 44

1 2 3 4 5

C': 11 11 11 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 11 33 33 44

1 2 3 4 5

C': 11 11 11 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 11 33 33 44

1 2 3 4 5

C': 00 11 11 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 11 33 33 44 44

1 2 3 4 5

C': 00 11 11 44


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


Loop 4

A: 44 11 33 44 33

B: 11 33 33 44 44

1 2 3 4 5

C': 00 11 11 33


C[A[ j]] ← C[A[ j]] – 1

1 2 3 4


B vs C

B: 11 33 33 44 44 C': 11 11 33 55

In the end, each element i occupies the rangeB[C[i-1]+1 … C[i]]

1 2 3 41 2 3 4 5


Analysisfor i ← 1 to k

do C[i] ← 0

Θ(n)

Θ(k)

Θ(n)

Θ(k)

for j ← 1 to ndo C[A[ j]] ← C[A[ j]] + 1

for i ← 2 to kdo C[i] ← C[i] + C[i–1]


C[A[ j]] ← C[A[ j]] – 1Θ(n + k)


Running time

If k = O(n), then counting sort takes Θ(n) time.• But, sorting takes Ω(n lg n) time!• Why ?

Answer:• Comparison sorting takes Ω(n lg n) time.• Counting sort is not a comparison sort.• In fact, not a single comparison between

elements occurs!


Stable sorting

Counting sort is a stable sort: it preserves the input order among equal elements.

A: 44 11 33 44 33

B: 11 33 33 44 44


Sorting integers

• We can sort n integers from 1, 2, …, k in O(n+k) time

• This is nice if k=O(n)• What if, say, k=n2 ?


Radix sort

• Origin: Herman Hollerith’s card-sorting machine for the 1890 U.S. Census. (See Appendix .)

• Digit-by-digit sort.• Hollerith’s original (bad) idea: sort on

most-significant digit first.• Good idea: Sort on least-significant digit

first with auxiliary stable sort.


Operation of radix sort

3 2 94 5 76 5 78 3 94 3 67 2 03 5 5

7 2 03 5 54 3 64 5 76 5 73 2 98 3 9

7 2 03 2 94 3 68 3 93 5 54 5 76 5 7

3 2 93 5 54 3 64 5 76 5 77 2 08 3 9


• Sort on digit t

Correctness of radix sortInduction on digit position • Assume that the numbers

are sorted by their low-order t – 1 digits.

7 2 03 2 94 3 68 3 93 5 54 5 76 5 7

3 2 93 5 54 3 64 5 76 5 77 2 08 3 9


• Sort on digit t



7 2 03 2 94 3 68 3 93 5 54 5 76 5 7

3 2 93 5 54 3 64 5 76 5 77 2 08 3 9

Two numbers that differ in digit t are correctly sorted.


• Sort on digit t



7 2 03 2 94 3 68 3 93 5 54 5 76 5 7

3 2 93 5 54 3 64 5 76 5 77 2 08 3 9

Two numbers that differ in digit t are correctly sorted.Two numbers equal in digit tare put in the same order as the input ⇒ correct order.


Analysis of radix sort

• Assume counting sort is the auxiliary stable sort.• Sort n computer words of b bits eachE.g., if we sort elements in 1…n2 , b=2 lg n

• Each word can be viewed as having b/r base-2r

digits.

Example: 32-bit word8 8 8 8

r = 8 ⇒ b/r = 4 passes of counting sort on base-28 digits; or r = 16 ⇒ b/r = 2 passes of counting sort on base-216

digits.


Analysis (continued)Recall: Counting sort takes Θ(n + k) time to sort n numbers in the range from 0 to k – 1.If each b-bit word is broken into r-bit pieces, each pass of counting sort takes Θ(n + 2r) time. Since there are b/r passes, we have

( )

+Θ= rn

rbbnT 2),( .

Choose r to minimize T(n, b):• Increasing r means fewer passes, but as

r > lg n, the time grows exponentially.>


Choosing r( )

+Θ= rn

rbbnT 2),(

Minimize T(n, b) by differentiating and setting to 0.Or, just observe that we don’t want 2r > n, and there’s no harm asymptotically in choosing r as large as possible subject to this constraint.

>

Choosing r = lg n implies T(n, b) = Θ(bn/lg n) .

• For numbers in the range from 0 to nd – 1, we have b = d lg n ⇒ radix sort runs in Θ(d n) time.


Conclusions

Example (32-bit numbers):• At most 3 passes when sorting ≥ 2000 numbers.• Merge sort and quicksort do at least lg 2000 =

11 passes.

In practice, radix sort is fast for large inputs, as well as simple to code and maintain.

Downside: Unlike quicksort, radix sort displays little locality of reference.


Appendix: Punched-card technology

• Herman Hollerith (1860-1929)• Punched cards• Hollerith’s tabulating system• Operation of the sorter• Origin of radix sort• “Modern” IBM card• Web resources on punched-

card technologyReturn to last slide viewed.


Herman Hollerith(1860-1929)

• The 1880 U.S. Census took almost10 years to process.

• While a lecturer at MIT, Hollerith prototyped punched-card technology.

• His machines, including a “card sorter,” allowed the 1890 census total to be reported in 6 weeks.

• He founded the Tabulating Machine Company in 1911, which merged with other companies in 1924 to form International Business Machines.

Image removed due to copyright considerations.


Punched cards• Punched card = data record.• Hole = value. • Algorithm = machine + human operator.



Hollerith’s tabulating system•Pantograph card punch

•Hand-press reader•Dial counters•Sorting box



Operation of the sorter• An operator inserts a card into

the press.• Pins on the press reach through

the punched holes to make electrical contact with mercury-filled cups beneath the card.

• Whenever a particular digit value is punched, the lid of the corresponding sorting bin lifts.

• The operator deposits the card into the bin and closes the lid.

• When all cards have been processed, the front panel is opened, and the cards are collected in order, yielding one pass of a stable sort.



Origin of radix sort

Hollerith’s original 1889 patent alludes to a most-significant-digit-first radix sort:

“The most complicated combinations can readily be counted with comparatively few counters or relays by first assorting the cards according to the first items entering into the combinations, then reassorting each group according to the second item entering into the combination, and so on, and finally counting on a few counters the last item of the combination for each group of cards.”

Least-significant-digit-first radix sort seems to be a folk invention originated by machine operators.


“Modern” IBM card

So, that’s why text windows have 80 columns!

• One character per column.



Web resources on punched-card technology

• Doug Jones’s punched card index• Biography of Herman Hollerith• The 1890 U.S. Census• Early history of IBM• Pictures of Hollerith’s inventions• Hollerith’s patent application (borrowed

from Gordon Bell’s CyberMuseum)• Impact of punched cards on U.S. history

Introduction to Algorithms September 29, 2004 L7.2© Charles Leiserson and Piotr Indyk

Data Structures

• Role of data structures:– Encapsulate data– Support certain operations (e.g., INSERT,

DELETE, SEARCH)• What data structures do we know already ?• Yes, heap:

– INSERT(x)– DELETE-MIN


Dictionary problem

Dictionary T holding n records:

key[x]key[x]record

x

Other fields containing satellite data

Operations on T:• INSERT(T, x)• DELETE(T, x)• SEARCH(T, k)

How should the data structure T be organized?


Assumptions

Assumptions:

• The set of keys is K ⊆ U = 0, 1, …, u–1• Keys are distinct

What can we do ?


Direct access table

• Create a table T[0…u-1]:

• Benefit:– Each operation takes constant time

• Drawbacks:– The range of keys can be large:

• 64-bit numbers (which represent 18,446,744,073,709,551,616 different keys),

• character strings (even larger!)

T[k] = x if k ∈ K and key[x] = k,NIL otherwise.


As each key is inserted, h maps it to a slot of T.

Hash functionsSolution: Use a hash function h to map the universe U of all keys into0, 1, …, m–1:

U

Kk1

k2 k3

k4

k5

0

m–1

h(k1)h(k4)

h(k2)

h(k3)

When a record to be inserted maps to an already occupied slot in T, a collision occurs.

T

= h(k5)


Collisions resolution by chaining

• Records in the same slot are linked into a list.

h(49) = h(86) = h(52) = i

T

4949 8686 5252i


Hash functions

• Designing good functions is quite non-trivial

• For now, we assume they exist. Namely, we assume simple uniform hashing:– Each key k ∈ K of keys is equally likely

to be hashed to any slot of table T, independent of where other keys are hashed


Analysis of chaining

Let n be the number of keys in the table, and let m be the number of slots.Define the load factor of T to be

α = n/m= average number of keys per slot.


Search cost

Expected time to search for a record with a given key =

apply hash function and access slot

search the list

Expected search time = Θ(1) if α = O(1), or equivalently, if n = O(m).

Θ(1 + α).


Other operations

• Insertion time ?– Constant: hash and add to the list

• Deletion time ? Recall that we defined DELETE(T, x)– Also constant, if x has a pointer to the

collision list and list doubly linked– Otherwise, do SEARCH first


Delete

T

4949 8686 5252i

key[x]key[x]


Dealing with wishful thinking

The assumption of simple uniform hashing is hard to guarantee, but several common techniques tend to work well in practice as long as their deficiencies can be avoided.

Desirata:• A good hash function should distribute the

keys uniformly into the slots of the table.• Regularity in the key distribution (e.g.,

arithmetic progression) should not affect this uniformity.


Hashing in practice

Leaving the realm of Provable


h(k)

Division methodDefine

h(k) = k mod m.

Extreme deficiency: If m = 2r, then the hash doesn’t even depend on all the bits of k:• If k = 10110001110110102 and r = 6, then

h(k) = 0110102 .

Deficiency: Don’t pick an m that has a small divisor d. A preponderance of keys that are congruent modulo d can adversely affect uniformity.


Division method (continued)

h(k) = k mod m.

Pick m to be a prime.

Annoyance:• Sometimes, making the table size a prime is

inconvenient.But, this method is popular, although the next method we’ll see is usually superior.


Multiplication method

Assume that all keys are integers, m = 2r, and our computer has w-bit words. Define

h(k) = (A·k mod 2w) rsh (w – r),where rsh is the “bit-wise right-shift” operator and A is an odd integer in the range 2w–1 < A < 2w.• Don’t pick A too close to 2w.• Multiplication modulo 2w is fast.• The rsh operator is fast.


4

0

3526

17

Modular wheel

Multiplication method example

h(k) = (A·k mod 2w) rsh (w – r)Suppose that m = 8 = 23 and that our computer has w = 7-bit words:

1 0 1 1 0 0 1× 1 1 0 1 0 1 1

1 0 0 1 0 1 0 0 1 1 0 0 1 1

= A= k

h(k) A ..2A

..

3A..


Back to the realm of Provable


A weakness of hashing “as we saw it”

Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket.• An adversary can pick all keys from

h-1(i) =k ∈ U : h(k) = i for a slot i.

• There is a slot i for which |h-1(i)| ≥u/m


Solution

• Randomize!• Choose the hash function at random from

some family of function, and independently of the keys.

• Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesn’t know exactly which hash function will be chosen.

• What family of functions should we select ?


Family of hash functions

• Idea #1: Take the family of all functions h: U → 0…m-1

That is, choose each of h(0), h(1), … , h(u-1)independently at random from 0…m-1

• Benefit:– The uniform hashing assumption is true!

• Drawback:– We need u random numbers to specify h.

Where to store them ?


Universal hashingIdea #2: Universal Hashing

• Let H be a finite collection of hash functions, each mapping U to 0, 1, …, m–1

• We say H is universal if for all x, y ∈ U, where x ≠ y, we have

Prh∈Hh(x) = h(y)| = 1/m.


Universality is good

Theorem. Let h be a hash function chosen (uniformly) at random from a universal set Hof hash functions. Suppose h is used to hash n arbitrary keys into the m slots of a table T. Then, for a given key x, we have

E[#collisions with x] < n/m.


Proof of theorem

Proof. Let Cx be the random variable denoting the total number of collisions of keys in T with x, and let

cxy = 1 if h(x) = h(y),0 otherwise.

Note: E[cxy] = 1/m and ∑−∈

=xTyxyx cC .


Proof (continued)

= ∑

−∈ ][

xTyxyx cECE • Take expectation

of both sides.


Proof (continued)

∑

∑

−∈

−∈

=

=

][

][

xTyxy

xTyxyx

cE

cECE

• Linearity of expectation.

• Take expectation of both sides.


Proof (continued)

∑

∑

∑

−∈

−∈

−∈

=

=

=

/1

][

][

xTy

xTyxy

xTyxyx

m

cE

cECE

• E[cxy] = 1/m.


• Take expectation of both sides.


Proof (continued)

mn

m

cE

cECE

xTy

xTyxy

xTyxyx

1

/1

][

][

−=

=

=

=

∑

∑

∑

−∈

−∈

−∈• Take expectation

of both sides.


• E[cxy] = 1/m.

• Algebra..


Constructing a set of universal hash functions

• Let m be prime. • Decompose key k into r + 1 digits, each with value in the set 0, 1, …, m–1. • That is, let k = ⟨k0, k1, …, kr⟩, where 0 ≤ ki < m.Randomized strategy:• Pick a = ⟨a0, a1, …, ar⟩ where each ai is chosen randomly from 0, 1, …, m–1.

mkakhr

iiia mod)(

0∑=

=• Define

• Denote H=ha: a as above


Universality of dot-product hash functions

Theorem. The set H = ha is universal.

Proof. Suppose that x = ⟨x0, x1, …, xr⟩ and y = ⟨y0, y1, …, yr⟩ are distinct keys. Thus,

they differ in at least one digit position, wlogposition 0. What is the probability that x and ycollide, that is ha(x)=hb(y) ?


Proof (continued)

)(mod0)(0

myxar

iiii ≡−∑

=

)(mod0)()(1

000 myxayxar

iiii ≡−+− ∑

=

)(mod)()(1

000 myxayxar

iiii∑

=−−≡− .

)(mod)()(00

myaxabhxhr

iii

r

iiiaa ∑∑

==

≡⇔=


Recall PS 2

Theorem. Let m be prime. For any z ∈ Zmsuch that z ≠ 0, there exists a unique z–1 ∈ Zmsuch that

z · z–1 ≡ 1 (mod m).


Back to the proof

)(mod)()(1

000 myxayxar

iiii∑

=−−≡−

We have

and since x0 ≠ y0 , an inverse (x0 – y0 )–1 must exist, which implies that

,

)(mod)()( 100

10 myxyxaa

r

iiii

−

=−⋅

−−≡ ∑ .

Thus, for any choices of a1, a2, …, ar, exactly one choice of a0 causes x and y to collide.


Proof (completed)

Q. What is the probability that x and ycollide?

A. There are m choices for a0, but exactly one choice for a0 causes x and y to collide, namely

myxyxaar

iiii mod)()( 1

001

0

−⋅

−−= −

=∑ .

Thus, the probability of x and y colliding is 1/m.


Recap

• Showed how to implement dictionary so that INSERT, DELETE, SEARCH work in expected constant time under the uniform hashing assumption

• Relaxed the assumption to universal hashing

• Constructed universal hashing for keys in 0…mr -1


Perfect hashingGiven a set of n keys, construct a static hash table of size m = O(n) such that SEARCH takes Θ(1) time in the worst case.

IDEA: Two-level scheme with universal hashing at both levels.No collisions at level 2! 4040 3737 2222

0123456

2626

m a 0 1 2 3 4 5 6 7 8

1414 2727

S4

S6

S1

44 3131

11 0000

99 8686

T

h31(14) = h31(27) = 1


Collisions at level 2Theorem. Let H be a class of universal hash functions for a table of size m = n2. Then, if we use a random h ∈H to hash n keys into the table, the expected number of collisions is at most 1/2. Proof. By the definition of universality, the probability that 2 given keys in the table collide under h is 1/m = 1/n2. Since there are pairs of keys that can possibly collide, the expected number of collisions is

( )2n

211

2)1(1

2 22 <⋅−=⋅

n

nnn

n .


No collisions at level 2Corollary. The probability of no collisions is at least 1/2.

Thus, just by testing random hash functions in H, we’ll quickly find one that works.

Proof. Markov’s inequality says that for any nonnegative random variable X, we have

PrX ≥ t ≤ E[X]/t.Applying this inequality with t = 1, we find that the probability of 1 or more collisions is at most 1/2.


Analysis of storageFor the level-1 hash table T, choose m = n, andlet ni be random variable for the number of keys that hash to slot i in T. By using ni

2 slots for the level-2 hash table Si, the expected total storage required for the two-level scheme is therefore

( ) )(1

0

2 nnEm

ii Θ=

Θ∑

−

=,

since the analysis is identical to the analysis from recitation of the expected running time of bucket sort. (For a probability bound, apply Markov.)


Resolving collisions by open addressing

No storage is used outside of the hash table itself.• Insertion systematically probes the table until an

empty slot is found.• The hash function depends on both the key and

probe number:h : U × 0, 1, …, m–1 → 0, 1, …, m–1.

• The probe sequence ⟨h(k,0), h(k,1), …, h(k,m–1)⟩should be a permutation of 0, 1, …, m–1.

• The table may fill up, and deletion is difficult (but not impossible).


204204

Example of open addressing

Insert key k = 496:

0. Probe h(496,0)586133

481

T0

m–1

collision



Insert key k = 496:

0. Probe h(496,0)586133

204

481

T0

m–1

1. Probe h(496,1) collision586



Insert key k = 496:

0. Probe h(496,0)586133

204

481

T0

m–1

1. Probe h(496,1)

insertion496

2. Probe h(496,2)



Search for key k = 496:

0. Probe h(496,0)586133

204

481

T0

m–1

1. Probe h(496,1)

496

2. Probe h(496,2)

Search uses the same probesequence, terminating suc-cessfully if it finds the keyand unsuccessfully if it encounters an empty slot.


Probing strategies

Linear probing:Given an ordinary hash function h′(k), linear probing uses the hash function

h(k,i) = (h′(k) + i) mod m.This method, though simple, suffers from primary clustering, where long runs of occupied slots build up, increasing the average search time. Moreover, the long runs of occupied slots tend to get longer.


Probing strategies

Double hashingGiven two ordinary hash functions h1(k) and h2(k), double hashing uses the hash function

h(k,i) = (h1(k) + i⋅h2(k)) mod m.This method generally produces excellent results, but h2(k) must be relatively prime to m. One way is to make m a power of 2 and design h2(k) to produce only odd numbers.


Analysis of open addressing

We make the assumption of uniform hashing:• Each key is equally likely to have any one of

the m! permutations as its probe sequence.

Theorem. Given an open-addressed hash table with load factor α = n/m < 1, the expected number of probes in an unsuccessful search is at most 1/(1–α).


Proof of the theoremProof.• At least one probe is always necessary.• With probability n/m, the first probe hits an

occupied slot, and a second probe is necessary.• With probability (n–1)/(m–1), the second probe

hits an occupied slot, and a third probe is necessary.

• With probability (n–2)/(m–2), the third probe hits an occupied slot, etc.

Observe that α=<−−

mn

imin for i = 1, 2, …, n.


Proof (continued)

Therefore, the expected number of probes is

+−+

−−+

−−++ LL

111

221

1111

nmmn

mn

mn

( )( )( )( )

α

α

ααααααα

−=

=

++++≤++++≤

∑∞

=

11

11111

0

32

i

i

L

LL

.

The textbook has a more rigorous proof.


Implications of the theorem

• If α is constant, then accessing an open-addressed hash table takes constant time.

• If the table is half full, then the expected number of probes is 1/(1–0.5) = 2.

• If the table is 90% full, then the expected number of probes is 1/(1–0.9) = 10.


Dot-product methodRandomized strategy:Let m be prime. Decompose key k into r + 1digits, each with value in the set 0, 1, …, m–1. That is, let k = ⟨k0, k1, …, km–1⟩, where 0 ≤ ki < m.Pick a = ⟨a0, a1, …, am–1⟩ where each ai is chosen randomly from 0, 1, …, m–1.

mkakhr

iiia mod)(

0∑=

=Define .

• Excellent in practice, but expensive to compute.

Data structures

• Previous lecture: hash tables

– Insert, Delete, Search in (expected) constant time

– Works for integers from 0…mr-1 • This lecture: Binary Search Trees

– Insert, Delete, Search (Successor) – Works in comparison model

© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.2

Binary Search Tree

• Each node x has:– key[x]

– Pointers: • left[x] • right[x] • p[x]

9

125

1 6

7

8


Binary Search Tree (BST)

• Property: for any node x:– For all nodes y in the left

subtree of x: key[y] ≤ key[x]

– For all nodes y in the rightsubtree of x:

key[y] ≥ key[x]

• Given a set of keys, is BST forthose keys unique?

9

125

1 6

7

8


No uniqueness

9

125

1 6

7

8

7

5 9

1 6 8 12


What can we do given BST ?

• Sort !• Inorder-Walk(x):

If x≠NIL then– Inorder-Walk( left[x] ) – print key[x] – Inorder-Walk( right[x] )

• Output: 1 765 8

9

125

1 6

7

89 12


Sorting, ctd.

• What is the running time of Inorder-Walk?

• It is O(n) • Because:

– Each link is traversed twice

– There are O(n) links

9

125

1 6

7

8


Sorting, ctd.

• Does it mean that we can sort n keys in O(n) time ?

• No

• It just means that building a BST takes Ω(n log n)time (in the comparison model)

9

125

1 6

7

8


BST as a data structure

• Operations: – Insert(x) – Delete(x) – Search(k)


Search

Search(x): • If x≠NIL then

– If key[x] = k then return x – If k < key[x] then return

Search( left[x] )– If k > key[x] then return

Search( right[x] )• Else return NIL

Search(8.5):

9

125

1 6

7

Search(8):


8

Predecessor/Successor

• Can modify Search (into Search’) such that,if k is not stored in BST, we get x such that: – Either it has the largest key[x]<k, or– It has the smallest key[x]>k

• Useful when k prone to errors • What if we always want a successor of k ?

– x=Search’(k) – If key[x]<k, then return Successor(x) – Else return x


Successor

Successor(x):• If right[x] ≠ NIL then

return Minimum( right[x] ) • Otherwise

– y ← p[x] – While y≠NIL and x=right[y] do

• x ← y • y ← p[y]

– Return y

9

125

1 6

7

8

y

y

x

y

x

y

x

9


x

Minimum

Minimum( x )• While left[x]≠NIL do

– x ← left[x] • Return x

9

125

1 6

7

8


Nearest Neighbor

• Assuming keys are numbers• For a key k, can we find x such that |k-key[x]| is

minimal ? • Yes:

– key[x] must be either a predecessor orsuccessor of k

– y=Search’(k) //y is either succ or pred of k – y’ =Successor(y) – y’’=Predecessor(y) – Report the closest of key[y], key[y’], key[y’’]


Analysis

• How much time does all of this take ?

• Worst case: O(height) • Height really important • Tree better be balanced

9

125

1 6

7

8


Constructing BST

Insert(z): • y ← NIL• x ← root• While x ≠ NIL do

– y ← x– If key[z] < key[x]

then x ← left[x]else x ← right[x]

• p[z] ← y• If key[z] < key[y]

then left[y] ← zelse right[y] ← z

October 6, 2004 L7.16

9

125

1 6

7

8 Insert(8.5) Insert(5.5)

8.5

5.5 y

z

© Piotr Indyk Introduction to Algorithms

Analysis

• After we insert n elements, what is the worst possible BST height ?

• Pretty bad: n-1

1 2

3 4

5 6


Average case analysis

• Consider keys 1,2,…,n, in a random order • Each permutation equally likely • For each key perform Insert • What is the likely height of the tree ? • It is O(log n)


Introduction to Algorithms October 6, 2004 L7.19© Piotr Indyk

Creating a random BST

1 2 3• n=9 33 4 5 6 7 8 9

1 2 4 5 6 7 8 96

4 5 7 8 98

7 9

5

4

1

22

4 7 9

3 6 8 5 1 2 7 4 9

Observations

• Each edge corresponds to a random partition

• Element x has height h ⇒ x participated inh partitions

• Let hx be a random variable denoting height of x

• What is Pr[hx >t] , where t=c lg n ?


Partitions

• A partition is lucky if the ratio is at least 1:3, i.e., each side has size ≥ 25%

• Probability of lucky partition is ½

• After log4/3 n lucky partitions the element becomes a leaf

• hx>t ⇒ in t= c log4/3 n partitions we had <log4/3 n lucky ones

• Toss t= c log4/3 n coins, what is the probability you get <k=log4/3 n heads ?


Concentration inequalities

• CLRS, p. 1118: probability of at most k heads t /2t-kin t trials is at most ( ) t k

Pr[hx >t] ≤ ( )/2t-kk

≤ (et/k)k/2t-k

= (ce)log4/3 n/2(c-1) log4/3 n

= 2lg(ce) log4/3 n/2 (c-1) log4/3 n

= 2 [lg(ce) – (c-1)] * (lg n)/ lg(4/3)

≤ 2 -1.1 lg n = 1/n1.1, for sufficient c


Final Analysis

• We know that for each x, Pr[hx >t] ≤ 1/n1.1

• We want Pr[h1>t or h2>t or … or hn>t] • This is at most

Pr[h1>t]+Pr[h2>t] +…+ Pr[hn>t] ≤ n * 1/n1.1

= 1/n0.1

• As n grows, probability of height >c lgnbecomes arbitrarily small


Summing up

• We have seen BSTs

• Support Search, Successor, Nearest Neighbor etc, as well as Insert

• Worst case: O(n) • But O(log n) on average • Next week: O(log n) worst case


Today

• or how to avoid this Balanced search trees,

even in the worst case 1

2 3

4 5

6

© Piotr Indyk and Charles E. Leiserson Introduction to Algorithms October 13, 2004 L9.2

Balanced search treesBalanced search tree: A search-tree data structure for which a height of O(lg n) is guaranteed when implementing a dynamic set of n items.

• AVL trees

Examples: • 2-3 trees • 2-3-4 trees • B-trees • Red-black trees


Red-black trees

BSTs with an extra one-bit color field in each node. Red-black properties: 1. Every node is either red or black. 2. The root and leaves (NIL’s) are black. 3. If a node is red, then its parent is black.4. All simple paths from any node x to a

descendant leaf have the same number of black nodes.


Example of a red-black tree

88 1111

1010

1818

2626

2222

33

77

NIL NIL

NIL

NIL NIL NIL NIL NIL NIL


Use of red-black trees

• What properties would we like to prove about red-black trees ? – They always have O(log n) height– There is an O(log n)–time insertion

procedure which preserves the red-black properties

• Is it true that, after we add a new element to a tree (as in the previous lecture), we can always recolor the tree to keep it red-black ?


Example of a red-black tree

88 1111

1010

1818

2626

2222

33

77

7.57.5


Use of red-black trees

• What properties would we like to prove about red-black trees ?– They always have O(log n) height– There is an O(log n)–time insertion procedure

which preserves the red-black properties • Is it true that, after we add a new element to a tree (as

in the previous lecture), we can always recolor thetree to keep it red-black ?

• NO • After insertions, sometimes we need to juggle nodes

around


Rotations

AABB

αα ββ γγ

RIGHT-ROTATE(B)

BBAA

γγββ αα

LEFT-ROTATE(A)

Rotations maintain the inorder ordering of keys:• a ∈ α, b ∈ β, c ∈ γ ⇒ a ≤ A ≤ b ≤ B ≤ c.A rotation can be performed in O(1) time.


Rotations can reduce height

B A

A

1 2

31

LEFT-ROTATE(A) B2 3 γ γ

AA

BB

αα ββ γγ

BB

AA

γγββ αα


Red-black tree wrap-up

• Can show how

– O(log n) re-colorings

– 1 rotation

can restore red-black properties after an insertion

• Instead, we will see 2-3 trees (but will come back to red-black trees at the end)


2-3 Trees

• The simplest balanced trees on the planet! • Although a little bit more wasteful


2-3 Trees

• either 2 or 3

••

depth • Leaves are sorted

Degree of each node is

Keys are in the leaves All leaves have equal

9 1251 6 7 8

6 8 12

12

• Each node x contains maximum key in thesub-tree, denoted x.max


Internal nodes

• Internal nodes:– Values:

• x.max: maximum key in the sub-tree – Pointers:

• left[x] • mid[x] • right[x] : can be null • p[x] : can be null for the root • …

• Leaves: – x.max : the key


Height of 2-3 tree

• What is the maximum height h of a 2-3 tree with n nodes ?

• Alternatively, what is the minimum number of nodes in a 2-3 tree of height h ?

• It is 1+2+22+23+…+2h =2h+1-1 • n ≥ 2h+1-1 ⇒ h = O(log n) • Full binary tree is the worst-case example!


Searching

• How can we search for a key k ?

Search(x,k): • If x=NIL then return NIL • Else if x is a leaf then

– If x.max=k then return x – Else return NIL

• Else 9 1251 6 7 8

6 8 12

12

– If k ≤ left[x].max then Search(left[x],k)

– Else if k ≤ mid[x].max Search(8)then Search(mid[x],k) Search(13)

– Else Search(right[x],k)


12

12

Insertion

• How to insert x ? • Perform Search for the

key of x• Let y be the last internal

node • Insert x into y in a

sorted order• At the end, update the

max values on the pathto root

9 1251 6 7 8

6 8

7.55.5

13

13

Insert(7.5)(continued on the next Insert(13)slide) Insert(5.5)


13

12

12

Insertion, ctd.

(continued from the previous slide)

• If y has 4 children, then Split(y)

x

9 1251 6 7 8

6 8

7.55.5

13

13

y


13

Split

• Split y into two nodes y1, y2

• Both are linked to *z=parent(y)

• If z has 4 children, split z

*If y is a root, then create new parent(y)=new root

ba d

y

c

ba d

y1

c

y2

z

z


12

12

Split

9 1251 6 7 8

6 8

5.5

13

13

7.5 13


12

12

Split

9 1251 6 7 8

5.5 8

5.5

13

13

6

7.5 13


12

Split

9 1251 6 7 8

5.5 8

5.5

13

6

6

13

13

7.5 13 • Insert and Split preserve heights, unless new root is created, in which case all heights are

increased by 1• After Split, all nodes have 2 or 3 children • Everything takes O(log n) time


12

Delete

• How to delete x ? • Let y=p(x) • Remove x from y • If y has 1 child:

– Remove y 9 1251 6 7 8

5.5 8

5.5

12

6

6

12

12

y z

– Attach x to y’s sibling z x

Delete(8)


12

Delete

• How to delete x ? • Let y=p(x) • Remove x from y • If y has 1 child:

– Remove y 9 1251 6 7

5.5

5.5

12

6

6

12

12

z

– Attach x to y’s sibling z • If z has 4 children, then

Split(z) Delete(8)INCOMPLETE – SEE THE END FOR FULL VERSION© Piotr Indyk and Charles E. Leiserson Introduction to Algorithms October 13, 2004 L9.24

Summing up

• 2-3 Trees: – O(log n) depth ⇒ Search in O(log n) – Insert, Delete (and Split) in O(log n)

• We will now see 2-3-4 trees – Same idea, but:

• Each parent has 2,3 or 4 children • Keys in the inner nodes • More complicated procedures

timetime


2-3-4 Trees

5 9

1 2 4 7 8 10 12


Height of a red-black tree

Theorem. A red-black tree with n keys has heighth ≤ 2 lg(n + 1).

INTUITION:• Merge red nodes

into their black parents.






h′

• This process produces a tree in which each node has 2, 3, or 4 children.

• The 2-3-4 tree has uniform depth h′ of leaves. © Piotr Indyk and Charles E. Leiserson Introduction to Algorithms October 13, 2004 L9.32

Summing up

• We have seen: – Red-black trees – 2-3 trees (in detail) – 2-3-4 trees

• Red-black trees are undercover 2-3-4 trees • In most cases, does not matter what you use


12

2-3 Trees: Deletions

• Problem: there is an internal node that has only 1 child

9 1251 6 7

5.5

5.5

12

6

6

12

12


Full procedure for Delete(x)

• Special case: x is the only element in the tree: delete everything

x NIL• Not-so-special case: x is one of two elements

in the tree. In this case, the procedure on thenext slide will delete x

x y

y

• Both NIL and y are special 2-3 trees © Piotr Indyk and Charles E. Leiserson Introduction to Algorithms October 13, 2004 L9.35

Procedure for Delete(x)• Let y=p(x) • Remove x • If y≠root then

– Let z be the sibling of y. – Assume z is the right sibling of y, otherwise the code is

symmetric.– If y has only 1 child w left

Case 1: z has 3 children • Attach left[z] as the rightmost child of y • Update y.max and z.max Case 2: z has 2 children: • Attach the child w of y as the leftmost child of z • Update z.max • Delete(y) (recursively*)

– Else • Update max of y, p(y), p(p(y)) and so on until root

• Else – If root has only one child u

• Remove root • Make u the new root

*Note that the input of Delete does not have to be a leaf

x

zy


12

Example

9 1251 6 7 8

5.5 8

5.5

12

6

6

12

12


12

Example, ctd.

9 1251 6 7

5.5 8

5.5

12

6

6

12

12


12

Example, ctd.

9 1251 6 7

5.5

5.5

12

12

6

12

12


12

Example, ctd.

9 1251 6 7

5.5

5.5

12

12

6


Introduction to Algorithms6.046J/18.401J/SMA5503


Introduction to Algorithms October 18, 2004 L10.2© 2004 by Erik Demaine and Piotr Indyk

Today

• A data structure for a new problem• Amortized analysis


2-3 Trees: Deletions

• Problem: there is an internal node that has only 1child

• Solution: delete recursively

9 1251 6 7

5.5 12

5.5

12

6

6

12

12


Example

9 1251 6 7 8

5.5 8 12

5.5

12

6

6

12

12


Example, ctd.

9 1251 6 7

5.5 8 12

5.5

12

6

6

12

12


Example, ctd.

9 1251 6 7

5.5 12

5.5

12

12

6

12

12


Example, ctd.

9 1251 6 7

5.5 12

5.5

12

12

6


Procedure for Delete(x)• Let y=p(x)• Remove x• If y≠root then

– Let z be the sibling of y.– Assume z is the right sibling of y, otherwise the code is

symmetric.– If y has only 1 child w left

Case 1: z has 3 children • Attach left[z] as the rightmost child of y• Update y.max and z.maxCase 2: z has 2 children:• Attach the child w of y as the leftmost child of z• Update z.max• Delete(y) (recursively*)

– Else• Update max of y, p(y), p(p(y)) and so on until root

• Else – If root has only one child u

• Remove root• Make u the new root

*Note that the input of Delete does not have to be a leaf

x

zy


2-3 Trees

• The simplest balanced trees on the planet!(but, nevertheless, not completely trivial)


Dynamic Maintenance of Sets• Assume, we have a collection

of elements• The elements are clustered• Initially, each element forms

its own cluster/set• We want to enable two

operations:– FIND-SET(x): report the

cluster containing x– UNION(C1, C2): merges

the clusters C1, C2

1 2

3

5

6


Disjoint-set data structure(Union-Find)

Problem:• Maintain a collection of pairwise-disjointsets S = S1, S2, …, Sr.• Each Si has one representative element x=rep[Si].• Must support three operations:

• MAKE-SET(x): adds new set x to Swith rep[x] = x (for any x ∉ Si for all i).

• UNION(x, y): replaces sets Sx, Sy with Sx ∪ Syin S for any x, y in distinct sets Sx, Sy .

• FIND-SET(x): returns representative rep[Sx]of set Sx containing element x.

rep.WEAK


Quiz

• If we have a WEAKUNION( x, y) that works only if x, y are representatives, how can we implement UNION that works for any x, y ?

• UNION( x, y)=WEAKUNION( FIND-SET(x) , FIND-SET(y) )


Representation

x

Other fields containingdata of our choice

Data


Applications

• Data clustering• Killer App: Minimum

Spanning Tree (Lecture 13)

• Amortized analysis

1 2

3

5

6


Ideas ?

• How can we implement this data structure efficiently ?– MAKE-SET

– UNION

– FIND-SET


Bad case for UNION or FIND

1 2 … n n+1 … 2n


Simple linked-list solutionStore set Si = x1, x2, …, xk as an (unordered) doubly linked list. Define representative elementrep[Si] to be the front of the list, x1.

…Si : x1 x2 xk

rep[Si]

• MAKE-SET(x) initializes x as a lone node.• FIND-SET(x) walks left in the list containing x

until it reaches the front of the list.• UNION(x, y) concatenates the lists containing

x and y, leaving rep. as FIND-SET[x].

Θ(1)

Θ(n)

Θ(n)

How can we improve it ?


Augmented linked-list solution

…Si : x1 x2 xk

rep[Si]

Store set Si = x1, x2, …, xk as unordered doublylinked list. Each xj also stores pointer rep[xj] to head.

• FIND-SET(x) returns rep[x].• UNION(x, y) concatenates the lists containing

x and y, and updates the rep pointers forall elements in the list containing y.


Example ofaugmented linked-list solution

Sx : x1 x2

rep[Sx]

rep

Sy : y1 y2 y3

rep[Sy]

rep



Sx ∪ Sy :x1 x2

rep[Sx]

rep

y1 y2 y3

rep[Sy]

rep



Sx ∪ Sy :

x1 x2

rep[Sx ∪ Sy]y1 y2 y3

rep


Augmented linked-list solution

…Si : x1 x2 xk

rep[Si]

Store set Si = x1, x2, …, xk as unordered doublylinked list. Each xj also stores pointer rep[xj] to head.

• FIND-SET(x) returns rep[x].• UNION(x, y) concatenates the lists containing

x and y, and updates the rep pointers forall elements in the list containing y.

Θ(1)

Θ(n)

?


Amortized analysis

• So far, we focused on worst-case time of each operation.– E.g., UNION takes Θ(n) time for some operations

• Amortized analysis: count the total time spent by any sequence of operations

• Total time is always at mostworst-case-time-per-operation * #operations

but it can be much better!• E.g., if times are 1,1,1,…,1,n,1,…,1• Can we modify the linked-list data structure so that any

sequence of m MAKE-SET, FIND-SET, UNION operations cost less than m*Θ(n) time?


Alternative

Sx : x1 x2

rep[Sy]

UNION(x, y) :• concatenates the lists containing y and x, and• update the rep pointers for all elements in the

list containing

y1 y2 y3

rep

rep[Sx]rep

Sy :

xy


Alternative concatenation

Sx ∪ Sy :x1 x2

rep[Sy]

UNION(x, y) could instead• concatenate the lists containing y and x, and• update the rep pointers for all elements in the

list containing x.

y1 y2 y3

rep[Sx]rep

rep


Alternative concatenation

Sx ∪ Sy :x1 x2

UNION(x, y) could instead• concatenate the lists containing y and x, and• update the rep pointers for all elements in the

list containing x.

y1 y2 y3

rep

rep

rep[Sx ∪ Sy]


Smaller into larger• Concatenate smaller list onto the end of the larger list (each list stores its weight = # elements)

• Cost = Θ(length of smaller list).

Let n denote the overall number of elements(equivalently, the number of MAKE-SET operations).Let m denote the total number of operations.

Theorem: Cost of all UNION’s is O(n lg n).Corollary: Total cost is O(m + n lg n).


Total UNION cost is O(n lg n)Proof:• Monitor an element x and set Sx containing it• After initial MAKE-SET(x), weight[Sx] = 1• Consider any time when Sx is merged with set Sy

– If weight[Sy] ≥ weight[Sx]• pay 1 to update rep[x]• weight[Sx] at least doubles (increasing by weight[Sy])

– Otherwise• pay nothing• weight[Sx] only increases

• Thus:– Each time we pay 1, the weight doubles– Maximum possible weight is n– Maximum pay ≤ lg n for x , or O(n log n) overall


Final Result

• We have a data structure for dynamic sets which supports:– MAKE-SET: O(1) worst case– FIND-SET: O(1) worst case– UNION:

• Any sequence of any m operations* takes O(m log n) time, or• … the amortized complexity of the operations* is O(log n)

* I.e., MAKE-SET, FIND-SET or UNION


Amortized vs Average

• What is the difference between average case complexity and amortized complexity ?– “Average case” assumes random

distribution over the input (e.g., random sequence of operations)

– “Amortized” means we count the totaltime taken by any sequence of moperations (and divide it by m)


Can we do better ?

• One can do:– MAKE-SET: O(1) worst case– FIND-SET: O(lg n) worst case– WEAKUNION: O(1) worst case– Thus, UNION: O(lg n) worst case


Representing sets as trees• Each set Si = x1, x2, …, xk stored as a tree• rep[Si] is the tree root.

S1 = x1, x2, x3, x4, x5 , x6S2 = x7

• MAKE-SET(x) initializes xas a lone node.

• FIND-SET(x) walks up thetree containing x until itreaches the root.

• UNION(x, y) concatenatesthe trees containingx and y

x1

x4 x3

x2 x5

rep[S1]

x6

x7

rep[S2]

UNION(rep[S1] ,rep[S1]): rep[S1 ∪ S2]


Time Analysis

O(1)• MAKE-SET(x) initializes xas a lone node.


• WEAKUNION(x, y)concatenatesthe trees containing x and y

O(depth)

O(1)

= ?


“Smaller into Larger” in trees

y1

y4 y3

Algorithm: Merge tree with smaller weight into tree withlarger weight.• Height of tree increases only when its sizedoubles• Height logarithmic in weight

x1

x4 x3

x2 x5 x6


“Smaller into Larger” in trees

Proof:• Monitor the height of an element z• Each time the height of z increases, the

weight of its tree doubles• Maximum weight is n• Thus, height of z is ≤log n


Tree implementation

• We have:– MAKE-SET: O(1) worst case– FIND-SET: O(depth) = O(lg n) worst case– WEAKUNION: O(1) worst case

• Can amortized analysis buy us anything ? • Need another trick…


Trick 2: Path compressionWhen we execute a FIND-SET operation and walkup a path to the root, we know the representativefor all the nodes on the path.

y1

y4 y3

y2 y5

x1

x4 x3

x2 x5 x6

Path compression makesall of those nodes directchildren of the root.

FIND-SET(y2)


Trick 2: Path compressionWhen we execute a FIND-SET operation and walkup a path to the root, we know the representativefor all the nodes on the path.

y1

y4 y3

y2 y5

x1

x4 x3

x2 x5 x6


FIND-SET(y2)


Trick 2: Path compressionWhen we execute a FIND-SET operation and walkup a path p to the root, we know the representativefor all the nodes on path p.

y1

y4

y3y2

y5

x1

x4 x3

x2 x5 x6

FIND-SET(y2)


Cost of FIND-SET(x)is still Θ(depth[x]).


Theorem: In general, amortized cost is

The Theorem

where α(n) grows really, really, really slow.O(α(n)),


Ackermann’s function A

Define

≥=+

= +− 1 if

0 if )(

1)( )1(

1 kk

jAj

jA jk

k

Define α(n) = min k : Ak(1) ≥ n.

A0( j) = j + 1A1( j) = A0(…(A0( j)…) ~2jA2( j) = A1(…A1( j)…) ~2j 2j

A3( j ) >

A4( j) is a lot bigger.

22

2

2 j

...

j

A0(1) = 2A1(1) = 3A2(1) = 7A3(1) = 2047

A4(1) >

-iterate Ak-1() j+1 times

22

2

22047

...

2048


Theorem: In general, amortized cost is

The Theorem

where α(n) grows really, really, really slow.O(α(n)),

Proof: Really, really, really long (CLRS, p. 509)


Application:Dynamic connectivity

Suppose a graph is given to us incrementally by• ADD-VERTEX(v)• ADD-EDGE(u, v)

and we want to support connectivity queries:• CONNECTED(u, v):Are u and v in the same connected component?

For example, we want to maintain a spanning forest,so we check whether each new edge connects apreviously disconnected pair of vertices.


Application:Dynamic connectivity

Sets of vertices represent connected components.Suppose a graph is given to us incrementally by

• ADD-VERTEX(v) – MAKE-SET(v)• ADD-EDGE(u, v) – if not CONNECTED(u, v)

then UNION(v, w)and we want to support connectivity queries:

• CONNECTED(u, v): – FIND-SET(u) = FIND-SET(v)Are u and v in the same connected component?

For example, we want to maintain a spanning forest,so we check whether each new edge connects apreviously disconnected pair of vertices.


Simple balanced-tree solutionStore each set Si = x1, x2, …, xk as a balanced tree(ignoring keys). Define representative elementrep[Si] to be the root of the tree.

x1

x4 x3

x2 x5

• MAKE-SET(x) initializes xas a lone node.


• UNION(x, y) concatenatesthe trees containing x and y,changing rep.

Si = x1, x2, x3, x4, x5

rep[Si]– Θ(1)

– Θ(lg n)

– Θ(lg n)


Plan of attackWe will build a simple disjoint-union data structurethat, in an amortized sense, performs significantlybetter than Θ(lg n) per op., even better thanΘ(lg lg n), Θ(lg lg lg n), etc., but not quite Θ(1).

To reach this goal, we will introduce two key tricks.Each trick converts a trivial Θ(n) solution into asimple Θ(lg n) amortized solution. Together, thetwo tricks yield a much better solution.

First trick arises in an augmented linked list.Second trick arises in a tree structure.


Each element xj stores pointer rep[xj] to rep[Si].

UNION(x, y)• concatenates the lists containing x and y,

and• updates the rep pointers for all elements

in thelist containing y.


Analysis of Trick 2 aloneTheorem: Total cost of FIND-SET’s is O(m lg n).Proof: Amortization by potential function.The weight of a node x is # nodes in its subtree.Define φ(x1, …, xn) = Σi lg weight[xi].UNION(xi, xj) increases potential of root FIND-SET(xi)by at most lg weight[root FIND-SET(xj)] ≤ lg n.Each step down p → c made by FIND-SET(xi),except the first, moves c’s subtree out of p’s subtree.Thus if weight[c] ≥ ½ weight[p], φ decreases by ≥ 1,paying for the step down. There can be at most lg nsteps p → c for which weight[c] < ½ weight[p].


Analysis of Trick 2 aloneTheorem: If all UNION operations occur beforeall FIND-SET operations, then total cost is O(m).

Proof: If a FIND-SET operation traverses a pathwith k nodes, costing O(k) time, then k – 2 nodesare made new children of the root. This changecan happen only once for each of the n elements,so the total cost of FIND-SET is O(f + n).


UNION(x, y)

• Every tree has a rank• Rank is an upper bound for height• When we take UNION(x, y):

– If rank[x] >rank[y] then link y to x– If rank[x] <rank[y] then link x to y– If rank[x]=rank[y] then

• link x to y• rank[y]=rank[y]+1

• Can show that 2rank(x) ≤ #elements in x (Exercise 21.4-2)• Therefore, height is O(log n)



LECTURE 11Amortized analysis• Dynamic tables• Aggregate method• Accounting method• Potential method

Introduction to Algorithms October 20, 2004 L14.2© 2001–4 by Charles E. Leiserson

How large should a hash table be?

Problem: What if we don’t know the proper size in advance?

Goal: Make the table as small as possible, but large enough so that it won’t overflow (or otherwise become inefficient).

IDEA: Whenever the table overflows, “grow” it by allocating (via malloc or new) a new, larger table. Move all items from the old table into the new one, and free the storage for the old table.

Solution: Dynamic tables.


Example of a dynamic table

1. INSERT 1

2. INSERT overflow


11


1. INSERT2. INSERT overflow


112


1. INSERT2. INSERT



1. INSERT2. INSERT

11

22

3. INSERT overflow



1. INSERT2. INSERT3. INSERT

21

overflow



1. INSERT2. INSERT3. INSERT

21



1. INSERT2. INSERT3. INSERT4. INSERT 4

321



1. INSERT2. INSERT3. INSERT4. INSERT5. INSERT

4321

overflow




4321

overflow




4321



1. INSERT2. INSERT3. INSERT4. INSERT

6. INSERT 65. INSERT 5

4321

77. INSERT


Worst-case analysis

Consider a sequence of n insertions. The worst-case time to execute one insertion is Θ(n). Therefore, the worst-case time for ninsertions is n ·Θ(n) = Θ(n2).

WRONG! In fact, the worst-case cost for n insertions is only Θ(n) Θ(n2).

Let’s see why.


Tighter analysis

i 1 2 3 4 5 6 7 8 9 10sizei 1 2 4 4 8 8 8 8 16 16

ci 1 2 3 1 5 1 1 1 9 1

Let ci = the cost of the i th insertion

= i if i – 1 is an exact power of 2,1 otherwise.


Tighter analysis

Let ci = the cost of the i th insertion

= i if i – 1 is an exact power of 2,1 otherwise.

i 1 2 3 4 5 6 7 8 9 10sizei 1 2 4 4 8 8 8 8 16 16

1 1 1 1 1 1 1 1 1 11 2 4 8

ci


Tighter analysis (continued)

)(3

2

)1lg(

0

1

nn

n

c

n

j

j

n

ii

Θ=≤

+≤

=

∑

∑−

=

=Cost of n insertions

.

Thus, the average cost of each dynamic-table operation is Θ(n)/n = Θ(1).


Amortized analysisAn amortized analysis is any strategy for analyzing a sequence of operations to show that the average cost per operation is small, even though a single operation within the sequence might be expensive.

Even though we’re taking averages, however, probability is not involved!• An amortized analysis guarantees the

average performance of each operation in the worst case.


Types of amortized analysesThree common amortization arguments:• the aggregate method,• the accounting method,• the potential method.We’ve just seen an aggregate analysis. The aggregate method, though simple, lacks the precision of the other two methods. In particular, the accounting and potential methods allow a specific amortized cost to be allocated to each operation.


Accounting method• Charge i th operation a fictitious amortized costĉi, where $1 pays for 1 unit of work (i.e., time).

• This fee is consumed to perform the operation.• Any amount not immediately consumed is stored

in the bank for use by subsequent operations.• The bank balance must not go negative! We

must ensure that

∑∑==

≤n

ii

n

ii cc

11ˆ

for all n.• Thus, the total amortized costs provide an upper

bound on the total true costs.


$0$0 $0$0 $0$0 $0$0 $2$2 $2$2

Example:$2 $2

Accounting analysis of dynamic tables

Charge an amortized cost of ĉi = $3 for the i th insertion.• $1 pays for the immediate insertion.• $2 is stored for later table doubling.When the table doubles, $1 pays to move a recent item, and $1 pays to move an old item.

overflow


Example:



overflow

$0$0 $0$0 $0$0 $0$0 $0$0 $0$0 $0$0 $0$0


Example:



$0$0 $0$0 $0$0 $0$0 $0$0 $0$0 $0$0 $0$0 $2 $2 $2


Accounting analysis (continued)

Key invariant: Bank balance never drops below 0. Thus, the sum of the amortized costs provides an upper bound on the sum of the true costs.

i 1 2 3 4 5 6 7 8 9 10sizei 1 2 4 4 8 8 8 8 16 16

ci 1 2 3 1 5 1 1 1 9 1ĉi 2 3 3 3 3 3 3 3 3 3

banki 1 2 2 4 2 4 6 8 2 4*

*Okay, so I lied. The first operation costs only $2, not $3.


Potential methodIDEA: View the bank account as the potential energy (à la physics) of the dynamic set.Framework:• Start with an initial data structure D0.• Operation i transforms Di–1 to Di. • The cost of operation i is ci.• Define a potential function Φ : Di → R,

such that Φ(D0 ) = 0 and Φ(Di ) ≥ 0 for all i. • The amortized cost ĉi with respect to Φ is

defined to be ĉi = ci + Φ(Di) – Φ(Di–1).


Understanding potentialsĉi = ci + Φ(Di) – Φ(Di–1)

potential difference ∆Φi

• If ∆Φi > 0, then ĉi > ci. Operation i stores work in the data structure for later use.

• If ∆Φi < 0, then ĉi < ci. The data structure delivers up stored work to help pay for operation i.


The amortized costs bound the true costs

The total amortized cost of n operations is

( )∑∑=

−=

Φ−Φ+=n

iiii

n

ii DDcc

11

1)()(ˆ

Summing both sides.




( )

)()(

)()(ˆ

01

11

1

DDc

DDcc

n

n

ii

n

iiii

n

ii

Φ−Φ+=

Φ−Φ+=

∑

∑∑

=

=−

=

The series telescopes.




( )

∑

∑

∑∑

=

=

=−

=

≥

Φ−Φ+=

Φ−Φ+=

n

ii

n

n

ii

n

iiii

n

ii

c

DDc

DDcc

1

01

11

1

)()(

)()(ˆ

since Φ(Dn) ≥ 0 andΦ(D0 ) = 0.


Potential analysis of table doubling

Define the potential of the table after the ith insertion by Φ(Di) = 2i – 2lg i. (Assume that 2lg 0 = 0.)Note:• Φ(D0 ) = 0,• Φ(Di) ≥ 0 for all i.Example:

•• •• •• •• •• •• Φ = 2·6 – 23 = 4

$0$0 $0$0 $0$0 $0$0 $2$2 $2$2 accounting method)(


Calculation of amortized costs

The amortized cost of the i th insertion is

ĉi = ci + Φ(Di) – Φ(Di–1)

i + (2i – 2lg i) – (2(i –1) – 2lg (i–1))if i – 1 is an exact power of 2,

1 + (2i – 2lg i) – (2(i –1) – 2lg (i–1))otherwise.

=


Calculation (Case 1)

Case 1: i – 1 is an exact power of 2.

ĉi = i + (2i – 2lg i) – (2(i –1) – 2lg (i–1))= i + 2 – (2lg i – 2lg (i–1))= i + 2 – (2(i – 1) – (i – 1))= i + 2 – 2i + 2 + i – 1= 3


Calculation (Case 2)

Case 2: i – 1 is not an exact power of 2.

ĉi = 1 + (2i – 2lg i) – (2(i –1) – 2lg (i–1))= 1 + 2 – (2lg i – 2lg (i–1))= 3

Therefore, n insertions cost Θ(n) in the worst case.

Exercise: Fix the bug in this analysis to show that the amortized cost of the first insertion is only 2.


Conclusions• Amortized costs can provide a clean abstraction

of data-structure performance.• Any of the analysis methods can be used when

an amortized analysis is called for, but each method has some situations where it is arguably the simplest.

• Different schemes may work for assigning amortized costs in the accounting method, or potentials in the potential method, sometimes yielding radically different bounds.



LECTURE 12Dynamic programming• Longest common

subsequence• Optimal substructure• Overlapping subproblems


Dynamic programmingDesign technique, like divide-and-conquer.

Example: Longest Common Subsequence (LCS)• Given two sequences x[1 . . m] and y[1 . . n], find

a longest subsequence common to them both.




a longest subsequence common to them both.“a” not “the”





x: A B C B D A B

y: B D C A B A

“a” not “the”





x: A B C B D A B

y: B D C A B A

“a” not “the”

BCBA = LCS(x, y)

functional notation, but not a function


Brute-force LCS algorithm

Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n].


Brute-force LCS algorithm

Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n].

Analysis• Checking = O(n) time per subsequence.• 2m subsequences of x (each bit-vector of

length m determines a distinct subsequence of x).

Worst-case running time = O(n2m)= exponential time.


Towards a better algorithmSimplification:1. Look at the length of a longest-common

subsequence. 2. Extend the algorithm to find the LCS itself.




Notation: Denote the length of a sequence sby | s |.


Recursive formulationTheorem.

c[i, j] =c[i–1, j–1] + 1 if x[i] = y[j],maxc[i–1, j], c[i, j–1] otherwise.




Proof. Case x[i] = y[ j]:

L1 2 i m

L1 2 j n

x:

y:=




Let z[1 . . k] = LCS(x[1 . . i], y[1 . . j]), where c[i, j] = k. Then, z[k] = x[i], or else z could be extended. Thus, z[1 . . k–1] is CS of x[1 . . i–1] and y[1 . . j–1].

Proof. Case x[i] = y[ j]:

L1 2 i m

L1 2 j n

x:

y:=


Proof (continued)Claim: z[1 . . k–1] = LCS(x[1 . . i–1], y[1 . . j–1]).

Suppose w is a longer CS of x[1 . . i–1] andy[1 . . j–1], that is, |w | > k–1. Then, cut and paste: w || z[k] (w concatenated with z[k]) is a common subsequence of x[1 . . i] and y[1 . . j]with |w || z[k] | > k. Contradiction, proving the claim.


Proof (continued)Claim: z[1 . . k–1] = LCS(x[1 . . i–1], y[1 . . j–1]).

Suppose w is a longer CS of x[1 . . i–1] andy[1 . . j–1], that is, |w | > k–1. Then, cut and paste: w || z[k] (w concatenated with z[k]) is a common subsequence of x[1 . . i] and y[1 . . j]with |w || z[k] | > k. Contradiction, proving the claim.

Thus, c[i–1, j–1] = k–1, which implies that c[i, j] = c[i–1, j–1] + 1.Other cases are similar.


Dynamic-programming hallmark #1

Optimal substructureAn optimal solution to a problem

(instance) contains optimal solutions to subproblems.



Optimal substructureAn optimal solution to a problem

(instance) contains optimal solutions to subproblems.

If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.


Recursive algorithm for LCS

LCS(x, y, i, j)if x[i] = y[ j]

then c[i, j] ← LCS(x, y, i–1, j–1) + 1else c[i, j] ← maxLCS(x, y, i–1, j),

LCS(x, y, i, j–1)


Recursive algorithm for LCS

LCS(x, y, i, j)if x[i] = y[ j]


LCS(x, y, i, j–1)Worst-case: x[i] ≠ y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.


Recursion treem = 3, n = 4: 3,43,4

2,42,4

1,41,4

3,33,3

3,23,22,32,3

1,31,3 2,22,2

2,32,3

1,31,3 2,22,2



2,42,4

1,41,4

3,33,3

3,23,22,32,3

1,31,3 2,22,2

Height = m + n ⇒ work potentially exponential.

2,32,3

1,31,3 2,22,2

m+n


same subproblem

,but we’re solving subproblems already solved!


2,42,4

1,41,4

3,33,3

3,23,22,32,3

1,31,3 2,22,2

Height = m + n ⇒ work potentially exponential.

2,32,3

1,31,3 2,22,2

m+n



Overlapping subproblemsA recursive solution contains a

“small” number of distinct subproblems repeated many times.



Overlapping subproblemsA recursive solution contains a

“small” number of distinct subproblems repeated many times.

The number of distinct LCS subproblems for two strings of lengths m and n is only mn.


Memoization algorithmMemoization: After computing a solution to a subproblem, store it in a table. Subsequent calls check the table to avoid redoing work.


Memoization algorithmMemoization: After computing a solution to a subproblem, store it in a table. Subsequent calls check the table to avoid redoing work.LCS(x, y, i, j)

if c[i, j] = NILthen if x[i] = y[j]


LCS(x, y, i, j–1)

same as before


Memoization algorithmMemoization: After computing a solution to a subproblem, store it in a table. Subsequent calls check the table to avoid redoing work.

Time = Θ(mn) = constant work per table entry.Space = Θ(mn).

LCS(x, y, i, j)if c[i, j] = NIL

then if x[i] = y[j]then c[i, j] ← LCS(x, y, i–1, j–1) + 1else c[i, j] ← maxLCS(x, y, i–1, j),

LCS(x, y, i, j–1)

same as before


00 00 00 00 00

00 00 11 11 1100 00 00

11 11 11

00 00 11 11 11 22 22D 22

00 00 11 22 22 22 22C 22

00 11 11 22 22 22 33A 33

00 11 22 22 33 33 33B 44

00 11 22 22 33 33

A

Dynamic-programming algorithm

IDEA:Compute the table bottom-up.

A B C B D B

B

A 44 44


00 00 00 00 00

00 00 11 11 1100 00 00

11 11 11

00 00 11 11 11 22 22D 22

00 00 11 22 22 22 22C 22

00 11 11 22 22 22 33A 33

00 11 22 22 33 33 33B 44

00 11 22 22 33 33

A



A B C B D B

B

A 44 44

Time = Θ(mn).


00 00 00 00 00 00 00 00

00 00 11 11 11 11 11 11

00 00 11 11 11 22 22D 22

00 00 11 22 22 22 22C 22

00 11 11 22 22 22 33A 33

00 11 22 22 33 33 33B 44

00 11 22 22 33 33

A



A B C B D B

B

A 44 44

Time = Θ(mn).Reconstruct LCS by tracing backwards.

0A

4

0B

B1

C

C

2B

B

3

A

A

D1

A2

D

3

B

4


00 00 00 00 00 00 00 00

00 00 11 11 11 11 11 11

00 00 11 11 11 22 22D 22

00 00 11 22 22 22 22C 22

00 11 11 22 22 22 33A 33

00 11 22 22 33 33 33B 44

00 11 22 22 33 33

A



A B C B D B

B

A 44 44

Time = Θ(mn).Reconstruct LCS by tracing backwards.

0A

4

0B

B1

C

C

2B

B

3

A

A

D1

A2

D

3

B

4Space = Θ(mn).Exercise:O(minm, n).



LECTURE 13Graph algorithms• Graph representation• Minimum spanning treesGreedy algorithms• Optimal substructure• Greedy choice• Prim’s greedy MST

algorithm


Graphs (review)Definition. A directed graph (digraph)G = (V, E) is an ordered pair consisting of• a set V of vertices (singular: vertex),• a set E ⊆ V × V of edges.In an undirected graph G = (V, E), the edge set E consists of unordered pairs of vertices.In either case, we have |E | = O(V 2). Moreover, if G is connected, then |E | ≥ |V | – 1, which implies that lg |E | = Θ(lgV). (Review CLRS, Appendix B.)


Adjacency-matrix representation

The adjacency matrix of a graph G = (V, E), where V = 1, 2, …, n, is the matrix A[1 . . n, 1 . . n]given by

A[i, j] = 1 if (i, j) ∈ E,0 if (i, j) ∉ E.


Adjacency-matrix representation

The adjacency matrix of a graph G = (V, E), where V = 1, 2, …, n, is the matrix A[1 . . n, 1 . . n]given by

A[i, j] = 1 if (i, j) ∈ E,0 if (i, j) ∉ E.

22 11

33 44

A 1 2 3 41234

0 1 1 00 0 1 00 0 0 00 0 1 0

Θ(V 2) storage ⇒ denserepresentation.


Adjacency-list representationAn adjacency list of a vertex v ∈ V is the list Adj[v]of vertices adjacent to v.

22 11

33 44

Adj[1] = 2, 3Adj[2] = 3Adj[3] = Adj[4] = 3



22 11

33 44

Adj[1] = 2, 3Adj[2] = 3Adj[3] = Adj[4] = 3

For undirected graphs, |Adj[v] | = degree(v).For digraphs, |Adj[v] | = out-degree(v).


Minimum spanning trees

Input: A connected, undirected graph G = (V, E)with weight function w : E → R.• For simplicity, assume that all edge weights are

distinct. (CLRS covers the general case.)


Minimum spanning trees

Input: A connected, undirected graph G = (V, E)with weight function w : E → R.• For simplicity, assume that all edge weights are

distinct. (CLRS covers the general case.)

∑∈

=Tvu

vuwTw),(

),()( .

Output: A spanning tree T — a tree that connects all vertices — of minimum weight:


Example of MST

6 125

14

3

8

10

15

9

7


Optimal substructureMST T:

(Other edges of Gare not shown.)


u

vRemove any edge (u, v) ∈ T.




u

vRemove any edge (u, v) ∈ T. Remove any edge (u, v) ∈ T. Then, T is partitioned into two subtrees T1 and T2.

T1

T2u

v




u

vRemove any edge (u, v) ∈ T. Remove any edge (u, v) ∈ T. Then, T is partitioned into two subtrees T1 and T2.

T1

T2u

v



Theorem. The subtree T1 is an MST of G1 = (V1, E1), the subgraph of G induced by the vertices of T1:

V1 = vertices of T1,E1 = (x, y) ∈ E : x, y ∈ V1 .

Similarly for T2.


Proof of optimal substructure

w(T) = w(u, v) + w(T1) + w(T2).Proof. Cut and paste:

If T1′were a lower-weight spanning tree than T1 for G1, then T ′ = (u, v) ∪ T1′ ∪ T2 would be a lower-weight spanning tree than T for G.





Do we also have overlapping subproblems?•Yes.





Great, then dynamic programming may work!•Yes, but MST exhibits another powerful property which leads to an even more efficient algorithm.

Do we also have overlapping subproblems?•Yes.


Hallmark for “greedy” algorithms

Greedy-choice propertyA locally optimal choice

is globally optimal.


Hallmark for “greedy” algorithms

Greedy-choice propertyA locally optimal choice

is globally optimal.

Theorem. Let T be the MST of G = (V, E), and let A ⊆ V. Suppose that (u, v) ∈ E is the least-weight edge connecting A to V – A. Then, (u, v) ∈ T.


Proof of theoremProof. Suppose (u, v) ∉ T. Cut and paste.

∈ A∈ V – A

T:

u

v

(u, v) = least-weight edge connecting A to V – A



∈ A∈ V – A

T:

u

Consider the unique simple path from u to v in T.

(u, v) = least-weight edge connecting A to V – A

v



∈ A∈ V – A

T:

u(u, v) = least-weight edge connecting A to V – A

v

Consider the unique simple path from u to v in T. Swap (u, v) with the first edge on this path that connects a vertex in A to a vertex in V – A.



∈ A∈ V – A

T ′:

u(u, v) = least-weight edge connecting A to V – A

v

Consider the unique simple path from u to v in T. Swap (u, v) with the first edge on this path that connects a vertex in A to a vertex in V – A.A lighter-weight spanning tree than T results.


Prim’s algorithmIDEA: Maintain V – A as a priority queue Q. Key each vertex in Q with the weight of the least-weight edge connecting it to a vertex in A.Q ← Vkey[v] ←∞ for all v ∈ Vkey[s] ← 0 for some arbitrary s ∈ Vwhile Q ≠ ∅

do u ← EXTRACT-MIN(Q)for each v ∈ Adj[u]

do if v ∈ Q and w(u, v) < key[v]then key[v] ← w(u, v) ⊳ DECREASE-KEY

π[v] ← u

At the end, (v, π[v]) forms the MST.


Example of Prim’s algorithm

∈ A∈ V – A

∞∞

∞∞ ∞∞

∞∞ 00

∞∞

∞∞

∞∞

6 125

14

3

8

10

15

9

7



∈ A∈ V – A

∞∞

∞∞ ∞∞

∞∞ 00

∞∞

∞∞

∞∞

6 125

14

3

8

10

15

9

7



∈ A∈ V – A

∞∞

∞∞ 77

∞∞ 00

1010

∞∞

1515

6 125

14

3

8

10

15

9

7



∈ A∈ V – A

1212

55 77

∞∞ 00

1010

99

1515

6 125

14

3

8

10

15

9

7



∈ A∈ V – A

66

55 77

1414 00

88

99

1515

6 125

14

3

8

10

15

9

7



∈ A∈ V – A

66

55 77

33 00

88

99

1515

6 125

14

3

8

10

15

9

7


Q ← Vkey[v] ←∞ for all v ∈ Vkey[s] ← 0 for some arbitrary s ∈ Vwhile Q ≠ ∅


do if v ∈ Q and w(u, v) < key[v]then key[v] ← w(u, v)

π[v] ← u

Analysis of Prim





π[v] ← u

Analysis of Prim

Θ(V)total





π[v] ← u

Analysis of Prim

|V |times

Θ(V)total





π[v] ← u

Analysis of Prim

degree(u)times

|V |times

Θ(V)total


Handshaking Lemma ⇒Θ(E) implicit DECREASE-KEY’s.




π[v] ← u

Analysis of Prim

degree(u)times

|V |times

Θ(V)total






π[v] ← u

Analysis of Prim

degree(u)times

|V |times

Θ(V)total

Time = Θ(V)·TEXTRACT-MIN + Θ(E)·TDECREASE-KEY


Analysis of Prim (continued)





Q TEXTRACT-MIN TDECREASE-KEY Total





array O(V) O(1) O(V2)





array O(V) O(1) O(V2)binary heap O(lg V) O(lg V) O(E lg V)






Fibonacci heap

O(lg V)amortized

O(1)amortized

O(E + V lg V)worst case


MST algorithms

Kruskal’s algorithm (see CLRS):• Uses the disjoint-set data structure (Lecture 10).• Running time = O(E lg V).


MST algorithms

Kruskal’s algorithm (see CLRS):• Uses the disjoint-set data structure (Lecture 10).• Running time = O(E lg V).

Best to date:• Karger, Klein, and Tarjan [1993].• Randomized algorithm.• O(V + E) expected time.



LECTURE 14Shortest Paths I• Properties of shortest paths• Dijkstra’s algorithm• Correctness• Analysis• Breadth-first search

Introduction to Algorithms November 1, 2004 L14.2© 2001–4 by Charles E. Leiserson

Paths in graphsConsider a digraph G = (V, E) with edge-weight function w : E → R. The weight of path p = v1 →v2 →L→ vk is defined to be

∑−

=+=

1

11),()(

k

iii vvwpw .


Paths in graphsConsider a digraph G = (V, E) with edge-weight function w : E → R. The weight of path p = v1 →v2 →L→ vk is defined to be

∑−

=+=

1

11),()(

k

iii vvwpw .

v1v1

v2v2

v3v3

v4v4

v5v54 –2 –5 1

Example:

w(p) = –2


Shortest paths

A shortest path from u to v is a path of minimum weight from u to v. The shortest-path weight from u to v is defined asδ(u, v) = minw(p) : p is a path from u to v.

Note: δ(u, v) = ∞ if no path from u to v exists.


Optimal substructure

Theorem. A subpath of a shortest path is a shortest path.




Proof. Cut and paste:


Triangle inequality

Theorem. For all u, v, x ∈ V, we haveδ(u, v) ≤ δ(u, x) + δ(x, v).


Triangle inequality

Theorem. For all u, v, x ∈ V, we haveδ(u, v) ≤ δ(u, x) + δ(x, v).

uu

Proof.

xx

vvδ(u, v)

δ(u, x) δ(x, v)


Well-definedness of shortest paths

If a graph G contains a negative-weight cycle, then some shortest paths may not exist.


Well-definedness of shortest paths

If a graph G contains a negative-weight cycle, then some shortest paths may not exist.

Example:

uu vv

…

< 0


Single-source shortest pathsProblem. From a given source vertex s ∈ V, find the shortest-path weights δ(s, v) for all v ∈ V.If all edge weights w(u, v) are nonnegative, all shortest-path weights must exist. IDEA: Greedy.1. Maintain a set S of vertices whose shortest-

path distances from s are known.2. At each step add to S the vertex v ∈ V – S

whose distance estimate from s is minimal.3. Update the distance estimates of vertices

adjacent to v.


Dijkstra’s algorithmd[s] ← 0for each v ∈ V – s

do d[v] ←∞S ←∅Q ← V ⊳ Q is a priority queue maintaining V – S



do d[v] ←∞S ←∅Q ← V ⊳ Q is a priority queue maintaining V – Swhile Q ≠ ∅

do u ← EXTRACT-MIN(Q)S ← S ∪ ufor each v ∈ Adj[u]

do if d[v] > d[u] + w(u, v)then d[v] ← d[u] + w(u, v)



do d[v] ←∞S ←∅Q ← V ⊳ Q is a priority queue maintaining V – Swhile Q ≠ ∅



relaxation step

Implicit DECREASE-KEY


Example of Dijkstra’s algorithm

AA

BB DD

CC EE

10

3

1 4 7 98

2

2

Graph with nonnegative edge weights:



AA

BB DD

CC EE

10

3

1 4 7 98

2

2

Initialize:

A B C D EQ:0 ∞ ∞ ∞ ∞

S:

0

∞

∞ ∞

∞



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A

0

∞

∞ ∞

∞“A” ← EXTRACT-MIN(Q):



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A

0

10

3 ∞

∞

10 3

Relax all edges leaving A:

∞ ∞



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C

0

10

3 ∞

∞

10 3

“C” ← EXTRACT-MIN(Q):

∞ ∞



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C

0

7

3 5

11

10 37 11 5

Relax all edges leaving C:

∞ ∞



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C, E

0

7

3 5

11

10 37 11 5

“E” ← EXTRACT-MIN(Q):

∞ ∞



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C, E

0

7

3 5

11

10 3 ∞ ∞7 11 57 11

Relax all edges leaving E:



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C, E, B

0

7

3 5

11

10 3 ∞ ∞7 11 57 11

“B” ← EXTRACT-MIN(Q):



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C, E, B

0

7

3 5

9

10 3 ∞ ∞7 11 57 11

Relax all edges leaving B:

9



AA

BB DD

CC EE

10

3

1 4 7 98

2

2A B C D EQ:0 ∞ ∞ ∞ ∞

S: A, C, E, B, D

0

7

3 5

9

10 3 ∞ ∞7 11 57 11

9

“D” ← EXTRACT-MIN(Q):


Correctness — Part ILemma. Initializing d[s] ← 0 and d[v] ←∞ for all v ∈ V – s establishes d[v] ≥ δ(s, v) for all v ∈ V, and this invariant is maintained over any sequence of relaxation steps.


Correctness — Part ILemma. Initializing d[s] ← 0 and d[v] ←∞ for all v ∈ V – s establishes d[v] ≥ δ(s, v) for all v ∈ V, and this invariant is maintained over any sequence of relaxation steps.Proof. Suppose not. Let v be the first vertex for which d[v] < δ(s, v), and let u be the vertex that caused d[v] to change: d[v] = d[u] + w(u, v). Then,

d[v] < δ(s, v) supposition≤ δ(s, u) + δ(u, v) triangle inequality≤ δ(s,u) + w(u, v) sh. path ≤ specific path≤ d[u] + w(u, v) v is first violation

Contradiction.


Correctness — Part IILemma. Let u be v’s predecessor on a shortest path from s to v. Then, if d[u] = δ(s, u) and edge (u, v) is relaxed, we have d[v] = δ(s, v) after the relaxation.


Correctness — Part IILemma. Let u be v’s predecessor on a shortest path from s to v. Then, if d[u] = δ(s, u) and edge (u, v) is relaxed, we have d[v] = δ(s, v) after the relaxation.Proof. Observe that δ(s, v) = δ(s, u) + w(u, v). Suppose that d[v] > δ(s, v) before the relaxation. (Otherwise, we’re done.) Then, the test d[v] > d[u] + w(u, v) succeeds, because d[v] > δ(s, v) = δ(s, u) + w(u, v) = d[u] + w(u, v), and the algorithm sets d[v] = d[u] + w(u, v) = δ(s, v).


Correctness — Part IIITheorem. Dijkstra’s algorithm terminates with d[v] = δ(s, v) for all v ∈ V.


Correctness — Part IIITheorem. Dijkstra’s algorithm terminates with d[v] = δ(s, v) for all v ∈ V.Proof. It suffices to show that d[v] = δ(s, v) for every v ∈ V when v is added to S. Suppose u is the first vertex added to S for which d[u] > δ(s, u). Let y be the first vertex in V – S along a shortest path from s to u, and let x be its predecessor:

ss xx yy

uu

S, just before adding u.


Correctness — Part III (continued)

Since u is the first vertex violating the claimed invariant, we have d[x] = δ(s, x). When x was added to S, the edge (x, y) was relaxed, which implies that d[y] = δ(s, y) ≤ δ(s, u) < d[u]. But, d[u] ≤ d[y] by our choice of u. Contradiction.

ss xx yy

uuS


Analysis of Dijkstrawhile Q ≠ ∅




Analysis of Dijkstra

|V |times

while Q ≠ ∅do u ← EXTRACT-MIN(Q)

S ← S ∪ ufor each v ∈ Adj[u]




degree(u)times

|V |times






degree(u)times

|V |times

Handshaking Lemma ⇒Θ(E) implicit DECREASE-KEY’s.Time = Θ(V·TEXTRACT-MIN + E·TDECREASE-KEY)

Note: Same formula as in the analysis of Prim’s minimum spanning tree algorithm.





Analysis of Dijkstra (continued)







array O(V) O(1) O(V2)






Fibonacci heap

O(lg V)amortized

O(1)amortized

O(E + V lg V)worst case


Unweighted graphsSuppose that w(u, v) = 1 for all (u, v) ∈ E. Can Dijkstra’s algorithm be improved?


Unweighted graphs

• Use a simple FIFO queue instead of a priority queue.

Suppose that w(u, v) = 1 for all (u, v) ∈ E. Can Dijkstra’s algorithm be improved?


Unweighted graphs

while Q ≠ ∅do u ← DEQUEUE(Q)

for each v ∈ Adj[u]do if d[v] = ∞

then d[v] ← d[u] + 1ENQUEUE(Q, v)


Breadth-first search



Unweighted graphs





Analysis: Time = O(V + E).

Breadth-first search



Example of breadth-first search

aa

bb

cc

dd

eegg

ii

ff hh

Q:



aa

bb

cc

dd

eegg

ii

ff hh

Q: a

0

0



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d

0

1

1

1 1



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e

0

1

1

2 2

1 2 2



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e

0

1

1

2 2

2 2



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e

0

1

1

2 2

2



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e g i

0

1

1

2 2

3

3

3 3



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e g i f

0

1

1

2 2

3

3

4

3 4



aa

bb

cc

dd

eegg

ii

ff hh

Q: a b d c e g i f h

0

1

1

2 2

3

3

4 4

4 4



aa

bb

cc

dd

eegg

ii

ff hh


0

1

1

2 2

3

3

4 4

4



aa

bb

cc

dd

eegg

ii

ff hh


0

1

1

2 2

3

3

4 4


Correctness of BFS

Key idea:The FIFO Q in breadth-first search mimics the priority queue Q in Dijkstra.• Invariant: v comes after u in Q implies that

d[v] = d[u] or d[v] = d[u] + 1.






LECTURE 15Shortest Paths II• Bellman-Ford algorithm• DAG shortest paths• Linear programming and

difference constraints• VLSI layout compaction


Negative-weight cyclesRecall: If a graph G = (V, E) contains a negative-weight cycle, then some shortest paths may not exist.Example:

uu vv

…

< 0


Negative-weight cyclesRecall: If a graph G = (V, E) contains a negative-weight cycle, then some shortest paths may not exist.Example:

uu vv

…

< 0

Bellman-Ford algorithm: Finds all shortest-path lengths from a source s ∈ V to all v ∈ V or determines that a negative-weight cycle exists.


Bellman-Ford algorithmd[s] ← 0for each v ∈ V – s

do d[v] ←∞

for i ← 1 to |V| – 1do for each edge (u, v) ∈ E


for each edge (u, v) ∈ Edo if d[v] > d[u] + w(u, v)

then report that a negative-weight cycle exists

initialization

At the end, d[v] = δ(s, v), if no negative-weight cycles. Time = O(VE).

relaxation step


Example of Bellman-Ford

AA

BB

EE

CC DD

–1

4

12

–3

2

5

3



AA

BB

EE

CC DD

–1

4

12

–3

2

5

3

∞

0 ∞

∞ ∞

Initialization.



AA

BB

EE

CC DD

–1

4

12

–3

2

5

3

∞

0 ∞

∞ ∞

1

2

34

5

7

8

Order of edge relaxation.

6



AA

BB

EE

CC DD

–1

4

12

–3

2

5

3

∞

0 ∞

∞ ∞

1

2

34

5

7

8

6


∞−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞ ∞

1

2

34

5

7

8

6


∞4

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞

1

2

34

5

7

8

6


4

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞

1

2

34

5

7

8

6


42

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞

1

2

34

5

7

8

6


2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞

1

2

34

5

7

8

6


2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30 ∞

∞

1

2

34

5

7

8

End of pass 1.

6


∞1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

∞

1

2

34

5

7

8

6


1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

∞

1

2

34

5

7

8

6


∞1

1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

1

2

34

5

7

8

6


1

1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

1

2

34

5

7

8

6


1−2

1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

1

2

34

5

7

8

6


−2

1

2

−1


AA

BB

EE

CC DD

–1

4

12

–3

2

5

30

1

2

34

5

7

8

6

End of pass 2 (and 3 and 4).


CorrectnessTheorem. If G = (V, E) contains no negative-weight cycles, then after the Bellman-Ford algorithm executes, d[v] = δ(s, v) for all v ∈ V.


CorrectnessTheorem. If G = (V, E) contains no negative-weight cycles, then after the Bellman-Ford algorithm executes, d[v] = δ(s, v) for all v ∈ V. Proof. Let v ∈ V be any vertex, and consider a shortest path p from s to v with the minimum number of edges.

v1v1

v2v2

v3v3 vk

vkv0v0

…s

v

p:

Since p is a shortest path, we haveδ(s, vi) = δ(s, vi–1) + w(vi–1, vi) .


Correctness (continued)

v1v1

v2v2

v3v3 vk

vkv0v0

…s

v

p:

Initially, d[v0] = 0 = δ(s, v0), and d[v0] is unchanged by subsequent relaxations (because of the lemma from Lecture 14 that d[v] ≥ δ(s, v)).• After 1 pass through E, we have d[v1] = δ(s, v1).• After 2 passes through E, we have d[v2] = δ(s, v2).M

• After k passes through E, we have d[vk] = δ(s, vk).Since G contains no negative-weight cycles, p is simple. Longest simple path has ≤ |V| – 1 edges.


Detection of negative-weight cycles

Corollary. If a value d[v] fails to converge after |V| – 1 passes, there exists a negative-weight cycle in G reachable from s.


Linear programming

Let A be an m×n matrix, b be an m-vector, and cbe an n-vector. Find an n-vector x that maximizes cTx subject to Ax ≤ b, or determine that no such solution exists.

. ≤ .maximizingm

n

A x ≤ b cT x


Linear-programming algorithms

Algorithms for the general problem• Simplex methods — practical, but worst-case

exponential time.• Interior-point methods — polynomial time and

competes with simplex.


Linear-programming algorithms

Algorithms for the general problem• Simplex methods — practical, but worst-case

exponential time.• Interior-point methods — polynomial time and

competes with simplex.

Feasibility problem: No optimization criterion. Just find x such that Ax ≤ b.• In general, just as hard as ordinary LP.


Solving a system of difference constraints

Linear programming where each row of A contains exactly one 1, one –1, and the rest 0’s. Example:

x1 – x2 ≤ 3x2 – x3 ≤ –2x1 – x3 ≤ 2

xj – xi ≤ wij




x1 – x2 ≤ 3x2 – x3 ≤ –2x1 – x3 ≤ 2

xj – xi ≤ wij

Solution:x1 = 3x2 = 0x3 = 2




x1 – x2 ≤ 3x2 – x3 ≤ –2x1 – x3 ≤ 2

xj – xi ≤ wij

Solution:x1 = 3x2 = 0x3 = 2

Constraint graph:

vjvjvi

vixj – xi ≤ wijwij

(The “A”matrix has dimensions|E | × |V |.)


Unsatisfiable constraintsTheorem. If the constraint graph contains a negative-weight cycle, then the system of differences is unsatisfiable.


Unsatisfiable constraintsTheorem. If the constraint graph contains a negative-weight cycle, then the system of differences is unsatisfiable.Proof. Suppose that the negative-weight cycle is v1 → v2 →L→ vk → v1. Then, we have

x2 – x1 ≤ w12x3 – x2 ≤ w23

Mxk – xk–1 ≤ wk–1, kx1 – xk ≤ wk1


Unsatisfiable constraintsTheorem. If the constraint graph contains a negative-weight cycle, then the system of differences is unsatisfiable.Proof. Suppose that the negative-weight cycle is v1 → v2 →L→ vk → v1. Then, we have

x2 – x1 ≤ w12x3 – x2 ≤ w23

Mxk – xk–1 ≤ wk–1, kx1 – xk ≤ wk1

Therefore, no values for the xican satisfy the constraints.

0 ≤ weight of cycle< 0


Satisfying the constraintsTheorem. Suppose no negative-weight cycle exists in the constraint graph. Then, the constraints are satisfiable.


Satisfying the constraintsTheorem. Suppose no negative-weight cycle exists in the constraint graph. Then, the constraints are satisfiable.Proof. Add a new vertex s to V with a 0-weight edge to each vertex vi ∈ V.

v1v1

v4v4

v7v7

v9v9

v3v3


Satisfying the constraintsTheorem. Suppose no negative-weight cycle exists in the constraint graph. Then, the constraints are satisfiable.Proof. Add a new vertex s to V with a 0-weight edge to each vertex vi ∈ V.

v1v1

v4v4

v7v7

v9v9

v3v3

s

0 Note:No negative-weight cycles introduced ⇒shortest paths exist.


The triangle inequality gives us δ(s,vj) ≤ δ(s, vi) + wij. Since xi = δ(s, vi) and xj = δ(s, vj), the constraint xj – xi≤ wij is satisfied.

Proof (continued)Claim: The assignment xi = δ(s, vi) solves the constraints.

ss

vjvj

vivi

δ(s, vi)

δ(s, vj) wij

Consider any constraint xj – xi ≤ wij, and consider the shortest paths from s to vj and vi:


Bellman-Ford and linear programming

Corollary. The Bellman-Ford algorithm can solve a system of m difference constraints on nvariables in O(mn) time. Single-source shortest paths is a simple LP problem.In fact, Bellman-Ford maximizes x1 + x2 + L + xnsubject to the constraints xj – xi ≤ wij and xi ≤ 0(exercise).Bellman-Ford also minimizes maxixi – minixi(exercise).


Application to VLSI layout compaction

Integrated-circuit features:

Problem: Compact (in one dimension) the space between the features of a VLSI layout without bringing any features too close together.

minimum separation λ


VLSI layout compaction

11

x1 x2

2

d1

Constraint: x2 – x1 ≥ d1 + λBellman-Ford minimizes maxixi – minixi, which compacts the layout in the x-dimension.



LECTURE 16Shortest Paths III• All-pairs shortest paths• Matrix-multiplication

algorithm• Floyd-Warshall algorithm• Johnson’s algorithm


Shortest pathsSingle-source shortest paths• Nonnegative edge weights

Dijkstra’s algorithm: O(E + V lg V)• General

Bellman-Ford algorithm: O(VE)• DAG

One pass of Bellman-Ford: O(V + E)


Shortest pathsSingle-source shortest paths• Nonnegative edge weights

Dijkstra’s algorithm: O(E + V lg V)• General

Bellman-Ford: O(VE)• DAG

One pass of Bellman-Ford: O(V + E)All-pairs shortest paths• Nonnegative edge weights

Dijkstra’s algorithm |V| times: O(VE + V 2 lg V)• General

Three algorithms today.


All-pairs shortest paths

Input: Digraph G = (V, E), where V = 1, 2, …, n, with edge-weight function w : E → R.Output: n × n matrix of shortest-path lengths δ(i, j) for all i, j ∈ V.


All-pairs shortest paths

Input: Digraph G = (V, E), where V = 1, 2, …, n, with edge-weight function w : E → R.Output: n × n matrix of shortest-path lengths δ(i, j) for all i, j ∈ V.IDEA:• Run Bellman-Ford once from each vertex.• Time = O(V 2E).• Dense graph (n2 edges) ⇒Θ(n 4) time in the

worst case.Good first try!


Dynamic programmingConsider the n × n adjacency matrix A = (aij)of the digraph, and define

dij(0) = 0 if i = j,

∞ if i ≠ j;

Claim: We have

and for m = 1, 2, …, n – 1,dij

(m) = minkdik(m–1) + akj .

dij(m) = weight of a shortest path from

i to j that uses at most m edges.


Proof of claimdij

(m) = minkdik(m–1) + akj

ii jjiM

k’s

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges


Proof of claimdij


ii jjiM

k’s

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges

Relaxation!for k ← 1 to n

do if dij > dik + akjthen dij ← dik + akj


Proof of claimdij


ii jjiM

k’s

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges

≤ m – 1 edges

Relaxation!for k ← 1 to n

do if dij > dik + akjthen dij ← dik + akj

Note: No negative-weight cycles impliesδ(i, j) = dij

(n–1) = dij (n) = dij

(n+1) = L


Matrix multiplicationCompute C = A · B, where C, A, and B are n × nmatrices:

∑=

=n

kkjikij bac

1.

Time = Θ(n3) using the standard algorithm.



∑=

=n

kkjikij bac

1.

Time = Θ(n3) using the standard algorithm.What if we map “+” → “min” and “·” → “+”?



∑=

=n

kkjikij bac

1.

Time = Θ(n3) using the standard algorithm.What if we map “+” → “min” and “·” → “+”?

cij = mink aik + bkj.Thus, D(m) = D(m–1) “×” A.

Identity matrix = I =

∞∞∞∞∞∞∞∞∞∞∞∞

00

00

= D0 = (dij(0)).


Matrix multiplication (continued)

The (min, +) multiplication is associative, and with the real numbers, it forms an algebraic structure called a closed semiring.Consequently, we can compute

D(1) = D(0) · A = A1

D(2) = D(1) · A = A2

M MD(n–1) = D(n–2) · A = An–1 ,

yielding D(n–1) = (δ(i, j)).Time = Θ(n·n3) = Θ(n4). No better than n × B-F.


Improved matrix multiplication algorithm

Repeated squaring: A2k = Ak × Ak.Compute A2, A4, …, A2lg(n–1) .

O(lg n) squarings

Time = Θ(n3 lg n).

To detect negative-weight cycles, check the diagonal for negative values in O(n) additional time.

Note: An–1 = An = An+1 = L.


Floyd-Warshall algorithm

Also dynamic programming, but faster!

Define cij(k) = weight of a shortest path from i

to j with intermediate vertices belonging to the set 1, 2, …, k.

ii ≤ k≤ k ≤ k≤ k ≤ k≤ k ≤ k≤ k jj

Thus, δ(i, j) = cij(n). Also, cij

(0) = aij .


Floyd-Warshall recurrencecij

(k) = min cij(k–1), cik

(k–1) + ckj(k–1)

ii jj

k

icij

(k–1)

cik(k–1) ckj

(k–1)

intermediate vertices in 1, 2, …, k


Pseudocode for Floyd-Warshall

for k ← 1 to ndo for i ← 1 to n

do for j ← 1 to ndo if cij > cik + ckj

then cij ← cik + ckjrelaxation

Notes:• Okay to omit superscripts, since extra relaxations

can’t hurt.• Runs in Θ(n3) time.• Simple to code.• Efficient in practice.


Transitive closure of a directed graph

Compute tij = 1 if there exists a path from i to j,0 otherwise.

IDEA: Use Floyd-Warshall, but with (∨, ∧) instead of (min, +):

tij(k) = tij(k–1) ∨ (tik(k–1) ∧ tkj(k–1)).

Time = Θ(n3).


Graph reweightingTheorem. Given a function h : V → R, reweight each edge (u, v) ∈ E by wh(u, v) = w(u, v) + h(u) – h(v). Then, for any two vertices, all paths between them are reweighted by the same amount.


Graph reweighting

Proof. Let p = v1 → v2 →L→ vk be a path in G. We have

( )

)()()(

)()(),(

)()(),(

),()(

1

1

1

11

1

111

1

11

k

k

k

iii

k

iiiii

k

iiihh

vhvhpw

vhvhvvw

vhvhvvw

vvwpw

−+=

−+=

−+=

=

∑

∑

∑

−

=+

−

=++

−

=+

.

Theorem. Given a function h : V → R, reweight each edge (u, v) ∈ E by wh(u, v) = w(u, v) + h(u) – h(v). Then, for any two vertices, all paths between them are reweighted by the same amount.

Sameamount!


Shortest paths in reweightedgraphs

Corollary. δh(u, v) = δ(u, v) + h(u) – h(v).


Shortest paths in reweightedgraphs

Corollary. δh(u, v) = δ(u, v) + h(u) – h(v).

IDEA: Find a function h : V → R such that wh(u, v) ≥ 0 for all (u, v) ∈ E. Then, run Dijkstra’s algorithm from each vertex on the reweighted graph. NOTE: wh(u, v) ≥ 0 iff h(v) – h(u) ≤ w(u, v).


Johnson’s algorithm1. Find a function h : V → R such that wh(u, v) ≥ 0 for

all (u, v) ∈ E by using Bellman-Ford to solve the difference constraints h(v) – h(u) ≤ w(u, v), or determine that a negative-weight cycle exists.• Time = O(VE).

2. Run Dijkstra’s algorithm using wh from each vertex u ∈ V to compute δh(u, v) for all v ∈ V.• Time = O(VE + V 2 lg V).

3. For each (u, v) ∈ V × V, computeδ(u, v) = δh(u, v) – h(u) + h(v) .

• Time = O(V 2).Total time = O(VE + V 2 lg V).

Introduction to Algorithms6.046J/18.401


Today

• We have seen algorithms for:– “numerical” data (sorting,

median) – graphs (shortest path, MST)

• Today and the next lecture: algorithms for geometric data

© 2003 by Piotr Indyk Introduction to Algorithms November 10, 2004 L17.2

Computational Geometry

• Algorithms for geometric problems• Applications: CAD, GIS, computer

vision,……. • E.g., the closest pair problem:

– Given: a set of points P=p1…pn in the plane, such that pi=(xi,yi)

– Goal: find a pair pi ≠pj that minimizes ||pi – pj|| ||p-q||= [(px-qx)2+(py-qy)2]1/2

• We will see more examples in the nextlecture


Closest Pair

• Find a closest pair among p1…pn

• Easy to do in O(n2) time – For all pi ≠pj, compute ||pi – pj|| and

choose the minimum • We will aim for O(n log n) time


Divide and conquer

• Divide: – Compute the median of

x-coordinates – Split the points into PL

and PR, each of size n/2 • Conquer: compute the

closest pairs for PL and PR

• Combine the results (the hard part)


Combine2d

• Let d=min(d1,d2) • Observe:

– Need to check only pairs which cross the dividingline

– Only interested in pairs within distance < d

• Suffices to look at points in the2d-width strip around themedian line

d1

d2


Scanning the strip

• Sort all points in the strip by their y-coordinates, forming q1…qk, k ≤ n.

dd

d• Let yi be the y-coordinate of qi d• dmin= d

d

• For i=1 to k– j=i-1 – While yi-yj < d

• If ||qi–qj||<d then dmin=||qi–qj|| • j:=j-1

• Report dmin (and the corresponding pair)


Analysis

• Correctness: easy• Running time is more

involved • Can we have many qj’s

that are within distance d from qi ?

• No • Proof by packing

argument

d


Analysis, ctd.

Theorem: there are at most 7 qj’s such that yi-yj ≤ d.

Proof:• Each such qj must lie either in

the left or in the right d × d square

• Within each square, all pointshave distance distance ≥ d from others

• We can pack at most 4 such points into one square, so wehave 8 points total (incl. qi)

qi


Packing bound

• Proving “4” is not easy• Will prove “5”

– Draw a disk of radius d/2around each point

– Disks are disjoint– The disk-square intersection

has area ≥ π (d/2)2/4 = π/16 d2

– The square has area d2

– Can pack at most 16/π ≈ 5.1 points


Running time

• Divide: O(n) • Combine: O(n log n) because we sort by y• However, we can:

– Sort all points by y at the beginning– Divide preserves the y-order of points Then combine takes only O(n)

• We get T(n)=2T(n/2)+O(n), soT(n)=O(n log n)


Close pair

• Given: P=p1…pn • Goal: check if there is any pair pi ≠pj within

distance R from each other • Will give an O(n) time algorithm, using…

…radix sort ! (assuming coordinates are small integers)


Algorithm

• Impose a square grid onto the plane, where each cell is an R × R square

• Put each point into a bucket corresponding to the cell itbelongs to. That is:

– For each point p=(x,y), create computes its bucketID b(p)=( x/R , y/R )

– Radix sort all b(p) ’s – Each sequence of the same b(p) forms a bucket

• If there is a bucket with > 4 points in it, answer YESand exit

• Otherwise, for each p∈P:– Let c =b(p)– Let C be the set of bucket IDs of the 8 cells

adjacent to c – For all points q from buckets in C ∪c

• If ||p-q||≤R, then answer YES and exit • Answer NO

(1,1), (1,2), (1,2), (2,1), (2,2), (2,2), (2,3), (3,1), (3,2) © 2003 by Piotr Indyk Introduction to Algorithms November 10, 2004 L17.13

Bucket access

• Given a bucket ID c, how can we quickly retrieve all points p such that b(p)=c ?

• This is exactly the dictionary problem (Lecture 7)

• E.g., we can use hashing.


Analysis

• Running time: – Putting points into the buckets: O(n) time – Checking if there is a heavy bucket: O(n) – Checking the cells: 9 × 4 × n = O(n)

• Overall: linear time


Computational Model

• In the two lectures, we assume that – The input (e.g., point coordinates) are real numbers – We can perform (natural) operations on them in

constant time, with perfect precision • Advantage: simplicity • Drawbacks: highly non-trivial issues:

– Theoretical: if we allow arbitrary operations on reals, we can compress n numbers into a one number

– Practical: algorithm designed for infinite precision sometimes fail on real computers


Introduction to Algorithms6.046J/18.401


Computational Geometry ctd.

• Segment intersection problem:– Given: a set of n distinct

segments s1…sn, represented by coordinates of endpoints

– Detection: detect if there is any pair si ≠ sj that intersects

– Reporting: report all pairs of intersecting segments


Segment intersection

• Easy to solve in O(n2) time • Is it possible to get a better algorithm

for the reporting problem ? • NO (in the worst-case)• However:

– We will see we can do better for the detection problem

– Moreover, the number of intersections P is usually small. Then, we would like an outputsensitive algorithm, whoserunning time is low if P is small.


Result

• We will show: – O(n log n) time for detection – O( (n +P) log n) time for reporting

• We will use …

… (no, not divide and conquer)… Binary Search Trees

• Specifically: Line sweep approach


Orthogonal segmentsV-segment

• All segments are either horizontal or vertical

H-segment • Assumption: all coordinates are distinct

• Therefore, only vertical-horizontal intersections exist


Orthogonal segments• Sweep line:

– A vertical line sweeps theplane from left to right

– It “stops” at all “important” x-coordinates, i.e., when ithits a V-segment orendpoints of an H-segment

– Invariant: all intersections on the left side of the sweep linehave been already reported


Orthogonal segments ctd.

• We maintain sorted y-coordinates of H-segmentscurrently intersected by thesweep line (using a balanced 17BST V)

• When we hit the left point of 12

an H-segment, we add its y-coordinate to V

• When we hit the right point ofan H-segment, we delete its y-coordinate from V


Orthogonal segments ctd.ytop

y

• Whenever we hit a V-segment having coord.

top, ybot), we report all 17H-segments in V with y-

coordinates in [ytop, ybot] ybot

12


Algorithm

• Sort all V-segments and endpoints of H-segments by their x-coordinates – this gives the “trajectory” of the sweep line

• Scan the elements in the sorted list: – Left endpoint: add segment to tree V – Right endpoint: remove segment from V – V-segment: report intersections with the

H-segments stored in V


Analysis

• Sorting: O(n log n)• Add/delete H-segments to/from vertical data

structure V: – O(log n) per operation– O(n log n) total

• Processing V-segments: – O(log n) per intersection - SEE NEXT SLIDE – O(P log n) total

• Overall: O( (P+ n) log n) time • Can be improved to O(P +n log n)


Analyzing intersections

• Given: – A BST V containing y-coordinates – An interval I=[ybot,ytop]

• Goal: report all y’s in V that belong to I • Algorithm:

– y=Successor(ybot) – While y≤ytop

• Report y • y:=Successor(y)

– End • Time: (number of reported y’s)*O(log n) + O(log n)


The general case

• Assumption: all coordinates of endpoints and intersections distinct

• In particular: – No vertical segments – No three segments

intersect at one point


Sweep line

• Invariant (as before): all intersectionson the left of the sweep line havebeen already reported

• Stops at all “important” x-coordinates, i.e., when it hitsendpoints or intersections

• Do not know the intersections in advance !

• The list of intersection coordinates is constructed and maintained dynamically (in a “horizontal” data structure H)


Sweep line

• Also need to maintain the information about the segmentsintersecting the sweep line

• Cannot keep the values of y-coordinates of the segments !

• Instead, we will maintain theirorder .I.e., at any point, wemaintain all segments intersectingthe sweep line, sorted by the y-coordinates of the intersections

(in a “vertical” data structure V)


Algorithm

• Initialize the “vertical” BST V (to “empty”)• Initialize the “horizontal” priority queue H (to contain the

segments’ endpoints sorted by x-coordinates) • Repeat

– Take the next “event” p from H:// Update V– If p is the left endpoint of a segment, add the segment

to V – If p is the right endpoint of a segment, remove the

segment from V – If p is the intersection point of s and s’, swap the order

of s and s’ in V, report p


Algorithm ctd.

// Update H – For each new pair of neighbors s and s’ in V:

• Check if s and s’ intersect on the right side of the sweep line

• If so, add their intersection point to H • Remove the possible duplicates in H

• Until H is empty


Analysis

• Initializing H: O(n log n)• Updating V:

– O(log n) per operation – O( (P+n) log n) total

• Updating H: – O(log n) per intersection – O(P log n) total

• Overall: O( (P+ n) log n) time


Correctness

• All reported intersections are correct• Assume there is an intersection not reported. Let p=(x,y)

be the first such unreported intersection (of s and s’ )• Let x’ be the last event before p. Observe that:

– At time x’ segments s and s’ are neighbors on thesweep line

– Since no intersections were missed till then, V maintained the right order of intersecting segments

– Thus, s and s’ were neighbors in V at time x’. Thus,their intersection should have been detected


Changes

• Y’s – change the order



LECTURE 19Take-home exam • Instructions • Academic honesty • Strategies for doing well


Take-home quiz

The take-home quiz contains 5 problems worth25 points each, for a total of 125 points.

• 1 easy • 2 moderate• 1 hard • 1 very hard

© 2001–4 by Charles E. Leiserson Introduction to Algorithms November 17, 2004 L19.2

End of quiz

• Your exam is due between 10:00 and 11:00 A.M. on Monday, November 22,2004.

• Late exams will not be accepted unlessyou obtain a Dean’s Excuse or makeprior arrangements with yourrecitation instructor.

• You must hand in your own exam in person.


Planning

• The quiz should take you about 12 hours todo, but you have five days in which to do it.

• Plan your time wisely. Do not overwork,and get enough sleep.

• Ample partial credit will be given for goodsolutions, especially if they are well written.

• The better your asymptotic running-timebounds, the higher your score.

• Bonus points will be given for exceptionallyefficient or elegant solutions.


Format• Each problem should be answered on

a separate sheet (or sheets) of 3-holepunched paper.

• Mark the top of each problem with • your name, • 6.046J/18.410J, • the problem number, • your recitation time, • and your TA.


Executive summary

• Your solution to a problem should start witha topic paragraph that provides an executivesummary of your solution.

• This executive summary should describe • the problem you are solving, • the techniques you use to solve it, • any important assumptions you make, and • the running time your algorithm achieves.


Solutions• Write up your solutions cleanly and concisely

to maximize the chance that we understand them.

• Be explicit about running time and algorithms. • For example, don't just say you sort n numbers,

state that you are using heapsort, which sorts the n numbers in O(n lg n) time in the worst case.

• When describing an algorithm, give an Englishdescription of the main idea of the algorithm.

• Use pseudocode only if necessary to clarify your solution.


Solutions

• Give examples, and draw figures.• Provide succinct and convincing arguments

for the correctness of your solutions. • Do not regurgitate material presented in class. • Cite algorithms and theorems from CLRS,

lecture, and recitation to simplify yoursolutions.


Assumptions

• Part of the goal of this exam is to testengineering common sense.

• If you find that a question is unclear orambiguous, make reasonable assumptionsin order to solve the problem.

• State clearly in your write-up whatassumptions you have made.

• Be careful what you assume, however,because you will receive little credit if youmake a strong assumption that renders aproblem trivial.


Bugs, etc.• If you think that you’ve found a bug, please send

email. • Corrections and clarifications will be sent to the

class via email. • Check your email daily to avoid missing

potentially important announcements.• If you did not receive an email last night

reminding you about Quiz 2, then you are not onthe class email list. Please let your recitationinstructor know immediately.


Academic honesty

• This quiz is “limited open book.”• You may use

• your course notes, • the CLRS textbook, • lecture videos, • basic reference materials such as dictionaries,

and • any of the handouts posted on the server.

• No other sources whatsoever may be consulted!


Academic honesty

• For example, you may not use notes or solutionsfrom other times that this course or other related courses have been taught, or materials on the server.

• These materials will not help you, but you may notuse them anyhow.

• You may not communicate with any personexcept members of the 6.046 staff about anyaspect of the exam until after noon on Monday,November 22, even if you have already handedin your exam.


Academic honesty

• If at any time you feel that you may haveviolated this policy, it is imperative that youcontact the course staff immediately.

• It will be much the worse for you if third partiesdivulge your indiscretion.

• If you have any questions about what resourcesmay or may not be used during the quiz, sendemail.


Poll of 78 quiz takers

Question 1: “Did you cheat?”

• 76 — “No.” • 1 — “Yes.” • 1 — “Abstain.”


Poll of 78 quiz takers

Question 2: “How many people do you knowwho cheated?”

• 72 — “None.” • 2 — “3 people compared answers.”• 1 — “Suspect 2, but don’t know.”• 1 — “Either 0 or 2.” • 1 — “Abstain.” • 1 — “10” (the cheater).


Reread instructions

Please reread the exam instructions in their entirety at

least once a day during the exam.


Test-taking strategies

• Manage your time. • Manage your psyche. • Brainstorm. • Write-up early and often.


Manage your time

• Work on all problems the first day. • Budget time for write-ups and debugging.• Don’t get sucked into one problem at the

expense of others. • Replan your strategy every day.


Manage your psyche

• Get enough sleep.• Maintain a patient, persistent, and

positive attitude. • Use adrenaline productively. • Relax, and have fun.

• It’s not the end of the world!


Brainstorm• Get an upper bound, even if it is loose. • Look for analogies with problems you’ve seen.• Exploit special structure. • Solve a simpler problem. • Draw diagrams. • Contemplate. • Be wary of self-imposed constraints — think

“out of the box.” • Work out small examples, and abstract. • Understand things in two ways: sanity checks.


Write up early and often

• Write up partial solutions. • Groom your work every day. • Work on shortening and simplifying.• Provide an executive summary. • Ample partial credit will be given! • Unnecessarily long answers will be

penalized.


Positive attitude



LECTURE 20Network Flow I• Flow networks• Maximum-flow problem • Flow notation• Properties of flow • Cuts • Residual networks • Augmenting paths


Flow networksDefinition. A flow network is a directed graphG = (V, E) with two distinguished vertices: a source s and a sink t. Each edge (u, v) ∈ E has a nonnegative capacity c(u, v). If (u, v) ∉ E, then c(u, v) = 0.


Flow networksDefinition. A flow network is a directed graphG = (V, E) with two distinguished vertices: a source s and a sink t. Each edge (u, v) ∈ E has a nonnegative capacity c(u, v). If (u, v) ∉ E, then c(u, v) = 0.

Example:

ss tt

3 2

3

3 2

2 3

31

2

1


Flow networks

Definition. A positive flow on G is a function p : V × V → R satisfying the following: • Capacity constraint: For all u, v ∈ V,

0 ≤ p(u, v) ≤ c(u, v). • Flow conservation: For all u ∈ V – s, t,

∑ v u p ) − ∑ u v p ) = 0.( , ( ,v∈V v∈V


Flow networks

Definition. A positive flow on G is a function p : V × V → R satisfying the following: • Capacity constraint: For all u, v ∈ V,

0 ≤ p(u, v) ≤ c(u, v). • Flow conservation: For all u ∈ V – s, t,

∑ v u p ) − ∑ u v p ) = 0.( , ( , v∈V v∈V

The value of a flow is the net flow out of the source:

∑ v s p ) − ∑ s v p ) .( , ( , v∈V v∈V


A flow on a networkpositive capacityflow

1:1

2:2 1:3 2:3

ss 0:1 1:3 2:3 1:2 tt

2:2 1:22:3


A flow on a networkpositive capacityflow

1:1

2:2 1:3 2:3

ss 0:1 1:3 2:3 1:2 tt

2:2 u 2:3 1:2

Flow conservation (like Kirchoff’s current law):• Flow into u is 2 + 1 = 3. • Flow out of u is 0 + 1 + 2 = 3. The value of this flow is 1 – 0 + 2 = 3.


The maximum-flow problem

Maximum-flow problem: Given a flow network G, find a flow of maximum value on G.

2:2 2:3 2:3

ss 0:1 0:3 2:3 1:2 tt1:1

2:22:2 3:3

The value of the maximum flow is 4.© 2001–4 by Charles E. Leiserson Introduction to Algorithms November 24, 2004 L20.8

Flow cancellationWithout loss of generality, positive flow goes either from u to v, or from v to u, but not both.

vv

uu

2:3

vv

uu

1:31:2 0:2 Net flow from u to v in both cases is 1.

The capacity constraint and flow conservation are preserved by this transformation. INTUITION: View flow as a rate, not a quantity.


A notational simplification

IDEA: Work with the net flow between two vertices, rather than with the positive flow.Definition. A (net) flow on G is a function f : V × V → R satisfying the following: • Capacity constraint: For all u, v ∈ V,

f (u, v) ≤ c(u, v). • Flow conservation: For all u ∈ V – s, t,

,∑ f ( v u ) = 0. v∈V

• Skew symmetry: For all u, v ∈ V, f (u, v) = –f (v, u).


A notational simplification

IDEA: Work with the net flow between two vertices, rather than with the positive flow.Definition. A (net) flow on G is a function f : V × V → R satisfying the following: •

•

•

f (u, v) ≤ c(u, v).Flow conservation: For all u ∈ V – s, t,

, v∈V ∑ f ( v u ) = 0.

Skew symmetry: For all u, v ∈ V, f (u, v) = –f (v, u).

Capacity constraint: For all u, v ∈ V,

One summation instead of two.


Equivalence of definitionsTheorem. The two definitions are equivalent.


Equivalence of definitionsTheorem. The two definitions are equivalent. Proof. (⇒) Let f (u, v) = p(u, v) – p(v, u). • Capacity constraint: Since p(u, v) ≤ c(u, v) and

p(v, u) ≥ 0, we have f (u, v) ≤ c(u, v).• Flow conservation:

∑ f ( v u ) =∑( v u p ) − u v p )), ( , ( , v∈V v∈V

=∑ v u p ) −∑ u v p )( , ( , v∈V v∈V

• Skew symmetry:f (u, v) = p(u, v) – p(v, u)

= – (p(v, u) – p(u, v)) = – f (v, u).


Proof (continued)(⇐) Let

f (u, v) if f(u, v) > 0,p(u, v) = 0 if f(u, v) ≤ 0.

• Capacity constraint: By definition, p(u, v) ≥ 0. Since f (u, v) ≤ c(u, v), it follows that p(u, v) ≤ c(u, v).

• Flow conservation: If f (u, v) > 0, then p(u, v) – p(v, u) = f (u, v). If f (u, v) ≤ 0, then p(u, v) – p(v, u) = – f (v, u) = f (u, v) by skew symmetry. Therefore,

∑ v u p ) −∑ u v p ) =∑ f ( v u ).( , ( , , v∈V v∈V v∈V


Notation

Definition. The value of a flow f, denoted by | f |, is given by

f = ∑ f ( v s ), V v ∈

,= f ( V s ) .Implicit summation notation: A set used in an arithmetic formula represents a sum over the elements of the set. • Example — flow conservation:

f (u, V) = 0 for all u ∈ V – s, t.© 2001–4 by Charles E. Leiserson Introduction to Algorithms November 24, 2004 L20.15

Simple properties of flow

Lemma. • f (X, X) = 0, • f (X, Y) = – f (Y, X), • f (X∪Y, Z) = f (X, Z) + f (Y, Z) if X∩Y = ∅.


Simple properties of flow

Lemma. • f (X, X) = 0, • f (X, Y) = – f (Y, X), • f (X∪Y, Z) = f (X, Z) + f (Y, Z) if X∩Y = ∅.

Theorem. | f | = f (V, t).Proof.

| f | = f (s, V) = f (V, V) – f (V–s, V) Omit braces. = f (V, V–s) = f (V, t) + f (V, V–s–t) = f (V, t).


Flow into the sink

2:2 2:3 2:3

ss 0:1 0:3 1:3 0:2 tt1:1

2:22:2 3:3

| f | = f (s, V) = 4 f (V, t) = 4


CutsDefinition. A cut (S, T) of a flow network G = (V, E) is a partition of V such that s ∈ S and t ∈ T. If f is a flow on G, then the flow across the cut is f (S, T).

2:2 2:3 2:3 ∈ S

ss 0:1 0:3 1:3 0:2 tt ∈ T1:1

2:22:2 3:3

f (S, T) = (2 + 2) + (– 2 + 1 – 1 + 2) = 4


Another characterization of flow value

Lemma. For any flow f and any cut (S, T), we have | f | = f (S, T).


Another characterization of flow value

Lemma. For any flow f and any cut (S, T), we have | f | = f (S, T). Proof. f (S, T) = f (S, V) – f (S, S)

= f (S, V) = f (s, V) + f (S–s, V) = f (s, V) = | f |.


Capacity of a cutDefinition. The capacity of a cut (S, T) is c(S, T).

2:2 2:3 2:3 ∈ S

ss 0:1 0:3 1:3 0:2 tt ∈ T1:1

2:22:2 3:3

c(S, T) = (3 + 2) + (1 + 2 + 3) = 11


Upper bound on the maximumflow value

Theorem. The value of any flow is bounded above by the capacity of any cut.

.


Upper bound on the maximumflow value

Theorem. The value of any flow is bounded above by the capacity of any cut.

f = T S f )( ,Proof. ,= ∑∑ f ( v u )

S vu∈ ∈T ≤ ∑∑ v u c )( ,

S vu∈ ∈T ( ,= T S c ) .


Residual networkDefinition. Let f be a flow on G = (V, E). The residual network Gf (V, Ef ) is the graph with strictly positive residual capacities

cf (u, v) = c(u, v) – f (u, v) > 0. Edges in Ef admit more flow.




uu vv

0:1

G: uu vv

4

Gf :

Example:

3:5 2




uu vv

0:1

3:5

G: uu vv

4

2

Gf :

Example:

Lemma. | Ef | ≤ 2| E |. © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 24, 2004 L20.27

Augmenting pathsDefinition. Any path from s to t in Gf is an augmenting path in G with respect to f. The flow value can be increased along an augmentingpath p by c f ( p) = min c f ( v u ).,

,( v u )∈p


Augmenting pathsDefinition. Any path from s to t in Gf is an augmenting path in G with respect to f. The flow value can be increased along an augmentingpath p by c f ( p) = min c f ( v u ).,

,( v u )∈p

ss

3:5

G: 2:6 0:2

tt

2:5Ex.:

c5:5 2:3

f (p) = 2 2 4 7 2 3

ssGf : tt 3 2 1 2


Max-flow, min-cut theorem

Theorem. The following are equivalent:1. f is a maximum flow. 2. Gf contains no augmenting paths.3. | f | = c(S, T) for some cut (S, T) of G.



Theorem. The following are equivalent:1. f is a maximum flow. 2. Gf contains no augmenting paths.3. | f | = c(S, T) for some cut (S, T) of G.

Proof (and algorithms). Next time.



LECTURE 21 Network Flow II • Max-flow, min-cut theorem • Ford-Fulkerson algorithm

and analysis • Edmonds-Karp algorithm

and analysis • Best algorithms to date


Recall from Lecture 22• Flow value: | f | = f (s, V).• Cut: Any partition (S, T) of V such that s ∈ S

and t ∈ T. • Lemma. | f | = f (S, T) for any cut (S, T). • Corollary. | f | ≤ c(S, T) for any cut (S, T).• Residual graph: The graph Gf = (V, Ef ) with

strictly positive residual capacities cf (u, v) = c(u, v) – f (u, v) > 0.

• Augmenting path: Any path from s to t in Gf . • Residual capacity of an augmenting path:

,c f ( p) = min c f ( v u ). ,( v u )∈p



Theorem. The following are equivalent:1. | f | = c(S, T) for some cut (S, T). 2. f is a maximum flow. 3. f admits no augmenting paths.



Theorem. The following are equivalent:1. | f | = c(S, T) for some cut (S, T). 2. f is a maximum flow. 3. f admits no augmenting paths. Proof. (1) ⇒ (2): Since | f | ≤ c(S, T) for any cut (S, T) (by the corollary from Lecture 22), the assumption that | f | = c(S, T) implies that f is a maximum flow.



Theorem. The following are equivalent:1. | f | = c(S, T) for some cut (S, T). 2. f is a maximum flow. 3. f admits no augmenting paths. Proof. (1) ⇒ (2): Since | f | ≤ c(S, T) for any cut (S, T) (by the corollary from Lecture 22), the assumption that | f | = c(S, T) implies that f is a maximum flow. (2) ⇒ (3): If there were an augmenting path, theflow value could be increased, contradicting the maximality of f.


Proof (continued)(3) ⇒ (1): Suppose that f admits no augmenting paths. Define S = v ∈ V : there exists a path in Gf from s to v, and let T = V – S. Observe that s ∈ S and t ∈ T, and thus (S, T) is a cut. Consider any vertices u ∈ S and v ∈ T.

ss uu vv S Tpath in Gf

We must have cf (u, v) = 0, since if cf (u, v) > 0, then v ∈ S, not v ∈ T as assumed. Thus, f (u, v) = c(u, v), since cf (u, v) = c(u, v) –

S, T) = cf (u, v). Summing over all u ∈ S and v ∈ T

yields f ( (S, T), and since | f | = f (S, T), the theorem follows. © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 29, 2004 L21.6

Ford-Fulkerson max-flow algorithm

Algorithm:f [u, v] ← 0 for all u, v ∈ V while an augmenting path p in G wrt f exists

do augment f by cf (p)




do augment f by cf (p) Can be slow:

ss tt

109 109

109

1

109

G:





ss tt

0:109 0:109

0:109

0:1

0:109

G:





ss tt

1:109 0:109

1:109

1:1

0:109

G:





G: ss tt

1:109 0:109

1:109

1:1

0:109





G: ss tt

1:109 1:109

1:109

0:1

1:109





ss tt

1:109 1:109

1:109

0:1

1:109

G:





ss tt

2:109 1:109

2:109

1:1

1:109

G:





ss tt

2:109 1:109

2:109

1:1

1:109

G:

2 billion iterations on a graph with 4 vertices! © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 29, 2004 L21.16

Edmonds-Karp algorithm

G

Edmonds and Karp noticed that many people’s implementations of Ford-Fulkerson augment along a breadth-first augmenting path: a shortest path in

f from s to t where each edge has weight 1. These implementations would always run relatively fast. Since a breadth-first augmenting path can be found in O(E) time, their analysis, which provided the first polynomial-time bound on maximum flow, focuses on bounding the number of flow augmentations. (In independent work, Dinic also gave polynomial-time bounds.)


Monotonicity lemmaLemma. Let δ(v) = δf (s, v) be the breadth-first distance from s to v in Gf . During the Edmonds-Karp algorithm, δ(v) increases monotonically.


Monotonicity lemmaLemma. Let δ(v) = δf (s, v) be the breadth-first distance from s to v in Gf . During the Edmonds-Karp algorithm, δ(v) increases monotonically. Proof. Suppose that augmenting a flow f on G produces a new flow f ′. Let δ′(v) = δf ′(s, v). We’ll show δ′(v) ≥ δ(v) by induction on δ′(v). For the base case, δ′(v) = 0 implies v = s, and since δ(s) = 0, we have δ′(v) ≥ δ(v). For the inductive case, consider a breadth-first path s → L → u → v in Gf ′. We must have δ′(v) = δ′(u) + 1, since subpaths of shortest paths are shortest paths. Hence, we have δ′(u) ≥ δ(u) by induction, because δ′(v) > δ′(u). Certainly, (u, v) ∈ Ef ′.


Proof of Monotonicity Lemma — Case 1

Consider two cases depending on whether (u, v) ∈ Ef .

Case 1: (u, v) ∈ Ef .

We have δ(v) ≤ δ(u) + 1 (triangle inequality)

≤ δ′(u) + 1 (induction) = δ′(v) (breadth-first path),

and thus monotonicity of δ(v) is established.


Proof of Monotonicity Lemma — Case 2

Case: (u, v) ∉ Ef . Since (u, v) ∈ Ef ′ , the augmenting path p that produced f ′ from f must have included (v, u). Moreover, p is a breadth-first path in Gf :

p = s → L → v → u → L → t .Thus, we have

δ(v) = δ(u) – 1 (breadth-first path) ≤ δ′(u) – 1 (induction) = δ′(v) – 2 (breadth-first path) < δ′(v) ,

thereby establishing monotonicity for this case, too. © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 29, 2004 L21.21

Counting flow augmentationsTheorem. The number of flow augmentations in the Edmonds-Karp algorithm (Ford-Fulkerson with breadth-first augmenting paths) is O(VE).


Counting flow augmentationsTheorem. The number of flow augmentations in the Edmonds-Karp algorithm (Ford-Fulkerson with breadth-first augmenting paths) is O(VE). Proof. Let p be an augmenting path, and suppose that we have cf (u, v) = cf (p) for edge (u, v) ∈ p. Then, we say that (u, v) is critical, and it disappears from the residual graph after flow augmentation.


Counting flow augmentationsTheorem. The number of flow augmentations in the Edmonds-Karp algorithm (Ford-Fulkerson with breadth-first augmenting paths) is O(VE). Proof. Let p be an augmenting path, and suppose that we have cf (u, v) = cf (p) for edge (u, v) ∈ p. Then, we say that (u, v) is critical, and it disappears from the residual graph after flow augmentation. Example: cf (p) = 2

2 4 7 2 3

ssGf : tt 3 2 1 2


Counting flow augmentations(continued)

The first time an edge (u, v) is critical, we have δ(v) = δ(u) + 1, since p is a breadth-first path. We must wait until (v, u) is on an augmenting path before (u, v) can be critical again. Let δ′ be the distance function when (v, u) is on an augmenting path. Then, we have

δ′(u) = δ′(v) + 1 (breadth-first path) ≥ δ(v) + 1 (monotonicity) = δ(u) + 2 (breadth-first path).





Example: uu

vv ss tt





ss uu

vv tt

δ(u) = 5

δ(v) = 6

Example:





Example: δ(u) = 5

tt uu

vv ss

δ(v) = 6 © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 29, 2004 L21.28




ss uu

vv tt

δ(u) ≥ 7

δ(v) ≥ 6

Example:





Example: δ(u) ≥ 7

tt uu

vv ss

δ(v) ≥ 6 © 2001–4 by Charles E. Leiserson Introduction to Algorithms November 29, 2004 L21.30




ss uu

vv tt

δ(u) ≥ 7

δ(v) ≥ 8

Example:


Running time of Edmonds-Karp

Distances start out nonnegative, never decrease, and are at most |V| – 1 until the vertex becomes unreachable. Thus, (u, v) occurs as a critical edge O(V) times, because δ(v) increases by at least 2 between occurrences. Since the residual graph contains O(E) edges, the number of flow augmentations is O(V E).




Corollary. The Edmonds-Karp maximum-flow algorithm runs in O(V E2) time.




Corollary. The Edmonds-Karp maximum-flow algorithm runs in O(V E2) time. Proof. Breadth-first search runs in O(E) time, and all other bookkeeping is O(V) per augmentation.


Best to date

• The asymptotically fastest algorithm to date for maximum flow, due to King, Rao, and Tarjan, runs in O(V E logE/(V lg V)V) time.

• If we allow running times as a function of edge weights, the fastest algorithm for maximum flow, due to Goldberg and Rao, runs in time

O(minV 2/3, E 1/2 ⋅ E lg (V 2/E + 2) ⋅ lg C), where C is the maximum capacity of any edge in the graph.


Today

• String matching problems • HKN Evaluations (last 15 minutes) • Graded Quiz 2 (outside)

© Piotr Indyk Introduction to Algorithms December 1, 2004 L22.2

String Matching

• Input: Two strings T[1…n] and P[1…m],containing symbols from alphabet Σ. E.g. :–Σ=a,b,…,z– T[1…18]=“to be or not to be” – P[1..2]=“be”

• Goal: find all “shifts” 0≤ s ≤n-m such that T[s+1…s+m]=P E.g. 3, 16


Simple Algorithm

for s ← 0 to n-m

Match ← 1

for j ← 1 to m

if T[s+j] ≠P[j] then Match ← 0 exit loop

if Match=1 then output s


Results

• Running time of the simple algorithm: – Worst-case: O(nm) – Average-case (random text): O(n)

• Ts= time spent on checking shift s • E[Ts] ≤ 2 • E [∑sTs] = ∑s E[Ts] = O(n)


Worst-case

• Is it possible to achieve O(n) for any input ?

– Knuth-Morris-Pratt’77: deterministic – Karp-Rabin’81: randomized


Karp-Rabin Algorithm

• A very elegant use of an idea that we have encounteredbefore, namely…

HASHING ! • Idea:

– Hash all substrings T[1…m], T[2…m+1], …, T[m-n+1…n]

– Hash the pattern P[1…m] – Report the substrings that hash to the same value as P

• Problem: how to hash n-m substrings, each of length m, inO(n) time ?


Attempt 0

• In Lecture 7, we have seen

ha(x)=∑i aixi mod q

where a=(a1,…,ar) , x=(x1,…,xr)• To implement it, we would need to compute

ha( T[s…s+m-1] )=∑i ai T[s+i] mod q for s=0…n-m

• How to compute it in O(n) time ? • A big open problem!


Attempt 1

• Assume Σ=0,1• Think about each Ts=T[s+1…s+m] as a

number in binary representation, i.e., ts=T[s+1]2m-1+T[s+2]2m-2+…+T[s+m]20

• Find a fast way of computing ts+1 given ts • Output all s such that ts is equal to the

number p represented by P


The great formula

• How to transform ts=T[s+1]2m-1+T[s+2]2m-2+…+T[s+m]20

into ts+1=T[s+2]2m-1+T[s+3]2m-2+…+T[s+m+1]20 ?

• Three steps:– Subtract T[s+1]2m-1

– Multiply by 2 (i.e., shift the bits by oneposition)

– Add T[s+m+1]20

• Therefore: ts+1= (ts- T[s+1]2m-1)*2 + T[s+m+1]20


Algorithm

ts+1= (ts- T[s+1]2m-1)*2 + T[s+m+1]20

• Can compute ts+1 from ts using 3 arithmetic operations

• Therefore, we can compute all t0,t1,…,tn-musing O(n) arithmetic operations

• We can compute a number corresponding toP using O(m) arithmetic operations

• Are we done ?


Problem

• To get O(n) time, we would need to perform each arithmetic operation in O(1) time

• However, the arguments are m-bit long !• If m large, it is unreasonable to assume that

operations on such big numbers can be done in O(1) time

• We need to reduce the number range to something more managable


Attempt 2: Hashing

• We will instead compute t’s=T[s+1]2m-1+T[s+2]2m-2+…+T[s+m]20 mod q where q is an “appropriate” prime number

• One can still compute t’s+1 from t’s : t’s+1= (t’s- T[s+1]2m-1)*2+T[s+m+1]20 mod q

• If q is not large, i.e., has O(log n) bits, we can compute all t’s (and p’) in O(n) time


Problem

• Unfortunately, we can have false positives, i.e., Ts≠P but ts mod q = p mod q

• Need to use a random q

• We will show that the probability of a false positive is small → randomized algorithm


False positives

• Consider any ts≠p. We know that both numbers are in the range 0…2m-1

• How many primes q are there such that ts mod q = p mod q ≡ (ts-p) =0 mod q ?

• Such prime has to divide x=(ts-p) ≤ 2m

e1p2e2…pk

ek• Represent x=p1 , pi prime, ei≥1What is the largest possible value of k ?

– Since 2 ≤ pi , we have x ≥ 2k

– But x ≤ 2m

– k ≤ m• There are ≤ m primes dividing x


Algorithm

• Algorithm: – Let ∏ be a set of 2nm primes, each having O(log n) bits – Choose q uniformly at random from ∏ – Compute t0 mod q, t1 mod q, …., and p mod q – Report s such that ts mod q = p mod q

• Analysis: – For each s, the probability that Ts≠P but

ts mod q =p mod qis at most m/2nm = 1/2n

– The probability of any false positive is at most (n-m)/2n ≤ 1/2


“Details”

• How do we know that such ∏ exists ? (That is, a set of 2nm primes, each having O(log n) bits)

• How do we choose a random prime from ∏ in O(n) time ?


Prime density

• Primes are “dense”. I.e., if PRIMES(N) is the set of primes smaller than N, thenasymptotically

|PRIMES(N)|/N ~ 1/ln N• If N large enough, then

|PRIMES(N)| ≥ N/(2ln N)

• Proof: Trust me.


Prime density continued

• Set N=C mn ln(mn) • There exists C=O(1) such that

N/(2ln N) ≥ 2mn (Note: for such N we have PRIMES(N) ≥ 2mn )

• Proof:C mn ln(mn) / [2 ln(C mn ln(mn)) ]

≥ C mn ln(mn) / [2 ln(C (mn)2) ] = C mn ln(mn) / 4[ ln(C) + ln(mn)] • All elements of PRIMES(N) are log N = O(log n)

bits long


Prime selection

• Still need to find a random element of PRIMES(N)

• Solution: – Choose a random element from 1 … N – Check if it is prime – If not, repeat


Prime selection analysis

• A random element q from 1…N is prime with probability ~1/ln N

• We can check if q is prime in time polynomial in log N : – Randomized: Rabin, Solovay-Strassen in 1976

– Deterministic: Agrawal et al in 2002 • Therefore, we can generate random prime q in

o(n) time


Final Algorithm

• Set N=C mn ln(mn)• Repeat

– Choose q uniformly at random from1…N

• Until q is prime• Compute t0 mod q, t1 mod q, …., and p mod

q • Report s such that ts mod q = p mod q


Dealing with Hard Problems

• What to do if: – Divide and conquer – Dynamic programming

– Greedy

– Linear Programming/Network Flows

– …

does not give a polynomial time algorithm?



• Solution I: Ignore the problem

– Can’t do it ! There are thousands of problems for which we do not know polynomial time algorithms

– For example: • Traveling Salesman Problem (TSP) • Set Cover


Traveling Salesman Problem

• Traveling SalesmanProblem (TSP)– Input: undirected graph

with lengths on edges– Output: shortest cycle

that visits each vertex exactly once

• Best known algorithm:O(n 2n) time.


Set Covering

Bank robbery problem:• Set Cover: • X=plan, shoot, safe,

∪– Input: subsets S1…Sn of X, drive, scary

i Si = X, |X|=m • Sets:– Output: C ⊆1…n , such – SJoe =plan, safe

that ∪i∈C Si = X, and |C| – SJim=shoot, scary, minimal drive

• Best known algorithm: – …. O(2n m) time(?)



• Exponential time algorithms for small inputs. E.g., (100/99)n time is not bad for n < 1000.

• Polynomial time algorithms for some (e.g., average-case) inputs

• Polynomial time algorithms for all inputs, but which return approximate solutions


Approximation Algorithms

• An algorithm A is ρ-approximate, if, on any inputof size n: – The cost CA of the solution produced by the

algorithm, and – The cost COPT of the optimal solution are such that CA ≤ ρ COPT

• We will see: – 2-approximation algorithm for TSP in the plane – ln(m)-approximation algorithm for Set Cover


Comments on Approximation

• “ CA ≤ ρ COPT “ makes sense only forminimization problems

• For maximization problems, replace by ““CA ≥ 1/ρ COPT

• Additive approximation “CA ≤ ρ + COPT “ also makes sense, although difficult to achieve


2-approximation for TSP

• Compute MST T – An edge between any pair of points

– Weight = distance between endpoints

• Compute a tree-walk W of T – Each edge visited twice

• Convert W into a cycle C using shortcuts


2-approximation: Proof

• Let COPT be the optimal cycle

• Cost(T) ≤ Cost(COPT ) – Removing an edge from C gives a spanning

tree, T is a spanning tree of minimum cost

• Cost(W) = 2 Cost(T) – Each edge visited twice

• Cost(C) ≤ Cost(W) – Triangle inequality

⇒ Cost(C) ≤ 2 Cost(COPT )


Approximation for Set Cover

Greedy algorithm: • Initialize C=∅

• Repeat until all elements are covered:– Choose Si which contains largest number

of yet-not-covered elements – Add i to C – Mark all elements in Si as covered


Greedy Algorithm: Example

• X=1,2,3,4,5,6

• Sets: – S1=1,2 – S2=3,4 – S3=5,6 – S4=1,3,5

• Algorithm picks C=4,1,2,3 • Not optimal!


ln(m)-approximation

• Notation:– COPT = optimal cover– k=|COPT |

• Fact: At any iteration of the algorithm, there exists Sjwhich contains at ≥ 1/k fraction of yet-not-coveredelements

• Proof: by contradiction. – If all sets cover <1/k fraction of yet-not-covered

elements, there is no way to cover them using k sets – But COPT does that !

• Therefore, at each iteration greedy covers ≥1/k fraction of yet-not-covered elements


ln(m)-approximation

• Let ui be the number of yet-not-covered elementsat the end of step i=0,1,2,…

• We haveui+1 ≤ ui (1-1/k)

u0=m • Therefore, after t=k ln m steps, we have

ut ≤ u0 (1-1/k)t ≤ m (1-1/k)k ln m < m 1/eln m =1 • I.e., all elements are covered by the k ln m sets

chosen by greedy algorithm • Opt size is k ⇒ greedy is ln(m)-approximate


Approximation Algorithms

• Very rich area– Algorithms use greedy, linear

programming, dynamic programming • E.g., 1.01-approximate TSP in the plane

– Sometimes can show that approximating a problem is as hard as finding exactsolution ! • E.g., 0.99 ln(m)-approximate Set Cover


CLR Explained

Documents

Transcript of CLR Explained