Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function...

24
Average Case Analysis October 11, 2011

Transcript of Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function...

Page 1: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average Case Analysis

October 11, 2011

Page 2: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Worst-case analysis

Worst-case analysis gives an upper bound for the running timeof a single execution of an algorithm with a worst-case inputand worst-case random choices.

Does this reflect the performance of the algorithm in practice?

Page 3: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Practical considerations

We are often more interested in having a reasonable estimateor reasonable guarantee of the performance of the algorithm inpractice.

I Typical case performance?I Performance most of the time?I Performance in all but some pathological cases?

Page 4: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Amortized analysis

Amortized analysis is often used for sequences of operationsthat update a data structure. It gives an upper bound for theaverage running time of a single operation in a worst-casesequence of operations, starting from some neutral state.

The fundamental assumption is that while some operationscan be expensive, such operations are rare in long sequences ofoperations.

Page 5: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Typical scenario in amortized analysis

A data structure has space for up to n elements. Inserting ordeleting a single element takes Θ(1) time. When the datastructure becomes full, the elements are moved into a datastructure of size 2n in Θ(n) time. If the structure becomes1/4 full, the elements are moved into a data structure of sizen/2 in Θ(n) time.

In a neutral state, the data structure has n/2 elements. Bydefining the potential of a data structure with m elements as|m − n/2|, we can easily show that the amortized time forinserting or deleting an element is Θ(1).

Page 6: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis

Average case analysis gives an upper bound for the expectedrunning time of a single execution of a deterministic algorithmwith a random input selected according to some distribution.

Is the average case also typical case?

Page 7: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

function Quicksort(A):if |A| ≤ 1:

return ASelect pivot p from A.Partition A into smaller, equal, and larger than p.return Quicksort(smaller)+equal+Quicksort(larger)

Quicksort is an interesting algorithm for average case analysis,because its worst-case complexity is Θ(n2), while the typicalperformance is much better than for other Θ(n2)-timealgorithms (e.g. Insertion sort).

Page 8: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

In the analysis, we always choose the first element as thepivot. The input is assumed to be a permutation of integers1, . . . , n, with each of the permutations equally likely.

The running time of QuickSort is determined by the number ofelement comparisons, plus Θ(n) to cover the base cases.Integers i and j are compared at most once. This happens, ifone of i and j is chosen as a pivot before any of i + 1, . . . , j − 1is chosen. This happens with probability 2/(j − i + 1).

This suggests that we could use indicator variables in theanalysis.

Page 9: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

Let Xi ,j indicate whether integers i and j (with i < j) arecompared during sorting. Now Xi ,j = 1 with probability2/(j − i + 1), and Xi ,j = 0 otherwise.

The analysis is greatly simplified by the linearity ofexpectations:

E

[ ∑1≤i<j≤n

Xi ,j

]=

∑1≤i<j≤n

E[Xi ,j ].

Page 10: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

E

[ ∑1≤i<j≤n

Xi ,j

]=

∑1≤i<j≤n

E[Xi ,j ] =∑

1≤i<j≤n

2j − i + 1

= 2 ·n−1∑i=1

n−i+1∑j=2

1j≤ 2 ·

n−1∑i=1

Hi

= Θ(n log n),

where Hi = Θ(log i) is the ith harmonic number.

Hence the expected running time of Quicksort isΘ(n log n + n) = Θ(n log n).

Page 11: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

We can also base the average case analysis on the followingrecurrence:

T (n) =

Θ(1) if n ≤ 11n

∑ni=1 (T (i − 1) + T (n − i) + Θ(n))

The general case is further divided into cases, where the pivotis the ith smallest element in the array.

Page 12: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

We start by simplifying the general case:

T (n) =1n

n∑i=1

(T (i − 1) + T (n − i) + Θ(n))

=2n

n−1∑i=0

T (i) + Θ(n).

For large enough n, it holds that

nT (n)− (n − 1)T (n − 1) = 2T (n − 1) + Θ(n).

Page 13: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

T (n) =n + 1n

T (n − 1) + Θ(1)

=n + 1n − 1

T (n − 2) + Θ

(1 +

n + 1n

)= · · ·

=n + 12

T (1) + Θ

((n + 1)

n+1∑i=3

1i

)= Θ(n log n).

Page 14: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis of Quicksort

A typical implementation uses Quicksort for arrays larger thank elements, and sorts small arrays with Insertion sort. AsInsertion sort can move an element for at most k − 1 positions,the total time complexity for all Insertion sorts is Θ(nk).

By setting T (n) = Θ(1) for n ≤ k , we get

T (n) =n + 1k + 1

T (k) + Θ

((n + 1)

n+1∑i=k+2

1i

)= Θ

(n log

nk

)for the Quicksort phase, for a total of Θ(n(k + log(n/k))).

Page 15: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis

Average case analysis gives an upper bound for the expectedrunning time of a single execution of a deterministic algorithmwith a random input selected according to some distribution.

OR

Average case analysis gives an upper bound for the expectedrunning time of a single execution of a randomized algorithmwith a worst-case input.

Page 16: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis

Randomized Quicksort selects a random element as the pivot.The same average case analysis works for this variant as well.

Should we draw the same conclusions from the analysis ofdeterministic Quicksort and randomized Quicksort?

Page 17: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis

For randomized Quicksort, any input may require Ω(n2) time,but this is extremely unlikely.

For deterministic Quicksort, some inputs require Ω(n2) time. Ifthe first element is always chosen as the pivot, an array that isalready sorted is one of these inputs.

It is hard to find an input distribution that reflects the actualdistribution of inputs in practice, and is easy to analyze.

Page 18: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Average case analysis

Given a distribution of running times,

I worst-case analysis gives an upper bound for themaximum, while

I average case analysis gives an upper bound for theexpected value.

In statistics, there is no single parameter that always capturesthe relevant properties of a distribution. Similarly, there is nosingle way of analyzing an algorithm that always givesreasonable bounds for its performance in practice. One has tounderstand the nature of a particular algorithm to know, whichmethod of analysis accurately describes its performance.

Page 19: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Smoothed analysis

Smoothed analysis gives an upper bound for the expectedrunning time of a single execution of an algorithm for arandom input that is close to the worst-case input.

Daniel A. Spielman, Shang-Hua Teng: Smoothed analysisof algorithms: Why the simplex algorithm usuallytakes polynomial time.Journal of the ACM 51(3): 385-463, 2004.

Perhaps the most important advance in the analysis ofalgorithms in the 2000s.

Page 20: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Linear optimization

Discrete Optimization (4 cr), period II, Mon, Wed 14–16.Registration starts today!

The canonical form of a linear optimization problem is

maximize cᵀxsubject to Ax ≤ band x ≥ 0.

With n variables and m constraints, a solution x is a point inRn, and the m half-spaces defined by A and b limit thefeasible region in Rn. The goal is to maximize the linearreal-valued target function cᵀx in the feasible region.

Page 21: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Simplex algorithm (simplified)

function Simplex(A,b, c):Find a feasible solution x.repeat:

Find a direction that increases cᵀx.Move x in that direction as far as possible.

until the solution no longer improves.return x

The average running time of one version of Simplex is mO(√

n),yet the algorithm is very efficient in practice.

Page 22: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

Smoothed analysis of the Simplex algorithm

Spielman and Teng add Gaussian noise with normalizedstandard deviation σ to the constraints A and b. They thenproceed to show that the running time of the Simplexalgorithm is polynomial in n, m, and 1/σ.

Even though the average running time is exponential, most ofit is produced by small neighborhoods of hard instances. If theconstraints are imprecise (e.g. due to measurement errors),then these hard instances are almost never found in practice.

Page 23: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.

String Processing Algorithms (4 cr)

Period II, Tue, Thu 12–14Juha Kärkkäinen

I Exact and approximate string matching.I String sorting and suffix sorting.I Suffix trees, suffix arrays, and related data structures.I Data structures for sets of strings.

Period III: Data Compression Techniques (4 cr), Project inString Processing Algorithms (2 cr)

A major research area at CS Department.

Page 24: Average Case Analysis - University of Helsinki · AveragecaseanalysisofQuicksort function Quicksort(A): if jAj 1: return A SelectpivotpfromA. PartitionAintosmaller,equal,andlargerthanp.