Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

76
Crash Course on Data Stream Algorithms Part I: Basic Definitions and Numerical Streams Andrew McGregor University of Massachusetts Amherst 1/24

Transcript of Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Page 1: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Crash Course on Data Stream AlgorithmsPart I: Basic Definitions and Numerical Streams

Andrew McGregorUniversity of Massachusetts Amherst

1/24

Page 2: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 3: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 4: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 5: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:

I If you get bored, ask questions. . .I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 6: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .

I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 7: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .I If you get lost, ask questions. . .

I If you’d like to ask questions, ask questions. . .

2/24

Page 8: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Goals of the Crash Course

I Goal: Give a flavor for the theoretical results and techniques fromthe 100’s of papers on the design and analysis of stream algorithms.

“When we abstract away the application-specific details, what arethe basic algorithmic ideas and challenges in stream processing?

What is and isn’t possible?”

I Disclaimer: Talks will be theoretical/mathematical but shouldn’trequire much in the way of prerequisites.

I Request:I If you get bored, ask questions. . .I If you get lost, ask questions. . .I If you’d like to ask questions, ask questions. . .

2/24

Page 9: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

3/24

Page 10: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

4/24

Page 11: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m2. Access data sequentially3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 12: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m2. Access data sequentially3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 13: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m

2. Access data sequentially3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 14: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m2. Access data sequentially

3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 15: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m2. Access data sequentially3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 16: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Data Stream Model

I Stream: m elements from universe of size n, e.g.,

〈x1, x2, . . . , xm〉 = 3, 5, 3, 7, 5, 4, . . .

I Goal: Compute a function of stream, e.g., median, number ofdistinct elements, longest increasing sequence.

I Catch:

1. Limited working memory, sublinear in n and m2. Access data sequentially3. Process each element quickly

I Origins in 70s but has become popular in last ten years because ofgrowing theory and very applicable.

5/24

Page 17: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Why’s it become popular?

I Practical Appeal:I Faster networks, cheaper data storage, ubiquitous data-logging

results in massive amount of data to be processed.I Applications to network monitoring, query planning, I/O efficiency

for massive data, sensor networks aggregation. . .

I Theoretical Appeal:I Easy to state problems but hard to solve.I Links to communication complexity, compressed sensing,

embeddings, pseudo-random generators, approximation. . .

6/24

Page 18: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Why’s it become popular?

I Practical Appeal:I Faster networks, cheaper data storage, ubiquitous data-logging

results in massive amount of data to be processed.I Applications to network monitoring, query planning, I/O efficiency

for massive data, sensor networks aggregation. . .

I Theoretical Appeal:I Easy to state problems but hard to solve.I Links to communication complexity, compressed sensing,

embeddings, pseudo-random generators, approximation. . .

6/24

Page 19: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

7/24

Page 20: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Sampling and Statistics

I Sampling is a general technique for tackling massive amounts of data

I Example: To compute the median packet size of some IP packets,we could just sample some and use the median of the sample as anestimate for the true median. Statistical arguments relate the size ofthe sample to the accuracy of the estimate.

I Challenge: But how do you take a sample from a stream of unknownlength or from a “sliding window”?

8/24

Page 21: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Sampling and Statistics

I Sampling is a general technique for tackling massive amounts of data

I Example: To compute the median packet size of some IP packets,we could just sample some and use the median of the sample as anestimate for the true median. Statistical arguments relate the size ofthe sample to the accuracy of the estimate.

I Challenge: But how do you take a sample from a stream of unknownlength or from a “sliding window”?

8/24

Page 22: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Sampling and Statistics

I Sampling is a general technique for tackling massive amounts of data

I Example: To compute the median packet size of some IP packets,we could just sample some and use the median of the sample as anestimate for the true median. Statistical arguments relate the size ofthe sample to the accuracy of the estimate.

I Challenge: But how do you take a sample from a stream of unknownlength or from a “sliding window”?

8/24

Page 23: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Reservoir Sampling

I Problem: Find uniform sample s from a stream of unknown length

I Algorithm:I Initially s = x1

I On seeing the t-th element, s ← xt with probability 1/t

I Analysis:I What’s the probability that s = xi at some time t ≥ i?

P [s = xi ] =1

iׄ

1− 1

i + 1

«× . . .×

„1− 1

t

«=

1

t

I To get k samples we use O(k log n) bits of space.

9/24

Page 24: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Reservoir Sampling

I Problem: Find uniform sample s from a stream of unknown length

I Algorithm:I Initially s = x1

I On seeing the t-th element, s ← xt with probability 1/t

I Analysis:I What’s the probability that s = xi at some time t ≥ i?

P [s = xi ] =1

iׄ

1− 1

i + 1

«× . . .×

„1− 1

t

«=

1

t

I To get k samples we use O(k log n) bits of space.

9/24

Page 25: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Reservoir Sampling

I Problem: Find uniform sample s from a stream of unknown length

I Algorithm:I Initially s = x1

I On seeing the t-th element, s ← xt with probability 1/t

I Analysis:I What’s the probability that s = xi at some time t ≥ i?

P [s = xi ] =1

iׄ

1− 1

i + 1

«× . . .×

„1− 1

t

«=

1

t

I To get k samples we use O(k log n) bits of space.

9/24

Page 26: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Reservoir Sampling

I Problem: Find uniform sample s from a stream of unknown length

I Algorithm:I Initially s = x1

I On seeing the t-th element, s ← xt with probability 1/t

I Analysis:I What’s the probability that s = xi at some time t ≥ i?

P [s = xi ] =1

iׄ

1− 1

i + 1

«× . . .×

„1− 1

t

«=

1

t

I To get k samples we use O(k log n) bits of space.

9/24

Page 27: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Reservoir Sampling

I Problem: Find uniform sample s from a stream of unknown length

I Algorithm:I Initially s = x1

I On seeing the t-th element, s ← xt with probability 1/t

I Analysis:I What’s the probability that s = xi at some time t ≥ i?

P [s = xi ] =1

iׄ

1− 1

i + 1

«× . . .×

„1− 1

t

«=

1

t

I To get k samples we use O(k log n) bits of space.

9/24

Page 28: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 29: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)

2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 30: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 31: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 32: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:

I The probability that j-th oldest element is in S is 1/j so theexpected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 33: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 34: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Priority Sampling for Sliding Windows

I Problem: Maintain a uniform sample from the last w items

I Algorithm:

1. For each xi we pick a random value vi ∈ (0, 1)2. In a window 〈xj−w+1, . . . , xj〉 return value xi with smallest vi

3. To do this, maintain set of all elements in sliding window whose vvalue is minimal among subsequent values

I Analysis:I The probability that j-th oldest element is in S is 1/j so the

expected number of items in S is

1/w + 1/(w − 1) + . . .+ 1/1 = O(log w)

I Hence, algorithm only uses O(log w log n) bits of memory.

10/24

Page 35: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Other Types of Sampling

I Universe sampling: For a random i ∈R [n], compute

fi = |{j : xj = i}|

I Minwise hashing: Sample i ∈R {i : there exists j such that xj = i}I AMS sampling: Sample xj for j ∈R [m] and compute

r = |{j ′ ≥ j : xj′ = xj}|

Handy when estimating quantities like∑

i g(fi ) because

E [m(g(r)− g(r − 1))] =∑

i

g(fi )

11/24

Page 36: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Other Types of Sampling

I Universe sampling: For a random i ∈R [n], compute

fi = |{j : xj = i}|

I Minwise hashing: Sample i ∈R {i : there exists j such that xj = i}

I AMS sampling: Sample xj for j ∈R [m] and compute

r = |{j ′ ≥ j : xj′ = xj}|

Handy when estimating quantities like∑

i g(fi ) because

E [m(g(r)− g(r − 1))] =∑

i

g(fi )

11/24

Page 37: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Other Types of Sampling

I Universe sampling: For a random i ∈R [n], compute

fi = |{j : xj = i}|

I Minwise hashing: Sample i ∈R {i : there exists j such that xj = i}I AMS sampling: Sample xj for j ∈R [m] and compute

r = |{j ′ ≥ j : xj′ = xj}|

Handy when estimating quantities like∑

i g(fi ) because

E [m(g(r)− g(r − 1))] =∑

i

g(fi )

11/24

Page 38: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Other Types of Sampling

I Universe sampling: For a random i ∈R [n], compute

fi = |{j : xj = i}|

I Minwise hashing: Sample i ∈R {i : there exists j such that xj = i}I AMS sampling: Sample xj for j ∈R [m] and compute

r = |{j ′ ≥ j : xj′ = xj}|

Handy when estimating quantities like∑

i g(fi ) because

E [m(g(r)− g(r − 1))] =∑

i

g(fi )

11/24

Page 39: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

12/24

Page 40: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Sketching

I Sketching is another general technique for processing streams

I Basic idea: Apply a linear projection “on the fly” that takeshigh-dimensional data to a smaller dimensional space. Post-processlower dimensional image to estimate the quantities of interest.

13/24

Page 41: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Sketching

I Sketching is another general technique for processing streams

I Basic idea: Apply a linear projection “on the fly” that takeshigh-dimensional data to a smaller dimensional space. Post-processlower dimensional image to estimate the quantities of interest.

13/24

Page 42: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Estimating the difference between two streams

I Input: Stream from two sources 〈x1, x2, . . . , xm〉 ∈ ([n] ∪ [n])m

I Goal: Estimate difference between distribution of red values and bluevalues, e.g., ∑

i∈[n]

|fi − gi |

where fi = |{k : xk = i}| and gi = |{k : xk = i}|

14/24

Page 43: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Estimating the difference between two streams

I Input: Stream from two sources 〈x1, x2, . . . , xm〉 ∈ ([n] ∪ [n])m

I Goal: Estimate difference between distribution of red values and bluevalues, e.g., ∑

i∈[n]

|fi − gi |

where fi = |{k : xk = i}| and gi = |{k : xk = i}|

14/24

Page 44: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stable

I Algorithm:I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).I Compute sketches Af and Ag incrementallyI Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 45: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stableI Algorithm:

I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).

I Compute sketches Af and Ag incrementallyI Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 46: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stableI Algorithm:

I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).I Compute sketches Af and Ag incrementally

I Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 47: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stableI Algorithm:

I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).I Compute sketches Af and Ag incrementallyI Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 48: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stableI Algorithm:

I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).I Compute sketches Af and Ag incrementallyI Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 49: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

p-Stable Distributions and AlgorithmI Defn: A p-stable distribution µ has the following property:

for X ,Y ,Z ∼ µ and a, b ∈ R : aX + bY ∼ (|a|p + |b|p)1/pZ

e.g., Gaussian is 2-stable and Cauchy distribution is 1-stableI Algorithm:

I Generate random matrix A ∈ Rk×n where Aij ∼ Cauchy, k = O(ε−2).I Compute sketches Af and Ag incrementallyI Return median(|t1|, . . . , |tk |) where t = Af − Ag

I Analysis:I By the 1-stability property for Zi ∼ Cauchy

|ti | = |X

j

Ai,j(fj − gj)| ∼ |Zi |X

j

|fj − gj |

I For k = O(ε−2), since median(|Zi |) = 1, with high probability,

(1− ε)X

j

|fj − gj | ≤ median(|t1|, . . . , |tk |) ≤ (1 + ε)X

j

|fj − gj |

15/24

Page 50: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min Sketch

I Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 51: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 52: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 53: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 54: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]

I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 55: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]

I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 56: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]

I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 57: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 58: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

A Useful Multi-Purpose Sketch: Count-Min SketchI Heavy Hitters: Find all i such that fi ≥ φm

I Range Sums: Estimate∑

i≤k≤j fk when i , j aren’t known in advance

I Find k-Quantiles: Find values q0, . . . , qk such that

q0 = 0, qk = n, and∑

i≤qj−1

fi <jm

k≤∑i≤qj

fi

I Algorithm: Count-Min SketchI Maintain an array of counters ci,j for i ∈ [d ] and j ∈ [w ]I Construct d random hash functions h1, h2, . . . hd : [n]→ [w ]I Update counters: On seeing value v , increment ci,hi (v) for i ∈ [d ]I To get an estimate of fk , return

fk = mini

ci,hi (k)

I Analysis: For d = O(log 1/δ) and w = O(1/ε2)

P[fk − εm ≤ fk ≤ fk

]≥ 1− δ

16/24

Page 59: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

17/24

Page 60: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Counting Distinct Elements

I Input: Stream 〈x1, x2, . . . , xm〉 ∈ [n]m

I Goal: Estimate the number of distinct values in the stream up to amultiplicative factor (1 + ε) with high probability.

18/24

Page 61: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Algorithm

I Algorithm:

1. Apply random hash function h : [n]→ [0, 1] to each element

2. Compute φ, the t-th smallest value of the hash seen where t = 21/ε2

3. Return r = t/φ as estimate for r , the number of distinct items.

I Analysis:

1. Algorithm uses O(ε−2 log n) bits of space.2. We’ll show estimate has good accuracy with reasonable probability

P [|r − r | ≤ εr ] ≤ 9/10

19/24

Page 62: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Algorithm

I Algorithm:

1. Apply random hash function h : [n]→ [0, 1] to each element2. Compute φ, the t-th smallest value of the hash seen where t = 21/ε2

3. Return r = t/φ as estimate for r , the number of distinct items.

I Analysis:

1. Algorithm uses O(ε−2 log n) bits of space.2. We’ll show estimate has good accuracy with reasonable probability

P [|r − r | ≤ εr ] ≤ 9/10

19/24

Page 63: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Algorithm

I Algorithm:

1. Apply random hash function h : [n]→ [0, 1] to each element2. Compute φ, the t-th smallest value of the hash seen where t = 21/ε2

3. Return r = t/φ as estimate for r , the number of distinct items.

I Analysis:

1. Algorithm uses O(ε−2 log n) bits of space.2. We’ll show estimate has good accuracy with reasonable probability

P [|r − r | ≤ εr ] ≤ 9/10

19/24

Page 64: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Algorithm

I Algorithm:

1. Apply random hash function h : [n]→ [0, 1] to each element2. Compute φ, the t-th smallest value of the hash seen where t = 21/ε2

3. Return r = t/φ as estimate for r , the number of distinct items.

I Analysis:

1. Algorithm uses O(ε−2 log n) bits of space.

2. We’ll show estimate has good accuracy with reasonable probability

P [|r − r | ≤ εr ] ≤ 9/10

19/24

Page 65: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Algorithm

I Algorithm:

1. Apply random hash function h : [n]→ [0, 1] to each element2. Compute φ, the t-th smallest value of the hash seen where t = 21/ε2

3. Return r = t/φ as estimate for r , the number of distinct items.

I Analysis:

1. Algorithm uses O(ε−2 log n) bits of space.2. We’ll show estimate has good accuracy with reasonable probability

P [|r − r | ≤ εr ] ≤ 9/10

19/24

Page 66: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Accuracy Analysis

1. Suppose the distinct items are a1, . . . , ar

2. Over Estimation:

P [r ≥ (1 + ε)r ] = P [t/φ ≥ (1 + ε)r ] = P[φ ≤ t

r(1 + ε)

]3. Let Xi = 1[h(ai ) ≤ t

r(1+ε) ] and X =∑

Xi

P[φ ≤ t

r(1 + ε)

]= P [X > t] = P [X > (1 + ε)E [X ]]

4. By a Chebyshev analysis,

P [X > (1 + ε)E [X ]] ≤ 1

ε2E [X ]≤ 1/20

5. Under Estimation: A similar analysis shows P [r ≤ (1− ε)r ] ≤ 1/20

20/24

Page 67: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Accuracy Analysis

1. Suppose the distinct items are a1, . . . , ar

2. Over Estimation:

P [r ≥ (1 + ε)r ] = P [t/φ ≥ (1 + ε)r ] = P[φ ≤ t

r(1 + ε)

]

3. Let Xi = 1[h(ai ) ≤ tr(1+ε) ] and X =

∑Xi

P[φ ≤ t

r(1 + ε)

]= P [X > t] = P [X > (1 + ε)E [X ]]

4. By a Chebyshev analysis,

P [X > (1 + ε)E [X ]] ≤ 1

ε2E [X ]≤ 1/20

5. Under Estimation: A similar analysis shows P [r ≤ (1− ε)r ] ≤ 1/20

20/24

Page 68: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Accuracy Analysis

1. Suppose the distinct items are a1, . . . , ar

2. Over Estimation:

P [r ≥ (1 + ε)r ] = P [t/φ ≥ (1 + ε)r ] = P[φ ≤ t

r(1 + ε)

]3. Let Xi = 1[h(ai ) ≤ t

r(1+ε) ] and X =∑

Xi

P[φ ≤ t

r(1 + ε)

]= P [X > t] = P [X > (1 + ε)E [X ]]

4. By a Chebyshev analysis,

P [X > (1 + ε)E [X ]] ≤ 1

ε2E [X ]≤ 1/20

5. Under Estimation: A similar analysis shows P [r ≤ (1− ε)r ] ≤ 1/20

20/24

Page 69: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Accuracy Analysis

1. Suppose the distinct items are a1, . . . , ar

2. Over Estimation:

P [r ≥ (1 + ε)r ] = P [t/φ ≥ (1 + ε)r ] = P[φ ≤ t

r(1 + ε)

]3. Let Xi = 1[h(ai ) ≤ t

r(1+ε) ] and X =∑

Xi

P[φ ≤ t

r(1 + ε)

]= P [X > t] = P [X > (1 + ε)E [X ]]

4. By a Chebyshev analysis,

P [X > (1 + ε)E [X ]] ≤ 1

ε2E [X ]≤ 1/20

5. Under Estimation: A similar analysis shows P [r ≤ (1− ε)r ] ≤ 1/20

20/24

Page 70: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Accuracy Analysis

1. Suppose the distinct items are a1, . . . , ar

2. Over Estimation:

P [r ≥ (1 + ε)r ] = P [t/φ ≥ (1 + ε)r ] = P[φ ≤ t

r(1 + ε)

]3. Let Xi = 1[h(ai ) ≤ t

r(1+ε) ] and X =∑

Xi

P[φ ≤ t

r(1 + ε)

]= P [X > t] = P [X > (1 + ε)E [X ]]

4. By a Chebyshev analysis,

P [X > (1 + ε)E [X ]] ≤ 1

ε2E [X ]≤ 1/20

5. Under Estimation: A similar analysis shows P [r ≤ (1− ε)r ] ≤ 1/20

20/24

Page 71: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Outline

Basic Definitions

Sampling

Sketching

Counting Distinct Items

Summary of Some Other Results

21/24

Page 72: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Some Other Results

Correlations:

I Input: 〈(x1, y1), (x2, y2), . . . , (xm, ym)〉I Goal: Estimate strength of correlation between x and y via the

distance between joint distribution and product of the marginals.

I Result: (1 + ε) approx in O(ε−O(1)) space.

Linear Regression:

I Input: Stream defines a matrix A ∈ Rn×d and b ∈ Rd×1

I Goal: Find x such that ‖Ax − b‖2 is minimized.

I Result: (1 + ε) estimation in O(d2ε−1) space.

22/24

Page 73: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Some Other Results

Correlations:

I Input: 〈(x1, y1), (x2, y2), . . . , (xm, ym)〉I Goal: Estimate strength of correlation between x and y via the

distance between joint distribution and product of the marginals.

I Result: (1 + ε) approx in O(ε−O(1)) space.

Linear Regression:

I Input: Stream defines a matrix A ∈ Rn×d and b ∈ Rd×1

I Goal: Find x such that ‖Ax − b‖2 is minimized.

I Result: (1 + ε) estimation in O(d2ε−1) space.

22/24

Page 74: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Some More Other Results

Histograms:

I Input: 〈x1, x2, . . . , xm〉 ∈ [n]m

I Goal: Determine B bucket histogram H : [m]→ R minimizing∑i∈[m]

(xi − H(i))2

I Result: (1 + ε) estimation in O(B2ε−1) space

Transpositions and Increasing Subsequences:

I Input: 〈x1, x2, . . . , xm〉 ∈ [n]m

I Goal: Estimate number of transpositions |{i < j : xi > xj}|I Goal: Estimate length of longest increasing subsequence

I Results: (1 + ε) approx in O(ε−1) and O(ε−1√

n) space respectively

23/24

Page 75: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Some More Other Results

Histograms:

I Input: 〈x1, x2, . . . , xm〉 ∈ [n]m

I Goal: Determine B bucket histogram H : [m]→ R minimizing∑i∈[m]

(xi − H(i))2

I Result: (1 + ε) estimation in O(B2ε−1) space

Transpositions and Increasing Subsequences:

I Input: 〈x1, x2, . . . , xm〉 ∈ [n]m

I Goal: Estimate number of transpositions |{i < j : xi > xj}|I Goal: Estimate length of longest increasing subsequence

I Results: (1 + ε) approx in O(ε−1) and O(ε−1√

n) space respectively

23/24

Page 76: Crash Course on Data Stream Algorithms - Part I: Basic Definitions ...

Thanks!

I Blog: http://polylogblog.wordpress.com

I Lectures: Piotr Indyk, MIT

http://stellar.mit.edu/S/course/6/fa07/6.895/

I Books:

“Data Streams: Algorithms and Applications”S. Muthukrishnan (2005)

“Algorithms and Complexity of Stream Processing”A. McGregor, S. Muthukrishnan (forthcoming)

24/24