Approximating the Depth via Sampling and Emptiness

65
Approximating the Depth via Sampling and Emptiness Lecture :Adi Vardi

description

Approximating the Depth via Sampling and Emptiness. Approximating the Depth via Sampling and Emptiness. Approximating the Depth via Sampling and Emptiness. Approximating the Depth via Sampling and Emptiness. Example: Range tree. S = Set of points in the plane r = Query Rectangle. - PowerPoint PPT Presentation

Transcript of Approximating the Depth via Sampling and Emptiness

Page 1: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Lecture : Adi Vardi

Page 2: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Depth(r,S) – Given a set of n objects S and a query range r, let depth(r,s) be the number of object of S intersected by r.

Page 3: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Depth(r,S) = 3

Page 4: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Range counting queries - preprocessing a set S of n objects and a class of ranges, so that given a range r, one can quickly report number of objects in S intersecting r

Page 5: Approximating the Depth via Sampling and Emptiness

Example: Range tree

• S = Set of points in the plane• r = Query Rectangle

Page 6: Approximating the Depth via Sampling and Emptiness

Range tree – primary tree

Balanced binary search tree Ordered by x-coordinate Each point is stored at a leaf node

Page 7: Approximating the Depth via Sampling and Emptiness

Range tree query• Given interval [a,b], search for a and b• Find where the paths split, look at subtrees along

the search

Paths split

a b

Page 8: Approximating the Depth via Sampling and Emptiness

Range tree complexity

Query time: O(logd-1n) Space: O(nlogd-1n) Construction time: O(nlogd-1n)

Page 9: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Range emptiness queries - preprocessing a set S of objects and a class of ranges, so that given a range r, one can quickly report whether r intersects any of the objects in S

Page 10: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Emptiness(r,s) = not_empty

Page 11: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Emptiness(r,s) = empty

Page 12: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Sampling - selection of a subset of objects from a set S to estimate characteristics of S

S S’

Page 13: Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Approximate – Let μr = depth(r,S) denote the depth of r. For a prespecified ε > 0 the data structure outputs a number αr such that (1 - ε)μr ≤ αr ≤ μr

Example: μr = 20, ε = 0.1 o Valid αr: 18, 19, 20 o Invalid αr: 17, 21

Page 14: Approximating the Depth via Sampling and Emptiness

Motivation Counting queries are much harder then emptiness queries

Example: halfspace queries in 2,3 dimensions o Counting queries can be answered in polynomial time o Emptiness queries can be answered in logarithmic time

Goal: answer approximate range counting queries using

polylogarithmic emptiness queries

Page 15: Approximating the Depth via Sampling and Emptiness

Halfplane emptiness Preprocessing: Find convex hall Query o Rotate polygon o Find upper and lower envelop – binary search o Find extreme points – binary search o Orientation test

Page 16: Approximating the Depth via Sampling and Emptiness

Idea 1

• Create M independent samples R1,…,Rm of S • Each sample formed by picking each element of S with probability p

• Xi = 1 if r intersects any object of Ri, Xi = 0 otherwise • Yr = σ 𝑋𝑖 i • αr = f(Yr, M, p) • problems?

Page 17: Approximating the Depth via Sampling and Emptiness

Idea 1 – problems Number of samples M = M(ε,n)

(Number of " Bernoulli trials") Probability to pick object p = p(μr) Wrong p might yield invalid result o Large p for "light" depth o Small p for "heavy" depth

How to find appropiate p?

Page 18: Approximating the Depth via Sampling and Emptiness

Idea 1 – problems "Heavy" depth o n = 1000, μr = 997, ε → 0 o Sample size must be less than 4

(otherwise none of them will be "empty") "Light" depth o n = 1000, μr = 3 o Sample size must be very large (otherwise we

won't be able to catch the "non-empty" objects).

Page 19: Approximating the Depth via Sampling and Emptiness

Idea 2 Guess starting z = depth(r,S)

Probability to pick object p = 1𝑧

Probability r of depth k to intersect Ri

pz(k) = 1 – (1 - 1𝑧)k

If r has depth z, then Δ = E[yr] = Mpz(z) If yr > Δ then μr > z, otherwise μr ≤ z Perform binary search on [0,n] Problems?

Page 20: Approximating the Depth via Sampling and Emptiness

-problems Idea 2

Only the expectation of Yr is Mpz(z) The decision μr < z might be mistaken How can we overcome mistakes and guarantee

(1-ε)μr ≤ αr ≤ μr?

Page 21: Approximating the Depth via Sampling and Emptiness

Halfspace complexity status

Emptiness query – logarithmic time Binary search – logarithmic time

(probably) Number of samples M = M(ε,n) -

logarithmic?

Page 22: Approximating the Depth via Sampling and Emptiness

The decision procedure

Given parameters z ∈ [0,n] and 12 > ε > 0,

we construct a data structure, such that for any δ,

with 12 > δ ≥ ε, and a query range r, we can decide

with high probability, whether μr < z or μr ≥ z. The data structure is allowed to make a mistake if

μr ∈ [(1 - δ)z, (1 + δ)z]

(1-δ)z (1+δ)z z

?< >

Page 23: Approximating the Depth via Sampling and Emptiness

Why δ? Trade off between query time and accuracy Large δ when binary search range is large Small δ when the range is small

(1-δ)z (1+δ)z z

<

Page 24: Approximating the Depth via Sampling and Emptiness

The data structure Let R1,…,Rm be M independent random samples of

S, formed by picking every element with probability 1/z, where M(ε) = ⌈c3ε-2logn⌉, and c3 is a sufficiently large absolute constant.

Build M separate emptiness-query data structures D1,…,DM, respectively, and put D = D(z,ε) := {D1,…,DM}

Page 25: Approximating the Depth via Sampling and Emptiness

Answering a query Xi = 1 if r intersects any object of Ri, Xi = 0 otherwise

for i = 1…M(δ)

Yr = σ Xi i

Probability r of depth k to intersect Ri

pz(k) = 1 – (1 - 1z)k

If r has depth z, then Δ = E[yr] = Mpz(z) Our data structure returns "depth(r,S) < z" if Yr < Δ,

and "depth(r,S) ≥ z" otherwise

Page 26: Approximating the Depth via Sampling and Emptiness

Lemma 1

Pr[Yr > Δ | depth(r,S) ≤ (1-δ)z] does not exceed n-c4, where c4 = c4(c3) > 0 depends only on c3 and can be made arbitrarily large by a choice of sufficiently large c3 > 0.

Page 27: Approximating the Depth via Sampling and Emptiness

Reminder - Chernoff bound Let X1,…Xn be n independent Bernoulli trials

Pr[Xi = 1] = pi, Pr[Xi = 0] = qi = 1 – pi

X = σ 𝑋𝑛𝑖=1 i

μ = E[x]

For any δ > 0

o Pr[X ≥ (1+δ)μ] ≤ ( 𝑒𝛿ሺ1+𝛿ሻ1+𝛿)μ

o Pr[X < (1-δ)μ] ≤ exp(-μδ2/2)

For any δ ≤ 2e – 1

Pr[X ≥ (1+δ)μ] ≤ exp(-μδ2/4)

Page 28: Approximating the Depth via Sampling and Emptiness

Lemma 1 - proofObservation 1: for 0 ≤ x ≤ y < 1, 1−x1−y = 1 + y−x1−y ≥ 1 + y – x.

The probability α is maximized when depth(r,S) = (1-α)z. Therefore: α ≤ Pr[Yr > Δ| depth(r,S) = (1-ε)z].

Pr[xi] = pz((1-ε)z).

E[Yr] = Mpz((1-ε)z) = M[1-(1-1𝑧)(1-ε)z] ≥ M(1-e-(1-ε)) ≥ 𝑀3

Since (1-1𝑧)z ≤ e-1, and ε ≤ 12.

According to observation 1, for y = (1-1𝑧)(1-ε)z, x = (1-1𝑧)z

ξ = Δ𝐸[𝑌𝑟] = 𝑀[1−ቀ1−1𝑧ቁ𝑧]

𝑀[1−ቀ1−1𝑧ቁሺ1−𝜀ሻ𝑧] ≥ 1 + (1-1𝑧)(1-ε)z - (1-1𝑧)z = 1 + (1-1𝑧)(1-ε)z[1-(1-1𝑧)εz]

Page 29: Approximating the Depth via Sampling and Emptiness

Lemma 1 - proof

ξ = Δ𝐸[𝑌𝑟] = 1 + (1-1𝑧)(1-ε)z[1-(1-1𝑧)εz]

Applying exp(-2x) ≤ 1-x, 1+x ≤ exp(x):

ξ ≥ 1 + exp[-2𝑧(1-ε)z)][1-exp(-1𝑧εz)] ≥ 1 + 1𝑒2[1-exp(-ε)]

≥ 1 + 1𝑒2[1-(1-𝜀2)] ≥ 1 + 𝜀15

Deploying Chernoff inequality:

α = Pr[Yr > Δ] ≤ Pr[Yr > ξE[Yr]] ≤ Pr[Yr > (1+ε/15)E[Yr]]

≤ exp[-E[Yr]14( 𝜀15)2] ≤ exp(-𝑀𝜀2𝑐 ) ≤ exp(𝜀2⌈c3ε−2logn ⌉𝑐 ) ≤ n-c4

Page 30: Approximating the Depth via Sampling and Emptiness

Lemma 2 Given a set S of n objects, a parameter 0 < ε < 12 and z∈[0,n],

one can construct a data structure D(z) which, given a range r

and a parameter 12 > δ ≥ ε, returns either LOW or HIGH.

If it returns LOW, then μr ≤ (1+δ)z, and if it returns HIGH, then μr ≥ (1-δ)z.

The data structure might return either answer if μr ∈ [(1-δ)z,(1+δ)z].

The data structure D consist of M = O(ε-2logn) emptiness data structure

Page 31: Approximating the Depth via Sampling and Emptiness

Lemma 2 - cont The space needed is O(S(2n/z)ε-2logn) where S(m) is

the space needed for a single emptiness data structure storing m objects.

The time is O(Q(2n/z)δ-2logn) where Q(m) is the time needed for a single query in such a structure storing m objects.

All bounds hold with high probability.

Proof?

Page 32: Approximating the Depth via Sampling and Emptiness

Lemma 3 Given the data structure of Lemma 2, z and δ > c5,

one can decide for a query range r if μr < z/(1+δ) or μr ≥ z(1+δ). C5 sufficiently large constant.

The data structure is allowed to return any answer if μr ∈ [z/(1+δ), z(1+δ)].

This requires M = ⌈c6(logn)/lnδ⌉ emptiness queries, and the answer returned is correct with high probability, where c6 is an appropriate absolute constant

Proof?

Page 33: Approximating the Depth via Sampling and Emptiness

Range counting data structure "Light" depth values o Build a separate data structure Di = D(vi, εi) of

Lemma 2

o vi = 𝑖2, εi = 18𝑖 o i = 1,2,….,U = O(ε-1)

"Heavy" depth values o Build a separate data structure Dj = D(vj, εj) of

Lemma 2 o vj = (U/4)(1+ε/16)j, εj = ε/16 o j = U+1,U+2….W,

where W := clog1+ε/16n = O(ε-1logn), Wn = n

Page 34: Approximating the Depth via Sampling and Emptiness

Range counting data structurelog1+𝜀/16 𝑛 = 𝑙𝑜𝑔𝑛log (1+ 𝜀16) For 0 < ε ≤ 1 50log(1+ε/16) ≥ ε

Page 35: Approximating the Depth via Sampling and Emptiness

Answering a query Given a range query r, each data structure in our list returns

LOW or HIGH. If we were to query all the data structures, we would get a

sequence of HIGHs, followed by a sequence of LOWs. The value associated with the last data structure returning

HIGH (rounded to the nearest integer) yields the required approximation.

We can use binary search on D1…DW to locate this changeover value using a total of O(logW) = O(log(ε-1logn)) queries.

Overall time is O(Q(n)ε-2(logn)log(ε-1logn)).

D1 DW αr

Page 36: Approximating the Depth via Sampling and Emptiness

“Light” depthKey observation: error ≤ Uε = cε-1ε = c. Small error range.

Example: μr = 3, ε = 0.1, error = 0.3. αr must be equal to μr!

Assume μr = x

D2x might return HIGH or LOW

v2x - v2x-1 = 12 > 116 = 2𝑥−12 18(2𝑥− 1) = v2x-1ε2x-1

D2x-1 must return HIGH

v2x+1 - v2x = 12 > 116 = 2𝑥+12 18(2𝑥+ 1) = v2x+1ε2x+1

D2x+1 must return LOW

D2x-1 D2x D2x+1

?

Page 37: Approximating the Depth via Sampling and Emptiness

“ Light” depth – option 1

D2x = H

αr = v2x = x = μr

D2x-1 D2x D2x+1

Page 38: Approximating the Depth via Sampling and Emptiness

“ Light” depth – option 2D2x = L

αr = ⌈v2x-1⌉ = ⌈x - 12⌉ = x = μr

D2x-1 D2x D2x+1

Page 39: Approximating the Depth via Sampling and Emptiness

“Heavy” depthvj+1 = (1+ 𝜀16)vj, εj = 𝜀16

Observation: vj+1 – vj = vjεj

Assume μr = vj

Dj-2 must return HIGH

vj+2 - vj = vj𝜀16 + vj+1

𝜀16 = vj𝜀16[1 + (1+ 𝜀16)] > 2vj

𝜀16

> vj𝜀16(1+ 𝜀16)2 = vj+2εj+2

Dj+2 must return LOW

Dj-2 Dj+2 Dj

?

Dj-1

?

Dj+1

?

Page 40: Approximating the Depth via Sampling and Emptiness

“Heavy” depthDj+1 ≥ αr ≥ Dj-2 If we choose the DS previous to the changeover:

μr = Dj ≥ αr ≥ Dj-3 ≥ vj(1-3 𝜀16 ) ≥ vj(1-ε)

Dj-2 Dj+2 Dj

?

Dj-1

?

Dj+1

?

Page 41: Approximating the Depth via Sampling and Emptiness

Improved data structure Treat D1,…,DW as a linked list LM,

M = ⌈logW⌉ = O(log(ε-1logn). Build a data structure where Li-1 is formed from Li

by picking every other element Base list L1 has 4-8 elements

L1

L2

L3

L4

Page 42: Approximating the Depth via Sampling and Emptiness

Answering a query Search top-down starting from L1 At the ith stage maintain pointers to four consecutive DS

in Li, such that left two return HIGH and the right two return LOW.

The corresponding portion of Li+1 delimited by these two DS in Li is a sub list of at most seven DS in Li+1.

We query at most three new DS to maintain the sub list. Key observation: at each level we use δi as large as possible,

such that the error intervals of all these data structure are disjoint

Query time: O(ε-2Q(n)logn)

Page 43: Approximating the Depth via Sampling and Emptiness

Answering a query

L1

L2

L3

L4

Page 44: Approximating the Depth via Sampling and Emptiness

Answering a query

L1

L2

L3

L4

Page 45: Approximating the Depth via Sampling and Emptiness

Answering a query

L1

L2

L3

L4

Page 46: Approximating the Depth via Sampling and Emptiness

Answering a query

L1

L2

L3

L4

αr

Page 47: Approximating the Depth via Sampling and Emptiness

Answering a query δ1 = n1/4 During coarse search o O(loglogn) levels o δi = O(ξ𝛿i-1)

During refine search o O(log(ε-1)) levels o δ'j = 1/2j

Page 48: Approximating the Depth via Sampling and Emptiness

Coarse searchAfter O(log(ε-1)) levels all "light" depth DS disappear

In the ith level:

vj(1+δi) < vj+1/(1+δi)

(1+δi)2 < vj+1/vj = Ci = (1+ε/16)2^i

1 + δi = ξ𝐶i

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Page 49: Approximating the Depth via Sampling and Emptiness

Coarse searchAs long as δi > c5, ξ𝐶i ≫ 1. Therefore:

δi ≈ ξ𝐶i

During the first level we only 4 elements. Therefore C1 = n1/2, δ1 = n1/4.

Since Ci-1 = Ci2 and δi ≈ ξ𝐶i, δi = O(ξ𝛿i-1)

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Page 50: Approximating the Depth via Sampling and Emptiness

Coarse searchC1 = n1/2, C2 = n1/4, C3 = n1/8, C4 = n1/16….

δ1 = n1/4, δ2 = n1/8, δ3 = n1/16, δ4 = n1/32…

loglogn level since n1/4 = 2logn/4 is constant after O(loglogn) sqrt operation

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Page 51: Approximating the Depth via Sampling and Emptiness

Refine search – “Light” depthδ(vj+2d) + δ(vj+d) < d δ(vj+d) + δvj< d Therefore: 2δ(vj+2d) + 2δvj = δ(vj+2d) + δ(vj+d) + δ(vj+d) + δvj < 2d

δi-1 = 2δi

Vj Vj + d

Vj

Vj + 2d

d d

2d

Vj + 2d

Page 52: Approximating the Depth via Sampling and Emptiness

Refine search – “Heavy” depth

Observation 2: for k ≤ logε, (1+ε/16)2^k ≈ (1+2kε/16).

proof: (1+ε/16)2^k = 1 + σ ቀ2𝑘𝑖 ቁ2𝑘𝑖=1 (ε/16)i

ቀ2𝑘𝑖+1ቁ(ε/16)i+1 = ቀ2𝑘𝑖 ቁ(ε/16)i 𝜀(2𝑘−𝑖)16(𝑖+1) ≤ 116ቀ2𝑘𝑖 ቁ(ε/16)i

Page 53: Approximating the Depth via Sampling and Emptiness

Refine search – “Heavy” depthδvj(1+ε/16)2^k + δvj(1+ε/16)2^(k-1) < d2

δvj(1+ε/16)2^(k-1) + δvj < d1.

k ≤ log(ε-1), Applying observation 2:

2δvj(1+ε/16)2^k + 2δvj ≈ δvj(1+ε/16)2^k + δvj(1+2kε/16) + 2δvj

= δvj(1+ε/16)2^k + 2δvj(1+2k-1ε/16) + δvj ≈ δvj(1+ε/16)2^k + 2δvj(1+ε/16)2^(k-1) + δvj < d1 + d2 δi-1 = 2δi

Vj

Vj

Vj(1+ε/16)2^(k-1) Vj(1+ε/16)2^k

Vj(1+ε/16)2^k

d1 d2

d1 + d2

Page 54: Approximating the Depth via Sampling and Emptiness

Complexity analysis – coarse search

During the coarse search δi = O(ξ𝛿i), Lemma 3 bound number of emptiness queries. Therefore: σ 𝑂𝑖,δi>c5 ( 𝑙𝑜𝑔𝑛𝑙𝑜𝑔𝛿𝑖)

= 𝑙𝑜𝑔𝑛𝑛14 + 𝑙𝑜𝑔𝑛𝑛18 + 𝑙𝑜𝑔𝑛𝑛 116 + …..+ 𝑙𝑜𝑔𝑛𝑛𝑂(𝑙𝑜𝑔𝑙𝑜𝑔𝑛 )

= 𝑙𝑜𝑔𝑛2𝑙𝑜𝑔𝑛4 + 𝑙𝑜𝑔𝑛

2𝑙𝑜𝑔𝑛8 + 𝑙𝑜𝑔𝑛2𝑙𝑜𝑔𝑛16 + …..+ 𝑙𝑜𝑔𝑛2𝑂(1)

= O(logn)

Page 55: Approximating the Depth via Sampling and Emptiness

Complexity analysis – refine search

During the refine search δ'j = 1/2j, Lemma 1

bound number of emptiness queries. Therefore:

= σ 𝑂𝑂(log 1𝜀)𝑗=2 (𝑙𝑜𝑔𝑛δ′ j2 ) = σ 𝑂𝑂(log 1𝜀)𝑗=2 (22jlogn)

= O(ε-2logn).

Page 56: Approximating the Depth via Sampling and Emptiness

Summary Query time: O(ε-2Q(n)logn) Space: O(S(n)ε-3log2n) Construction time: O(T(n)ε-3log2n) In some cases we can reduce space and

construction time requirements. Intuition: o S(2n/z), T(2n/z) o Degree λ: S(n/i) = O(S(n)/iλ)

Page 57: Approximating the Depth via Sampling and Emptiness

Applications - Halfplane

Emptiness queries can be answered in logarithmic time when d = 2,3.

S(n) = O(n), T(n) = O(nlogn), Q(n) = O(logn). Approximating counting queries o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Page 58: Approximating the Depth via Sampling and Emptiness

Applications - Disks Using the standard lifting of points in R2 to the

paraboloid in R3 (maps balls in Rd to hyperplanes in Rd+1 and points in Rd to points on the standard paraboloid in Rd+1).

Disk range query in the plane reduces to a halfspace range query in three dimensions.

Similar results for disks range counting o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Page 59: Approximating the Depth via Sampling and Emptiness

Applications – Depth queriesPseudo-disks: A set of objects is a collection of pseudo-disks, if the boundary of every pair of them intersects at most twice.

Page 60: Approximating the Depth via Sampling and Emptiness

Applications – Depth queries Computing the union of n pseudo-disk in the plane and

preprocessing the union for point-location queries Emptiness queries can be answered in logarithmic time. S(n) = O(n), T(n) = O(nlogn), Q(n) = O(logn). Approximating counting queries o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Page 61: Approximating the Depth via Sampling and Emptiness

Relative approximation via sampling

Approximate depth(r,S) for query point r. Use single sample. Sample each object with probability p in a

random sample R. If r had sufficiently large depth, then its depth

can be estimated reliably by depth(r,R)/p. The deeper r is the better this estimate is.

Page 62: Approximating the Depth via Sampling and Emptiness

Lemma 4 – reliable sampling Let S be a set of n objects, 0 < ε < 12,

and let r a point of depth u ≥ k in S. Let R be a random sample of S, such that every

element is picked with probability p = 8𝑘𝜀2ln1𝛿.

Let X be the depth of r in R (accurate depth). X/p lies in the interval [(1-ε)u ,(1+ε)u]. This estimate succeeds with

probability ≥ 1 – δu/k ≥ 1 – δ.

Page 63: Approximating the Depth via Sampling and Emptiness

Lemma 4 – example

U = 10000, k = 5000, ε = 0.2, δ = 0.001.

p = 85000∗0.22ln 10.001 = 0.276

x/p ∈ [8000,12000]. Pr[success] ≥ 1 – δ = 0.999

Page 64: Approximating the Depth via Sampling and Emptiness

Lemma 4 – proofμ = E[X] = pu. By Chernoff's inequality: Pr[X ∉ [(1 - ϵ)μ,(1 + ϵ)μ] = Pr[X < (1 - ϵ)μ] + Pr[X > (1 + ϵ)μ] ≤ exp(-puε2/2) + exp(-puε2/4)

≤ exp(-4𝑢𝑘ln1𝛿) + exp(-2𝑢𝑘ln1𝛿) ≤ δu/k

Since u ≥ k

Page 65: Approximating the Depth via Sampling and Emptiness

Lemma 4 – conclusions If depth(r,S) is (say) u ≤ 10k, then

depth(r,R) ≤ (1+ε)pu = O( 1𝜀2ln1𝛿).

This is (relatively) small number. Therefore via sampling, we turned the task of

estimating depth of heavy range to estimating shallow range.

We can perform a binary search for the depth of r by a sequence of coarser to finer samples.