Approximating the Depth via Sampling and Emptiness

Approximating the Depth via Sampling and Emptiness

Lecture : Adi Vardi


Depth(r,S) – Given a set of n objects S and a query range r, let depth(r,s) be the number of object of S intersected by r.


Depth(r,S) = 3


Range counting queries - preprocessing a set S of n objects and a class of ranges, so that given a range r, one can quickly report number of objects in S intersecting r

Example: Range tree

• S = Set of points in the plane• r = Query Rectangle

Range tree – primary tree

Balanced binary search tree Ordered by x-coordinate Each point is stored at a leaf node

Range tree query• Given interval [a,b], search for a and b• Find where the paths split, look at subtrees along

the search

Paths split

a b

Range tree complexity

Query time: O(logd-1n) Space: O(nlogd-1n) Construction time: O(nlogd-1n)


Range emptiness queries - preprocessing a set S of objects and a class of ranges, so that given a range r, one can quickly report whether r intersects any of the objects in S


Emptiness(r,s) = not_empty


Emptiness(r,s) = empty


Sampling - selection of a subset of objects from a set S to estimate characteristics of S

S S’


Approximate – Let μr = depth(r,S) denote the depth of r. For a prespecified ε > 0 the data structure outputs a number αr such that (1 - ε)μr ≤ αr ≤ μr

Example: μr = 20, ε = 0.1 o Valid αr: 18, 19, 20 o Invalid αr: 17, 21

Motivation Counting queries are much harder then emptiness queries

Example: halfspace queries in 2,3 dimensions o Counting queries can be answered in polynomial time o Emptiness queries can be answered in logarithmic time

Goal: answer approximate range counting queries using

polylogarithmic emptiness queries

Halfplane emptiness Preprocessing: Find convex hall Query o Rotate polygon o Find upper and lower envelop – binary search o Find extreme points – binary search o Orientation test

Idea 1

• Create M independent samples R1,…,Rm of S • Each sample formed by picking each element of S with probability p

• Xi = 1 if r intersects any object of Ri, Xi = 0 otherwise • Yr = σ 𝑋𝑖 i • αr = f(Yr, M, p) • problems?

Idea 1 – problems Number of samples M = M(ε,n)

(Number of " Bernoulli trials") Probability to pick object p = p(μr) Wrong p might yield invalid result o Large p for "light" depth o Small p for "heavy" depth

How to find appropiate p?

Idea 1 – problems "Heavy" depth o n = 1000, μr = 997, ε → 0 o Sample size must be less than 4

(otherwise none of them will be "empty") "Light" depth o n = 1000, μr = 3 o Sample size must be very large (otherwise we

won't be able to catch the "non-empty" objects).

Idea 2 Guess starting z = depth(r,S)

Probability to pick object p = 1𝑧

Probability r of depth k to intersect Ri

pz(k) = 1 – (1 - 1𝑧)k

If r has depth z, then Δ = E[yr] = Mpz(z) If yr > Δ then μr > z, otherwise μr ≤ z Perform binary search on [0,n] Problems?

-problems Idea 2

Only the expectation of Yr is Mpz(z) The decision μr < z might be mistaken How can we overcome mistakes and guarantee

(1-ε)μr ≤ αr ≤ μr?

Halfspace complexity status

Emptiness query – logarithmic time Binary search – logarithmic time

(probably) Number of samples M = M(ε,n) -

logarithmic?

The decision procedure

Given parameters z ∈ [0,n] and 12 > ε > 0,

we construct a data structure, such that for any δ,

with 12 > δ ≥ ε, and a query range r, we can decide

with high probability, whether μr < z or μr ≥ z. The data structure is allowed to make a mistake if

μr ∈ [(1 - δ)z, (1 + δ)z]

(1-δ)z (1+δ)z z

?< >

Why δ? Trade off between query time and accuracy Large δ when binary search range is large Small δ when the range is small

(1-δ)z (1+δ)z z

<

The data structure Let R1,…,Rm be M independent random samples of

S, formed by picking every element with probability 1/z, where M(ε) = ⌈c3ε-2logn⌉, and c3 is a sufficiently large absolute constant.

Build M separate emptiness-query data structures D1,…,DM, respectively, and put D = D(z,ε) := {D1,…,DM}

Answering a query Xi = 1 if r intersects any object of Ri, Xi = 0 otherwise

for i = 1…M(δ)

Yr = σ Xi i

Probability r of depth k to intersect Ri

pz(k) = 1 – (1 - 1z)k

If r has depth z, then Δ = E[yr] = Mpz(z) Our data structure returns "depth(r,S) < z" if Yr < Δ,

and "depth(r,S) ≥ z" otherwise

Lemma 1

Pr[Yr > Δ | depth(r,S) ≤ (1-δ)z] does not exceed n-c4, where c4 = c4(c3) > 0 depends only on c3 and can be made arbitrarily large by a choice of sufficiently large c3 > 0.

Reminder - Chernoff bound Let X1,…Xn be n independent Bernoulli trials

Pr[Xi = 1] = pi, Pr[Xi = 0] = qi = 1 – pi

X = σ 𝑋𝑛𝑖=1 i

μ = E[x]

For any δ > 0

o Pr[X ≥ (1+δ)μ] ≤ ( 𝑒𝛿ሺ1+𝛿ሻ1+𝛿)μ

o Pr[X < (1-δ)μ] ≤ exp(-μδ2/2)

For any δ ≤ 2e – 1

Pr[X ≥ (1+δ)μ] ≤ exp(-μδ2/4)

Lemma 1 - proofObservation 1: for 0 ≤ x ≤ y < 1, 1−x1−y = 1 + y−x1−y ≥ 1 + y – x.

The probability α is maximized when depth(r,S) = (1-α)z. Therefore: α ≤ Pr[Yr > Δ| depth(r,S) = (1-ε)z].

Pr[xi] = pz((1-ε)z).

E[Yr] = Mpz((1-ε)z) = M[1-(1-1𝑧)(1-ε)z] ≥ M(1-e-(1-ε)) ≥ 𝑀3

Since (1-1𝑧)z ≤ e-1, and ε ≤ 12.

According to observation 1, for y = (1-1𝑧)(1-ε)z, x = (1-1𝑧)z

ξ = Δ𝐸[𝑌𝑟] = 𝑀[1−ቀ1−1𝑧ቁ𝑧]

𝑀[1−ቀ1−1𝑧ቁሺ1−𝜀ሻ𝑧] ≥ 1 + (1-1𝑧)(1-ε)z - (1-1𝑧)z = 1 + (1-1𝑧)(1-ε)z[1-(1-1𝑧)εz]

Lemma 1 - proof

ξ = Δ𝐸[𝑌𝑟] = 1 + (1-1𝑧)(1-ε)z[1-(1-1𝑧)εz]

Applying exp(-2x) ≤ 1-x, 1+x ≤ exp(x):

ξ ≥ 1 + exp[-2𝑧(1-ε)z)][1-exp(-1𝑧εz)] ≥ 1 + 1𝑒2[1-exp(-ε)]

≥ 1 + 1𝑒2[1-(1-𝜀2)] ≥ 1 + 𝜀15

Deploying Chernoff inequality:

α = Pr[Yr > Δ] ≤ Pr[Yr > ξE[Yr]] ≤ Pr[Yr > (1+ε/15)E[Yr]]

≤ exp[-E[Yr]14( 𝜀15)2] ≤ exp(-𝑀𝜀2𝑐 ) ≤ exp(𝜀2⌈c3ε−2logn ⌉𝑐 ) ≤ n-c4

Lemma 2 Given a set S of n objects, a parameter 0 < ε < 12 and z∈[0,n],

one can construct a data structure D(z) which, given a range r

and a parameter 12 > δ ≥ ε, returns either LOW or HIGH.

If it returns LOW, then μr ≤ (1+δ)z, and if it returns HIGH, then μr ≥ (1-δ)z.

The data structure might return either answer if μr ∈ [(1-δ)z,(1+δ)z].

The data structure D consist of M = O(ε-2logn) emptiness data structure

Lemma 2 - cont The space needed is O(S(2n/z)ε-2logn) where S(m) is

the space needed for a single emptiness data structure storing m objects.

The time is O(Q(2n/z)δ-2logn) where Q(m) is the time needed for a single query in such a structure storing m objects.

All bounds hold with high probability.

Proof?

Lemma 3 Given the data structure of Lemma 2, z and δ > c5,

one can decide for a query range r if μr < z/(1+δ) or μr ≥ z(1+δ). C5 sufficiently large constant.

The data structure is allowed to return any answer if μr ∈ [z/(1+δ), z(1+δ)].

This requires M = ⌈c6(logn)/lnδ⌉ emptiness queries, and the answer returned is correct with high probability, where c6 is an appropriate absolute constant

Proof?

Range counting data structure "Light" depth values o Build a separate data structure Di = D(vi, εi) of

Lemma 2

o vi = 𝑖2, εi = 18𝑖 o i = 1,2,….,U = O(ε-1)

"Heavy" depth values o Build a separate data structure Dj = D(vj, εj) of

Lemma 2 o vj = (U/4)(1+ε/16)j, εj = ε/16 o j = U+1,U+2….W,

where W := clog1+ε/16n = O(ε-1logn), Wn = n

Range counting data structurelog1+𝜀/16 𝑛 = 𝑙𝑜𝑔𝑛log (1+ 𝜀16) For 0 < ε ≤ 1 50log(1+ε/16) ≥ ε

Answering a query Given a range query r, each data structure in our list returns

LOW or HIGH. If we were to query all the data structures, we would get a

sequence of HIGHs, followed by a sequence of LOWs. The value associated with the last data structure returning

HIGH (rounded to the nearest integer) yields the required approximation.

We can use binary search on D1…DW to locate this changeover value using a total of O(logW) = O(log(ε-1logn)) queries.

Overall time is O(Q(n)ε-2(logn)log(ε-1logn)).

D1 DW αr

“Light” depthKey observation: error ≤ Uε = cε-1ε = c. Small error range.

Example: μr = 3, ε = 0.1, error = 0.3. αr must be equal to μr!

Assume μr = x

D2x might return HIGH or LOW

v2x - v2x-1 = 12 > 116 = 2𝑥−12 18(2𝑥− 1) = v2x-1ε2x-1

D2x-1 must return HIGH

v2x+1 - v2x = 12 > 116 = 2𝑥+12 18(2𝑥+ 1) = v2x+1ε2x+1

D2x+1 must return LOW

D2x-1 D2x D2x+1

?

“ Light” depth – option 1

D2x = H

αr = v2x = x = μr

D2x-1 D2x D2x+1

“ Light” depth – option 2D2x = L

αr = ⌈v2x-1⌉ = ⌈x - 12⌉ = x = μr

D2x-1 D2x D2x+1

“Heavy” depthvj+1 = (1+ 𝜀16)vj, εj = 𝜀16

Observation: vj+1 – vj = vjεj

Assume μr = vj

Dj-2 must return HIGH

vj+2 - vj = vj𝜀16 + vj+1

𝜀16 = vj𝜀16[1 + (1+ 𝜀16)] > 2vj

𝜀16

> vj𝜀16(1+ 𝜀16)2 = vj+2εj+2

Dj+2 must return LOW

Dj-2 Dj+2 Dj

?

Dj-1

?

Dj+1

?

“Heavy” depthDj+1 ≥ αr ≥ Dj-2 If we choose the DS previous to the changeover:

μr = Dj ≥ αr ≥ Dj-3 ≥ vj(1-3 𝜀16 ) ≥ vj(1-ε)

Dj-2 Dj+2 Dj

?

Dj-1

?

Dj+1

?

Improved data structure Treat D1,…,DW as a linked list LM,

M = ⌈logW⌉ = O(log(ε-1logn). Build a data structure where Li-1 is formed from Li

by picking every other element Base list L1 has 4-8 elements

L1

L2

L3

L4

Answering a query Search top-down starting from L1 At the ith stage maintain pointers to four consecutive DS

in Li, such that left two return HIGH and the right two return LOW.

The corresponding portion of Li+1 delimited by these two DS in Li is a sub list of at most seven DS in Li+1.

We query at most three new DS to maintain the sub list. Key observation: at each level we use δi as large as possible,

such that the error intervals of all these data structure are disjoint

Query time: O(ε-2Q(n)logn)

Answering a query

L1

L2

L3

L4

Answering a query

L1

L2

L3

L4

αr

Answering a query δ1 = n1/4 During coarse search o O(loglogn) levels o δi = O(ξ𝛿i-1)

During refine search o O(log(ε-1)) levels o δ'j = 1/2j

Coarse searchAfter O(log(ε-1)) levels all "light" depth DS disappear

In the ith level:

vj(1+δi) < vj+1/(1+δi)

(1+δi)2 < vj+1/vj = Ci = (1+ε/16)2^i

1 + δi = ξ𝐶i

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Coarse searchAs long as δi > c5, ξ𝐶i ≫ 1. Therefore:

δi ≈ ξ𝐶i

During the first level we only 4 elements. Therefore C1 = n1/2, δ1 = n1/4.

Since Ci-1 = Ci2 and δi ≈ ξ𝐶i, δi = O(ξ𝛿i-1)

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Coarse searchC1 = n1/2, C2 = n1/4, C3 = n1/8, C4 = n1/16….

δ1 = n1/4, δ2 = n1/8, δ3 = n1/16, δ4 = n1/32…

loglogn level since n1/4 = 2logn/4 is constant after O(loglogn) sqrt operation

Vj Vj(1+ε/16)

Vj

Vj(1+ε/16)2

Vj(1+ε/16)2

Refine search – “Light” depthδ(vj+2d) + δ(vj+d) < d δ(vj+d) + δvj< d Therefore: 2δ(vj+2d) + 2δvj = δ(vj+2d) + δ(vj+d) + δ(vj+d) + δvj < 2d

δi-1 = 2δi

Vj Vj + d

Vj

Vj + 2d

d d

2d

Vj + 2d

Refine search – “Heavy” depth

Observation 2: for k ≤ logε, (1+ε/16)2^k ≈ (1+2kε/16).

proof: (1+ε/16)2^k = 1 + σ ቀ2𝑘𝑖 ቁ2𝑘𝑖=1 (ε/16)i

ቀ2𝑘𝑖+1ቁ(ε/16)i+1 = ቀ2𝑘𝑖 ቁ(ε/16)i 𝜀(2𝑘−𝑖)16(𝑖+1) ≤ 116ቀ2𝑘𝑖 ቁ(ε/16)i

Refine search – “Heavy” depthδvj(1+ε/16)2^k + δvj(1+ε/16)2^(k-1) < d2

δvj(1+ε/16)2^(k-1) + δvj < d1.

k ≤ log(ε-1), Applying observation 2:

2δvj(1+ε/16)2^k + 2δvj ≈ δvj(1+ε/16)2^k + δvj(1+2kε/16) + 2δvj

= δvj(1+ε/16)2^k + 2δvj(1+2k-1ε/16) + δvj ≈ δvj(1+ε/16)2^k + 2δvj(1+ε/16)2^(k-1) + δvj < d1 + d2 δi-1 = 2δi

Vj

Vj

Vj(1+ε/16)2^(k-1) Vj(1+ε/16)2^k

Vj(1+ε/16)2^k

d1 d2

d1 + d2

Complexity analysis – coarse search

During the coarse search δi = O(ξ𝛿i), Lemma 3 bound number of emptiness queries. Therefore: σ 𝑂𝑖,δi>c5 ( 𝑙𝑜𝑔𝑛𝑙𝑜𝑔𝛿𝑖)

= 𝑙𝑜𝑔𝑛𝑛14 + 𝑙𝑜𝑔𝑛𝑛18 + 𝑙𝑜𝑔𝑛𝑛 116 + …..+ 𝑙𝑜𝑔𝑛𝑛𝑂(𝑙𝑜𝑔𝑙𝑜𝑔𝑛 )

= 𝑙𝑜𝑔𝑛2𝑙𝑜𝑔𝑛4 + 𝑙𝑜𝑔𝑛

2𝑙𝑜𝑔𝑛8 + 𝑙𝑜𝑔𝑛2𝑙𝑜𝑔𝑛16 + …..+ 𝑙𝑜𝑔𝑛2𝑂(1)

= O(logn)

Complexity analysis – refine search

During the refine search δ'j = 1/2j, Lemma 1

bound number of emptiness queries. Therefore:

= σ 𝑂𝑂(log 1𝜀)𝑗=2 (𝑙𝑜𝑔𝑛δ′ j2 ) = σ 𝑂𝑂(log 1𝜀)𝑗=2 (22jlogn)

= O(ε-2logn).

Summary Query time: O(ε-2Q(n)logn) Space: O(S(n)ε-3log2n) Construction time: O(T(n)ε-3log2n) In some cases we can reduce space and

construction time requirements. Intuition: o S(2n/z), T(2n/z) o Degree λ: S(n/i) = O(S(n)/iλ)

Applications - Halfplane

Emptiness queries can be answered in logarithmic time when d = 2,3.

S(n) = O(n), T(n) = O(nlogn), Q(n) = O(logn). Approximating counting queries o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Applications - Disks Using the standard lifting of points in R2 to the

paraboloid in R3 (maps balls in Rd to hyperplanes in Rd+1 and points in Rd to points on the standard paraboloid in Rd+1).

Disk range query in the plane reduces to a halfspace range query in three dimensions.

Similar results for disks range counting o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Applications – Depth queriesPseudo-disks: A set of objects is a collection of pseudo-disks, if the boundary of every pair of them intersects at most twice.

Applications – Depth queries Computing the union of n pseudo-disk in the plane and

preprocessing the union for point-location queries Emptiness queries can be answered in logarithmic time. S(n) = O(n), T(n) = O(nlogn), Q(n) = O(logn). Approximating counting queries o Query time: O(ε-2log2n) o Space: O(nε-2logn) o Construction time: O(nε-2log2n)

Relative approximation via sampling

Approximate depth(r,S) for query point r. Use single sample. Sample each object with probability p in a

random sample R. If r had sufficiently large depth, then its depth

can be estimated reliably by depth(r,R)/p. The deeper r is the better this estimate is.

Lemma 4 – reliable sampling Let S be a set of n objects, 0 < ε < 12,

and let r a point of depth u ≥ k in S. Let R be a random sample of S, such that every

element is picked with probability p = 8𝑘𝜀2ln1𝛿.

Let X be the depth of r in R (accurate depth). X/p lies in the interval [(1-ε)u ,(1+ε)u]. This estimate succeeds with

probability ≥ 1 – δu/k ≥ 1 – δ.

Lemma 4 – example

U = 10000, k = 5000, ε = 0.2, δ = 0.001.

p = 85000∗0.22ln 10.001 = 0.276

x/p ∈ [8000,12000]. Pr[success] ≥ 1 – δ = 0.999

Lemma 4 – proofμ = E[X] = pu. By Chernoff's inequality: Pr[X ∉ [(1 - ϵ)μ,(1 + ϵ)μ] = Pr[X < (1 - ϵ)μ] + Pr[X > (1 + ϵ)μ] ≤ exp(-puε2/2) + exp(-puε2/4)

≤ exp(-4𝑢𝑘ln1𝛿) + exp(-2𝑢𝑘ln1𝛿) ≤ δu/k

Since u ≥ k

Lemma 4 – conclusions If depth(r,S) is (say) u ≤ 10k, then

depth(r,R) ≤ (1+ε)pu = O( 1𝜀2ln1𝛿).

This is (relatively) small number. Therefore via sampling, we turned the task of

estimating depth of heavy range to estimating shallow range.

We can perform a binary search for the depth of r by a sequence of coarser to finer samples.

Approximating the Depth via Sampling and Emptiness

Documents

Transcript of Approximating the Depth via Sampling and Emptiness