Beyond Set Disjointness : The Communication Complexity of Finding the Intersection
description
Transcript of Beyond Set Disjointness : The Communication Complexity of Finding the Intersection
Beyond Set Disjointness: The Communication Complexity of Finding the
Intersection
Grigory Yaroslavtsevhttp://grigory.us
Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yao’79]
Alice: Bob:
𝒇 (𝒙 ,𝒚 )=?
Shared randomness
…𝒇 (𝒙 ,𝒚 )
• = min. communication (error ) • min. -round communication (error )
Set Intersection
𝒙=𝑺 ,𝒚=𝑻 , 𝒇 (𝒙 , 𝒚 )=𝑺∩𝑻𝑺⊆ [𝑛 ] ,|𝑆|≤𝒌 𝑻 ⊆ [𝑛 ] ,|𝑇|≤𝒌 = ?
(-Intersection) = ?
is big, n is huge, where huge big
Our results
Let
• (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODC’14]• (-Intersection) = [Saglam-Tardos FOCS’13; Brody, Chakrabarti, Kondapally, Woodruff, Y.’; RANDOM’14]
{
times
(-Intersection) = for
Applications
• Exact Jaccard index (for -approximate use MinHash [Broder’98; Li-Konig’11; Path-Strokel-Woodruff’14])• Rarity, distinct elements, joins,…• Multi-party set intersection (later)• Contrast:
1-round -protocol
𝒉 : [𝒏 ]→[𝒌3]
𝑺 𝑻
𝒉(𝑺) 𝒉(𝑻 )
[𝒏 ] [𝒏 ]
[𝒌3] [𝒌3]
Hashing
log 𝒌
=# of buckets
𝒉 : [𝒏 ]→[𝒌 / log𝒌]
Expected # of elements
Secondary Hashing
= # of hash functions
log 3𝒌 where
2-Round -protocol
log 3𝒌
log 3𝒌
|h𝑖 (𝑺 )|,|h𝑖 (𝑻 )|=𝑂 ( log𝒌 log log𝒌 )
Total communication = = O()
Collisions
𝒌log𝒌
log 3𝒌Pr [𝑐𝑜𝑙𝑙𝑖𝑠𝑖𝑜𝑛 ]=𝑂( 1log𝒌 )
Collisions
log 3𝒌
log 3𝒌
Key fact: If then also =
Collisions
• Second round: – For each bucket send -bit equality check (total -
communication)– Correct intersection computed in buckets where
– Expected # items in incorrect buckets – Use 1-round protocol for incorrect buckets– Total communication
Main protocol
𝑂 (1)
=# of buckets
𝒉 : [𝒏 ]→[𝒌]
Expected # of elements
Verification tree -degree
…i log𝑟 −1𝒌
buckets = leaves of the verification tree
Verification bottom-up
𝑺𝟏❑ ,𝐓𝟏
❑ 𝑺𝟐❑ ,𝐓𝟐
❑
𝑺𝟏❑∪𝑺𝟐 ,𝐓𝟏
❑∪𝑻 𝟐
𝑺𝟏❑∩𝐓𝟏
❑𝑺𝟐❑∩𝐓𝟐
❑
(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿
EQUALITY CHECK
Verification bottom-up
𝑺𝟏❑∩𝐓𝟏
❑𝑺𝟐❑∩𝐓𝟐
❑
(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿
Correct Incorrect
Incorrect
𝑺𝟏❑∩𝐓𝟏
❑𝑺𝟐❑∩𝐓𝟐
❑
(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿
Correct Incorrect
Correct
Verification bottom-up
𝑺𝟏❑∩𝐓𝟏
❑𝑺𝟐❑∩𝐓𝟐
❑
(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿
Correct Incorrect
EQUALITY CHECK FAILS =>RESTART THE SUBTREE
𝑺𝟏❑∩𝐓𝟏
❑𝑺𝟐❑∩𝐓𝟐
❑
(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿
Correct Incorrect
Correct
Verification bottom-up
𝒑𝒓 −𝟐
…𝒑𝟏
𝑺𝟏𝟏 ,𝐓𝟏
𝟏 … 𝑺𝒊𝟏 ,𝐓 𝐢
𝟏𝑺𝟐𝟏 ,𝐓𝟐
𝟏 𝑺𝒌𝟏 ,𝐓𝒌
𝟏…
𝒑𝒓 −𝟏
Analysis of Stage
• = [node at stage computed correctly]• Set = – Run equality checks and basic intersection
protocols with success probability – Key lemma: [# of restarts per leaf => Cost of
Intersection in leafs = – Cost of Equality =
• [protocol succeeds] =
Multi-party extensions
players: , where
• Boost error probability of 2-player protocol to • Average per player (using coordinator):
in rounds• Worst-case per player (using a tournament)
in rounds
Open Problems
• (-Intersection) = ?• Better protocols for the multi-party setting?
-Disjointness• , iff • [Razborov’92; Hastad-Wigderson’96] • [Folklore + Dasgupta, Kumar, Sivakumar; Buhrman’12, Garcia-Soriano, Matsliah, De Wolf’12]
• [Saglam, Tardos’13]• [Braverman, Garg, Pankratov, Weinstein’13]