Distributed Top-K Monitoring
description
Transcript of Distributed Top-K Monitoring
1
Distributed Top-K Monitoring
Brian Babcock & Chris Olston
Presented by Yuval Altman
To be presented at ACM SIGMOD 2003 International Conference on Management of Data
2
The problem
Continuously report the k largest values obtained from
distributed data streams.
3
Motivation -
Google is the most popular search engine in the world.
Servers in multiple sites in the world handle millions of queries an hour.
What are the top 20 search terms?
4
The problem
Continuously report the k largest values obtained from distributed data streams.
Multiple sources - physically far away Communication is expensive. Inefficient to transmit large amounts of data
Streaming model Values change over time
Approximation may be sufficient
5
Motivation – Detecting DDos attacks
6
Formal problem definition
m+1 nodes: Monitor nodes: N1, N2 , … , Nm
Coordinator node: N0
Set of n data objects U = {O1, O2 , … , On} i.e. Search terms, IP addresses
Objects are associated with real values V1, V2 , … , Vn i.e. # of requests DNS queries to IP address in
last 15 minutes
7
Distributed streaming model
Updates to values through a sequence of < Oi , Nj , > touples where: Nj detects a change in the value Vi of Oi. Change is not seen by other nodes Nk
(ki)
For each node j, Define Partial values V1,j, V2,j,…, Vn,j: Vi,j
= < Oi , Nj , > ()
The value Vi for an object Oi: Vi= j (Vi,j)
8
Model example
< O1 , N1 , 2>< O2 , N1 , 3>< O4 , N1 , 4>< O3 , N1 , 2>< O1 , N1 , 1> N1 N2 N3
U = {O1, O2 , O3 , O4}
< O2 , N2 , 3>< O4 , N2 , 5>< O4 , N2 , -2>< O3 , N2 , 4>< O3 , N2 , 5>
< O2 , N3 , -1>< O3 , N3 , 4>< O2 , N3 , 2>< O3 , N3 , 3>< O2 , N3 , 5>
V1,2 = 0V2,2 = 3V3,2 = 9V4,2 = 3
V1,1 = 3V2,1 = 3V3,1 = 2V4,1 = 4
V1,3 = 0V2,3 = 6V3,3 = 7V4,3 = 0
V1=3 , V2=12 , V3=18 , V4=7
9
Using the model
Top-k IP addresses in the last 15 minutes: <IPAddr,Router,1> when receiving a request
for an IP address. A cancelling <IPAddr,Router,-1> 15 minutes
afterwards
Can Adopt a different strategy: <IPAddr, Router, 15> when receiving a request. <IPAddr, Router, -1> 15 times on the minute
10
The problem
The coordinator node N0 must report a set TU, |T|=k, that represents the top-k data objects.
Must be the correct within .
Formally. If OtT and OsU-T :
Vt+ VS
Example
=5
1009795 92908887838075
11
Related work
One time distributed top-k calculation Bruno, Gravano, Marian 2002 Fagin, Lotem, Naor 2001
Much better than transmitting all the values to coordinator nodeNot streaming no means to detect changes to data Running algorithm continuously is very expensive
Monitor nodes have limited query capabilities Sorted (GetNext) and random (GetValue)
12
Related work
Streaming top-k monitoring from single source Charikar, Chen, Farach-Colton 2002 Manku, Motwani 2002 Gibbons, Matias 1998
Randomized Algorithms Focus on minimizing space
Reminder: The objective is to minimize communication costs
13
Overview of algorithm
Initialize a top-k set at the coordinator node
Set arithmetic constraints at monitor nodes Depend on current top-k set
Constraints valid No communications
Constraints invalidated Resolution Possibly new top-k set Reallocation of constraints
14
Choosing the constraints
Ideally, data is distributed evenly at monitor nodes, such that the top-k sets are the sameIn this case, the global top-k set matches the local local top-k sets It suffices that local constraints remain valid
N1 (US)
Money=100Sex=98
Health=94Mail=92
N2 (Germany)
Sex=30Money=20
Mail=5Health=3
N3 (Japan)
Money=50Sex=5Mail=4
Health=1
Global List
Money=170Sex=133Mail=101Health=98
15
Adjustment factors
In real life, data is not distributed evenly
N1 (US)
Money=100Health=94Mail=92Sex=90
N2 (Germany)
Sex=30Money=20
Mail=5Health=3
N3 (Japan)
Money=50Health=6
Sex=5Mail=4
Global List
Money=170Sex=125
Health=103Mail=101
Local constraints are invalidated, but global top-k still valid
<N1,Sex,-8> <N3,Health,5>
16
Adjustment factors
For each node Nj and object Oi associate an adjustment factor i,j
Constraints are evaluated after adding the adjustment factors If OtT and OsU-T : Vt,i+ t,i Vs,i + t,i
Adjustment factors for each object sum to zero: This ensures sum remains valid
17
Adjustment factors example
N1 (US)
Money=100Health=94Mail=92Sex=90
N2 (Germany)
Sex=30Money=10
Mail=5Health=3
N3 (Japan)
Money=50Health=6
Sex=5Mail=4
Global List
Money=170Sex=125
Health=103Mail=101
Sex,1=10, Sex,2=-15, Sex,3=5N1 (US)
Money=100Sex=100
Health=94Mail=92
N2 (Germany)
Money=20Sex=15Mail=5
Health=3
N3 (Japan)
Money=50Sex=10
Health=6Mail=4
Global List
Money=170Sex=125
Health=103Mail=101
18
Coordinator adjustment factor
For each object Oj add an adjustment factor j,0 at the coordinator node Factors for each object Oj must still sum to 0
To allow error, if OtT and OsU-T : Give Ot values a “bonus” of Let Vt,0
= Vs,0 = 0
The constraint: t,0+ s,0
19
Allowing error – example
N1 (US)
Money=100Sex=98
Health=94Mail=92
N2 (Germany)
Sex=30Money=20
Mail=5Health=3
N3 (Japan)
Money=50Health=41
Sex=5Mail=4
Global List
Money=170Health=138
Sex=133Mail=101
<N3,Health,40> =5
sex,1=-4, 2,sex,2=-25, sex,3=29
health,2=2, health,3=-7
The trick: Health,0 =5sex,0 + 5 health,0
20
Why do adjustment factors work?
For OtT and OsU-T :
As long as for each node Ni the adjusted constraints and the coordinator constraint are valid: Vt,i+ t,i Vs,i + t,I
t,0+ s,0
We can sum for the nodes and the error constraint and get:Vt+ Vs
21
Algorithm details
Coordinator node No maintains Current approximate Top-k set All adjustment factors i,j
Each monitor node Nj maintains Current approximate top-k set For each object Oi
Partial value: Vi,j
Relevant adjustment factor: i,j
22
Algorithm details
Initialization. Coordinator: Computes the approximate top-k set once. Chooses adjustment factors Sends adjustment factors and top-k set to monitors
Monitor node constraints: For OtT and OsU-T : Vt,j+ t,j Vs,j + t,j
Adjustment factor constraints: For each object Oi: j (i,j) = 0
For objects OtT and OsU-T: t,0+ s,0
23
Algorithm for monitor node Nj
Algorithm for monitor node Nj
While (1) Read tuple < Oi , Nj , >
Vi,j = Vi,j +
Check constraints: For OtT and OsU-T : Vt,j+ t,j Vs,j + t,j
If invalid, initiate resolution.
End
To check constraints: Use two Heaps (or Fibheaps)
24
Resolution – phase 1
First, Nj sends a message to N0
with: F - The set of objects
involved in violated constraints
All partial values for objects in R = FT
The border value Bf - Maximum adjusted value not in the resolution set
N3 (Japan)
Money=50 Mail=10
Sex=5Health=1Love=0
F3 = {Mail, Sex}R3 = {Money,Mail, Sex}
Vmoney,3 = 50Vmail,3 = 10Vsex,3 = 5B3 = 1
25
Resolution – phase 2
The coordinator N0 attempts to resolve the constraints using the *,0 slack
For each violated constraint N0 tests:
Vt,j+ t,j + t,0 + Vs,j + s,j + s,0
If all tests succeed, the top-k set is valid, and there’s no need to communicate with other nodes. No reallocates adjustment factors. Resolution is over
If at least one test fails, proceed to phase 3
26
Phase 2 resolution example
Money=100Sex=98Mail=96
Health=92
Money=35Sex =20Mail=5
Health=3
Money=50Sex=5Mail=4
Health=1
Money=185Sex=123Mail=105Health=96
=5
<N2,Mail,17>
*,* =0
Money=100Sex=98Mail=96
Health=92
Money=35Mail=22Sex =20Health=3
Money=50Sex=5Mail=4
Health=1
Money=185 Sex=123Mail=122Health=96
To fix: sex,0 =-2 sex,2 =2
27
Phase 2 resolution failure
<N2,Sex,5>
Money=100Sex=98Mail=96
Health=92
Money=35 Sex =27
Mail=22Health=3
Money=50Sex=5Mail=4
Health=1
Money=185 Sex=128Mail=122Health=96
sex,0 =-2 sex,2 =2
<N3,Mail,5>
Money=100Sex=98Mail=96
Health=92
Money=35 Sex =27
Mail=22Health=3
Money=50Mail=9Sex=5
Health=1
Money=185 Sex=128Mail=127Health=96
Can’t “loan” 4 from sex,0
28
Resolution – phase 3
The coordinator N0 contacts all the nodes Ni
excluding Nj, requesting: Partial values for objects in R = FT Border values Bi
N0 sums the partial values and sorts them to compute new top-k list T’
N0 reallocates new adjustment factors for T’
N0 sends T’ and adjustment factors to all nodes
29
Resolution – summary
Phase 1 - Nj detects failed constraints and notifies N0. Initiates resolution for R = FT
Phase 2 – N0 attempts to resolve constraints using *,0 – the “bank” If success, reallocate adjustment factors & stop
Phase 3 - N0 requests all updated partial values for R, sorts, computes new top-k list Reallocate adjustment factors
30
Resolution Performance
Means to measure algorithm performanceMessages are usually small Only resolution set R = FT is involved
Two phase resolution Initiation + reallocation Only two messages
Three phase resolution Initiation + Query + reallocation 1 + 2(m-1) + m = 3m –1
31
Adjustment factor reallocation
Input: top-k list T’ Partial values in resolution set R Border values
Output New adjustment factors i,j
Method - For each object: Meet border value constraints Calculate leeway Distribute leeway evenly
Money=50 Mail=10
Sex=5Health=1Love=0
F = {Mail, Sex}R = {Money,Mail, Sex}
Vmoney = 50Vmail = 10Vsex = 5B = 1
32
Leeway computation
For each object in R compute leeway : the slack above the sum of border valuesDefine: Sum of border values: B = j (Bj) Computed values: Vi = j (Vi,j) Vi,0 = 0 ; Bj = max (i,0) where Oi not in R
If Oi T’ : i= Vi – B + Otherwise : i= Vi – B
33
Leeway computation example
N1 (US)
Money=100Sex=98Health=94
Mail=92Love = 85
N2 (Germany)
Sex=30Money=20
Mail=5Love = 5Health=3
N3 (Japan)
Money=50 Mail=10
Sex=5Health=1Love=0
Global List
Money=170Sex=133Mail=107Health=98Love=90
B = 94+5+1 = 100
money = 170 – B = 70
sex = 133 – B = 33
Mail = 107 – B = 7
=0
34
Leeway distribution
Initialization: Meet constraints i,j = Bj - Vi,j
For Oi T’ , j = 0 : i,0 = B0 - Leeway distribution: i,j = i,j + (i / m)
Correctness: Vt,j+ t,j Vs,j + t,j
If Os R: follows from Vt,i, > Bi
If Os R: follows from t,i > s,i
35
Leeway distribution example
N1 (US)
Money=100Sex=98Health=94
Mail=92Love = 85
N2 (Germany)
Sex=30Money=20
Mail=5Love = 5Health=3
N3 (Japan)
Money=50 Mail=10
Sex=5Health=1Love=0
Global List
Money=170Sex=133Mail=107Health=98Love=90
sex = 33
sex,1 = B1 – Vsex,1 + 33/3 = 94 – 98 + 11 = 7
sex,2 = B2 – Vsex,2 + 33/3 = 5 – 30 + 11 = -14
sex,3 = B3 – Vsex,3 + 33/3 = 1 – 5 + 11 = 7
36
Leeway distribution example
money = 70
money,1 = B1 – Vmoney,1 + 70/3 = 94 – 100 + 24 = 18
money,2 = B2 – Vmoney,2 + 70/3 = 5 – 20 + 23 = 8
money,3 = B3 – Vmoney,3 + 70/3 = 1 – 50 + 23 = -26
mail = 7
mail,1 = B1 – Vmail,1 + 7/3 = 94 – 92 + 3 = 5
mail,2 = B2 – Vmail,2 + 7/3 = 5 – 5 + 2 = 2
mail,3 = B3 – Vmail,3 + 7/3 = 1 – 10 + 2 = -7
37
Reallocation Results
N1 (US)
Money=100Sex=98Health=94
Mail=92Love = 85
N2 (Germany)
Sex=30Money=20
Mail=5Love = 5Health=3
N3 (Japan)
Money=50 Mail=10
Sex=5Health=1Love=0
Global List
Money=170Sex=133Mail=107Health=98Love=90
N1 (US)
Money=118Sex=105Mail=97Health=94Love = 85
N2 (Germany)
Money=28Sex=16Mail=7Love = 5Health=3
N3 (Japan)
Money=24 Sex=12 Mail=3 Health=1Love=0
Global List
Money=170Sex=133Mail=107Health=98Love=90
38
Leeway distribution to N0
Leeway also distributed to monitor node added to leeway computation for Ot T’ Initialization for t,0 for Ot T’ is B0 - Any addition can be “loaned” to monitor nodes
Amount distributed to N0
Higher (i / 2) – Less chance for phase 3 in resolution
Lower (0) – Less resolutions (More leeway to monitor nodes)
39
Proportional leeway distribution
Allocate more leeway to monitor nodes updated more often
Top-k likely to change more
Good for monitor notes that exhibit characteristic behavior Google locations Enterprise routers
40
Experiments
Query 1: FIFA ’98 Servers at 4 locations throughout the world. 20 top Web site page hit statistics
Query 2: Most loaded server in a cluster Single value per monitor node
Query 3: Berkly to world WAN link, with 4 monitor points 20 top destination hosts by number outgoing tcp
packets
41
Results – Query 1
42
Results – Query 2
43
Results – query 3
44
Analysis of results
Allowing error improves results dramatically
Leeway for N0 – Dominant factor Low – Half leeway to N0
Low little leeway Resolutions are bound to happen. Make them less
expensive
High – No leeway to N0
45
Analysis of results
Even / Proportional leeway distribution depends on query. Server load – Proportional Berkly WAN – Monitor nodes simulated, so
even distribution better FIFA – Proportional for lower . Even for
higher .
46
Comparison to alternative
Caching Coordinator holds cached partial data values Monitor must send update to coordinator when
partial value deviates by /2m
Monitor will always have correct partial values, within /2
Top-k list always correct within
47
Results:
Note the
log scale!
48
Summary
Problem – find top-k set within error Distributed – multiple sources Streaming – frequent updates
Naive approach Transmit streams to coordinator node If error is allowed, transmit only when deviation from
cached value threatens correctness
New approach offers dramatic improvement over naïve approach for low-medium .
49
Summary
Use adjustment factors to establish constraintsMonitor node initiates resolution when constraint gets brokenResolution Attempt to use coordinator node leeway. If successful,
fix constraints by adjustment factor reallocation. Get partial values for resolution set from all nodes,
compute new top-k set. Reallocate leeway to all nodes.
Reallocation Distribute leeway evenly between monitor nodes Distribute leeway for monitor on on low
50
Questions?