Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network

21
Quasar A Probabilistic Publish- Subscribe System for Social Networks over P2P Kademlia network David Arinzon Supervisor: Gil Einziger April 2012 1

description

Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network. David Arinzon Supervisor: Gil Einziger April 2012. Quasar. Quasar is a “Publish-Subscribe” mechanism, which bases its routing mechanism on the usage of Bloom Filters . Bloom Filter. - PowerPoint PPT Presentation

Transcript of Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network

Page 1: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

1

QuasarA Probabilistic Publish-Subscribe System for

Social Networks over P2P Kademlia network

David ArinzonSupervisor: Gil EinzigerApril 2012

Page 2: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

2

Quasar

• Quasar is a “Publish-Subscribe” mechanism, which bases its routing mechanism on the usage of Bloom Filters.

Page 3: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

3

Bloom Filter• A bloom filter is “is a space-efficient probabilistic data structure

that is used to test whether an element is a member of a set”. (Wikipedia entry based on Donald Knuth’s “The art of computer programming”, 1970)

• In this structure, false positives are possible, but false negatives are not.

• When an element is added, its value is sent to k hash functions which will produce k array positions in the bitmap (They’re set to 1).

• Upon querying for an element, the same process is applied, and if at least one of the given bits is 0, the element is not in the structure.

Page 4: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

4

Bloom Filters in Quasar• An entry for a Bloom Filter is an ID for a subscription. (In our case,

the publisher ID).• Each node contains an “enhanced” routing table.• The radius defines “how much each node knows about the

surrounding subscription interests”.• For radius of k, each node contains a set of k attenuated bloom

filters for each level of “closeness” (0 – k-1).– The bloom filter on level n will contain subscription information about nodes

existing n+1 hops away.– The location of the information is saved in the attenuated bloom filters of

the relevant immediate neighbor.– Along with the information is a set of nodes achievable by using this

particular entry.

Page 5: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

5

• For the radius of 2, nodes 2 and 3 are considered immediate neighbors (1 hop), 4 and 5 are considered level 2 neighbors (2 hops), and 6 and 7, are of the “recognition radius”

Bloom Filters in Quasar (example)

1 324 5 76

Page 6: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

6

Level Neighbor Bloom Filter Reachable nodes

0 2 2, 43 3, 5

• Node 4 subscription information will be set in the bloom filter of node 2, but on the 1st level. Same for Node 5, but in the bloom filter of node 3.

Bloom Filters in Quasar (example)

1 324 5 76

Level Neighbor Bloom Filter Reachable nodes

0 2 23 3

Page 7: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

7

• Each node periodically (Depends on whether the network status is static or not) sends it subscription list to its neighbors, which propagate it further, depends on the allowed TTL.

• A node updates its proper routing table entry (Attenuated bloom filter) according to the information, and the direction (who’s the original sender node, and from which immediate neighbor it has been received).

Subscription mechanism

Page 8: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

8

• During our simulations we realized that the mechanism described above is very consuming in terms of time and traffic. Therefore, a different mechanism has been used in order to achieve the same goal.– Based on the fact that the simulation was executed on UDP over

Kademlia-based KeyBasedRouting network, each node can reach another node regardless of the radius defined for Quasar.

– Alternatively to the Quasar subscription mechanism, two steps were applied. In the first, each node requests information from each radius level, about its neighbors. After each node builds a picture of its radius neighborhood, it propagates its own subscription information to each of them directly.

Subscription mechanism(Alternative)

Page 9: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

9

• When a node decides to publish a topic (A.K.A publisher node), it replicates a message alpha times, and sends it to a random set of neighbors.

• When a node receives a publication, it can act in multiple ways– If it is the publisher (Message routed back), it acts as a “middle” node,

and routes it randomly.– If the node is subscribed to the topic, it renews the TTL, and sends it

again to alpha random neighbors (as if it published it).– Otherwise, the node searches the first routing table entry (level by

level) which contains this subscription in the bloom filters, and routes it accordingly.

Publication mechanism

Page 10: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

10

• The publication methodology may cause problems, which may prevent a message from “leaving a gravity well”, a case in which nodes within a small radius from the publishing node are subscribed to it, and route it between one another.

• A set of methods have been applied (Negative information)– Each message contains information about the “already received

subscribers”. To complete that, each node stores information about the publications it already received.

– When routing, if a candidate entry is found (publication ID exists in the bloom filter), the entry won’t be used in one of the “received subscribers” are in the list of the reachable nodes.

– A subscriber which receives a publication more than once, routes it randomly without duplicating it.

Publication mechanism (Continued)

Page 11: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

11

• As mentioned before, the simulation was executed over the Kademlia-based KeyBasedRouting network, developed in the CS faculty.

• The main focus of the comparison was the behavior of the Attenuated bloom filters when routing publications.

• As a competitor, it has been decided that instead of using the routing table, messages will be propagated to a random neighbor.

Simulation, scenarios and comparison

Page 12: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

12

• Three scenarios were tested– Scenario 1

Each node is randomly assigned ten subscriptions. Afterwards, each of the nodes in its turn publishes once.

– Scenario 2A subset of publishers (10% of all the nodes) is selected from all the nodes (The also act as subscribes, but not to their own publications). Each node is randomly assigned a set of publications (A random number between 1 and half the number of publishers). Afterwards, periodically, each period of time (5 seconds), three publicators are chosen randomly in order to publish.

– Scenario 3A publisher node is chosen in random. 10% of all the nodes are chosen to be subscribers of that publisher. Afterwards, the publisher node publishes once.

Simulation, scenarios and comparison

Page 13: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

13

4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scenario 1 - Delivery ratio250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n de

liver

y ra

tio

Page 14: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

14

4 610000

15000

20000

25000

30000

35000

40000

45000

50000

55000

60000

65000

70000

Scenario 1 - Traffic250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n m

essa

ges t

raffi

c

Page 15: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

15

• In this scenario the advantage of the routing table bloom filters applied in Quasar can be observed. By using the routing table, which contains information about the surrounding subscriptions, the messages were routed properly, which results in a high “hit rate”. It should be noted, that the high hit-rate provided a much higher traffic rate, because for each “first successful hit”, the message is being duplicated alpha times.

Scenario conclusions

Page 16: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

16

4 60.7

0.75

0.8

0.85

0.9

0.95

1

Scenario 2 - Delivery ratio250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n de

liver

y ra

tio

Page 17: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

17

4 670000

80000

90000

100000

110000

120000

130000

140000

150000

Scenario 2 - Traffic250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n m

essa

ges t

raffi

c

Page 18: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

18

• This scenario is supposed to represent a “general” state of the network, in which there’s a set of publicators, which periodically publish to the entire network. Even though it seems like Quasar reduces the network traffic by a relatively high amount, the hit-rate is considerably low (at least 15% lower). One possible explanation may be the limitation of the bloom filter. One must keep in mind that one of the caveats of the attenuated bloom filters is the false positive entries that may appear. In our case, this can be resulted in false message routing.

Scenario conclusions

Page 19: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

19

4 60.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Scenario 3 - Delivery ratio250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n de

liver

y ra

tio

Page 20: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

20

4 6200

250

300

350

400

450

500

550

600

650

Scenario 3 - Traffic250 stations

Random routingQuasar routing, Radius 2Quasar routing, Radius 3

Publication TTL

Publ

icatio

n m

essa

ges t

raffi

c

Page 21: Quasar A Probabilistic Publish-Subscribe System for Social Networks  over P2P  Kademlia  network

21

• In this scenario, unlike the 1st scenario, the difference is much lower. But, it can be observed again, that by looking on a single publication, the routing policy of Quasar, which is based on information from the neighbors and the attenuated bloom filters, provides a better routing in the publish-subscribe methodology.

• Please note that in the case of the Random routing, there was a very high variance rate, since there were cases in which the delivery rate was 1, as opposed to 0, or 0.5. The Quasar execution provided a much more stable rate.

Scenario conclusions