Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B....
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B....
![Page 1: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/1.jpg)
Herald: Achieving a Global Herald: Achieving a Global Event Notification ServiceEvent Notification Service
Marvin Theimer
Joint work withMichael B. Jones, Helen Wang,
Alec Wolman
Microsoft ResearchRedmond, Washington
![Page 2: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/2.jpg)
What this Talk will be AboutWhat this Talk will be About
• Event notification: more than just simple multicast.
• Problems you run into at large scale.
• Herald event notification service: Design criteria and initial solution strategies.
• Some results on overlay networks.
![Page 3: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/3.jpg)
Event Notification: It’s more Event Notification: It’s more than just Simple Multicastthan just Simple Multicast
• Additional types of functionality:– Reliability/Persistence– QoS– Security– Richer publish/subscribe
semantics
• Herald will be exploring a subset of the potential design space.
![Page 4: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/4.jpg)
Multicast + Multicast + Reliability/PersistenceReliability/Persistence
Unreliable multicast
Reliable, un-ordered multicast
Ordered multicastPersistent, multicast
Multicast with historyPersistent, ordered multicast
Persistent, ordered multicast with history
![Page 5: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/5.jpg)
Multicast + QoSMulticast + QoS
Best-effort multicast
Bounded delivery timemulticast
Simultaneous delivery timemulticast
![Page 6: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/6.jpg)
Multicast + SecurityMulticast + Security
Unsecured multicast
Integrity of messages,Authorized senders
Confidentiality of messages,Authorized receivers
Federated forwarders
Sender issues Receiver issuesForwarder issues
Revocation
![Page 7: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/7.jpg)
Richer Publish/Subscribe Richer Publish/Subscribe SemanticsSemantics
Unfiltered, single event topics
Filtered subscriptionse.g. event.type==exception or max delivery = 1 msg/sec
Subscription patternse.g. machines/*/cpu-load
Topic compositione.g. a | (b&c)
Standing DB queriese.g. if ServerFarm.VirusAlarm then Select machine where CombinedLoad(machine.CpuLoad, machine.NetLoad)>0.9 and Contains(machine.Processes, WebServer)
Broaden subscription Narrow subscription
![Page 8: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/8.jpg)
Global Event Notification Global Event Notification ServicesServices
• Local-node and local-area event notification have well-established user communities.
• Communication via event notification is also well-suited– for Internet-scale distributed applications (e.g.
instant messaging and multi-player games)– for loosely-coupled eCommerce applications
• General event notification systems currently:– scale to tens of thousands of clients– do not have global reach
![Page 9: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/9.jpg)
Internet Scalability is HardInternet Scalability is Hard
• Internet scale implies that individual systems can never have complete information:– Information will be unavailable, stale or even
inaccurate
• Some participants will always be unreachable– Partitions, disconnection, and node downtime will
always be occurring somewhere
• There will (probably) not be a single organization that owns the entire event notification infrastructure. Hence a federated design is required.
![Page 10: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/10.jpg)
Scaling to “Extreme” Event Scaling to “Extreme” Event NotificationNotification
• 1011 computers and embedded systems in the world (soon):– > 1011 event topics– > 1011 publishers & subscribers in
aggregate
• Global audiences:– 1010 subscribers for some event topics
• “Trust no one”:– Potentially 1010 federated security
domains
![Page 11: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/11.jpg)
Some Event Notification Some Event Notification Semantics Don’t Scale WellSemantics Don’t Scale Well
• Fully general subscription semantics
• Simultaneous delivery times
• Highly available, reliable, ordered events from multiple publishers
![Page 12: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/12.jpg)
Global Event Notification is Global Event Notification is Hard to ValidateHard to Validate
• What kinds of work loads?– Instant messaging++ ?– Billions of inter-connected gizmos (BIG)?– eCommerce?
• How do you explore and validate a design that should scale to tens of billions?– Rule of thumb: every order of magnitude
results in qualitatively different system behavior
![Page 13: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/13.jpg)
No Good Means of No Good Means of Exploration/ValidationExploration/Validation
• Build a real system– Most accurate approach – lets you discover the
unexpected.– Current test beds only go up to a few thousand
nodes – but can virtualize to get another factor of 5-10.
• Simulation:– Detailed (non-distributed) simulators don’t scale
beyond a few thousand nodes; plus they only model the effects you’re already aware of.
– Crude (non-distributed) simulators scale to about a million nodes; but they ignore many important factors.
– Distributed simulators: Are accurate ones easier to build or more scalable than a real system??
![Page 14: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/14.jpg)
Herald: Scalable Event Herald: Scalable Event Notification as an Infrastructure Notification as an Infrastructure
ServiceService• The design space of interesting choices is
huge.
• Focus on the scalability of basic message delivery and distributed state management capabilities first (i.e. bottom-up approach)– Employ a very simple message-oriented design.– Try to layer richer event notification semantics
on top.– Explore the trade-off between scalability and
various forms of semantic richness.
![Page 15: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/15.jpg)
Herald Event Notification Herald Event Notification ModelModel
Creator
Publisher Subscriber
1: Create Event Topic
2: Subscribe3: Publish
4: Notify
Event Topic
Herald Service
![Page 16: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/16.jpg)
Design CriteriaDesign Criteria• The “usual” criteria:
– Scalability– Resilience– Self-administration– Timeliness
• Additional criteria:– Heterogeneous federation– Security– Support for disconnected and
partitioned operation
![Page 17: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/17.jpg)
Heterogeneous FederationHeterogeneous Federation
• Federation of machines within cooperating but mutually suspicious domains of trust
• Federated parties may include both small and large domains– Can we support fully peer-to-peer
models?– How will “lumpy” systems, containing
both small and large domains, behave?
![Page 18: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/18.jpg)
SecuritySecurity• What’s the right threat model?
– Trust all nodes within an administrative domain?
– “Trust no one”?
• Dealing with large numbers:– Managing large numbers of access
control groups.– Revocation in the face of high change
volumes.
• How should anonymity and privacy be treated and supported?
![Page 19: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/19.jpg)
Support for Disconnected Support for Disconnected and Partitioned Operationand Partitioned Operation
• Capabilities wanted by some applications:– Eventual delivery to disconnected subscribers.– Event histories to allow a posteriori examination of
bounded regions of the past.– Continued operation on both sides of a network
partition.– Eventual (out-of-order) delivery after partition
healing.
• What’s the cost of supporting disconnected and partitioned operation and who should pay it?– Who maintains event histories?– How do dissemination trees get reconfigured when
partitions and partition healings occur?
![Page 20: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/20.jpg)
Non-GoalsNon-Goals
• Developing the “best” way to do:– Naming– Filtering– Complex subscription queries
• Eventually want to support filtering and complex subscription queries.
• Ideally, Herald should be agnostic to the choices made for these topics.
![Page 21: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/21.jpg)
Initial Solution StrategiesInitial Solution Strategies
• Keep it simple:– Small set of core mechanisms.– Everything else driven with policy
modules.
• Peer-to-peer network of managed servers– Servers are spread around the
“edge” of the Internet.– Leverage smart clients when
possible.
![Page 22: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/22.jpg)
Keep it SimpleKeep it Simple
• We believe we only need these mechanisms:– State replication and transfer– Overlay distribution networks– Time contracts (to age & discard state)– Event histories– (Non-scalable) administrative event
topics
![Page 23: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/23.jpg)
Distributed Systems Distributed Systems AlternativesAlternatives
• Centralized mega-services don’t meet our design criteria.
• Peer-to-peer Internet clients make poor service providers:– Limited bandwidth links.– Intermittently available.– Trust issues.
• Peer-to-peer network of managed servers:– Avoid the pitfalls of using clients as servers.– Can reflect administrative boundaries.– But still want to be able to take advantage of
client capabilities/resources.
![Page 24: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/24.jpg)
Project StatusProject Status• Have focused mostly on peer-to-
peer overlay networks so far:– Scalable topic name service.– Event distribution algorithms:
+Broadcasting over per-topic overlays versus event forwarding trees.
+Pastry versus CAN overlay networks.
• Prototyped using MSR Cambridge network simulator.
![Page 25: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/25.jpg)
Next StepsNext Steps
• Build a full Herald implementation
• Run implementation on test bed of real machines– Use Farsite cluster of ~300 machines– Eventually look for larger, more distributed test
beds.
• Tackle federation & security issues
• Understand behavior under a variety of environmental and workload scenarios
![Page 26: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/26.jpg)
An Evaluation of Scalable Application-Level Multicast Built Using Peer-To-Peer Overlay Networks
Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron, Marvin Theimer,
Helen Wang, Alec Wolman
Microsoft ResearchCambridge, England and
Redmond, Washington
![Page 27: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/27.jpg)
Scalable Application-Level Scalable Application-Level MulticastMulticast
• Peer-to-peer overlay networks such as CAN, Chord, Pastry, and Tapestry can be used to implement scalable application-level multicast.
• Two approaches:– Build multicast tree per group
+Deliver messages by routing over tree
– Build separate overlay network per group+Deliver messages by intelligently flooding
to entire group
![Page 28: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/28.jpg)
Classes of Overlay NetworksClasses of Overlay Networks
• Chord, Pastry and Tapestry all use a form of generalized hypercube routing with longest-prefix address matching.– O(Log(N)) routing state; O(Log(N))
routing hops.
• CAN uses a numerical distance metric to route through a Cartesian hyper-space.– O(D) routing state; O(N1/D) routing hops.
![Page 29: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/29.jpg)
ObservationObservation
• Approach to multicast is independent of overlay network choice.
• Possible to perform a head-to-head comparison of flooding versus tree-based multicast on both styles of overlays.
![Page 30: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/30.jpg)
What Should One Use?What Should One Use?
• Evaluate:– Forwarding tree versus flooding
multicast approaches– On both CAN and Pastry
• On the same simulation platform
• Running the same experiments
![Page 31: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/31.jpg)
Simulation PlatformSimulation Platform
• Packet-level discrete event simulator.
• Counts the number of packets sent over each physical link and assigns a constant delay to each link.
• Does not model queuing delays or packet losses.
• Georgia Tech transit/stub network topology with 5050 core router nodes.
• Overlay nodes attached to the routers via LAN links.
![Page 32: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/32.jpg)
ExperimentsExperiments
• Two sets of experiments:– Single multicast group with all overlay
nodes as members (80,000 total).– 1500 multicast groups with a range of
membership sizes and locality characteristics
+Zipf-like distribution governing membership size.
+Both uniform and Zipf-like distributions governing locality of members.
• Each experiment had 2 phases:– Members subscribe to groups.– One message multicast to each group.
![Page 33: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/33.jpg)
Evaluation CriteriaEvaluation Criteria
• Relative delay penalty– RMD: max delayapp-mcast / max delayip-
mcast
– RAD: avg delayapp-mcast / avg delayip-mcast
• Link stress• Node stress
– Number of routing table entries– Number of forwarding tree table
entries
• Duplicates
![Page 34: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/34.jpg)
RDP for Flooding on CANRDP for Flooding on CAN
0
2
4
6
8
19 29 38 59 111
Routing table size
Re
lative
De
lay
Pe
na
lty
RMD RAD
![Page 35: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/35.jpg)
RDP for Trees on CANRDP for Trees on CAN
00.5
11.5
22.5
33.5
19 29 38 59 111Routing table size
Re
lative
De
lay
Pe
na
lty
RMD RAD
![Page 36: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/36.jpg)
Link Stress with CANLink Stress with CAN
State size 29 38 59 111
Join phase
Max 134378 143454 398174 480926
Average 182 217 276 424
Flood phase
Max 1897 1343 958 652
Average 4.13 3.35 3.06 2.94
State size 29 38 59 111
Mcast phase
Max 219 178 188 202
Average 1.52 1.42 1.35 1.31
Flooding
Tree-based
![Page 37: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/37.jpg)
Tree-Per-Group vs. Overlay-Tree-Per-Group vs. Overlay-Per-GroupPer-Group
• Tree-based multicast approach consistently outperforms flooding approach.
• Biggest disadvantage of flooding is cost of constructing a new overlay for each multicast group.
• Results consistent for both Pastry and CAN.
• But the flooding approach doesn’t require trusting third-party intermediary nodes.
![Page 38: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/38.jpg)
CAN vs. Pastry for Tree-CAN vs. Pastry for Tree-Based MulticastBased Multicast
• Equivalent per-node routing table state, with representative tuned settings:– RDP values are 20% to 50% better with Pastry
than with CAN– Comparable average link stress values; but
max. link stress was twice as high for Pastry as for CAN
– Max. number of forwarding tree entries was about three times as high for Pastry as for CAN.
• Pastry can be “de-tuned” to provide comparable RDP values to CAN in exchange for comparable costs.
• Conclusion: Pastry can optionally provide higher performance than CAN but at a higher cost.
![Page 39: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/39.jpg)
Routing Choice Routing Choice ObservationsObservations
• Pastry employs a single mechanism to obtain good routing choices: greedy selection of routing table alternatives based on network proximity– Easier to implement– Easier to tune
• CAN employs a multitude of different mechanisms:– Multiple dimensions– Multiple realities– Greedy routing based on network proximity– Overloaded zones– Uniformly distributed zones– Topologically aware assignment of zone addresses
(most useful)• CAN is more difficult to implement and adjust for
alternate uses (such as multicast)• CAN is harder to tune
![Page 40: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/40.jpg)
Programming Model Programming Model ObservationsObservations
• Pastry model simple:– Unicast delivery to node with nearest
ID in ID space– Can send to a specific node with a
known ID
• CAN model more complex:– Anycast delivery to sets of nodes
inhabiting same regions of hypercube or hyper cubes
– Requires application-level coordination
![Page 41: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/41.jpg)
Topologically Aware Topologically Aware Address AssignmentAddress Assignment
• Topological assignment– Choosing node IDs or regions based on
topology• Two methods:
– Landmark-based (CAN only)– Transit/stub aware clustering
• Hard to get right for CAN• Helps CAN a lot, both for unicast and
multicast• Helps Pastry-based flooding; hurts Pastry-
based forwarding trees
![Page 42: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/42.jpg)
Summary and ConclusionsSummary and Conclusions
• Embedded forwarding trees are preferable to flooding of mini-overlay networks.
• Pastry is capable of lower delays than CAN, but may incur higher maximal costs; Pastry can be tuned to offer equivalent delays to CAN at equivalent costs.
• Topologically aware assignment:– Most important optimization for CAN.– Mixed results for Pastry.
• CAN:– More complex programming model.– Difficult to tune.
![Page 43: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/43.jpg)
Future WorkFuture Work
• Behavior with respect to fault tolerance.
• Better understanding of topologically aware address assignment.
![Page 44: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/44.jpg)
Related WorkRelated Work• Non-global event notification
systems– Gryphon, Ready, Siena, …
• Netnews• P2P systems
– Gnutella, Farsite, …
• Overlay & multicast networks– CAN, Chord, Pastry, Tapestry, Scribe,
…
• Content Distribution Networks (CDNs)– Akamai, …
• OceanStore
![Page 45: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/45.jpg)
Some Useful PointersSome Useful Pointers• Talk with me:
– Marvin Theimer - [email protected]– http://research.microsoft.com/~theimer/
• Herald Scalable Event Notification Project– http://research.microsoft.com/sn/Herald/
• Peer-to-peer measurements– A Measurement Study of Peer-to-Peer File Shari
ng Systems (UW)
• Overlay network with dominant control traffic– Resilient Overlay Networks (MIT)
• Scalability limits of ISIS/Horus– “Bimodal Multicast” (Cornell); August 1999
TOCS
![Page 46: Herald: Achieving a Global Event Notification Service Marvin Theimer Joint work with Michael B. Jones, Helen Wang, Alec Wolman Microsoft Research Redmond,](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d6b5503460f94a49c96/html5/thumbnails/46.jpg)
Some Useful Pointers (cont)Some Useful Pointers (cont)• CAN - ICIR (formerly ACIRI)
– A Scalable Content-Addressable Network• Chord - MIT
– Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
• Pastry - MSR Cambridge & Rice– Storage management and caching in PAST, a la
rge-scale, persistent peer-to-peer storage utility
• Tapestry - Berkeley– Tapestry: An Infrastructure for Fault-tolerant Wi
de-area Location and Routing• End-System Multicast - CMU
– A Case For End System Multicast