Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of...
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20081
Principles of Reliable Distributed Systems
Tutorial 4: SkipNet
Spring 2008
Alex Shraer
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20082
Reading Material
• SkipNet: A Scalable Overlay Network with Practical Locality PropertiesHarvey, Jones, Saroiu, Theimer, WolmanMicrosoft Research
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20083
Reminder: DHT Advantages
• Peer-to-peer: no centralized control or infrastructure
• Scalability: O(log N) routing, routing tables, join time
• Load-balancing
• Overlay robustness
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20084
DHT Disadvantages: SkipNet Motivation
• No control where data is stored– Data may be stored far from its users– Data may be stored outside its administrative domain
• hard to administer privileges• invites different security attacks
– Local accesses leave local organization• In practice, organizations want:
– Content Locality – explicitly place data where we want (inside the organization)
– Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local
• No prefix search– Search(key) returns file whose name has key as prefix
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20085
Practical Requirements
• Data Controllability:– Organizations want control over their own data– Even if local data is globally available
• Manageability:– Data control allows for data administration,
provisioning and manageability
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20086
Practical Requirements (cont’d)
• Security:– Content and path locality are key building blocks for
dealing with certain external attacks (DoS, Traffic analysis)
• Data availability– Local data survives network partitions.
• Performance– Data can be stored near clients that use it
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20087
SkipNet Content Locality
• Place files at nodes according to names
• Name ID space (DNS-like)– for files and nodes– node name = reverse DNS name of the host
(com.microsoft.host1)– file names have same prefix
• Problem?
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20088
Constrained Load-Balancing
• Data uniformly distributed in designated subset of nodes – e.g., inside organization
• How can this be achieved?• Numeric ID space!
– similar to Chord, Pastry and others– nodes are randomly distributed– Hashes of the node names and content identifiers mapped into the
numeric ID.– Content is stored on the node with id closest to content’s hashed
name.
• Key property of SkipNet: two address spaces
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 20089
Skip Lists - Reminder• In-memory dictionary data structure.
– Sorted linked list with a subset of nodes having additional links to skip over many list elements
• Perfect (deterministic) skip list:
– Pointer at level h skips over 2h elements– Search: O (log N), N – number of nodes in the list.– Insertion/deletion: expensive/awkward
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200810
Skip Lists - Reminder
• Probabilistic skip list:
– Node at level h with probability 1/2h
– Search, Insert, Delete: O (log N) w.h.p.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200811
Skip List: Good for Us?
• The Good: – Sorted list: path locality for name-based search– O(log N) search with skip pointers– Up to log(N) skip pointers: O(log N) instertion
• The Bad:– Lookup starts from root only– Unequal load
• nodes on the top levels have high chance to be in routing path
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200812
SkipNet Global ViewRing000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A
D M O
T
Z X V
A
M
T
X
D O
Z V
A T
M
X
O
Z
D
V
A T
M
X Z
O D
V
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root Ring Level L = 0
L = 1
L = 2
L = 3
The full SkipNet routing infrastructure for an 8 node system, including the ring labels.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200813
SkipNet Structure
• Skip Graph = Distributed Skip List– Every node belongs to rings at all levels – Search can start at any node– Use doubly linked lists at each level to account for absence of head and
tail nodes.• Perfect vs. Probabilistic
– Perfect : Pointers at level h point to nodes that are exactly 2h nodes to the left and right.
– Probabilistic : A node in level h probabilistically determines which ring it belongs to.
• All rings are sorted according to Name IDs• Ring membership is according to Numeric IDs
– All nodes sharing the same prefix of Numeric IDs of length h are members of the same ring at level h
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200814
SkipNet Routing Tables
Level: L = 0
L = 1
L = 2Ring 00 Ring 01 Ring 10 Ring 11
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A Root RingD M O
TVXZ
Ring 0A
M
T
X
Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV
Z
OL = 3
Node A’s Routing
Table
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200815
An Alternative View
Level
2 T T
1 M X
0 D Z
SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown.
A
DM
O
T
ZX
V
Level
2 D D
1 Z O
0 X T
000 001
010
011100
101110
111
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200816
Routing By Name ID
• Routing in Skip Graph = Search in Skip Lists • Simple Rule:
– Forward the message to node that is closest to destination, without going too far.
• Route either clockwise/counterclockwise• Terminates when messages arrives at a node
whose name ID is closest to destination. • Number of hops is O(log N) w.h.p.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200817
Example: Routing from A to V
Level: L = 0
L = 1
L = 2Ring 00 Ring 01 Ring 10 Ring 11
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A Root RingD M O
TVXZ
Ring 0A
M
T
X
Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200818
Example: Routing from A to V
Level: L = 0
L = 1
L = 2Ring 00 Ring 01 Ring 10 Ring 11
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A Root RingD M O
TVXZ
Ring 0A
M
T
X
Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Node T’sRoutingTable
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200819
Example: Routing from A to V
Level: L = 0
L = 1
L = 2Ring 00 Ring 01 Ring 10 Ring 11
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A Root RingD M O
TVXZ
Ring 0A
M
T
X
Ring 1D
Z V
O
OZA T
M
X
DV
A TM
X
DV Z
O L = 3
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200820
Example: Routing to Object
Level: L = 0
L = 1
L = 2
Route from A to F -> Terminates at E
Ring 00 Ring 01 Ring 10 Ring 11
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A Root RingD E O
VXZ
Ring 0A
E
T
X
Ring 1D
Z V
O
OZA T
E
X
DV
A TE
X
DV Z
O L = 3
T
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200821
Name ID Routing Algorithm
SendMsg(nameID, msg) {
if( LongestPrefix(nameID,localNode.nameID)==0 )
msg.dir = RandomDirection();
else if( nameID<localNode.nameID )
msg.dir = counterClockwise;
else
msg.dir = clockwise;
msg.nameID = nameID;
RouteByNameID(msg);
}
// Invoked at all nodes (including the source and// destination nodes) along the routing path.RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg);}
Load Balancing
Path Locality
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200822
Routing By Numeric ID
• Numeric id’s are random, no ring is sorted by them– We can’t route top-down!
• Bottom-up Routing– Routing begins at level 0 ring until a node is found whose
numeric ID matches the destination numeric ID in the first digit.
– Messages forwarded from ring in level h, Rh, to a ring in level h+1, Rh+1, such that nodes in Rh+1 share h+1 digits with destination numeric ID.
– Terminates when message delivered, or none the nodes in Rh share h+1 digits with destination numeric ID, at a node in Rh with closest possible numeric id.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200823
Example: Routing by Numeric ID
– Hash(“Foo.c”) = 101
Level: L = 0
L = 1
L = 2Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
RingRing000000RingRing000000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
Root RingRoot RingD M O
TVXZ
Ring 0Ring 0
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Foo.c
A
A
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200824
Routing by Numeric ID
• The same routing tables are used for routing by nameID and numericID
• When Numeric IDs are binary: in each ring Rh, in expectation only 2 nodes visited before encountering one belonging to the next ring Rh+1
– The number of message hops is O(log N) w.h.p.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200826
Base (k) for Numeric IDs• If a higher base k>2 is used for Numeric IDs
the routing is O(klogkN) w.h.p.
• When we increase kmore rings in each level less levels less pointers in routing table less state but more hops…
• Optimization - dense routing table (R-Table)• Normal (sparse) R-Table + k-1 pointers to contiguous
nodes in both directions at each level. More state but less hops
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200827
Node Join
• Two-stage process: (1) bottom-up + (2) top-down• Bottom-up: find the top level ring that matches the
node’s numeric ID.• Top-down: build the new node’s routing table
– Find a neighbor in the top ring using name ID search.– Starting from this neighbor, search for the name ID at the next
lower level and thus find neighbors at lower level. – Repeated until the search reaches the root.
• Update of the existing nodes’ routing tables:– after the new node has joined the root ring.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200828
Node join illustrated
Ring P0 Ring P1
Ring P
Only a few in expectation
Joining node
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200829
Node Join - Analysis
• Key ideas:– Climb to a weakly populated ring.– Search for the node’s neighbors at the lower levels only
after finding the neighbors at the higher levels.– The range of traversed nodes at the level = the range of
neighbors at the next higher level.
• Insertion traverses O(log N) hops whp– Expected O(log N) levels, constant number of
neighbors at each level.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200830
Node Departure/Failure
• Graceful (notified) vs crash departure• Key issue –routing tables’ update• Key idea – separate vital info from optimizations
– Routing is correct as long as the root level ring is maintained.
– Other levels regarded as optimization hints – Does this remind something?
• Upper-ring membership maintained through a background repair process.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200831
Leaf Sets
• Idea = use redundant pointers at level 0:
• Store L/2 pointers in each direction • SkipNet uses L=16
– Not an original SkipNet idea – used in Pastry.
• Protect from independent failures• Improve the search performance
– rout directly using leaf set if got within L/2 of the target
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200832
Constrained Load Balancing (CLB)
• Multiple DHTs with differing scopes using a single SkipNet structure– A result of the ability to route in both address spaces
• Divide data object names into two parts with ! CLB Domain CLB Suffix
microsoft.com!skipnet.htmlNumeric RoutingName Routing
• microsoft.com/skipnet.html! – controlled placement• !microsoft.com/skipnet.html – Global DHT
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200833
CLB Example
• File ID = “com.microsoft!skipnet.html”– Route by name ID to com.microsoft– Inside com.microsoft, route by numeric ID to hash(“skipnet.html”)
com.sun
edu.ucbgov.irs
com.microsoft
skipnet.html
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 200834
SkipNet Path Locality
• Organizations correspond to contiguous SkipNet segments– Internal routing by NameID remains internal
• Nodes have left / right pointers
com.sun
edu.ucbgov.irs
com.microsoft
com.microsoft.research