A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.
-
Upload
joan-murphy -
Category
Documents
-
view
218 -
download
1
Transcript of A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.
![Page 1: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/1.jpg)
A Framework for Highly-Available
Real-Time Internet Services
Bhaskaran Raman,EECS, U.C.Berkeley
![Page 2: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/2.jpg)
Problem Statement
• Real-time services with long-lived sessions• Need to provide continued service in the
face of failures
Internet
Video smoothing proxy
Transcoding proxy
Clients
Content server
Real-time services
![Page 3: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/3.jpg)
Problem Statement
• Goals:1. Quick recovery2. Scalability
Client Monitoring
Video-on-demand server
Network path failure
Service failure
Service replica
![Page 4: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/4.jpg)
Our Approach
• Service infrastructure– Computer clusters deployed at several points on
the Internet– For service replication & path monitoring– (path = path of the data stream in a client
session)
![Page 5: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/5.jpg)
Summary of Related Work
• Fail-over within a single cluster– Active Services (e.g., video transcoding)– TACC (web-proxy)– Only service failure is handled
• Web mirror selection– E.g., SPAND– Does not handle failure during a session
• Fail-over for network paths– Internet route recovery
• Not quick enough
– ATM, Telephone networks, MPLS• No mechanism for wide-area Internet
No architecture to address network path failure during a real-time session
![Page 6: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/6.jpg)
Research Challenges
• Wide-area monitoring– Feasibility: how quickly and reliably can
failures be detected?– Efficiency: per-session monitoring would
impose significant overhead
• Architecture– Who monitors who?– How are services replicated?– What is the mechanism for fail-over?
![Page 7: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/7.jpg)
Is Wide-Area Monitoring Feasible?
Monitoring for liveness of path using keep-alive heartbeat
Time
Time
Failure: detected by timeout
Timeout period
Time
False-positive: failure detected incorrectly
Timeout period
There’s a trade-off between time-to-detection and rate of false-positives
![Page 8: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/8.jpg)
Is Wide-Area Monitoring Feasible?
• False-positives due to:– Simultaneous losses– Sudden increase in RTT
• Related studies:– Internet RTT study, Acharya & Saltz, UMD 1996
• RTT spikes are isolated
– TCP RTO study, Allman & Paxson, SIGCOMM 1999
• Significant RTT increase is quite transient
• Our experiments:– Ping data from ping servers– UDP heartbeats between Internet hosts
![Page 9: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/9.jpg)
Ping measurements
• Ping servers– 12 geographically distributed
servers chosen– Approximation of a keep-alive
stream
• Count number of loss runs with > 4 simultaneous losses– Could be an actual failure or just
intermittant losses– If we have 1 second HBs, and
timeout after losing 4 HBs– This count gives the upper
bound on the number of false-positives
Ping server
Berkeley
Internet host
1
2
3HTTP
ICMP
![Page 10: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/10.jpg)
Ping measurements
Ping server Ping host Total time > 4 misses
cgi.shellnet.co.uk hnets.uia.ac.be 15:14:14 12
hnets.uia.ac.be sites.inka.de 20:55:58 29
cgi.shellnet.co.uk hnets.uia.ac.be 14:53:14 0
www.hit.net d18183f47.rochester.rr.com 20:27:32 10
d18183f47.rochester.rr.com www.his.com 17:30:31 0
www.hit.net www.his.com 20:28:5 2
www.csu.net zeus.lyceum.com 20:1:15 17
zeus.lyceum.com www.atmos.albany.edu 20:1:17 62
www.csu.net www.atmos.albany.edu 20:1:18 7
www.interworld.net www.interlog.com 14:19:44 6
www.interlog.com inoc.iserv.net 20:32:7 24
www.interworld.net inoc.iserv.net 14:39:56 0
![Page 11: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/11.jpg)
UDP-based keep-alive stream
• Geographically distributed hosts:– Berkeley, Stanford, UIUC, TU-Berlin, UNSW
• UDP heart-beat every 300ms• Measure gaps between receipt of successive
heart-beats• False positive:
– No heartbeat received for > 2 seconds, but received before 30 seconds
• Failure:– No HB for > 30 seconds
![Page 12: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/12.jpg)
UDP-based keep-alive stream
HB destination HB source Total time Num. False positives
Berkeley UNSW 130:48:45 135
UNSW Berkeley 130:51:45 9
Berkeley TU-Berlin 130:49:46 27
TU-Berlin Berkeley 130:50:11 174
TU-Berlin UNSW 130:48:11 218
UNSW TU-Berlin 130:46:38 24
Berkeley Stanford 124:21:55 258
Stanford Berkeley 124:21:19 2
Stanford UIUC 89:53:17 4
UIUC Stanford 76:39:10 74
Berkeley UIUC 89:54:11 6
UIUC Berkeley 76:39:40 3
![Page 13: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/13.jpg)
What does this mean?
• If we have a failure detection scheme– Timeout of 2 sec– False positives can be as low as once a day– For many pairs of Internet hosts
• In comparison, BGP route recovery:– > 30 seconds– Can take upto 10s of minutes, Labovitz & Ahuja,
SIGCOMM 2000– Worse with multi-homing (an increasing trend)
![Page 14: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/14.jpg)
Architectural Requirements
• Efficiency:– Monitoring per-session too much overhead– End-to-end more latency monitoring less
effective
• Need aggregation– Client-side aggregation: using a SPAND-like server– Server-side aggregation: clusters– But not all clients have the same server & vice-versa
• Service infrastructure to address this– Several service clusters on the Internet
![Page 15: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/15.jpg)
Architecture
Internet
Service cluster: Compute cluster capable of running
services
Keep-alive stream
Client
Source
Overlay topology
Nodes = service clusters
Links = monitoring
channels between clusters
Source Client
Routed via monitored paths
Could go through an intermediate
service
Local recovery using a backup
path
![Page 16: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/16.jpg)
Architecture
Monitoring within cluster for process/machine failure
Monitoring across clusters for network path failure
Peering of service clusters – to server as backups for
one another – or for monitoring the path
between them
![Page 17: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/17.jpg)
Architecture: Advantages
• 2-level monitoring – Process/machine failures still handled within cluster– Common failure cases do not require wide-area
mechanisms
• Aggregation of monitoring across clusters – Efficiency
• Model works for cascaded services as well
Client
SourceS1
S2
![Page 18: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/18.jpg)
Architecture: Potential Criticism
• Potential criticism:– Does not handle resource reservation
• Response:– Related issue, but could be orthogonal– Aggregated/hierarchical reservation schemes (e.g.,
the Clearing House)– Even if reservation is solved (or is not needed), we
still need to address failures
![Page 19: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/19.jpg)
Architecture: Issues(1) Routing
Internet
Client
Source
Given the overlay topology,
Need a routing algorithm to go from source to destination via service(s)
Also need: (a) WA-SDS, (b) Closest service cluster to a given host
![Page 20: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/20.jpg)
Architecture: Issues(2) Topology
How many service-clusters? (nodes)
How many monitored-paths? (links)
![Page 21: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/21.jpg)
Ideas on Routing
• BGP– Border router– Peering session
• Heartbeats
– IP route• Destination based
• Service infrastructure– Service cluster– Peering session
• Heartbeats
– Route across clusters• Based on destination
and intermediate service(s)
Similarities with BGP
![Page 22: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/22.jpg)
• How is routing in the overlay topology different from BGP?
• Overlay topology– More freedom than physical topology– Constraints on graph can be imposed more easily– For example, can have local recovery
Ideas on Routing
S (Berkeley)
S’ (San Francisco)
![Page 23: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/23.jpg)
Ideas on Routing
AP1
AP2
AP3
BGP exchanges a lot of information
Increases with the number of APs – O(100,000)
Service clusters need to exchange very little information
Problems of (a) Service discovery (b) Knowing the nearest service cluster – are decoupled from routing
Source
Client
![Page 24: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/24.jpg)
Ideas on Routing
• BGP routers do not have per-session state
• But, service clusters can maintain per-session state– Can have local recovery pre-
determined for each session
• Finally, we probably don’t need to have as many nodes as the number of BGP routers– can have more aggressive routing
algorithm
It is feasible to have fail-over in the overlay topology quicker than Internet route recovery with BGP
![Page 25: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/25.jpg)
Ideas on topology
• Decision criteria– Additional latency for client session
• Sparse topology more latency
– Monitoring overhead• Many monitored paths more overhead, but additional
flexibility might reduce end-to-end latency
![Page 26: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/26.jpg)
Ideas on topology
• Number of nodes:– Claim: upper bound is: one per AS– This will ensure that there is a service cluster “close”
to every client– Topology will be close to Internet backbone topology
• Number of monitoring channels:– # AS: ~7000 as of 1999– Monitoring overhead: ~100 Bytes/sec
• Can have ~1000 peering sessions per service cluster easily
– Dense overlay topology possible (to minimize additional latency)
![Page 27: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/27.jpg)
Implementation + Performance
Service cluster
Peer clusterManager node
Exchange ofsession-information + Monitoring heartbeat
![Page 28: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/28.jpg)
Implementation + Performance
• PCM GSM codec service– Overhead of “hot” backup service: 1 process (idle)– Service reinstantiation time: ~100ms
• End-to-end recovery over wide-area: three components
Detection time – O(2sec)
Communication with replica – RTT – O(100ms)
Replica activation: ~100ms
![Page 29: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/29.jpg)
Implementation + Performance
• Overhead of monitoring– Keep-alive heartbeat– One per 300ms in our implementation– O(100Bytes/sec)
• Overhead of false-positive in failure detection– Session transfer– One message exchange across adjacent clusters
• Few hundred bytes
![Page 30: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/30.jpg)
Summary
• Real-time applications with long-lived sessions– No support exists for path-failure recovery (unlike
say, the PSTN)
• Service infrastructure to provide this support– Wide-area monitoring for path liveness:
• O(2sec) failure detection with low rate of false positives
– Peering and replication model for quick fail-over
• Interesting issues:– Nature of overlay topology– Algorithms for routing and fail-over
![Page 31: A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.](https://reader038.fdocuments.net/reader038/viewer/2022110211/56649edd5503460f94bee15b/html5/thumbnails/31.jpg)
Questions
• What are the factors in deciding topology?• Strategies for handling failure near client/source
– Currently deployed mechanisms for RIP/OSPF, ATM/MPLS failure recovery
– What is the time to recover?
• Applications: video/audio streaming– More?– Games? Proxies for games on hand-held devices?
http://www.cs.berkeley.edu/~bhaskar
(Presentation running under VMWare under Linux)