Post on 22-Feb-2016
description
Moving beyond end-to-end path information to optimize CDN performance
Krishnan, R., et al. in processed IMC '09. 2009. New York, NY, USA: ACM.
Reported by: Eraser HuangEmail: eraser.osiris@gmail.com2011-03-23 @SYSU
1/33
AgendaProblemAbstractOverviewPath Latency AnalysisDiagnose Cases of Inefficient
RoutingSome ExampleLimitation
2 /33
ProblemClient-Server Applications
3 /33
ProblemContent Distribution Network ( CDN )
4 /33
ProblemContent Distribution Networks ( CDN )
5 /33
AbstractMain result of this paper is that:
◦Redirecting every client to the server with least latency does not suffice to optimize client latencies
◦Find that queuing delays often override the benefits of a client interacting with a nearby server
The dataset analyzed in this paper is available at : http://research.google.com/pubs/pub35590.html
6 /33
OverviewGoogle’s CDN Architecture
◦Aims to redirect each client to the node to which it has the least latency
◦The RTT measured to a client is taken to be representative of the client’s prefix
◦This redirection however is based on the prefix corresponding to the IP address of the DNS server that resolves the URL of the content on the client’s behalf
7 /33
OverviewGoals
◦Understand the efficacy of latency-based redirection in enabling a CDN to deliver the best RTTs possible to its clients
◦Identify the broad categories of causes for poor RTTs experienced by clients
◦Implement a system to detect instances of poor RTTs and diagnose the root causes underlying them
8 /33
OverviewThe authors have used WhyHigh to
diagnose several instances of inflated latencies◦BGP tables from routers◦Mapping of routers to geographic locations ◦RTT logs for connections from clients◦Traffic volume information◦Active probes such as traceroutes and pings when necessary
◦Approximately 170K prefixes spread across the world
The dataset analyzed in this paper is available at : http://research.google.com/pubs/pub35590.html
9 /33
OverviewData Set - RTT Measurement
The RTT will be measured
10 /33
OverviewData Set - Data Pre-Processing
1-to prefix• Map every client to the routable prefix• By BGP snapshot
2-add geo-info
• tag prefixes with geographic information• from a commercial geolocation database
3-prunning
• prune out incorrect prefixes with incorrect geographical information• Geo-RTT• “confidence” • Region Spanning
11 /33
Path Latency Analysis Distribution of RTTs
Figure 212 /33
Path Latency Analysis Three Main Components of TCP Layer RTT
◦Transmission delay (time to put a packet on to the wire) The size of typically control packets is 50 bytes, the
transmission delay will be less than 1ms on dialup link
◦Propagation delay (time spent from one end of the wire to the other end) The client is far away from the node to which they
have the lowest latency◦Queuing delay (time spent by a packet
waiting to be forwarded)
13 /33
Path Latency Analysis Effectiveness of Client
Redirection
Figure 314 /33
Path Latency Analysis Characterizing Latency Inflation
More than 20%
Figure 415 /33
Path Latency Analysis Data Set Partition
80% 20%
1) Prefixes closest to the node geographically
2) All other prefixes
Figure 3 16 /33
Path Latency Analysis Characterizing Latency Inflation
(after data set partition)
More than 20%
Figure 5 17 /33
Path Latency AnalysisCharacterizing Delays
More than 40%
Figure 4 18 /33
Path Latency AnalysisChange of Route (Inefficient
Routing) 4K 6K
19 /33
Path Latency AnalysisCharacterizing Queuing Delays
Figure 720 /33
Path Latency AnalysisSummary
◦Redirection based on end-to-end RTTs results in most clients being served from a geographically nearby node;
◦A significant fraction of prefixes have inefficient routes to their nearby nodes;
◦Clients in most prefixes incur significant latency overheads due to queuing of packets.
21 /33
Diagnose Casesof Inefficient RoutingIdentifying Inflated Prefixes
◦Compare the minimum RTT measured at the node across all connections to the prefix with the minimum RTT measured at the same node across all connections to clients within the prefix’s region
◦Declare a prefix to be inflated if that difference is greater than 50ms.
22 /33
Diagnose Casesof Inefficient RoutingIdentifying Causes of Latency Inflation
◦Snapshots of the BGP routing table provide information on the AS path being used to route packets to all prefixes
◦A log of all the BGP updates tells us the other alternative paths available to each prefix
◦A traceroute 1 from the node to a destination in the prefix, and pings to intermediate routers seen on the traceroute will gain visibility into the reverse path back from prefixes
23 /33
Diagnose Casesof Inefficient RoutingIdentifying Causes of Latency
Inflation◦Circuitousness along the forward path
Sequence of locations traversed along the traceroute to the prefix
◦Circuitousness along the reverse path Significant RTT increase on a single hop of the
traceroute Return TTL on the response from a probed
interface Flow records gathered at border routers in
Google’s network
24 /33
Diagnose Casesof Inefficient RoutingHelping Administrator to Troubleshooting
◦Identifying Path Inflation Granularity (i) Prefixes sharing the same PoP-level path
measured by traceroute, (ii) Prefixes sharing the same AS path and the
same exit and entry PoPs out of and into Google’s network, (iii) prefixes sharing the same AS path
(iv) Prefixes belonging to the same AS◦Ranking CDN Nodes
The fraction of nearby prefixes that have inflated latencies
The fraction of nearby prefixes that are served elsewhere
25 /33
Diagnose Casesof Inefficient RoutingRanking of 13 CDN Nodes
26 /33
System Architecture of WhyHighSteps Involved in the WhyHigh
Pipeline
27 /33
Diagnose Casesof Inefficient RoutingIdentifying Root Causes of
Inflation◦Lack of peering◦Limited bandwidth capacity◦Routing misconfiguration◦Traffic engineering
28 /33
Some ExampleIllustrative Cases
◦Case 2: No peering, and shorter path on less specific prefix
Data used in troubleshooting Case 2: (a)Extract of traceroute, and (b) AS paths received by Google.
- Traffic engineering
A node in India measured RTTs above 400ms to prefixes in IndSP1
29 /33
Some ExampleIllustrative Cases
◦Case 3: Peering, but inflated reverse path
30 /33
Data used in troubleshooting Case 4: (a) Extract of traceroute, and (b) Pings to routers at peering link.
A node in Japan measured RTTs above 100ms to prefixes in IndSP1
Some ExampleSummarizing use of WhyHigh
WhyHigh’s classification of inflated paths
31 /33
LimitationTraceroutes yield path information
only at the IP routing layer◦However, path inflation could occur
below layer 3, e.g., in MPLS tunnels;◦may not be explainable by the
geographic locations of traceroute hopsOnly has access to RTT data
◦TCP transfer times of medium to large objects could be inflated by other factors such as loss rate and bandwidth
32 /33
Thank You!
33 /33