Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan
Diagnosis via monitoring & tracinggarth/15719/lectures/719-S17... · 2017. 4. 4. · Diagnosis via...
Transcript of Diagnosis via monitoring & tracinggarth/15719/lectures/719-S17... · 2017. 4. 4. · Diagnosis via...
Diagnosis via monitoring & tracing
Greg Ganger, Garth Gibson, Majd Sakradapted from Raja Sambasivan
15-719: Advanced Cloud ComputingSpring 2017
1
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Problem diagnosis is difficult
• For developers of clouds
• For cloud users (i.e., software developers)
• Must debug own applications
• Must debug interactions w/cloud
• E.g., is a slowdown due to other VMs or my app?
2
2
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Monitoring via perf. counters
• Yields counters of low-level data
• E.g., CPU time, disk I/Os, etc.
• E.g., AWS CloudWatch, Ganglia
• Pros: Lightweight, commonly available
• Cons: Black-box; machine oriented
3
3
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Logging events of interest
• Yields detailed text describing system’s behavior (e.g., application, OS, VM, etc.)
• Available in most systems (in some form)
• Pros: White-box approach
• Cons: High overhead; machine-oriented
4
4
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
End-to-end activity tracing
• Similar to logging, but workflow-based
• E.g., Dapper, Stardust, X-Trace, etc.
• Pros: White box, shows workflow
• Cons: Requires system modifications
5
5
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/20176
Monitoring
6
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
A key cloud-specific issue
• Cloud providers and users usually do not wish to share information
• As such:
• Counters normalized to VM capacity
• e.g., percentage of AWS instance
• Provider logs/traces not visible to users
7
7
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Ganglia [Massie04]
• Designed for HPC environments
• Paper assumes bare-metal hardware
• Collects and aggregates counters
• Counters can be app or machine specific
• Within cluster, counters visible everywhere
• Counters from multiple clusters aggregated
8
8
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Ganglia architecture
9
Ganglia federates multiple clusters together using a tree of point-to-point connec-tions. Each leaf node specifies a node in a specific cluster being federated, whilenodes higher up in the tree specify aggregation points. Since each cluster node con-tains a complete copy of its cluster’s monitoring data, each leaf node logically rep-resents a distinct cluster while each non-leaf node logically represents a set ofclusters. (We specify multiple cluster nodes for each leaf to handle failures.) Aggre-gation at each point in the tree is done by polling child nodes at periodic intervals.Monitoring data from both leaf nodes and aggregation points is then exported usingthe same mechanism, namely a TCP connection to the node being polled followed bya read of all its monitoring data.
4. Implementation
The implementation consists of two daemons, gmond and gmetad, a command-line program gmetric, and a client side library. The Ganglia monitoring daemon(gmond) provides monitoring on a single cluster by implementing the listen/announce protocol and responding to client requests by returning an XML represen-tation of its monitoring data. gmond runs on every node of a cluster. The GangliaMeta Daemon (gmetad), on the other hand, provides federation of multiple clus-ters. A tree of TCP connections between multiple gmetad daemons allows monitor-ing information for multiple clusters to be aggregated. Finally, gmetric is acommand-line program that applications can use to publish application-specificmetrics, while the client side library provides programmatic access to a subset ofGanglia’s features.
4.1. Monitoring on a single cluster
Monitoring on a single cluster is implemented by the Ganglia monitoring daemon(gmond). gmond is organized as a collection of threads, each assigned a specific task.
client
gmetad
gmetad
gmetad
Node
gmond
Node
gmond
Node
gmond. . .Node
gmond
Node
gmond
Node
gmond. . .
dataconnect
failoverpoll
poll poll
failoverpoll
Cluster Cluster
XML over TCP
XDR over UDP
Fig. 1. Ganglia architecture.
822 M.L. Massie et al. / Parallel Computing 30 (2004) 817–840
Source: The ganglia distributed monitoring system: design, implementation, and
experience. Parallel Computing, Volume 30, Issue 7, July 2004.
9
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
AWS CloudWatch
• Provides monitoring for all AWS resources
• EC2 counters show VM-normalized values
• Also, can monitor app-specific metrics
10
10
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/201711
End-to-end tracing
11
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
End-to-end tracing overview
• Focus of many research efforts for ~10 yrs
• Currently used in Google, Bing, etc.
• Traces show causality-related activity
• Trace: set of events from different threads/machines merged & sorted by causality
• E.g., flow of indiv. requests (request flows)
12
12
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
End-to-end tracing implementation
• Tracing infrastructure tracks trace points touched by individual requests
• Some “start” traces (eg. user request rec’d)
• Others propagate trace ID created at start
• Traces obtained by stitching together trace points accessed by individual requests
• Hard to account for async and batched work
13
13
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Throttling by Sampling
• Users trace too little or too much
• Limit user bytes added per trace span
• Request sampling to limit global overhead
• Collects all trace points for a req. or none
• Hash trace ID to [0,1] and keep if < threshold
• Allows end-2-end tracing to be “always on”
14
14
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
End-to-end tracing architecture
15
• causality slice choice
App Server
Table Store
DistributedFilesystem
Client Server
trace storage(optional)
trace points
storage/construction
traceconstruction
causal tracking
presentation layer (visualization)
• trace representation
conceptual choices
samplingdecision
15
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
A few key design questions
• How much representational power?
• DAGs, trees, or paths?
• What causal relationships to preserve?
• Read-after-write, contention, etc.
• How many request flows to sample?
• Where to make sampling decision?
16
16
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
A DAG-based request flow
17
SN1 Reply
1,500μsSN2
Reply
1,500μs
9
NFS Read Call
Cache Miss
100μs
SN1 Read Call
SN2 Read Call
10μs 10μs
NFS Reply10μs 10μs
Response-time: 8,120μs
Work on SN1
Work on SN2
1,000μs 1,000μs
3,500μs 5,500μs
Nodes show trace points & edges show latencies
Storage node 1NFS server
Storage node 2
17
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Dapper [Sigelman10]
• Google’s impl. of end-2-end tracing
• In use since at least 2008
• Similar in architecture to other examples
• But, optimized for traces expected at Google
• Trace records gathered in external system
• median lat. 15s, 25% of time 98%tile > hrs
18
18
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Dapper design decisions
• Traces represented as trees of of RPCs
• Node contains all work done for an RPC
• Edges indicate new RPC calls/replies
• Core tracing infrastruct. + developer adds
• Sampling decision made at request entry
• Based on hash of root ID (keep x% traces)
19
19
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Example Dapper trace tree
20
tem on behalf of a given initiator. For example, Fig-ure 1 shows a service with 5 servers: a front-end (A),two middle-tiers (B and C) and two backends (D and E).When a user request (the initiator in this case) arrives atthe front end, it sends two RPCs to servers B and C. Bcan respond right away, but C requires work from back-ends D and E before it can reply to A, which in turn re-sponds to the originating request. A simple yet usefuldistributed trace for this request would be a collectionof message identifiers and timestamped events for everymessage sent and received at each server.
Two classes of solutions have been proposed to ag-gregate this information so that one can associate allrecord entries with a given initiator (e.g., RequestX inFigure 1), black-box and annotation-based monitoringschemes. Black-box schemes [1, 15, 2] assume there isno additional information other than the message recorddescribed above, and use statistical regression techniquesto infer that association. Annotation-based schemes[3, 12, 9, 16] rely on applications or middleware toexplicitly tag every record with a global identifier thatlinks these message records back to the originating re-quest. While black-box schemes are more portable thanannotation-based methods, they need more data in orderto gain sufficient accuracy due to their reliance on sta-tistical inference. The key disadvantage of annotation-based methods is, obviously, the need to instrument pro-grams. In our environment, since all applications use thesame threading model, control flow and RPC system, wefound that it was possible to restrict instrumentation toa small set of common libraries, and achieve a monitor-ing system that is effectively transparent to applicationdevelopers.
We tend to think of a Dapper trace as a tree of nestedRPCs. However, our core data model is not restrictedto our particular RPC framework; we also trace activ-ities such as SMTP sessions in Gmail, HTTP requestsfrom the outside world, and outbound queries to SQLservers. Formally, we model Dapper traces using trees,spans, and annotations.
2.1 Trace trees and spans
In a Dapper trace tree, the tree nodes are basic units ofwork which we refer to as spans. The edges indicate acasual relationship between a span and its parent span.Independent of its place in a larger trace tree, though, aspan is also a simple log of timestamped records whichencode the span’s start and end time, any RPC timingdata, and zero or more application-specific annotationsas discussed in Section 2.3.
We illustrate how spans form the structure of a largertrace in Figure 2. Dapper records a human-readable spanname for each span, as well as a span id and parent id
Figure 2: The causal and temporal relationships be-tween five spans in a Dapper trace tree.
in order to reconstruct the causal relationships betweenthe individual spans in a single distributed trace. Spanscreated without a parent id are known as root spans. Allspans associated with a specific trace also share a com-mon trace id (not shown in the figure). All of these idsare probabilistically unique 64-bit integers. In a typicalDapper trace we expect to find a single span for eachRPC, and each additional tier of infrastructure adds anadditional level of depth to the trace tree.
Figure 3 provides a more detailed view of the loggedevents in a typical Dapper trace span. This particularspan describes the longer of the two “Helper.Call” RPCsin Figure 2. Span start and end times as well as any RPCtiming information are recorded by Dapper’s RPC libraryinstrumentation. If application owners choose to aug-ment the trace with their own annotations (like the “foo”annotation in the figure), these are also recorded with therest of the span data.
It is important to note that a span can contain informa-tion from multiple hosts; in fact, every RPC span con-tains annotations from both the client and server pro-cesses, making two-host spans the most common ones.Since the timestamps on client and server come from
Figure 3: A detailed view of a single span from Fig-ure 2.
3
20
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Dapper UI Example
21
21
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/201722
End-to-end tracinganalysis tools
22
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Spectroscope [Sambasivan11]
• Localizes performance degradations
• By ID’ing changed request flows
• Output:
• Groups of before/after request flows
• Some changes automatically ID’d
23
23
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Spectroscope workflow
24
Before degradation request flows
After degradation request flows
Ranking
Grouping
Structural change identification
Response-time only change identification
Presentation
24
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Automatically ID’d changes
• Same structure is same trace points• Groups w/structural changes
• Identified via heuristics (e.g. freq. of types)• Groups w/response-time changes
• Have identical flows in both periods• ID’d via statistical significance testing
25
25
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/201726
Before
10μs
20μs
50μs
NFS Lookup Call
MDS DB Lock
MDS DB Unlock
NFS Lookup Reply
10μs
50μs
AfterNFS Lookup Call
MDS DB Lock
MDS DB Unlock
NFS Lookup Reply
350μs
Developers localize root cause by ID’ing how differences before/after degradation
Group w/structural changes
26
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Group w/response time
27
After degradation avg. response time: 1,090μs
Before degradation avg. response time: 110μs
Avg. 10μs
Avg. 20μs Avg. 1,000μs
Avg. 80μs
NFS Read Call
SN1 Read Start
SN1 Read End
NFS Read ReplyRoot cause localized by ID’ing responsible interaction
27
15-719/18-847b: Advanced cloud computing, CMURevised: 04/3/2017
Summary
• Debugging distributed systems is hard
• Performance debugging is harder still
• Monitoring is counting without causation
• But people want examples (traces)
• Too much statistical analysis slows trust
• Traces are logistically expensive, quick to rot
28
28