A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1,...
-
Upload
walter-jones -
Category
Documents
-
view
224 -
download
0
Transcript of A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1,...
![Page 1: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/1.jpg)
A Runtime Verification Based Trace-Oriented
Monitoring Framework for Cloud SystemsJingwen Zhou1, Zhenbang Chen1, Ji Wang1, Zibin Zheng2 , and
Wei Dong1
Email: {jwzhou, zbchen}@nudt.edu.cn
1PDL & College of Computer, NUDT, Changsha, China2Shenzhen Research Institute, CUHK, Shenzhen, China
![Page 2: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/2.jpg)
2
Motivation
![Page 3: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/3.jpg)
3
![Page 4: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/4.jpg)
4
![Page 5: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/5.jpg)
5
…
August, 2013meltdowns
Amazon: $7,000,000/100minGoogle: $550,000/5min
......
March 13-14, 2008Malfunction in Windows Azure
last 22 hours
February 24, 2009Gmail and Google Apps Engine outage
last 2.5 hours
January 31, 2009Google search outage due to programming error
last 40 min
June 17, 2008Google AppEngine partial outage due to programming error
last 5 hours
February 15, 2008S3 outage: the authentication service overload leading to unavailability
last 2 hours
July 20, 2008S3 outage: single bit error leading to gossip protocol blowup
last >6 hours
August 11, 2008Gmail site unavailable due to outage in contacts
systemlast 1.5 hours
June 29, 2010intermittent performance problems
last 3 hours
![Page 6: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/6.jpg)
6
…
August, 2013meltdowns
Amazon: $7,000,000/100minGoogle: $550,000/5min
......
March 13-14, 2008Malfunction in Windows Azure
last 22 hours
February 24, 2009Gmail and Google Apps Engine outage
last 2.5 hours
January 31, 2009Google search outage due to programming error
last 40 min
June 17, 2008Google AppEngine partial outage due to programming error
last 5 hours
February 15, 2008S3 outage: the authentication service overload leading to unavailability
last 2 hours
July 20, 2008S3 outage: single bit error leading to gossip protocol blowup
last >6 hours
August 11, 2008Gmail site unavailable due to outage in contacts
systemlast 1.5 hours
June 29, 2010intermittent performance problems
last 3 hours
![Page 7: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/7.jpg)
7
Detection Diagnosis Fixing …
User request trace-oriented monitoring is an important method to improve system reliability at runtime.
![Page 8: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/8.jpg)
8
• Currently, many trace-oriented monitoring frameworks exist, such as Dapper, Zipkin, X-ray, P-Tracer, MTracer, …
• However, two aspects need further investigations:– 1. The method for specifying monitoring requirements.
Developers orAdministrators
Monitoring Requirements
Pip IRONModel
![Page 9: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/9.jpg)
9
• Currently, many trace-oriented monitoring frameworks exist, such as Dapper, Zipkin, X-ray, P-Tracer, MTracer, …
• However, two aspects need further investigations:– 1. The method for specifying monitoring requirements.– 2. The efficiency of monitoring.
T1 sec
Request_1 Request_100 Request_1000… … …
Handled in a complex process. For example, a simple Google search request will trigger more than 200 sub-requests and cross hundreds of servers.
Hugetracedata
Real-time
vs
A problem faced by all existing methods!
![Page 10: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/10.jpg)
10
• Currently, many trace-oriented monitoring frameworks exist, such as Dapper, Zipkin, X-ray, P-Tracer, MTracer, …
• However, two aspects need further investigations:– 1. The method for specifying monitoring requirements.– 2. The efficiency of monitoring.
• To facilitate these issues, we bring runtime verification (RV) into the field of the trace-oriented monitoring for cloud systems.
• Runtime Verification– Expressive specification languages – Automatic monitor generation – Efficient monitoring
![Page 11: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/11.jpg)
11
Framework
![Page 12: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/12.jpg)
12
A Cloud System
Tracing System
Preprocess Monitors
Monitor GeneratorPDB
properties
traces
trace data collecting
results
![Page 13: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/13.jpg)
13
The trace records the execution path of a user request.
Trace = events + relationships
Event: function name and latency …
Relationship: local and remote function calls …
Trace → Trace Tree
Nodes correspond to events.
Edges correspond to relationships, e.g., a and c.
Trace Tree → linear event sequence
DFS: 1,2,4,3,5
Call and Return: C1C2C4R4R2C3C5R5R3R1
Compared with the resource-oriented methods, traces can record more find-grained information, e.g., RPC and execution time.
![Page 14: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/14.jpg)
14
Preliminary Evaluation
![Page 15: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/15.jpg)
15
• We collected a trace data set (TraceBench) in an HDFS deployed on a real environment,
– Considering different kinds of user requests• write: uploading files to HDFS• read: downloading files from HDFS• rpc: file management, like querying, removing, renaming, …
– Injecting various faults
– With various cluster size, request speed, etc.
http://mtracer.github.io/TraceBench/
![Page 16: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/16.jpg)
16
• Based on TraceBench, we extract many such properties and we can correctly and flexibly expressing them all! Following are some samples.
Each read request contains at least one reading operation.
And the last reading operation should be successful.Or else, we say it is a failed read request.
![Page 17: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/17.jpg)
17
• In the form of SQL queries, we check the traces with above properties in all sets with faults injected in.
• 100% of failed traces are identified without FPs.
• Several failed traces are also found in the Normal set, with the reason of losing events in the tracing system.
![Page 18: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/18.jpg)
18
• Checking traces in killDN set using Property 2 with a notebook of 4×2.5 GHz CPU and 4 GB memory.
• About 10,000 traces can be checked in 1 second in this condition, which is a promising result.
• In addition, the efficiency can be further improved with various optimizations.
![Page 19: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/19.jpg)
19
Future work
![Page 20: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/20.jpg)
20
• Integrating existing RV frameworks into our tracing system
• Highly efficient and scalable monitoring algorithms and effective machine learning methods for properties.
• Using RV to monitor the performance aspects
• More applications on other real world cloud systems
![Page 21: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/21.jpg)
Tracing Framework available at:http://mtracer.github.io/MTracer/
Data set available at:http://mtracer.github.io/TraceBench/
Online demonstration at:http://www.wsdream.net/mtracer-viz/
21
Framework and Data Set
![Page 22: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/22.jpg)
Thanks for Your Attention!
AndAny Questions?
![Page 23: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/23.jpg)
23
Backup Slides
![Page 24: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/24.jpg)
24
HDFS
In the rest, the traces discussed are collected in HDFS, which is a widely used cloud file storage system.
![Page 25: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/25.jpg)
25
• Starts with getFileInfo, to know if file exists.• Followed by some other RPCs, for related operations.
rpc
• A failure occurs when a violation happens.
![Page 26: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/26.jpg)
26
read
• Consist of many data block reading operations – starting with blockSeekTo (B),
• the last one should be correct– indicated by checksumOK (K).
• A failure occurs when a violation happens.
![Page 27: A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.](https://reader035.fdocuments.net/reader035/viewer/2022062305/5697c01d1a28abf838cd0b98/html5/thumbnails/27.jpg)
27
write
• Similar with read, consists many data block writing operations,– by calling createBlockOutputStream (C) .
• and the last one should be correct– indicated by the equality of receiveBlock (R) and wirteBlock (W).
• where Oa is the abstract next operator in CaRet.• And a failure occurs when a violation happens.