CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of...

22
CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic Software Fault Diagnosis by Exploiting Application Signatures

Transcript of CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of...

Page 1: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Presented by: Suman Chander BDept of Computer & Information Sciences

University of Delaware

Automatic Software Fault Diagnosis by Exploiting Application Signatures

Page 2: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Motivation

• Application problem diagnosis in complex Enterprise environments

• large number of possible causes, most of the failures due to runtime interactions with the system environment

• Troubleshooting these problems requires extensive experience and time

Page 3: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Overview

• Present a black box approach to diagnose several application faults

• Application signatures

• Approach to detect application faults

• Provide detail on tool design and implementation

• Evaluate effectiveness of the tool to correct fault behaviour of an application

• Case studies to support the ideology

Page 4: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Application Behaviour

• Factors aiding in capturing application behaviour:• System Calls• Signals• Environment variables• Resource Limits• Access Control

• Collecting and keeping history information help in finding the root cause of problem in quick time.

Page 5: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Building Signatures...

• Choice of attributes – using “test of goodness test” using KS-test

Page 6: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Signatures for system calls...

Page 7: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

handling multiple processes...

• Data is collected for each process separately• Relations between systems calls will be correctly

reflected after separating interleaving system calls• Some specific attributes (eg. Signals, UID) are specific to

a process

• For multithreaded applications – data collection and signatures are built separately for each thread

• Current approach does not handle user-level threads

Page 8: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Tool Design

System Architecture

Page 9: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

One...Application Tracer...

• Tracer tool force executes target application

e.g. ‘tracer application_program’

• Low overheads is crucial

• Uses p-trace interface for building signatures for system calls

• Some runtime behaviours (environment variables, resource limit, user id, etc) are not relevant to system calls

Page 10: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Two...Signature Bank...

Page 11: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Three...Fault Diagnosis...

• Classifier tool provides root cause for deviation from normal behaviour:• Access the signature bank for normal traces• Compare with faulty trace obtained• Determine the root cause for this fault• Provide information to user with diagnosis

Page 12: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Case Studies

Page 13: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Testing with Apache...

• For testing the tool with Apache, WebStone 2.5 is used

• WebStone 2.5 is free benchmarking tool for web servers

• Signature bank was built from performing operations ten times each to generate corresponding traces

• Example: • Faulty execution of write system call• Unable to write into log file• Root Cause: Error Number EFBIG indicating that file is

too large

Page 14: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Testing with Apache...

Page 15: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Observation 1

• Comparison showing change in size of trace over 45 minute period

• 6.3 MB space contains recording of nearly 11 million system call invocations

Page 16: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Observation 2

• Comparison of change in size of trace file and signature bank based on the number of traces run

• Signature bank grows slow as redundant data are merged

Page 17: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

From other tests...

• CVS• Average slowdown – 29.6%• Collected 26 traces ranging from 0.1 MB to 1.6 MB• Recorded signature bank is 6.5 MB consisting of about 1.8 million

system calls

• PostgreSQL• Average slowdown – 15.7%• Collected traces ranging from 0.6 MB to 2.1 MB• Recorded signature bank is 3.2 MB

Page 18: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Limiting false positives• First cause is related to KS-test• Second cause relates to the fact that Signature bank cannot

cover all normal variations of the attributes• Aggregating more traces would complete the bank and

reduce false positives gradually

Page 19: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Performance measure

• Majority is due to information collection and trace file updating when a system call happens

• Overheads that occur:• Switching from kernel to tracer and back both at system call entry and

exit• Retrieving system call number, return value and related attributes• Looking up user stack to get its content

• Improvement obtained by modifying ptrace code with addition of primitives

• PTRACE_SETBATCHSIZE and PTRACE_READBUFFER

Page 20: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Improvement...

Page 21: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

Limitations

• Labelling of application execution trace as faulty

• Manual indication required

• Conservative approach in capturing amount of information needed for trace

• More analysis required to identify minimum required set of data that will provide higher accuracy in detecting problems

• Results are limited from exploring few case studies

Page 22: CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CISC 879 - Machine Learning for Solving Systems Problems

THANK YOU