Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays...

Post on 08-Jan-2017

43 views 0 download

Transcript of Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays...

Debugging Skynet

A Machine Learning Approach to Log Analysis

ianir ideses - Logz.io

The Problem - Overlogging• Millions of logs per week

• Important logs get lost in the clutter

• Need to surface the relevant logs, deemphasize irrelevant logs

Proposed Solution• A Machine Learning approach

• Can sift through large amounts of data

• Can evolve and react to changes in data

• Requires large amounts of data to be effective

Machine Learning• Unsupervised• Clustering• Anomaly detection

• Supervised• Recommender systems• Classifiers

Unsupervised Machine Learning• No labels are needed, just lots of data

• Useful when reducing a large amount of data points to a smaller cluster subset

Unsupervised Machine Learning

"GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.Confi"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1."GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253"GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1."GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Configuratio"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851"GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732

"GET /app_dev.php/ HTTP/1.1" 200 6715 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36""GET /bundles/framework/css/body.css HTTP/1.1" 200 6657 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.231"GET /bundles/framework/css/structure.css HTTP/1.1" 200 1191 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42."GET /bundles/acmedemo/css/demo.css HTTP/1.1" 200 2204 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311"GET /bundles/acmedemo/images/welcome-quick-tour.gif HTTP/1.1" 200 4770 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko)"GET /bundles/acmedemo/images/welcome-demo.gif HTTP/1.1" 200 4053 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrom

Nov 20 17:27:55 HANNIBAL MyProgram[13163]: Program started by User 1000 Nov 21 17:27:53 HANNIBAL MyProgram[13163]: Program terminated by User 1000 Nov 21 17:27:58 JANE MyProgram[13163]: Program started by User 555Nov 23 18:27:53 ARILOU MyProgram[13163]: Program stopped by User 777

Supervised Machine Learning• Learning from labeled examples

• Requires a well defined question:• Is this email spam?• Is this object a car?• Is this log interesting?

• Deployed successfully in many domains, most notable classifiers are NN, SVM, Bayesian Classifiers

Supervised Machine Learning - SVM• Data elements are arranged in vectors• Each vector index is assigned a weight in the training phase• A score is computed by summing up the relevant weights

0.1

0.5

-0.9

0.3

Xconnection error success failure“Connection failure”: 0.1 + 0.3 = 0.4

“Connection success”: 0.1 - 0.9 = -0.8

Log Relevancy• An ill posed problem

• Relevancy is user specific

• People tend to search forknown issues

• There are also unknownunknowns

• Labels are potentiallyvery tedious to acquire

Proposed Solution - Labels• Acquiring labels:• Implicit/explicit user behavior

• Inter-user similarities

• Public knowledge bases

Machine Learning in Practice• Data is textual, numerical and alphanumerical

• Classifiers that have shown good results:• Random Forests, resemble flow chart decision making• Linear SVM

• Both classifiers are easy to interpret in the feature space

Machine Learning in Practice

connected: -0.157199772246to provider: -0.15319903564connected successfully: -0.15319903564

unable: 0.671539714688topic: 0.678756599452error: 0.788508324168

Machine Learning in Practice - Modules• Log normalization

• Label acquisition

• Model training

• Log classification and enhancement

Log Normalization• Lower case, stem, stop words

• Identify common fields (timestamp, severity, etc’)

• Identify variable, functions, class names

• Identify known reserved words

• Cluster logs that share the same prototype

Labeler• Different sources for labels• CQA sites• Explicit user interaction• Implicit user interaction• Heuristics

Log Enhancer• Use knowledge about log events to add prior data

• Suggest solutions to known problems

• Tag relevant logs for display to the user

Flow

Log Normalization

Labeler

ML - Training Log Enhancer

Logs

Classifiers

Logs

Machine Learning at Scale• Use Spark to drive high throughput, high scale

• Tbytes of data, daily

• Spot Instances to keep costs at bay

To Sum Up• Formulate your question• Get enough data• Get enough labels• Clean data

• Train your classifier