Towards Detecting Performance Anti-patterns Using Classification Techniques

19
Towards Detecting Performance AntiPatterns Using Classi8ication Techniques Manjala Peiris and James H. Hill 1 st Interna4onal Workshop on Machine Learning & Informa4on Retrieval for SoBware Evolu4on Nov 11, 2013, Silicon Valley, California, USA.

description

This is the talk I gave on behalf of my Ph.D. student at the Machine Learning and Information Retrieval (MALIR) for Software Evolution (MALIR-SE) workshop at ASE 2013.

Transcript of Towards Detecting Performance Anti-patterns Using Classification Techniques

Page 1: Towards Detecting Performance Anti-patterns Using Classification Techniques

Towards  Detecting  Performance  Anti-­‐Patterns  Using  Classi8ication  Techniques  Manjala  Peiris  and  James  H.  Hill  1st  Interna4onal  Workshop  on  Machine  Learning  &  Informa4on  Retrieval  for  SoBware  Evolu4on  Nov  11,  2013,  Silicon  Valley,  California,  USA.    

Page 2: Towards Detecting Performance Anti-patterns Using Classification Techniques

Motivation:  Software  Performance  Anti-­‐Patterns  •  Common  design  choices  that  have  nega4ve  consequences  

•  Solely  focus  on  performance  of  the  system  •  e.g.:  throughput,  response  4me  

•  Suggests  solu4ons  and  refactoring  •  e.g.,  One  lane  bridge,  Excessive  dynamic  alloca4ons,  God  Class  

Page 3: Towards Detecting Performance Anti-patterns Using Classification Techniques

One  Lane  Bridge  (Smith  et  al.)  Reasons  for  An+-­‐Pa.ern  •  Lack  of  concurrency  •  Limited  to  number  of  resources  

•  Not  u4lizing  available  resources  

Consequences  •  Low  system  throughput  •  High  latency  •  High  response  4me    

One or few processes/threads are allowed to execute concurrently

Page 4: Towards Detecting Performance Anti-patterns Using Classification Techniques

Excessive  Dynamic  Allocations  (Smith  et  al.)  

Reason for Anti-Pattern •  Objects are created when

they are first accessed and then destroyed when no longer needed.

Consequences •  The cost of dynamic

allocations      

N-­‐  Number  of  Calls  Sc  ,Sd-­‐  Costs  for  an  object  crea6on  and  dele6on    

Page 5: Towards Detecting Performance Anti-patterns Using Classification Techniques

Why  Automatic  Detection  of    Performance  Anti-­‐Patterns  •  Difficult to manually analyze large amount of performance

data •  Make sense of large amount of performance data rather

than just showing it to users •  Provides intuitions to system designers where the

refactoring is required  

Page 6: Towards Detecting Performance Anti-patterns Using Classification Techniques

Current  Approaches  for  Anti-­‐Pattern  detection  Approach  based  on  so7ware  design  ar+facts  1.  Annotate  the  soBware  

design  2.  Runs  simula4ons  and  

gather  performance  data  3.  Apply  rules    Approaches  based  on  run+me  data  •  Architecture  dependent  (e.g.:  J2EE  an4-­‐paYerns)  

•  Requires  architecture  specific  deployment  details  

Page 7: Towards Detecting Performance Anti-patterns Using Classification Techniques

Non-­‐intrusive  Performance  Anti-­‐Pattern  Detector  (NiPAD)  •  Collect  system  performance  metrics  

•  SoBware  execu4on  with  a  performance  an4-­‐paYern  (Class  0)  

•  SoBware  execu4on  without  the  performance  an4-­‐paYern  (Class  1)  

•  Normalize  the  data  •  Train  a  classifier  

•  Naïve  Bayes,  Logis4c  Regression,  FLD,  SVM  (Linear),  SVM  (RBF)  

•  Predict  for  new  performance  data  for  which  the  class  label  is  unknown  

Page 8: Towards Detecting Performance Anti-patterns Using Classification Techniques

System  level  Metrics  Metric Descrip+on

CPU  Idle  Time The  4me  CPU  is  idle  not  doing  any  work

CPU  User  Time CPU  u4liza4on  for  user  applica4ons

CPU  System  Time CPU  u4liza4on  for  system  level  programs

Free  Memory Total  free  memory  when  invoking  the  applica4on

Cached  Memory Total  cached  memory  available  when  invoking  the  applica4on

Total  Commits Total  number  of  commits

•  Metrics  are  collected  every  1  second  epochs  

Page 9: Towards Detecting Performance Anti-patterns Using Classification Techniques

CPU  Times  with  One  Lane  Bridge  

Page 10: Towards Detecting Performance Anti-patterns Using Classification Techniques

CPU  Times  without  One  Lane  Bridge  

Page 11: Towards Detecting Performance Anti-patterns Using Classification Techniques

Experiments  with  Apache  Web  Server  •  Emula4ng  One  Lane  Bridge  An4-­‐PaYern  •  Use  Apache  Benchmark  to  generate  a  load  •  Server  configura4ons  

One  Lane  Bridge Without  One  Lane  Bridge

300  concurrent  clients  sending    1  million  requests,  server  has  150  threads

300  concurrent  clients  sending  1  million  request,  server  has  300  threads

200 records for training 400 records for testing

Page 12: Towards Detecting Performance Anti-patterns Using Classification Techniques

Classi8ication  Results  for  One  Lane  Bridge  

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

1  

Naïve  Bayes   Logis4c  Regression  

FLD   SVM  (Linear)   SVM  (RBF)  

Accuracy  

Classifier  

One  Lane  Bridge  

Naïve  Bayes  

Logis4c  Regression  

FLD  

SVM  (Linear)  

SVM  (RBF)  

Page 13: Towards Detecting Performance Anti-patterns Using Classification Techniques

Classi8ication  Results  for  One  Lane  Bridge  with  Noise  

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

Naïve  Bayes   Logis4c  Regression  

FLD   SVM  (Linear)   SVM  (RBF)  

One  Lane  Bridge  with  Noise  

Naïve  Bayes  

Logis4c  Regression  

FLD  

SVM  (Linear)  

SVM  (RBF)  

Page 14: Towards Detecting Performance Anti-patterns Using Classification Techniques

Experiments  with  Apache  Web  Server  •  Emula4ng  Excessive  Dynamic  Alloca4on  an4-­‐paYern  •  Server  configura4ons  

Excessive  Dynamic  Alloca+ons

Without  Excessive  Dynamic  Alloca+ons

•  300  concurrent  clients  sending  1  million  requests,  server  has  300  threads  

•  Memory  pool  size  of  1kb

•  300  concurrent  clients  sending  1  million  request,  server  has  300  threads  

•  Memory  pool  size  of  1Mb

200 records for training 400 records for testing

Page 15: Towards Detecting Performance Anti-patterns Using Classification Techniques

Classi8ication  Results  for  Excessive  Dynamic  Allocations  

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

Naïve  Bayes   Logis4c  Regression   FLD   SVM  (Linear)   SVM  (RBF)  

Accuracy  

Classifier  

Excessive  Dynamic  Alloca+on  

Naïve  Bayes  

Logis4c  Regression  

FLD  

SVM  (Linear)  

SVM  (RBF)  

Reason for poor classification performance •  Emery et al. shows custom memory allocation techniques

does not have much advantages

Page 16: Towards Detecting Performance Anti-patterns Using Classification Techniques

Cost  Analysis  for  One  Lane  Bridge  

•  Posi4ve  class  is  the  one  which  does  not  have  the  An4-­‐paYern  •  Predicts  these  situa4ons  more  accurately  

•  Cost  of  misclassifica4on  depends  on  the  nature  of  the  soBware  and  soBware  development  cost  

•  This  technique  will  eliminate  unnecessary  soBware  tes4ng  •  Not  good  for  real  4me  soBware  systems    

Classifier   Sensi+vity   Specificity   Precision   Accuracy  

Logis4c   0.95   0.62   0.53   0.76  

FLD   0.95   0.66   0.56   0.7  

Naïve  Bayes   0.92   0.28   0.38   0.5  

SVM  (Linear)   0.98   0.92   0.84   0.94  

SVM  (RBF)   0.96   0.7   0.61   0.75  

Page 17: Towards Detecting Performance Anti-patterns Using Classification Techniques

Cost  Analysis  for  One  Lane  Bridge  

•  Posi4ve  class  is  the  one  which  does  not  have  the  An4-­‐paYern  •  Predicts  these  situa4ons  more  accurately  

•  Cost  of  misclassifica4on  depends  on  the  nature  of  the  soBware  and  soBware  development  cost  

•  This  technique  will  eliminate  unnecessary  soBware  tes4ng  •  Not  good  for  real  4me  soBware  systems    

Classifier   Sensi+vity   Specificity   Precision   Accuracy  

Logis4c   0.95   0.62   0.53   0.76  

FLD   0.95   0.66   0.56   0.7  

Naïve  Bayes   0.92   0.28   0.38   0.5  

SVM  (Linear)   0.98   0.92   0.84   0.94  

SVM  (RBF)   0.96   0.7   0.61   0.75  

Page 18: Towards Detecting Performance Anti-patterns Using Classification Techniques

Concluding  Remarks  Limita+ons  •  System  level  performance  metrics  may  not  show  enough  varia4ons  •  e.g.,  Excessive  Dynamic  Alloca4ons  

•  Bad  performance  may  be  for  some  other  reasons  •  e.g.,  Configura4on  errors,  bad  user  inputs  

Future  work  •  Currently  including  behavior  of  the  soBware  applica4on  in  analysis  

•  Applying  this  technique  to  other  soBware  applica4ons  

Page 19: Towards Detecting Performance Anti-patterns Using Classification Techniques

Questions