Works on my machine, your problem now? - QCon 2014

41
1 COMPANY CONFIDENTIAL – DO NOT DISTRIBUTE Works on my machine – your problem now? Wolfgang Gottesheim Compuware APM

description

Can you get away with that answer after crashing your production website with a change you just deployed? Usually you can’t, and instead you’re tasked with figuring out and fixing the problem. In this session, we will talk about typical architectural, coding and deployment problems you might recognize, show what data you need to quickly identify them, and how to catch them before impacting the business.

Transcript of Works on my machine, your problem now? - QCon 2014

Page 1: Works on my machine, your problem now? - QCon 2014

11

COMPANY CONFIDENTIAL – DO NOT DISTRIBUTE

Works on my machine – your problem now?

Wolfgang Gottesheim

Compuware APM

Page 2: Works on my machine, your problem now? - QCon 2014

22

Business comes up with new featuresBusiness comes up with new features

Page 3: Works on my machine, your problem now? - QCon 2014

33

Testing?Testing?

Page 4: Works on my machine, your problem now? - QCon 2014

44

And this is what you end up with…And this is what you end up with…

Page 5: Works on my machine, your problem now? - QCon 2014

55

System Unresponsive?System Unresponsive?

Page 6: Works on my machine, your problem now? - QCon 2014

66

What Operations tells Developers…What Operations tells Developers…

Page 7: Works on my machine, your problem now? - QCon 2014

77

…and what Devs would like to know…and what Devs would like to know

Page 8: Works on my machine, your problem now? - QCon 2014

88

…and what Devs would like to know…and what Devs would like to know

Top Contributor is related to String handling

99% of that time comes from RegEx Pattern Matching

Page Rendering is the main component

Page 9: Works on my machine, your problem now? - QCon 2014

99

Attitudes like this don’t help eitherAttitudes like this don’t help either

Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/

Page 10: Works on my machine, your problem now? - QCon 2014

1010

Very “expensive” to work on these issuesVery “expensive” to work on these issues

~80% of problems

caused by ~20% patterns

YES we know this

80% Dev Time in Bug Fixing

$60B Defect Costs

BUT

Page 11: Works on my machine, your problem now? - QCon 2014

1111

Page 12: Works on my machine, your problem now? - QCon 2014

1212

#1: Exhausted Resource Pools#1: Exhausted Resource Pools

Page 13: Works on my machine, your problem now? - QCon 2014

1313

#2: Maxing out Worker Threads#2: Maxing out Worker ThreadsThe timeline shows how these active

worker threads are distributed across all

JVMs

At ~10:10 AM almost all JVMs max out their

available worker threads

Detailed information for

every single JVM

Page 14: Works on my machine, your problem now? - QCon 2014

1414

Root Cause:Class Loading as Performance Hotspot

Root Cause:Class Loading as Performance Hotspot

Most of the time is spent in

CLASSLOADING during Peak Load

But the same is true for “normal” load. Classloading

seems to be a general problem that is not load

related

Page 15: Works on my machine, your problem now? - QCon 2014

1515

Root Cause:Trying to Load a Missing ClassRoot Cause:Trying to Load a Missing Class

Class Loading impacts ALL transactions (fast or slow)

Class Loader tries to load a class ending in

TransferValidatorBPBeanInfo

It’s a class that doesn’t exist

Page 16: Works on my machine, your problem now? - QCon 2014

1616

#3: Deployment Mistakes#3: Deployment Mistakes

Page 17: Works on my machine, your problem now? - QCon 2014

1717

Root Cause: Missing FileRoot Cause: Missing File

Page 18: Works on my machine, your problem now? - QCon 2014

1818

#4: Different settings in Test & Prod#4: Different settings in Test & Prod

Page 19: Works on my machine, your problem now? - QCon 2014

1919

#5: Real-world Data != Test Data#5: Real-world Data != Test Data

Page 20: Works on my machine, your problem now? - QCon 2014

2020

#6: N+1 Query Problem#6: N+1 Query Problem

Page 21: Works on my machine, your problem now? - QCon 2014

2121

#7: Misconfigured Caching Framework#7: Misconfigured Caching Framework

798772 DB Calls in 30 minutes

With NO TRAFFIC

Page 22: Works on my machine, your problem now? - QCon 2014

2222

#8: Memory Leaks#8: Memory Leaks

Still crashes

Problem fixed!

Fixed Version Deployed

Page 23: Works on my machine, your problem now? - QCon 2014

2323

#9: Bloated Web Sites#9: Bloated Web Sites

17! JS Files – 1.7MB in Size

Useless Information!Even might be a security risk!

Page 24: Works on my machine, your problem now? - QCon 2014

2424

Recent example: Healthcare.govRecent example: Healthcare.gov

55 JS Files, 16 jQuery related!

Merging files can reduce roundtrips by 95%

Page 25: Works on my machine, your problem now? - QCon 2014

2525

#10: Browser caches#10: Browser caches

62! Resources not cached

49! Resources with short expiration

Page 26: Works on my machine, your problem now? - QCon 2014

2626

Problems that could have been avoidedProblems that could have been avoided

BUT WHY are they still making it to Production?HOW can we catch them earlier?

?

Page 27: Works on my machine, your problem now? - QCon 2014

2727

Root Cause: Disconnected TeamsRoot Cause: Disconnected Teams

Page 28: Works on my machine, your problem now? - QCon 2014

28

Solution: DevOps + Performance Focus

Page 29: Works on my machine, your problem now? - QCon 2014

2929

CultureCulture Become ONE TeamBecome ONE Team

Page 30: Works on my machine, your problem now? - QCon 2014

3030

CultureCulture TestabilityTestability

Page 31: Works on my machine, your problem now? - QCon 2014

3131

Automate & Measure …Automate & Measure …PerformancePerformance

Page 32: Works on my machine, your problem now? - QCon 2014

3232

Automate & Measure …Automate & Measure …ScalabilityScalability

Page 33: Works on my machine, your problem now? - QCon 2014

3333

AutomateAutomate DeploymentDeployment

Page 34: Works on my machine, your problem now? - QCon 2014

3434

ShareShare ToolsTools

Page 35: Works on my machine, your problem now? - QCon 2014

3535

How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation

12 0 120ms

3 1 68ms

Build 20 testPurchase OK

testSearch OK

Build 17 testPurchase OK

testSearch OK

Build 18 testPurchase FAILED

testSearch OK

Build 19 testPurchase OK

testSearch OK

Build # Test Case Status # SQL # Excep CPU

12 0 120ms

3 1 68ms

12 5 60ms

3 1 68ms

75 0 230ms

3 1 68ms

Test Framework Results Architectural Data

We identified a regresesion

Problem solved

Lets look behind the scenes

Exceptions probably reason for failed tests

Problem fixed but now we have an architectural regression

Problem fixed but now we have an architectural regression

Now we have the functional and architectural confidence

Page 36: Works on my machine, your problem now? - QCon 2014

3636

How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation

Embed your Architectural Results in

Jenkins

Page 37: Works on my machine, your problem now? - QCon 2014

3737

Version Control System

dynaTraceServer

Developer

CI Server

Commit

Trigger build

Build andrun tests

Publish performancemetrics

Drilldownfor further

analysis

Inform about build

status

Look beyond test pass/fail!

Page 38: Works on my machine, your problem now? - QCon 2014

3838

How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation

Analyzing All Unit / Performance

Tests

Analyze Perf Metrics

Identify Regression

s

Page 39: Works on my machine, your problem now? - QCon 2014

3939

How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation

Cross Impact of KPIs

Page 40: Works on my machine, your problem now? - QCon 2014

4040

ShareShare ResultsResults

Page 41: Works on my machine, your problem now? - QCon 2014

41© 2011 Compuware Corporation — All Rights Reserved

Simply Smarter