Changing rules 1_stopcheating_slideshare

CLOUD TESTINGRewriting the Rules of Performance Testing

RULE 1: Stop Cheating and Start Running Realistic Tests

SOASTA Webinar Series

BC: (Before Cloud)We Worked With What We Had…

Before the web, when apps served hundreds, there was…

Circa 1991

When apps peaked at thousands, we had a few more options

Turn of 21st Century

“Virtual Users” were a valuable commodity

1 VU = $1200!

Yet many were left wanting

Untested websites, 2011: 75%

Necessity Led to Workarounds How we’ve “cheated” to get the job done

1) Modified “Think Time” to stretch VUsExample 2 virtual users ≠ 1 divided in 2

≠

2) Extrapolated results based on small lab testsEducated or assisted guessing is no match for measuring at real scale

Necessity Led to Workarounds How we’ve “cheated” to get the job done

4) Accepted blind spots by focusing on limited, single metrics (e.g. response time)Without complete end-to-end views, everything’s a black box

3) Tested pages or assets in a silo, ignoring realistic pace and flow of user behaviorOptimizes limited test hardware, but disregards session states, caching, etc.

Let’s Look at the NEW RULES

Scott Barber

Establishing Accuracy and Realism

1) Modifying Think Time: The Wrong Way

“If all you have is a hammer, everything looks like a nail”

-- Bernard BaruchTo Cheat a Software License• We did what we had to so we could generate some semblance

of load• We often found real and serious performance issues• Compared to *not* cheating, we added increased value• But they were often not the “right” ones• We still couldn’t simulate production, and we still got burned

Stretch Limited Hardware• We had the same issue with hardware, so we overloaded what

we had• Again, we found real and serious performance issues• Again, it increased value, but again, we rarely found the “right”

issues• And, again, we got burned in production

1) Modifying Think Time: The Right Way

The only way to simulate production……is to simulate

production.Users Think… and Type

• Guess what? They all do it at different speeds!• Guess what else? It's your job to figure out how to model and

script those varying speeds

Determine how long they think• Log files• Industry research• Observation• Educated guess/Intuition • Combinations are best

Frightening

NotFrightening

1) Modifying Think Time: The Right Way

When you get it wrong, it’s…

When you get it right, it’s…

2) Extrapolating Capacity: The Wrong Way

Extrapolating performance test results is black magic

Unless you are, or were trained by, Connie Smith, Ph.D.The most common type of bad extrapolation…

• 1 leg of an n leg system ≠ 1/nth capacity• Fractional virtual resources ≠ fractional capacity

Other types of bad extrapolation...• Faster processors in production ≠ faster response time• More resources ≠ faster response time• Any extrapolation that presumes linear correlations

DON’T DO IT

2) Measuring Capacity: The Right Way

Realistically, there are 3 ways to predict capacity

Trust your gut & cross your fingers• Gut feelings are sometimes very accurate• They can also cost you your job

Reverse cross-validate• Use post-release production data to modify & re-measure test

environment• Use new results to make predictions for prod• Check new predictions vs. reality, revise repeat

Find a way to run some tests in the actual production environment• You can learn a lot from loads below expected peak• A few of hours of scheduled maintenance in the middle of the night

can change *everything*

3) Modeling User Flows: The Wrong Way

You can’t test everything……the possibilities are literally

endless.

Implementing functional use cases or scenarios…• Will have you scripting until the sun explodes, AND• Will regularly miss “easy” stuff by choosing and prioritizing

poorly

Picking the most common, or most “important” flow• Is unlikely to catch the worst performance issues• Is likely to lead the application to be “hyper-tuned” for that

scenario• Is likely to yield unwanted surprises

3) Modeling User Flows: The Wrong Way

Common activities (get from logs)

e.g. Resource hogs (get from developers/admins)

Even if these activities are both rare and not risky

SLA’s, Contracts and other stuff that will get you sued

What the users will see and are mostly likely to complain about. What is likely to earn you bad press

New technologies, old technologies, places where it’s failed before, previously under-tested areas

Don’t argue with the boss (too much)

3) Modeling User Flows: The Right WayTell lots of little lies?

…No! FIBLOTS

3) Modeling User Flows: The Right Way

4) Measuring Performance: The Wrong Way

All three have an average of 4.

Which has the “best” performance”?

How do you know?

4) Measuring Performance: The Right Way

Now which has the “best” performance”?

Changing rules 1_stopcheating_slideshare

Technology

Transcript of Changing rules 1_stopcheating_slideshare