Performance Oriented Design

Post on 11-Nov-2014

3.335 views 0 download

Tags:

description

Performance Oriented Design, presented at QCon São Paulo 2011 by Rodrigo Campos

Transcript of Performance Oriented Design

Performance Oriented Design

QCon São Paulo 2011Rodrigo Albani de Campos - @xinu

camposr@gmail.com

Agenda

• Performance & Design

• Why should I care ?

• What should I measure ?

• References

What is performance ?

the capabilities of a machine or product, esp. when observed under particular conditions : the hardware is put through tests which assess the performance of the processor.

What is design ?

his design of reaching the top: intention, aim, purpose, plan, intent, objective, object, goal, end, target; hope, desire, wish, dream, aspiration, ambition.

McLaren MP4 12c GT3

Underlying Operating Systems

Read IOPS

Run QueueUSER CPU

SYSTEM CPU

Interrupts

Write IOPS

Page inPage out

Network Traffic

IO Wait

Network CollisionPacket Loss

Disk Usage

# processes

# users

Memory Usage

Page Faults

Resident Size

Buffers Kernel Tables

What about code ?

Apr 25, 2011 5:44:02 PM org.apache.fop.fo.FOTreeBuilder fatalErrorSEVERE: javax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be nullApr 25, 2011 5:44:02 PM org.apache.fop.cli.Main startFOPSEVERE: Exceptionjavax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be null at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:217) at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:125) at org.apache.fop.cli.Main.startFOP(Main.java:166) at org.apache.fop.cli.Main.main(Main.java:197)Caused by: javax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be null at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2416) at org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResult.java:1374) at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:393) at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:176)

We’ve been riding space shuttlesblindfoldedhandcuffed

Why should I care ?

Why should I care ?

Capacity planning is not just about the future anymore.

Today, there is a serious need to squeeze more out of your current capital equipment.

The Guerrilla Manual Onlinehttp://www.perfdynamics.com/Manifesto/gcaprules.html

Why should I care ?

“Our systems are very simple, there’s no need for such performance metrics”

It goes like this...

The Internet

Web ServerApplication Server

Database

It goes like this...

The Internet

Web ServerApplication Server

Database

It goes like this...The Internet

Web ServerApplication Server

Database

It goes like this...The Internet

Web ServerApplication Server

Slaves RO

Master RW

It goes like this...The Internet

Web ServerApplication Server

Master RWSlaves RO

It goes like this...The Internet

Web ServerApplication Server

Master RW Slaves RO

Evil Machines Corporation

Caches

It goes like this...The Internet

Web ServerApplication Server

Master RW Slaves RO

Evil Machines Corporation

Caches

CPUs will be idleDisks will be sleeping

Network will be underused

... and your users will be complaining...

Why should I care ?

“But we are using the Cloud !”

Why should I care ?

•So now you’re in an utility computing model

•You’re charged per usage

Why should I care ?

“Updating performance counters will make my

code run slower”

Why should I care ?

•Datacenter Average CPU utilization is around 15%

• If updating performance counters is a problem then you really need them

•Those microseconds will save you hours of troubleshooting !

Why should I care ?

“These are non-functional requirements”

Why should I care ?Distinct

Queries/UserQuery

RefinementRevenue/

UserAny Clicks Satisfaction Time to Click

(increase in ms)

50ms200ms500ms1000ms2000ms

0 0 0 0 0 00 0 0 -0,30% -0,40% 5000 -0,60% -1,20% -1,00% -0,90% 1200

-0,70% -0,90% -2,80% -1,90% -1,60% 1900-1,80% -2,10% -4,30% -4,40% -3,80% 3100

The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search - Eric Schurman (Amazon), Jake Brutlag (Google)

http://velocityconf.com/velocity2009/public/schedule/detail/8523

Why should I care ?

“Fast isn’t a feature, fast is a Requirement”

Jesse Robins - OPSCode

What should I measure ?

QueuesThe not so typical performance metrics

• Invented the fields of traffic engineering and queuing theory

• 1909 - Published “The theory of Probabilities and Telephone Conversations”

• 1917 - Published “Solution of some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges"

Agner Krarup Erlang

QueuesThe not so typical performance metrics

• 1961 - CTSS was first demonstrated at MIT

• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC

• Another offspring of Project MAC is Multics

• IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972

• Average purchase price: $ 771,000*

• No disks or peripherals included

• $ 4,082,039 by 2011

• Intel Core i7 Extreme Edition 990x released in 2011 peaks 159,000 MIPS @ 3,46GHz

* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html

QueuesThe not so typical performance metrics

Computer System

QueuesThe not so typical performance metrics

CPU

Disks

QueuesThe not so typical performance metrics

S

Open/ClosedNetwork

(A) λ

WR

X

A Arrival Count

λ Arrival Rate (A/T)

W Time spent in Queue

R Residence Time (W+S)

S Service Time

X System Throughput (C/T)

C Completed tasks count

(C)

Arrival Rate (λ)

• Pretty straightforward

• Requests per second/hour/day

• Not the same as throughput (X)

• Although in a steady state:

• A = C as T →∞

• λ = X

Service Time (S)

• Time spent in processing

• Web server response time

• Total query time

• IO operation time length

What to look for ?

• Stretch factor

• Method Count

• Method Service Time

• Geolocation

• Inbound & Outbound Traffic

• Round Trip Delays

What should I measure?

Average Hits/s = 65.142Average Svc time = 0.0159

What should I measure ?

• A simple tag collection data store

• For each data operation:

• A 64 bit counter for the number of calls

• An average counter for the service time

Method Call Count Service Time (ms)

dbConnect 1.876 11,2

fetchDatum 19.987.182 12,4

postDatum 1.285.765 98,4

deleteDatum 312.873 31,1

fetchKeys 27.334.983 278,3

fetchCollection 34.873.194 211,9

createCollection 118.853 219,4

What should I measure ?

What should I measure ?

Call Count x Service Time

Serv

ice

Tim

e (m

s)

Call Count

fetchKeys

fetchCollection

dbConnect fetchDatumpostDatum

deleteDatum

createCollection

References

Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services

Neil J. Gunther

http://goo.gl/59wJg

References

Analyzing Computer Systems Performance: With Perl::PDQ

Neil J. Gunther

http://goo.gl/MZ85L

References

Performance by Design: Computer Capacity Planning By Example

Daniel A. Menasce et al.

http://goo.gl/NJhwT

References

The Art of Capacity Planning: Scaling Web Resources

John Allspaw

http://goo.gl/l8szV

References

Capacity Planning for Web Performance: Metrics, Models, and Methods

Daniel Menasce & Virgilio Almeida

http://goo.gl/KsBM6

References

Measure IT:

http://www.cmg.org/measureit/

CMG Conference Proceedings:

http://www.cmg.org/proceedings/

CMG Brazilian Chapter:

http://cmgbrasil.posterous.com/

Last but not least...

Measure what is measurable, and make measurable what is not so.

Galileo Galilei