Performance Oriented Design
QCon São Paulo 2011Rodrigo Albani de Campos - @xinu
Agenda
• Performance & Design
• Why should I care ?
• What should I measure ?
• References
What is performance ?
the capabilities of a machine or product, esp. when observed under particular conditions : the hardware is put through tests which assess the performance of the processor.
What is design ?
his design of reaching the top: intention, aim, purpose, plan, intent, objective, object, goal, end, target; hope, desire, wish, dream, aspiration, ambition.
McLaren MP4 12c GT3
Underlying Operating Systems
Read IOPS
Run QueueUSER CPU
SYSTEM CPU
Interrupts
Write IOPS
Page inPage out
Network Traffic
IO Wait
Network CollisionPacket Loss
Disk Usage
# processes
# users
Memory Usage
Page Faults
Resident Size
Buffers Kernel Tables
What about code ?
Apr 25, 2011 5:44:02 PM org.apache.fop.fo.FOTreeBuilder fatalErrorSEVERE: javax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be nullApr 25, 2011 5:44:02 PM org.apache.fop.cli.Main startFOPSEVERE: Exceptionjavax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be null at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:217) at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:125) at org.apache.fop.cli.Main.startFOP(Main.java:166) at org.apache.fop.cli.Main.main(Main.java:197)Caused by: javax.xml.transform.TransformerException: java.lang.NullPointerException: Parameter alpha must not be null at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2416) at org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResult.java:1374) at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:393) at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:176)
We’ve been riding space shuttlesblindfoldedhandcuffed
Why should I care ?
Why should I care ?
Capacity planning is not just about the future anymore.
Today, there is a serious need to squeeze more out of your current capital equipment.
The Guerrilla Manual Onlinehttp://www.perfdynamics.com/Manifesto/gcaprules.html
Why should I care ?
“Our systems are very simple, there’s no need for such performance metrics”
It goes like this...
The Internet
Web ServerApplication Server
Database
It goes like this...
The Internet
Web ServerApplication Server
Database
It goes like this...The Internet
Web ServerApplication Server
Database
It goes like this...The Internet
Web ServerApplication Server
Slaves RO
Master RW
It goes like this...The Internet
Web ServerApplication Server
Master RWSlaves RO
It goes like this...The Internet
Web ServerApplication Server
Master RW Slaves RO
Evil Machines Corporation
Caches
It goes like this...The Internet
Web ServerApplication Server
Master RW Slaves RO
Evil Machines Corporation
Caches
CPUs will be idleDisks will be sleeping
Network will be underused
... and your users will be complaining...
Why should I care ?
“But we are using the Cloud !”
Why should I care ?
•So now you’re in an utility computing model
•You’re charged per usage
Why should I care ?
“Updating performance counters will make my
code run slower”
Why should I care ?
•Datacenter Average CPU utilization is around 15%
• If updating performance counters is a problem then you really need them
•Those microseconds will save you hours of troubleshooting !
Why should I care ?
“These are non-functional requirements”
Why should I care ?Distinct
Queries/UserQuery
RefinementRevenue/
UserAny Clicks Satisfaction Time to Click
(increase in ms)
50ms200ms500ms1000ms2000ms
0 0 0 0 0 00 0 0 -0,30% -0,40% 5000 -0,60% -1,20% -1,00% -0,90% 1200
-0,70% -0,90% -2,80% -1,90% -1,60% 1900-1,80% -2,10% -4,30% -4,40% -3,80% 3100
The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search - Eric Schurman (Amazon), Jake Brutlag (Google)
http://velocityconf.com/velocity2009/public/schedule/detail/8523
Why should I care ?
“Fast isn’t a feature, fast is a Requirement”
Jesse Robins - OPSCode
What should I measure ?
QueuesThe not so typical performance metrics
• Invented the fields of traffic engineering and queuing theory
• 1909 - Published “The theory of Probabilities and Telephone Conversations”
• 1917 - Published “Solution of some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges"
Agner Krarup Erlang
QueuesThe not so typical performance metrics
• 1961 - CTSS was first demonstrated at MIT
• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC
• Another offspring of Project MAC is Multics
• IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972
• Average purchase price: $ 771,000*
• No disks or peripherals included
• $ 4,082,039 by 2011
• Intel Core i7 Extreme Edition 990x released in 2011 peaks 159,000 MIPS @ 3,46GHz
* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html
QueuesThe not so typical performance metrics
Computer System
QueuesThe not so typical performance metrics
CPU
Disks
QueuesThe not so typical performance metrics
S
Open/ClosedNetwork
(A) λ
WR
X
A Arrival Count
λ Arrival Rate (A/T)
W Time spent in Queue
R Residence Time (W+S)
S Service Time
X System Throughput (C/T)
C Completed tasks count
(C)
Arrival Rate (λ)
• Pretty straightforward
• Requests per second/hour/day
• Not the same as throughput (X)
• Although in a steady state:
• A = C as T →∞
• λ = X
Service Time (S)
• Time spent in processing
• Web server response time
• Total query time
• IO operation time length
What to look for ?
• Stretch factor
• Method Count
• Method Service Time
• Geolocation
• Inbound & Outbound Traffic
• Round Trip Delays
What should I measure?
Average Hits/s = 65.142Average Svc time = 0.0159
What should I measure ?
• A simple tag collection data store
• For each data operation:
• A 64 bit counter for the number of calls
• An average counter for the service time
Method Call Count Service Time (ms)
dbConnect 1.876 11,2
fetchDatum 19.987.182 12,4
postDatum 1.285.765 98,4
deleteDatum 312.873 31,1
fetchKeys 27.334.983 278,3
fetchCollection 34.873.194 211,9
createCollection 118.853 219,4
What should I measure ?
What should I measure ?
Call Count x Service Time
Serv
ice
Tim
e (m
s)
Call Count
fetchKeys
fetchCollection
dbConnect fetchDatumpostDatum
deleteDatum
createCollection
References
Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services
Neil J. Gunther
http://goo.gl/59wJg
References
Analyzing Computer Systems Performance: With Perl::PDQ
Neil J. Gunther
http://goo.gl/MZ85L
References
Performance by Design: Computer Capacity Planning By Example
Daniel A. Menasce et al.
http://goo.gl/NJhwT
References
The Art of Capacity Planning: Scaling Web Resources
John Allspaw
http://goo.gl/l8szV
References
Capacity Planning for Web Performance: Metrics, Models, and Methods
Daniel Menasce & Virgilio Almeida
http://goo.gl/KsBM6
References
Measure IT:
http://www.cmg.org/measureit/
CMG Conference Proceedings:
http://www.cmg.org/proceedings/
CMG Brazilian Chapter:
http://cmgbrasil.posterous.com/
Last but not least...
Measure what is measurable, and make measurable what is not so.
Galileo Galilei
Top Related