Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research...

71
Performance Analysis in the Cloud A Reflective Practitioner’s Perspective

Transcript of Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research...

Page 1: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Performance Analysis in the Cloud

A Reflective Practitioner’s Perspective

Page 2: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Naur, Peter. “Programming as Theory Building.” Computing: A

Human Activity, 37-49. ACM Press, 1992

(Originally published in: Microprocessing and

Microprogramming 15:253-261, 1985)

Page 3: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Schön, Donald. 1983. The Reflective Practitioner: How

Professionals Think in Action.

New York: Basic Books.

Page 4: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

– Ben Shneiderman, The New ABCs of Research

“The design community celebrates Donald Schön’s 1983 book The Reflective Practitioner:

How Professionals Think in Action for its illuminating discussion of creative problem-

solving by experienced professionals.”

Page 5: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Reflection-in-action

Page 6: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Views of

Professional Practice

Page 7: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Technical Rationality Positivist View Reflection-in-Action

Means Ends Means <-> Ends

Problem Solving = Technical Procedure +

Pre-Established ObjectiveProblem Setting and Framing

Page 8: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Technical Rationality Positivist View Reflection-in-Action

Research Practice Practice is kind of Research

Practice = Application of objective research based theories which are

based on controlled experiment

Reflective conversation with the situation, rigorous on-the-spot

experiments

Page 9: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Technical Rationality Positivist View Reflection-in-Action

Knowing Doing Knowing <-> Doing

Action = implementation + test of technical decision

Inquiry is a transaction with the situation where knowing and

doing are inseparable

Page 10: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

“In his 1967 article, Dilemmas of an Engineering Education, Harvey Brooks described the

predicament of the practicing engineer who is expected to bridge the rapidly changing body

of knowledge and the rapidly changing expectations of society”

Page 11: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Devs / Systems Engineers / Data Scientists

Page 12: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Team Leads

Page 13: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Managers

Page 14: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Managers of Managers

Page 15: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Cloud

Page 16: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 17: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Anecdotes you can use Von Helfer Zu Helfer

Page 18: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Performance Modelling

“Data comes from the Devil, only Models come from God”

— Neil Gunther

Page 19: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Problem Solving Setting & Frame

Page 20: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

T-Shirt Counter at FrOSCon

(or the cold brew counter)

Page 21: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Guiding Principles Appreciative System Overarching Theory

• Queuing Theory

• Statistics and Data Analysis

Page 22: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• I present the following exposition as a “technician” of queues, not as a mathematician - this is in equal parts an admittance of weakness, and also an expression of wonder.

“I confess that life is rather a subject of wonder, than of didactics” — Ralph Waldo Emerson

Page 23: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 24: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 25: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

M / M / 1

Arrival distribution

Service distribution

Number of Servers oder

FrOSCon Helfer

M/M/1 Queue at the T-Shirt Counter

Arrivals and Service Periods are assumed to be statistically random (exponentially distributed)

Page 26: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

PDQ Metrics

Page 27: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Remember: Averages

Page 28: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Symbol Metric PDQ Value Units

λ Arrival Rate Input 0.75 participants/minute

S Service Time Input 1.0 minute

Inputs for M/M/1 queue

Page 29: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Outputs

Metric Formula Calculated Value

ρ λS 0.75

R S / (1 - λS) 1 / ( 1 - (3/4) X 1) = 4

Q λR 3

Page 30: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Demo! PDQ Model of FrOSCon

T-Shirt Counter in R

Page 31: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

FrOSCon as a Flow of

Participants, Helfer and Organizers In & Out of

Halls, Counters, Dev Rooms and All the Doors

• Impossible to calculate metrics by hand and formulae alone

• PDQ for analytical and Pretty Damn Quick computation!

Page 32: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Von FrOSCon Zu AWS

Page 33: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 34: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 35: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 36: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Performance Metrics

• Throughput(X)

• Response Time(R)

Page 37: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Derived Performance Metrics

• Little’s Law

• Num of Concurrent Requests

• N = X * R [ Macroscopic version ]

• Service Time

• U = X * S [ Microscopic version ]

• S = U / X

Page 38: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Timestamp Xdat Ndat Sest Rdat Udat 1486771200000.000000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120 1486771500000.000000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420 1486771800000.000000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980 1486772100000.000000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700 1486772400000.000000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860 1486772700000.000000, 528.587722, 201.187500, 0.000914, 0.366283, 0.483160 1486773000000.000000, 533.439054, 202.600006, 0.000892, 0.378207, 0.476080 1486773300000.000000, 531.708059, 208.187500, 0.000909, 0.392556, 0.483160 1486773600000.000000, 532.693783, 203.266663, 0.000894, 0.379749, 0.476020 1486773900000.000000, 519.748550, 200.937500, 0.000895, 0.381078, 0.465260

Page 39: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Modelling

Page 40: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

N

XThread-limited Throughput

Time Independent View (Steady State)

Page 41: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

N

RThread-limited Latency

Time Independent View (Steady State)

Page 42: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

0 100 200 300 400 500

020

040

060

080

010

00

Production Data July 2016

Concurrent users

Thro

ughp

ut (r

eq/s

)

July 2016

Page 43: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

0 100 200 300 400 500

020

040

060

080

010

00

Concurrent users

Thro

ughp

ut (r

eq/s

)

PDQ Model of Production Data July 2016

Nopt = 174.5367thrds = 250.00

DataPDQ

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

Concurrent users

Res

pons

e tim

e (s

)

PDQ Model of Production Data July 2016

Nopt = 174.5367thrds = 250.00

DataPDQ

Page 44: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Model Evaluation• Looks good visually!

• Required 350 “dummy queues” internally for correct Rmin

• What do the dummy queues represent?

• Polling?

• Parallelism?

Page 45: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

0 100 200 300 400 500

020

040

060

080

010

00

Production data Oct 2016

Concurrent users

Thro

ughp

ut (r

eq/s

)

October 2016

Page 46: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

October 2016 data breaks the model!

Page 47: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Adjusted PDQ Model

Page 48: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 49: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Spot is cheap, show me

the scaling metricReflection-in-action for on-the-spot experiments

Page 50: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• Upto 90% discount compared to on-demand

• For improving probability of obtaining instances - diversify across availability zones and instance types and sizes.

• Eg., market only has m4.2xlarge instance whereas the application was only tested on m4.10xlarge

• Application re-configuration required

• CPU%, Latency, Throughput not useful for autoscaling

Page 51: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Use N Number of concurrent requests the application can handle

Page 52: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 53: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Disk UtilizationThe Achilles Heel of Administration:

Kafka and Graphite Unplugged

Page 54: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Metrics, metrics everywhere… Oops! Graphite’s got a disk problem!

Metrics, metrics nowhere?

• Capacity planning for Graphite

• Performance issue discovered during data collection itself

• Less IOPS than what EBS volume was capable of - why are we losing IOPS?

Page 55: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• Multiple investigations carried out by AWS and myself

• Hello throttling!

• But no easy way for the end user to know!

Page 56: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• iostat showed high and bursty writes/sec

• Configure Graphite to reduce writes/sec

• Compromise, but stable behaviour

• Ask the right questions.

Page 57: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Predicting disk usage of Kafka

• Queueing theory as the guiding principle

• Data didn't make sense - what exactly is disk utilisation (to get the service time)? How does it relate with disk metrics from AWS?

• Went as far as block level IO metrics

• EBS queue length, throughput and latency metrics

• Linux iostat for throughput and util%

• “svctm”: The average service time for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version”

• “No Service, No Queues”

Page 58: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• Discussion with AWS

• Didn’t consider queueing theory, but suggested statistical approaches instead - linear regression for eg.

Page 59: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Serverless aber nicht Serviceless

Concurrency in Serverless Applications

Page 60: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Lambda: Concurrency• An invocation of a lambda function as the unit of

concurrency

• Lambda requires specifying memory (AWS calculates CPU accordingly)

• In some sense, a lambda function itself acts a server(when invoked).

Page 61: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Lambda: Concurrency

concurrency

= events per second * function duration

(for non-stream based event sources)

Little’s Law!

Page 62: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Lambda: Concurrency• You have to instrument your code to calculate

events rate. See references at end of presentation.

• By default 1000 is the limit on concurrent executions

• After that, throttling kicks in

• In some sense, throttling is a queueing mechanism.

Page 63: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Threads or Lambdas?

• Threads, Lambdas and Lambdas with Threads (https://www.awsadvent.com/2016/12/04/exploring-concurrency-in-python-aws/)

• Summary: use a mix of lambdas and threads (and lambdas with threads) to achieve good level of concurrency.

Page 64: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Monitoring in the Cloud

The Medium is the Message

Page 65: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Notebooks Vs. Dashboards

Page 66: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Better metrics and programming language support• Python Vs Java - metrics-wise Java has better support with JMX

etc.

• In monitoring Tomcat, busy threads are less significant than number of threads actually servicing requests (which is also harder to obtain)

• Where are the queues forming?

• Slow transfers to s3 manifested as delayed data transfer and processing

• netstat showed large Send-Q values: change of application as well as s3 sdk code

Page 67: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Data VisualizationQuiz Time!

Page 68: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

References and Resources

Page 69: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •
Page 70: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

• https://speakerdeck.com/alcy/exploring-concurrency-in-python-and-aws

• http://www.perfdynamics.com/papers.html#tth_sEc1

Page 71: Performance Analysis in the Cloud · 2019. 6. 12. · – Ben Shneiderman, The New ABCs of Research “The design community celebrates Donald ... concurrency-in-python-aws/) ... •

Q & A About Ihren Helfer

• a1cy on twitter

• Queue Technician

• No longer a data visualisation dilettante

• Re-inventing Automation

• Learning to Speak Squeak