Scaling apps for the big time
-
Upload
proitconsult -
Category
Technology
-
view
145 -
download
0
description
Transcript of Scaling apps for the big time
Pro IT ConsultingScaling apps for the big time
The Challenge?
• You have an app that works• You have users that like it
Awesome• Performance is suffering as you scale.• Reliability is getting worse, not better.• As your data sets grow,
the problems are more pronounced.• The operations team are talking about problems, not solutions
Not so awesome
So what happens if you win big?
You are not alone – unfortunately…
• Your cool app• May end up supported • By lots of things• You can’t control
You are not alone – unfortunately…
What is the root cause?
• Take the time to understand what happens when your code asks the server to do some task.
select * from some_production_table_with_100,000,000_records
Is really not the same workload as
select * from some_dev_table_with_100_records
• Look for evidence in logs and tools that provide real insight.
What is the root cause?
Issues of priority…
• Disk drive, single user session
• Disk drives, Multiple users….
Issues of Scale…
• Fetching Blocks, single user session
• Fetching Blocks, enterprise workload
Storage
• Many database and operating system vendor recommendations are woefully out of date.
• Modern techniques utilising flash in the right way can deliver millions of random IOPS.
• SAN and flash vendors have made dramatic changes over the last few years that invalidate many of the old recommendations.
• Some principles still hold and are important for optimised performance– 1 process writes to each disk group– Avoid reads and writes occurring simultaneously if possible
CPU
• CPUs are not all created equal.• Use SpecInt to compare if it matters for your workload.• Split up the work and scale wide if you can. There is a reason
the web scale companies have.• Don’t process work now that can wait until later. • Later might be in a few seconds and on another box.• Schedule intensive workloads like reports.• Don’t expect your laptop and the production server to scale
the same way.
Memory
• Memory is addressable in various forms with performance tradeoffs for capacity.
• Use the lowest latency one you can afford.
Memory Type Typical Capacity ApproximateAccess time
CPU cache 30MB < 10 nsDDR3 64GB <100nsSSD ~ 800GB <20,000nsFC or SAS ~ 1TB <20,000,000nsSATA 4TB + <8,000,000ns
Network
• Why is it that we conceptualise networks from an individual point of view?
NetworkThe best transport is context dependent
Network
• Latency & Bandwidth are not the same thing.– Think satellite delay on a TV interview
• In this context we use these definitions– Latency is the amount of time a network takes to reach the other end.– Bandwidth is the rate at which we can successfully transmit data to the
other end.
• This is why you need to test your app through a latency generator. – There are capable free open source tools such as WANEM
Middleware
• Websphere, WebLogic, JBOSS, Tomcat– Garbage collection tradeoffs between JVM size and system
memory/CPU capacities.
• Django– Read HighPerformanceDjango by the team from Lincoln Loop– Sponsored by the Common Code team
SQL databases
• Microsoft SQL, Oracle DB, PostgreSQL & MySQL.• Various strengths & weaknesses for each but have some key
things in common.• Offload reporting away from OLTP workloads• Indexes are important• Transaction Logs are a performance bottleneck• Think deeply about scaling out • Think about caching queries• Backups are critical because you will need to restore one day
Backup is about Restore
• Enterprise wide backup will find all your infrastructure failings by pushing more data for longer while other work continues.
• Test your restores. Really, test them.• Offload large backups away from your production systems.
Questions?
How to get in touch?
James CliffordEmail: [email protected]: 0421 648 034
Brenton CarbinsEmail: [email protected]: 0409 779 230