Keeping the Site Up Under Extreme Traffic

16
Keeping the Site Up Under Extreme Traffic

description

This is a talk I did about keeping a website up under extreme trarffic conditions, given in combination with Akamai at IRCE 2012.

Transcript of Keeping the Site Up Under Extreme Traffic

Page 1: Keeping the Site Up Under Extreme Traffic

Keeping the Site UpUnder Extreme Traffic

Page 2: Keeping the Site Up Under Extreme Traffic

2

• CTO of HauteLook• Oversee all custom applications built in-house and out• Major focus on customer experience based applications• Utilize Agile SCRUM and Lean Kanban SDLC• Manage single data center cage• Private cloud environment• Open source technology stack

Who am I? Kevin Diamond

Page 3: Keeping the Site Up Under Extreme Traffic

3

• Private sale, members-only limited-time sale events• Premium fashion and lifestyle brands at exclusive prices of 50-75% offOver

8 million active members• Acquired by Nordstrom in 2011• Increased sales by over 60% in 2011 and on pace to do the same in 2012• Over 20 new sale events begin each morning at 8am PST

Who is HauteLook

Page 4: Keeping the Site Up Under Extreme Traffic

4

• Every morning at 8am when our new sale events go live, we get the “Black Friday” equivalent spike in traffic

• And on special event days (really big brands) we can have spikes that are 3x higher!

Why do we know about extreme traffic?

Page 5: Keeping the Site Up Under Extreme Traffic

5

• Measure everything!• Free tools like Ganglia and Cacti• Cacti is great for measuring single services and servers• Ganglia is great for measuring clusters

So how do you plan for spikes in traffic?

Page 6: Keeping the Site Up Under Extreme Traffic

6

Graphs galore!

Page 7: Keeping the Site Up Under Extreme Traffic

7

• Experienced Systems Administrators (Infrastructure Gurus) will save you money in the long run

• Even to properly identify your bottlenecks and ensure your monitoring is right… you need the right people

• They then need to evaluate how to solve your bottlenecks• What solutions exist and what the cost• Run the RFP process for things you need to buy• Know the open source tools that can you get for free• Implement your solutions• And again monitor the results and work vendors to tweak until perfect

The right people make all the difference

Page 8: Keeping the Site Up Under Extreme Traffic

8

• The only way to scale quickly and cheaply is horizontally• Build and use smaller applications• Virtualize your environment into small VMs• Load Balance across many VMs• Ensure your load balancer can quickly add and remove nodes• Also ensure your load balancer can detect when a VM is operational

Build to scale horizontally

Page 9: Keeping the Site Up Under Extreme Traffic

9

• For extreme traffic scaling you need a cloud• If you have a lot of server hardware available, build a private cloud

– Scale up and down number of VMs running when reaching certain thresholds– Have priority levels to allow certain VMs to even be shutoff if more resources

are needed than are free– Requires centralized storage to move VM Images around to hardware

• If you don’t, scale to the public cloud– Amazon AWS, Microsoft Azure, etc– This requires Global Load Balancing– But provides infinite growth potential– Watch out for Latency

• And/Or go Dynamic Site Acceleration– Akamai product to scale dynamic page caching to their edge network– Almost infinite growth potential with no latency issues

Advanced scaling

Page 10: Keeping the Site Up Under Extreme Traffic

10

• Set a threshold that your system should be able to expand to handle• Keep raising the threshold as your traffic continues to grow• At HauteLook that is 3x our last PEAK• Plan for that threshold• Buy for that threshold• Test to that threshold• Load testing is a must, don’t trust that all things WILL scale like planned• Identify your bottlenecks at scale

Know your threshold

Page 11: Keeping the Site Up Under Extreme Traffic

11

• Bandwidth• CPU• Memory• Hard Disk I/O

Most common bottlenecks

Page 12: Keeping the Site Up Under Extreme Traffic

12

• Reserve your bandwidth for things that change• Get a CDN to offload static and cached objects• Get a burstable pipe (ex: commit to 1GigE but on a 10GigE port)• Ensure you are billed on 95/5 for that pipe

Bandwidth

Page 13: Keeping the Site Up Under Extreme Traffic

13

• CPU & Memory can be solved in much the same way• Often can find huge savings in just tweaking your services

– fastcgi.server = ( ".php" => ( "localhost" => ("max-procs" => 4,"min-procs" => 1,"bin-environment" => (

"PHP_FCGI_CHILDREN" => "4","PHP_FCGI_MAX_REQUESTS" => "5000"))))

• Or switching to newer/better services– Apache vs. Lighttp vs. NGINX– Mod_PHP vs. FastCGI PHP vs. PHP-FPM– Websphere vs. Tomcat vs. Glassfish

• Then tweak your applications• Go VM to encapsulate your application environments to run leaner• Lastly buy more physical hardware

CPU & Memory

Page 14: Keeping the Site Up Under Extreme Traffic

14

• More hard drives– RAID-0 or RAID-10– Get a SAN, also will provide centralized storage to do private cloud– EMC or NetApp

• Faster hard drives– SSD is quickly coming down in price– Either loaded in-server or in your SAN

• Specialty hardware– Fusion-IO in-server flash storage – Fusion-IO ioTurbine middleware flash caching between VM and SAN– PureStorage Flash Array, all SSD SAN

Hard disk I/O

Page 15: Keeping the Site Up Under Extreme Traffic

15

• So you couldn’t scale, what now?• Put up a static page “We’re sorry!!”• Put up a queue system

– Static page refreshes that slowly allow traffic through to your dynamic site– Delivered from a CDN or the cloud– Can be made to prioritize best customers using cookies (only if planned for in

advance…)• Don’t let it happen again!

Disaster Recovery

Page 16: Keeping the Site Up Under Extreme Traffic

16

• Thanks for taking the time with me today• If you have questions, please email me• [email protected]

Thank you!