Building Scalable Web Architectures

Post on 14-Jan-2016

65 views 0 download

Tags:

description

Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. Goal. To build a reliable , scalable , cheap , flexible , extendable internet application. The Age of LAMP. What does a LAMP architecture give us?. Scalability. Grows in small steps - PowerPoint PPT Presentation

Transcript of Building Scalable Web Architectures

BuildingScalable Web Architectures

Aaron Bannertaaron@apache.org / aaron@codemass.com

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Goal

To build a reliable, scalable, cheap, flexible,

extendable internet application.

The Age of LAMP

What does a LAMP architecture give us?

Scalability

Grows in small steps

Stays up when it counts

Can grow with your traffic

Room for the future

Reliability

High Quality of Service

Minimal Downtime

Stability

Redundancy

Resilience

Low Cost

Little or no software licensing costs

Minimal hardware requirements

Abundance of talent

Reduced maintenance costs

Flexible

Modular Components

Public APIs

Open Architecture

Vendor Neutral

Many options at all levels

Extendable

Free/Open Source Licensing Right to Use Right to Inspect Right to Improve

Plugins Some Free Some Commercial Can always customize

Free as in Beer?

Price

Speed

Quality

Pick any two.

LAMP-like Architectures

The Big Picture

External Caching Tier

Web Serving Tier

Application Server Tier

Internal Cache Tier

Database Tier

Misc. Services (DNS, Mail, etc…)

The Glue

•Routers

•Switches

•Firewalls

•Load Balancers

Software Choices

Building LAMP Software

External Caching Tier

External Caching Tier

What is this?

Squid

Apache’s mod_proxy

Commercial HTTP Accelerator

External Caching Tier

What does it do?

Caches outbound HTTP objects

Images, CSS, XML, HTML, etc…

Flushes Connections

Useful for modem users, frees up web tier

Denial of Service Defense

External Caching Tier

Hardware Requirements Lots of Memory Moderate to little CPU Fast Network Moderate Disk Capacity

Room for cache, logs, etc… (disks are cheap)

One slow disk is OK

Two Cheapies > One Expensive

External Caching Tier

Other Questions

What to cache?

How much to cache?

Where to cache (internal vs. external)?

Web Serving Tier

Web Serving Tier

What is this? Apache

thttpd

Tux Web Server

IIS

Netscape

Web Serving Tier

What does it do? HTTP, HTTPS

Serves Static Content from disk

Generates Dynamic Content CGI/PHP/Python/mod_perl/etc…

Dispatches requests to the App Server Tier Tomcat, Weblogic, Websphere, JRun, etc…

Web Serving Tier

Hardware Requirements Lots and lots of Memory

Memory is main bottleneck in web serving

Memory determines max number of users

Fast Network

CPU depends on usage Dynamic content needs CPU

Static file serving requires very little CPU

Cheap slow disk, enough to hold your content

Web Serving Tier: Zero-copy

Performance Hint Dedicated static content servers

Modern web servers are very good at serving static

content such as

• HTML

• CSS

• Images

• Zip/GZ/Tar files

Web Serving Tier

Performance Hint Stateless Sessions

Each connection is a fresh start

Server remembers nothing

Benefits? Allows Better Caching

Scales Horizontally

Web Serving Tier

Choices How much dynamic content?

When to offload dynamic processing?

When to offload database operations?

When to add more web servers?

Application Server Tier

Application Server Tier

What does it do? Dynamic Page Processing

JSP Servlets Standalone mod_perl/PHP/Python engines

Internal Services

Eg. Search, Shopping Cart, Credit Card Processing

Application Server Tier

1. How does it work?1. Web Tier generates the request using

HTTP (aka “REST”, sortof) RPC/Corba Java RMI XMLRPC/Soap (or something homebrewed)

2. App Server processes request and responds

Application Server Tier

Caveats Decoupling of services is GOOD

Manage Complexity using well-defined APIs

Don’t decouple for scaling, change your algorithms!

Remote Calling overhead can be expensive Marshaling of data

Sockets, net latency, throughput constraints…

XML, Soap, XMLRPC, yuck (don’t scale well)

Better to use Java’s RMI, good old RPC or even Corba

Application Server Tier

More Caveats Remote Calling can introduce new failure

scenarios Classic Distributed Problems

• How to detect remote failures?

How long to wait until deciding it’s failed?

• How to react to remote failures?

What do we do when all app servers have failed?

Application Server Tier

Hardware Requirements Lots and Lots and Lots of Memory

App Servers are very memory hungry

Java was hungry to being with

Consider going to 64bit for larger memory-space

Disk depends on application, typically minimal needed

FAST CPU required, and lots of them

(This will be an expensive machine.)

Database Tier

Database Tier

Available DB Products Free/Open Source DBs

PostgreSQL GNU DBM Ingres SQLite

Commercial Oracle MS SQL IBM DB2 Sybase SleepyCat

MySQL SQLite mSQL Berkeley DB

Database Tier

What does it do? Data Storage and Retrieval Data Aggregation and Computation Sorting Filtering ACID properties

(Atomic, Consistent, Isolated, Durable)

Database Tier

Choices How much logic to place inside the DB? Use Connection Pooling? Data Partitioning?

Spreading a dataset across multiple logical database “slices” in order to achieve better performance.

Database Tier

Hardware Requirements Entirely dependent upon application. Likely to be your most expensive machine(s).

Tons of Memory Spindles galore RAID is useful (in software or hardware)

Reliability usually trumps Speed• RAID levels 0, 5, 1+0, and 5+0 are useful

CPU also important Dual power supplies Dual Network

Internal Cache Tier

Internal Cache Tier

What is this? Object Cache

What Applications? Memcache Local Lookup Tables

BDB, GDBM, SQL-based Application-local Caching (eg. LRU tables) Homebrew Caching (disk or memory)

Internal Cache Tier

What does it do? Caches objects closer to the

Application or Web Tiers Tuned for your application Very Fast Access Scales Horizontally

Internal Cache Tier

Hardware Requirements Lots of Memory

Note that 32bit processes are typically limited to 2GB of RAM

Little or no disk Moderate to low CPU Fast Network

Misc. Services (DNS, Mail, etc…)

Misc. Services (DNS, Mail, etc…)

Why mention these? Every LAMP system has them

Crucial but often overlooked

Source of hidden problems

Misc. Services: DNS

Important Points Always have an offsite NS slave

Always have an onsite NS slave

Minimize network latency Don’t use NAT, load balancers, etc…

Misc. Services: Time Synchronization

Synchronize the clocks on your systems!

Hints: Use NTPDATE at boot time to set clock

Use NTPD to stay in synch

Don’t ever change the clock on a running

system!

Misc. Services: Monitoring

System Health Monitoring Nagios

Big Brother

Orcalator

Ganglia

Fault Notification

The Glue

•Routers

•Switches

•Firewalls

•Load Balancers

Routers and Switches

ExpensiveComplexCrucial Piece of the System

Hints Use GigE if you can

Jumbo Frames are GOOD VLans to manage complexity LACP (802.3ad) for failover/redundancy

Load Balancers

Hardware vs. Software Software is complex to set up, but cheaper

Hardware is expensive, but dedicated

IMHO: Use SW at first, graduate to HW

Load Balancers

What services to balance? HTTP Caches and Servers, App Servers, DB

SlavesWhat NOT to balance?

DNS LDAP NIS Memcache Spread Anything with it’s own built-in balancing

Message Busses

What is out there? Spread JMS MQSeries Tibco Rendezvous

What does it do? Various forms of distributed message delivery.

Guaranteed Delivery, Broadcasting, etc… Useful for heterogeneous distributed systems

What about the OS?

Operating System Selection

Lots of OS choices

LinuxFreeBSDNetBSDOpenBSDOpenSolarisCommercial Unix

What’s Important?

Maintainability Upgrade Path Security Updates Bug Fixes

Usability Do your engineers like it?

Cost Hardware Requirements (you don’t need a commercial Unix anymore)

Features to look for

Multi-processor Support

64bit Capable

Mature Thread Support

Vibrant User Community

Support for your devices

Hardware Choices

Building LAMP Hardware

Commodity Hardware Discussion

Consistency vs. Specialization Consistency reduces maintenance costs

Less Burn-in testing Fewer drivers to support Fewer OS variants Fewer types of security updates, upgrades In Sort: “Don’t throw hardware at the problem.”

However, specialization may improve ROI Put the money where best needed

Commodity Hardware Discussion

What I do when planning for growth: Specialize in the beginning

When cost is more important And designs aren’t yet mature

Design for horizontal scalability Plan on machine-sized pieces Want to grow by just adding more boxes

Eventually settle on two or three machine types

In-House vs. Colocation

Almost no reason to stay in-house these days

Colos keep getting cheaperLeased lines are still expensive

Beige-Box vs. Name Brand

Determine your Req’s ahead of time Talk to your engineers First

How important is a support plan? Hardware will break, plan on it

Name Brand usually has fewer options Works well if they have exactly what you need

Seek a neutral technical advisorIn the end it should come down to cost

Disk Drive Technologies

SCSI Expensive Big (300GB) Fast Reliable

IDE Cheap Huge (500GB!) Slow On-board support, often

w/ RAID0/1

Use SCSI for Performance Use IDE for cluster nodes

IDE w/ RAID for cheap speed

Disk Drive Technologies

PATA Immature drivers

Particularly w/ OSS• Linux has poor support

Prices coming down Unnecessary addons

Hot Swap not often needed, costs more

SATA Tried and Tested Obsolete

SATA is not SCSI The fast SATAs cost as much as SCSI

SATA not quite there for servers

Disk Drive Technology: Spindles

Number of Spindles More spindles can give

Higher Throughput Higher Concurrency

• Concurrency is crucial for Databases Reliability

• Failover drives, mirrors

Memory Technologies

ECC Expensive

Use only for keystone machines

Non-ECC Cheap, Fast

Use for cluster nodes

Processor Technologies: SMP

Multiple Processors Significantly higher cost

ECC, Dual Power Expensive Chassis, Motherboard

Less Reliable More parts to break

Requires MP-capable OS Good in Linux 2.6, Solaris, FreeBSD 5.x

Dual CPU systems cost more than double Possible exception: Dual-Core CPUs

Processor Technologies: 64bit

Most 32bit OSes limit each process to 2GB Some 32bit BIOSes are limited to 3.6GB RAM 64bit chips are still expensive 64bit OSes are becoming quite mature

Solaris 10 (AMD64) Linux 2.6 (x86_64)

Programs work but not yet tuned Java looks good MySQL not so good

Summary

Design for Horizontal Scalability

Design Stateless Systems

Decouple Internal Services

Write well-defined APIs

Standardize Hardware

Use Commodity Software

(Open Source!)

Avoid Fads

Use Commodity Parts

THE END

Thank You