Post on 14-Jan-2016
description
BuildingScalable Web Architectures
Aaron Bannertaaron@apache.org / aaron@codemass.com
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Goal
To build a reliable, scalable, cheap, flexible,
extendable internet application.
The Age of LAMP
What does a LAMP architecture give us?
Scalability
Grows in small steps
Stays up when it counts
Can grow with your traffic
Room for the future
Reliability
High Quality of Service
Minimal Downtime
Stability
Redundancy
Resilience
Low Cost
Little or no software licensing costs
Minimal hardware requirements
Abundance of talent
Reduced maintenance costs
Flexible
Modular Components
Public APIs
Open Architecture
Vendor Neutral
Many options at all levels
Extendable
Free/Open Source Licensing Right to Use Right to Inspect Right to Improve
Plugins Some Free Some Commercial Can always customize
Free as in Beer?
Price
Speed
Quality
Pick any two.
LAMP-like Architectures
The Big Picture
External Caching Tier
Web Serving Tier
Application Server Tier
Internal Cache Tier
Database Tier
Misc. Services (DNS, Mail, etc…)
The Glue
•Routers
•Switches
•Firewalls
•Load Balancers
Software Choices
Building LAMP Software
External Caching Tier
External Caching Tier
What is this?
Squid
Apache’s mod_proxy
Commercial HTTP Accelerator
External Caching Tier
What does it do?
Caches outbound HTTP objects
Images, CSS, XML, HTML, etc…
Flushes Connections
Useful for modem users, frees up web tier
Denial of Service Defense
External Caching Tier
Hardware Requirements Lots of Memory Moderate to little CPU Fast Network Moderate Disk Capacity
Room for cache, logs, etc… (disks are cheap)
One slow disk is OK
Two Cheapies > One Expensive
External Caching Tier
Other Questions
What to cache?
How much to cache?
Where to cache (internal vs. external)?
Web Serving Tier
Web Serving Tier
What is this? Apache
thttpd
Tux Web Server
IIS
Netscape
Web Serving Tier
What does it do? HTTP, HTTPS
Serves Static Content from disk
Generates Dynamic Content CGI/PHP/Python/mod_perl/etc…
Dispatches requests to the App Server Tier Tomcat, Weblogic, Websphere, JRun, etc…
Web Serving Tier
Hardware Requirements Lots and lots of Memory
Memory is main bottleneck in web serving
Memory determines max number of users
Fast Network
CPU depends on usage Dynamic content needs CPU
Static file serving requires very little CPU
Cheap slow disk, enough to hold your content
Web Serving Tier: Zero-copy
Performance Hint Dedicated static content servers
Modern web servers are very good at serving static
content such as
• HTML
• CSS
• Images
• Zip/GZ/Tar files
Web Serving Tier
Performance Hint Stateless Sessions
Each connection is a fresh start
Server remembers nothing
Benefits? Allows Better Caching
Scales Horizontally
Web Serving Tier
Choices How much dynamic content?
When to offload dynamic processing?
When to offload database operations?
When to add more web servers?
Application Server Tier
Application Server Tier
What does it do? Dynamic Page Processing
JSP Servlets Standalone mod_perl/PHP/Python engines
Internal Services
Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier
1. How does it work?1. Web Tier generates the request using
HTTP (aka “REST”, sortof) RPC/Corba Java RMI XMLRPC/Soap (or something homebrewed)
2. App Server processes request and responds
Application Server Tier
Caveats Decoupling of services is GOOD
Manage Complexity using well-defined APIs
Don’t decouple for scaling, change your algorithms!
Remote Calling overhead can be expensive Marshaling of data
Sockets, net latency, throughput constraints…
XML, Soap, XMLRPC, yuck (don’t scale well)
Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier
More Caveats Remote Calling can introduce new failure
scenarios Classic Distributed Problems
• How to detect remote failures?
How long to wait until deciding it’s failed?
• How to react to remote failures?
What do we do when all app servers have failed?
Application Server Tier
Hardware Requirements Lots and Lots and Lots of Memory
App Servers are very memory hungry
Java was hungry to being with
Consider going to 64bit for larger memory-space
Disk depends on application, typically minimal needed
FAST CPU required, and lots of them
(This will be an expensive machine.)
Database Tier
Database Tier
Available DB Products Free/Open Source DBs
PostgreSQL GNU DBM Ingres SQLite
Commercial Oracle MS SQL IBM DB2 Sybase SleepyCat
MySQL SQLite mSQL Berkeley DB
Database Tier
What does it do? Data Storage and Retrieval Data Aggregation and Computation Sorting Filtering ACID properties
(Atomic, Consistent, Isolated, Durable)
Database Tier
Choices How much logic to place inside the DB? Use Connection Pooling? Data Partitioning?
Spreading a dataset across multiple logical database “slices” in order to achieve better performance.
Database Tier
Hardware Requirements Entirely dependent upon application. Likely to be your most expensive machine(s).
Tons of Memory Spindles galore RAID is useful (in software or hardware)
Reliability usually trumps Speed• RAID levels 0, 5, 1+0, and 5+0 are useful
CPU also important Dual power supplies Dual Network
Internal Cache Tier
Internal Cache Tier
What is this? Object Cache
What Applications? Memcache Local Lookup Tables
BDB, GDBM, SQL-based Application-local Caching (eg. LRU tables) Homebrew Caching (disk or memory)
Internal Cache Tier
What does it do? Caches objects closer to the
Application or Web Tiers Tuned for your application Very Fast Access Scales Horizontally
Internal Cache Tier
Hardware Requirements Lots of Memory
Note that 32bit processes are typically limited to 2GB of RAM
Little or no disk Moderate to low CPU Fast Network
Misc. Services (DNS, Mail, etc…)
Misc. Services (DNS, Mail, etc…)
Why mention these? Every LAMP system has them
Crucial but often overlooked
Source of hidden problems
Misc. Services: DNS
Important Points Always have an offsite NS slave
Always have an onsite NS slave
Minimize network latency Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization
Synchronize the clocks on your systems!
Hints: Use NTPDATE at boot time to set clock
Use NTPD to stay in synch
Don’t ever change the clock on a running
system!
Misc. Services: Monitoring
System Health Monitoring Nagios
Big Brother
Orcalator
Ganglia
Fault Notification
The Glue
•Routers
•Switches
•Firewalls
•Load Balancers
Routers and Switches
ExpensiveComplexCrucial Piece of the System
Hints Use GigE if you can
Jumbo Frames are GOOD VLans to manage complexity LACP (802.3ad) for failover/redundancy
Load Balancers
Hardware vs. Software Software is complex to set up, but cheaper
Hardware is expensive, but dedicated
IMHO: Use SW at first, graduate to HW
Load Balancers
What services to balance? HTTP Caches and Servers, App Servers, DB
SlavesWhat NOT to balance?
DNS LDAP NIS Memcache Spread Anything with it’s own built-in balancing
Message Busses
What is out there? Spread JMS MQSeries Tibco Rendezvous
What does it do? Various forms of distributed message delivery.
Guaranteed Delivery, Broadcasting, etc… Useful for heterogeneous distributed systems
What about the OS?
Operating System Selection
Lots of OS choices
LinuxFreeBSDNetBSDOpenBSDOpenSolarisCommercial Unix
What’s Important?
Maintainability Upgrade Path Security Updates Bug Fixes
Usability Do your engineers like it?
Cost Hardware Requirements (you don’t need a commercial Unix anymore)
Features to look for
Multi-processor Support
64bit Capable
Mature Thread Support
Vibrant User Community
Support for your devices
Hardware Choices
Building LAMP Hardware
Commodity Hardware Discussion
Consistency vs. Specialization Consistency reduces maintenance costs
Less Burn-in testing Fewer drivers to support Fewer OS variants Fewer types of security updates, upgrades In Sort: “Don’t throw hardware at the problem.”
However, specialization may improve ROI Put the money where best needed
Commodity Hardware Discussion
What I do when planning for growth: Specialize in the beginning
When cost is more important And designs aren’t yet mature
Design for horizontal scalability Plan on machine-sized pieces Want to grow by just adding more boxes
Eventually settle on two or three machine types
In-House vs. Colocation
Almost no reason to stay in-house these days
Colos keep getting cheaperLeased lines are still expensive
Beige-Box vs. Name Brand
Determine your Req’s ahead of time Talk to your engineers First
How important is a support plan? Hardware will break, plan on it
Name Brand usually has fewer options Works well if they have exactly what you need
Seek a neutral technical advisorIn the end it should come down to cost
Disk Drive Technologies
SCSI Expensive Big (300GB) Fast Reliable
IDE Cheap Huge (500GB!) Slow On-board support, often
w/ RAID0/1
Use SCSI for Performance Use IDE for cluster nodes
IDE w/ RAID for cheap speed
Disk Drive Technologies
PATA Immature drivers
Particularly w/ OSS• Linux has poor support
Prices coming down Unnecessary addons
Hot Swap not often needed, costs more
SATA Tried and Tested Obsolete
SATA is not SCSI The fast SATAs cost as much as SCSI
SATA not quite there for servers
Disk Drive Technology: Spindles
Number of Spindles More spindles can give
Higher Throughput Higher Concurrency
• Concurrency is crucial for Databases Reliability
• Failover drives, mirrors
Memory Technologies
ECC Expensive
Use only for keystone machines
Non-ECC Cheap, Fast
Use for cluster nodes
Processor Technologies: SMP
Multiple Processors Significantly higher cost
ECC, Dual Power Expensive Chassis, Motherboard
Less Reliable More parts to break
Requires MP-capable OS Good in Linux 2.6, Solaris, FreeBSD 5.x
Dual CPU systems cost more than double Possible exception: Dual-Core CPUs
Processor Technologies: 64bit
Most 32bit OSes limit each process to 2GB Some 32bit BIOSes are limited to 3.6GB RAM 64bit chips are still expensive 64bit OSes are becoming quite mature
Solaris 10 (AMD64) Linux 2.6 (x86_64)
Programs work but not yet tuned Java looks good MySQL not so good
Summary
Design for Horizontal Scalability
Design Stateless Systems
Decouple Internal Services
Write well-defined APIs
Standardize Hardware
Use Commodity Software
(Open Source!)
Avoid Fads
Use Commodity Parts
THE END
Thank You