Prepare to Scale
description
Transcript of Prepare to Scale
1
Prepare to Scale
Bill O’Connor, CTOd.o.: csevb10
1
22
33
4
Basic Infrastructure
• Single-Server– Database & Application on the same
server– Start optimizing what you have
• Apache• Drupal• PHP• Database
4
– Optimizations you make for the first server will be applicable for future servers
– Strategy: Optimize what you have, then divert traffic through caching and specialization.
5
Drupal• 1-word:
– Support for Database Replication– Support for Squid/Varnish– MySQL optimizations– PHP5 optimizations– http://fourkitchens.com/pressflow-makes-drupal-scale/
downloads
5
6
DB• MyISAM– Default storage engine for <= Drupal 6– Good for selects
• Read-only sort of websites
– Poor Read-write performance for large websites
6
7
DB, Cont.• Falcon– Beta-stage project for MySQL– Different performance characteristics than other engines
(both + & -)• Not ready for primetime, but worth watching
7
8
DB, Cont.• InnoDB is your friend in most scenarios.– Row-level vs Table-level locking– Improves read/write functionality
• Does slow pure read functionality to some degree
– Easier to do it right from the start, then have to revisit the issue later when you have users and traffic
– Default Store Engine of Drupal 7+– Best bet at the moment for allowing your site to scale
8
9
PHP• Opcode caching– Sort of like having a compiled version of your application– Optimizes your application– Stores the compiled PHP bytecode for execution in stored
memory– Result: Smaller PHP memory footprint (read more users
with less hardware) and faster execution of code.• Virtually a necessity for any large-scale/high-volume Drupal deployment
9
10
PHP, Cont.• Opcode caching– eAccelerator
• Off & on maintenance• Only works with threadsafe PHP• Has – in my experience – led to some strange crashing, WSOD, etc.
– Xcache• Reasonable performance improvement, though tends to performance
test slowest of the 3• Actively maintained. • Stable, but still prone to cache-corruption, WSOD, etc.
10
11
PHP, Cont.• Opcode caching, cont.– APC
• Current opcode cache of choice.– Most actively updated.– Most stable of the 3.– Usually the winner in performance benchmarks.
• Maintained by core PHP developers (Rasmus).
11
12
Static Caching• Static Caching Modules– Creating and storing rendered versions of the html
• Rather than building the page on request
– Avoids having to load any aspect of your application depending on the implementation
– Acts as a layer between the user and actual execution of your program• Alleviates DB issues since the DB is no longer involved• Simplifies any PHP execution
12
13
Static Caching, Cont.• Static Caching Modules, Cont.– Boost Module
• Static file caching• Good for Anonymous traffic only• Great performance for small sites• Ideal for shared hosts
– AuthCache Module• Static file caching• Attempts to handle logged-in traffic• Plays nice with and/or can utilize multiple caching engines (more on
those later)• Can be a bit of a pain for user-specific content as you have to write
particular cases for each user-specific area
13
14
Static Caching, Cont.• Static Caching Modules, Cont.– Shameless plug: Ajaxify Regions
• Aptly-named….or not.– Actually pulls Blocks not Regions via ajax
• Early release w/ plenty of work to do, needs more real-world testing, etc.• Automatically handles all user-specific block content based on block-
caching settings – BLOCK_NO_CACHE– BLOCK_CACHE_PER_USER– BLOCK_CACHE_PER_ROLE
• Concept: ajax load anything that can’t be cached for everyone.
14
15
Object-level Caching• Object-level caching– Provides a way to store full-generated objects– Can be the amalgam of many queries
• Think of all the queries run on a node_load vs retrieving all that information in 1 query.
– Stories the information in memory for fast access– Performance characteristics not significantly different
than MySQL when MySQL can handle the load• BUT can handle a much higher load
– Protects the DB – the area most likely to inhibit performance for Drupal – from becoming overwhelmed
15
16
Object-level Caching, Cont.• Object-level caching, Cont.– APC
• Not a typo.• APC can handle object caching as well as op-code caching.• It’s fast: everything is stored in local memory.• It caches only for one server.
– This means that you could have synchronization issues between servers if you have more than one.
– If that’s not an issue, it’s a quick and easy solution.
• Ideal for single-server implementations or when synchronicity isn’t an issue.
16
17
Object-level Caching, Cont.• Object-level caching, Cont.– Memcache
• Utilized by most high-profile sites.– Facebook, for instance, makes tremendous use of lots and lots of memcache servers.– Drupal.org uses it.
• Provides an object cache that can be used by multiple servers.• Slower in the single-server instance than APC, but provides synchronicity.• Multiple silos/buckets can be created for information, so you can
distribute information across multiple servers.
17
18
Advanced Infrastructure (ex)
Application
Database Solr Memcache Deployment
Slave DB
18
Static-Caching
Load Balancer
19
Specialization• Specialized Servers/Services– DB Server– SOLR– Memcache– Static-caching– CDN
19
20
Specialization• MySQL Server– One of the fastest ways to improve performance is to
separate your MySQL DB from your application– This allows both your application and your db to make full
use of independent hardware– The change is basically transparent at the application
layer: just a single change to settings.php
20
21
Specialization• Search– Problem: Search is incredibly hard on the system
• Particularly w/ multiple search terms• Drupal search works, but, despite great efforts is still not as quick or
useful as an outside solution• Search is particularly hard on the DB, Drupal’s traditional bottleneck
– In other words, search makes a bad problem worse
21
22
Specialization• Search, Cont.– Solution: Solr
• Communication layer between the website and the Lucene search index
• Offloads all of the complex processing to a separate box– More power for searches (search faster!)– Doesn’t lock up your website DB– Website can focus on what it does, search can focus on what it does
• Additional benefit: faceting (filtering), sorting– Ability to search content based on specific criteria (content type, author, taxonomy
terms) and sort based on criteria (title, date, author, content type)
• Hosted model (Acquia Search) or can be installed on server in your infrastructure
22
23
Specialization• Static Caching– Static-caching on the same server as the website provides
performance improvement• Downside: there’s still a lot of wasted overhead. apache has everything it
needs for a website, not just serving html; php also has to load.
– Static-caching elsewhere provides the opportunity to optimize the server for static-caching• Side effect: your web server now has more memory free to handle
requests that require php processing.
23
24
Specialization
• Static Caching, Cont.– Squid
• Free• Not specifically designed just for
http acceleration• Difficult to setup/configure• Performance improvement, but less than competition
24
25
Specialization
• Static Caching, Cont.– Varnish
• Free (to download)• Pressflow built to work w/ Varnish• Varnish servers set up for Drupal and usable off Amazon EC2 (developed
by Chapter 3) ($.34/hr +$.17/GB)• Designed from the group up for http acceleration• Can take time/expertise to get the performance you want• Can create a significant performance improvement once configured
correctly
25
26
Specialization
• Static Caching, Cont.– AI-Cache
• Best performance of the bunch• Simple configuration• Provides additional features for caching
– header recognition– session caching
• Drop-in solution• Not free• Amazon EC2 instance is available ($.68/hr +$.20/GB)
26
27
Specialization• CDN– Cache content that is static (outside of full pages)
• Images• Video• CSS• JS
– Popular examples• Akamai• LimeLight• Amazon CloudFront
– Separate domains, more bandwidth, geographic servers all equal faster loading
– Can be an expensive option
27
28
Summary• Start small and make the easy optimizations:– Pressflow– InnoDB– APC
• Add servers and services as necessary and based on individual traffic:– MySQL– SOLR– Memcache– Static Cache– CDN
28
29
The End.• Questions?
29