Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... ·...
Transcript of Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... ·...
![Page 1: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/1.jpg)
Scalable Internet
Architectures
George Schlossnagle
Theo Schlossnagle
![Page 2: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/2.jpg)
Agenda
� Choosing Hardware
� Choosing an
Application/Availability
Architecture
� Deciding between Third-Party
and Custom-Built Software
� Case Study I: Building Fast
Scalable Web Forums
� Case Study II: Distributed
Logging
![Page 3: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/3.jpg)
Choosing Hardware
• 'Enterprise' Hardware
• Expensive
• Reliable
• Commodity Hardware
• Cheap
• Fast
• Unreliable
![Page 4: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/4.jpg)
Setting Up Apache
• Turn KeepAlives off
• As cluster grows, keep
MaxClients tuned to
avoid excessive database
connections
• For dynamic content
consider using a local
proxy instance
![Page 5: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/5.jpg)
Configuring A Local Proxy
• Run 2 Apache Instances on a Single
Host
• Public Instance handles high-latency
clients using
mod_rewrite/mod_proxy.
• Local Instance handles dynamic
content - only makes low-latency
connections to the public instance
![Page 6: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/6.jpg)
Configuring A Local Proxy
• Exterior (proxy) Instance
" <IfDefine PROXY
" DcocumentRoot /var/apache/htdocs
" Listen myexternal_ip:80
" MaxSpareServers 32
" MaxClients 128
" MaxRequestsPerChild 100000
" KeepAlive off
" LoadModule proxy_module libexec/libproxy.so
" LoadModule rewrite_module libexec/mod_rewrite.so
" AddModule mod_proxy.c
" AddModule mod_rewrite.c
" ProxyRequests on
" NoCache
" ProxyPassReverse / http://127.0.0.1
" RewriteRule ^proxy: - [F]
" RewriteRule ^(http:|ftp:) - [F]
" RewriteRule ^/(.*\.html)$ http://127.0.0.1/$1 [P,L,T]
" </IfDefine>
![Page 7: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/7.jpg)
Configuring A Local Proxy
• Interior Instance
" <IfDefine DYNAMIC>
" DocumentRoot /var/apache/htdocs
" Listen localhost:80
" MaxClients 40
" MaxRequestsPerChild 0
" KeepAlive off
" LoadModule perl_module libexec/libperl.so
" AddModule mod_perl.c
" <Files *.asp>
" SetHandler perl-script
" </Files>
" </IfDefine>
![Page 8: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/8.jpg)
Designing a HA/LB scheme
that's right for you
• Recognize the difference between
replicateable data and non-replicateable
data
• Replicateable data needs marginal
protection. Use commodity hardware.
• Non-Replicateable data needs single-point
reliability, consider Enterprise hardware.
• Bring the data to the session, not vice-
versa.
• Leverage distributed systems technology
• Avoid creating artificial points of failure
![Page 9: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/9.jpg)
Typical Three Tier ArchitectureTypical Three Tier Architecture
![Page 10: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/10.jpg)
Modern Two Tier ModelModern Two Tier Model
![Page 11: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/11.jpg)
Choosing between Custom andChoosing between Custom and
Commercial SoftwareCommercial Software
Commercial
� Code 'maturity'
� Dedicated Support
Homegrown
� Designed for your
particular needs
� In-house support
![Page 12: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/12.jpg)
Case Study I:Case Study I:Caching Web ObjectsCaching Web Objects
•• How well does your data match the original designHow well does your data match the original designgoals of any commercial products beinggoals of any commercial products beingconsidered?considered?
•• Is the data static?Is the data static?
•• Is the data static for a short period of time?Is the data static for a short period of time?
•• Is the data static for a short period of time forIs the data static for a short period of time foreach client?each client?
•• Does the data contain components which areDoes the data contain components which arestatic for each client for a short period of time?static for each client for a short period of time?
![Page 13: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/13.jpg)
Detailed Example:
Web Forums
Original Implementation
�Every page is generated by a database query which
returns a sorted list of all messages which are
returned to the user.
� Inefficient, database
intensive, scales poorly as
message volume increases.
� Takes no advantage of
select/update ratio.
![Page 14: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/14.jpg)
Second Implementation:Second Implementation:
Add Black-Box CachingAdd Black-Box Caching
Last-modification time is stored on every update and is
used to mark message listings as cacheable.
q Takes advantage of
high cache locality.
q Provides good
scaleability results.
q Require 3-tier
archiecture.
q Minimal application
modification required.
![Page 15: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/15.jpg)
Third Implementation:Third Implementation:
Application-Integrated CachingApplication-Integrated Caching
Static pages are written to shared filesystem, and
rewritten on update
� Takes advantage of high
cache locality.
� Efficient use of
hardware.
� Good scalability
![Page 16: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/16.jpg)
Fourth Implementation:Fourth Implementation:
Application-Integrated Caching (II)Application-Integrated Caching (II)
leveraging distributed systemsleveraging distributed systems
techtech
Static pages are written locally, nodes use group
communication tools to coordinate static page
removal on updates.
q Ideal use of commodity
hardware.
q Takes advantage of high
cache locality.
q Excellent scaleability
and avoidance of SPoFs.
![Page 17: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/17.jpg)
ImplementationImplementation
� mod_rewrite setup
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/forums/(.*)$ /admin/generator.php?forumid=$1
![Page 18: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/18.jpg)
ImplementationImplementation
� generator.php
" <?php
" $forumid = $_GET[’forumid'];
" if(!$uri) {" return_error();" }" ob_start();" if(generate_page($forumid)) {" $content = ob_get_contents();" $fp = fopen($SERVER['DOCUMENT_ROOT'].$uri, "w");" fwrite($fp, $content);" ob_flush();" }" ob_clean();" return_error();" ?>
![Page 19: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/19.jpg)
ImplementationImplementation
� update page:
�<?php
� …
� update_page($uri);
� purge_cache($uri);
�?>
�purge_cache can be something as simple as unlink() if we have a single
machine or are using a shared mountpoint. Otherwise we can use something
like spread to coordinate poisoning of all the caches.
![Page 20: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/20.jpg)
Case Study II:
Distributed Logging
• Need to conslidate logs across multiplewebservers for auditing
• Need to do real-time analysis of logs
![Page 21: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/21.jpg)
First (Traditional)
Implementation
•Web logs written locally on every
machine, periodically copied to central
server and sorted/merged
• Consolidation is slow
• Real time log processing is not possible
![Page 22: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/22.jpg)
Commercial
Solutions
•Expensive
•Lack Flexibility
• Syslog logging
– Unreliable
– Unicast
• Database logging
– Reliable
– Unicast
Existing Open SourceSolutions
Candidate Solutions
![Page 23: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/23.jpg)
Custom SolutionCustom Solution
(mod_log_spread)(mod_log_spread)
• Designed as Apache module for
maintainability
• Reliable multicast transport for
maximum flexibility
• Aggregated log stream can be used to
maintain/track user state and server health
across multiple servers, asynchronously but
in real-time
• Multicast transport allows additional
monitoring facilities to be added for ‘free’
![Page 24: Scalable Internet Architecturespeople.apache.org/~jim/ApacheCons/ApacheCon2002/pdf/sch... · 2013-05-27 · leveraging distributed systems tech Static pages are written locally, nodes](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f260d47821e38246d1bc685/html5/thumbnails/24.jpg)
Thanks!