Week 4: Building Scalable and Reliable e-commerce Site.
-
Upload
datacenters -
Category
Technology
-
view
896 -
download
0
Transcript of Week 4: Building Scalable and Reliable e-commerce Site.
Week 4: Building Scalable and Reliable e-commerce Site.
1: Introduction2: Multi-tier Architectures3: Application Taxonomy 4: Requirements of Web Applications5: Techniques for Scaling6: Caching and Replication7: Load Balancing8: Failure Detection
Scalability: SUN Server Farm
Akamai
Idea: distribute content over many servers spaced around the world
SOURCE: AKAMAI
9700 Servers650 Networks56 Countries
Akamai
SOURCE: AKAMAI
Distributing servers allows better handling of peak loadsGuarantee uptime all the time and more reliable infrastructure
1. Introduction
Receive IP address208.216.181.15
4. Returns a HTML Page (plus supporting files)
In the Browser
In the Address Box
Address http://www.amazon.com
What the User Sees
Behind The Browser – Summary
3. Server Processing
1. Request the IP address for the domain name
2. Send full URL plus extra info to IP address
amazon.com’s WWW server
3.1 Server requests to Other Servers
3.2 Receives info. from Other Servers
Optionally
Other servers
Advanced
Interaction
Complex heterogeneous infrastructures are a reality!
Director Director and Security and Security
ServicesServicesExistingExisting
ApplicationsApplicationsand Dataand Data
BusinessBusinessDataData
DataDataServerServerWebWeb
ApplicationApplicationServerServer
Storage AreaStorage AreaNetworkNetwork
BPs andBPs andExternalExternalServicesServices
Inte
rnet
Fire
wal
lIn
tern
et F
irew
all
WebWebServerServerDNSDNS
ServerServer
DataData
Cach
eCa
che
Load
Bal
ance
rLo
ad B
alan
cer
Inte
rnet
Fire
wal
lIn
tern
et F
irew
all
Dozens of systems and applications
Hundreds of components
Thousands of tuning
parameters
One of the D
ata Centers (500 servers)
C is c o 7 0 0 0
ICPMSCOMC7501
C is c o 7 0 0 0
ICPMSCOMC7502
C a ta lyst5 0 0 0
ICPMSCOMC5001(MSCOM1)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1 C a ta ly st5 0 0 0
ICPMSCOMC5002(MSCOM2)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7503
C a ta ly st5 0 0 0
ICPMSCOMC5003(MSCOM3)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1 C a ta lyst50 0 0
ICPMSCOMC5004(MSCOM4)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7504
S D
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SYST EMS
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
AC A C
48V DC 48 V DC
5VD C O K 5V DC OK
SH UTDO WN SHUTD OWN
CA UT IO N:D o u b le P o l e/ n eu tra l fu s in g CA U TI O N: Dou b le P o le /n e u tr a l f u si n g
F 1 2 A / 25 0 V F 1 2A /2 5 0V
ASX- 100 0
B DB DB D B D
A
C
A
C
A
C
A
C
S D
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SY STEM S
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SER
ETH
NEX
TS
ELEC
T
RESE
T
TXC
RXL
PWR
SER
ETH
NEX
TS
ELEC
T
RESE
TT
XC
RXL
PWR
A C AC
4 8V D C 48V D C
5 VDC OK 5VDC O K
S HUT DOWN SHU TDOW N
CA UT I ON :D ou b le Pol e /n e u tr al f u si n g C AU T IO N: Do u bl e Po le / ne ut ra l fu s in g
F 1 2A /2 5 0 V F 1 2 A/2 5 0 V
ASX-10 00
B DB DB D B D
A
C
A
C
A
C
A
C
ICPMDISTFA1001 ICPMDISTFA1002
3A2 2A2
2A2
1A2
ATM0/0/0.1
4A2
ATM0/0/0.1
4A2
1A2
C is c o 7 0 0 0
ICPMSCOMC7505
Catalyst 2926
ICPMSFTDLC2921(MSCOM DL1)
Port 1/1
FE4/0/0
HSRP
C is c o 7 0 0 0
ICPMSCOMC7506
C atalyst 2926
ICPMSFTDLC2922(MSCOM DL2)
Port 1/1
FE5/0/0
HSRP
Port 1/2Port 1/2
FE4/0/0
HSRP
FE5/0/0
HSRP
IIS
IIS
IIS
IIS
IIS
IIS
CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30
CPMSFTWBW37CPMSFTWBW38CPMSFTWBW39
WWW.MICROSOFT.COMWWW.MICROSOFT.COM
CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34
CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43
SEARCH.MICROSOFT.COM
CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09
CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18
WWW.MICROSOFT.COM
CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29
CPMSFTWBW36CPMSFTWBW44CPMSFTWBW45
WWW.MICROSOFT.COM
CPMSFTWBW01CPMSFTWBW15CPMSFTWBW25
CPMSFTWBW27CPMSFTWBW46CPMSFTWBW47
REGISTER.MICROSOFT.COM
CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05
CPMSFTWBR09CPMSFTWBR10
SUPPORT.MICROSOFT.COM
CPMSFTWBT01CPMSFTWBT02
CPMSFTWBT03CPMSFTWBT07
CPMSFTWBT04CPMSFTWBT05
WINDOWS.MICROSOFT.COMCPMSFTWBY01CPMSFTWBY02
CPMSFTWBY03CPMSFTWBY04
WINDOWS98.MICROSOFT.COM
CPMSFTWBJ01
WINDOWSMEDIA.MICROSOFT.COM
PREMIUM.MICROSOFT.COM
CPMSFTWBP01CPMSFTWBP02
CPMSFTWBP03
SUPPORT.MICROSOFT.COM
CPMSFTWBT06CPMSFTWBT08
CPMSFTWBR07CPMSFTWBR08
CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06
REGISTER.MICROSOFT.COM
WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ01CPMSFTWBJ02
CPMSFTWBJ03CPMSFTWBJ05
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
MSDN.MICROSOFT.COM
CPMSFTWBN01CPMSFTWBN02
CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM
CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42
CPMSFTWBT43CPMSFTWBT44
INSIDER.MICROSOFT.COM
CPMSFTWBI01 CPMSFTWBI02
3D2
C a ta ly st5 0 0 0
IUSCCMQUEC5002(COMMUNIQUE2)
C a ta lyst5 0 0 0
IUSCCMQUEC5001(COMMUNIQUE1)
C a ta lys t5 0 0 0
C a ta ly st50 0 0
ICPMSCBAC5001ICPMSCBAC5502
Port 1/1 Port 1/2Port 2/12
C is c o 7 0 0 0
ICPCMGTC7501
C i s c o 7 0 0 0
ICPCMGTC7502
FE4/1/0
Port 1/1
FE4/1/0SQL
Microsoft.com SQL Servers
Microsoft.com Stagers,Build and Misc. Servers
FTP 6
Build Servers 32
IIS 210
Application 2
Exchange 24
Network/Monitoring 12
SQL 120
Search 2
NetShow 3
NNTP 16
SMTP 6
Stagers 26
Total 459
Microsoft.com Server Count
Drawn by: Matt GroshongLast Updated: April 12, 2000
IP addresses removed by J im Gray to protect security
CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21
Backup SQL Servers
CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39
CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22
Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
Consolidator SQL Servers
CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23
CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39
DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM
HTMLNEWS(pvt).MICROSOFT.COMCPMSFTWBV01CPMSFTWBV02CPMSFTWBV03
CPMSFTWBV04CPMSFTWBV05
CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06
CPMSFTWBD07CPMSFTWBD08
CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09
CPMSFTWBD10CPMSFTWBD11
ACTIVEX.MICROSOFT.COM
CPMSFTWBA02 CPMSFTWBA03
FTP.MICROSOFT.COMCPMSFTFTPA03CPMSFTFTPA04
CPMSFTFTPA05CPMSFTFTPA06
NTSERVICEPACK.MICROSOFT.COM
CPMSFTWBH01CPMSFTWBH02
CPMSFTWBH03
HOTFIX.MICROSOFT.COM
CPMSFTFTPA01
ASKSUPPORT.MICROSOFT.COM
CPMSFTWBAM03CPMSFTWBAM04
CPMSFTWBAM01CPMSFTWBAM01
MSDNNews.MICROSOFT.COMCPMSFTWBV21CPMSFTWBV22
CPMSFTWBV23
MSDNSupport.MICROSOFT.COM
CPMSFTWBV41 CPMSFTWBV42
NEWSLETTERS.MICROSOFT.COM
CPMSFTSMTPQ01 CPMSFTSMTPQ02
NEWSLETTERSCPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15
NEWSWIRE
CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03
Misc. SQL Servers
INTERNAL SMTP
CPMSFTSMTPR01CPMSFTSMTPR02
NEWSWIRE.MICROSOFT.COM
CPITGMSGR01 CPITGMSGR02
NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03
OFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO01CPMSFTWBO02
CPMSFTWBO04CPMSFTWBO07
PremOFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO30CPMSFTWBO31
CPMSFTWBO32
SearchMCSP.MICROSOFT.COMCPMSFTWBM03
SvcsWINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ21 CPMSFTWBJ22
STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16
WINDOWS_Redir.MICROSOFT.COMCPMSFTWBY05
COMMUNITIES
COMMUNITIES.MICROSOFT.COMCPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03
CPMSFTNGXA04CPMSFTNGXA05
CODECS.MICROSOFT.COM
CPMSFTWBJ16CPMSFTWBJ17CPMSFTWBJ18
CPMSFTWBJ19CPMSFTWBJ20
CGL.MICROSOFT.COM
CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05
CPMSFTWBG04CPMSFTWBG05
CDMICROSOFT.COM
CPMSFTWBC01CPMSFTWBC02
CPMSFTWBC03
BACKOFFICE.MICROSOFT.COM
CPMSFTWBB01CPMSFTWBB03
CPMSFTWBB04
Build ServersINTERNET-BUILD
INTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16
INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IISIIS
IIS IIS
SQL
SQL
SQL
SQL
SQLSQL
SQL
SQL
SQL
SQL
SQL
StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03
CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07
PPTP / Terminal ServersCPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04
CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03
CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06
CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03
Monitoring ServersCPMSFTHMON01CPMSFTHMON02CPMSFTHMON03
CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03
Canyon Park Data CenterMicrosoft.com Network Diagram
2. Multi-Tier Architectures
Where it All Takes Place
Client/Server Model Fundamental to the Internet
packet switching decouples computers Functional modules with well-defined interfaces Client requests service; server provides it Data exchanged only through real-time messages
no global variables, no common databases Server may become a client to a different server
SOURCE: NETWORK COMPUTING
2-tier Vs. n-tier Architecture
Client(Browser)
Tier 2Logic
Tier 3Logic
Client Database
Database
Tier 2Logic Data
Data
2-tier
N-tier
Two-Tier Architecture
TIER 1:CLIENT
TIER 2:SERVER Server performs
processing directly
SOURCE: FOURNIER
Two-Tiered Architectures“Gartner Group Configurations”
SOURCE: NETWORK COMPUTING
Why 2-tier?
(Often called “Client-Server”, which is a bad name because it’s too general) Simple Better for dynamic queries Potentially more efficient (probably not in reality) Perhaps more processing off-loaded to client (for better or worse) Global data modeling is not practical
Examples of Two-,Three-,and Four-Tiered Infrastructures
Three-Tier ArchitectureTIER 1:CLIENT
TIER 2:SERVER
TIER 3:BACKEND Application server
offloads processingto tier 3
SOURCE: FOURNIER
N-Tier Architecture
SOURCE: FOURNIER
Data Warehousing Architecture
SOURCE: FOURNIER
Why n-tier?
Modularity via objects, not enterprise-wide data model “Thin” clients since “Fat” clients infeasible Security Replication of business logic easier Flexibility Performance (Due to flexibility) Manageability All data not in one data model All data not in one database brand Etc.
Even with n-tier, Databases Crucial
Databases need to have all functions required in 2-tier and more. Data model support Concurrency Control Security Integrity Performance Manageability Support for heterogeneity
Databases in a Heterogeneous World
There needs to be semantic consistency while using multiple databases Atomicity Consistency Isolation Durability
Transactions will be covered later It is desirable that there be interoperability of applications with multiple databases
Same API to access multiple databases And, ability to access multiple databases Hence, motivation for JDBC and ODBC, which can be considered as
middleware
3. Application Taxonomy
Characterizing Web Applications
Applications
Applications typically made up of many interactions with a client How the application must be built depends on the type of interactions
that comprise it This seems trivial, but it is where all architecture starts All interactions are to varying degrees
Asynchronous or Synchronous Influencing all interactions are requirements for concurrency, throughput,
latency, ... Interactions are sometimes called “transactions,” though no specific
semantic properties are applied to the word transaction when used in this way.
Workload Characteristics
Application Functionality Types of Interaction - Inquiry (Static and Dynamic) vs. Transactions Volume of Transactions Volume of User-Specific Responses (Personalization) Amount of Cross-Session Info Transaction Complexity Data Volatility Integration with legacy systems
Usage Patterns Number of Unique Items Number of Page Views Volume of Dynamic Searches Transaction Volumes Swing
Infrastructure Constraints % Secure Pages (privacy) Security: Authentication, Integrity, Non-repudiation, Regulations
Types of Web Applications
Publish and Subscribe Web Portals such as yahoo.com, excite.com, Media Sites such as www.nfc.co.il, zdnet.com and Events such as www.usopen.org, www.wimbeldon.org
Shopping Exact Inventory Sites - Victoriassecret.com, Abercrombie.com Inexact Inventory Sites - buy.com, dvdexpress.com
Customer Self Service Home banking - bankone.com, wingspanbank.com Travel Sites - Travelocity Insurance - amica.com
Trading Online Brokerages - schwab.com, fidelity.com, etrade.com Auction Sites - ebay.com, priceline.com Games – Interactive group game servers
Workload Characteristics of Web Applications
Low Medium High
Transaction VolumesDynamic ContentDynamic SearchesUser Specific Responses (Personalization)Cross-session Information
Legacy Integration
Data VolatilityTransaction Volume Swings
Number of content Publishers/Sources
Number of Unique Items per pagePage Content Volatility
Number of Page Views
Security, Authentication etc.Percentage of Secure Pages
Transaction Complexity
System Workload Characteristics Publish &Subscribe
Shopping CustomerSelf Svc.
Trading
Application Taxonomy: Read Transactions Read-only transactions
Highly static: X-Ray, Corporate Information Entertainment Video, 1990 Census
Nearly static: Train Schedule, Catalog without quantities Dynamic: Weather Forecast, Catalog with quantities Dynamic with high consistency requirements: Account balance,
Catalog with quantities Dynamic data with high consistency and rapid update: rock concert
sales with assigned seating
Application Taxonomy: Update Transactions
Update w/ modest integrity: Amazon book comment Update w/ high integrity: Billing record Update w/asynchronous processing: Stock Trade Update w/loosely coupled processing: Buying a physical product over
the net, or ordering/provisioning a new ISDN line
Issues
It is the type of applications along the read-only and update dimensions that greatly impact How applications are architected What system support is needed
For each of the previous examples, it is worth considering the implications
4. Requirements of Web Applications
Requirements - Summary
Availability Scalability Security Performance Integrity Manageability Malleability/Longevity Integration Cost
Availability Defined as measurement of perceived uptime by a user There are 86,400 seconds in a day (~100,000) 31,536,000 seconds in a
year (~30 million) 99% uptime represents 1% downtime is
864 seconds/day or 14.4 minutes/day 315,360 seconds/year or 5256 minutes/year or 88 hours/year
99.99%53 minutes/year or 0.14 minutes/day)
99.999%5 minutes/year
99.99999% (7 nines)3 seconds/year99.9999%30 seconds/year
Percentage UptimeDowntime
Availability - Discussion
What do you see on the web? Why? What will be required in the future?
In the News
Source: Gartner Group
Downtime Costs (per Hour) Brokerage operations $6,450,000 Credit card authorization $2,600,000 Ebay (1 outage 22 hours) $225,000 Amazon.com $180,000 Package shipping services $150,000 Home shopping channel $113,000 Catalog sales center $90,000 Airline reservation center $89,000 Cellular service activation $41,000 On-line network fees $25,000 ATM service fees $14,000
Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."
September 11, 2001
Only 15% of the companies in the World Trade Center had a working business continuity plan
One Law firm did not have a backup outside of the building – it went out of business
One of the trading firms was able to successfully, immediately transition over to a backup site across the river with absolutely no interruption to their customers
An investment bank had only a tape backup. It took them four days to recover
Scalability
The capability of a system to adapt readily to a greater or lesser intensity of use, volume, or demand while still meeting its business objectives (acceptable levels of performance, availability, manageability etc.)
Ideal - Gracefully degrade as load increases. Seldom happens
Bad situation - Think it's OK until load increases. Poor design
Utilization increases faster than the load - Typical
Utilization increase linearly with load - Good Situation
Resource Utilization
Load
Security
Privacy Authentication Authorization Audit Non-repudiation
Performance
How long does it take to get a response to a request from the system? Top-level metrics
Latency Throughput
How many transactions can be completed in a unit of time (Capacity)? Subsidiary metrics
CPU Network Bandwidth I/O of various types ...
Integrity
Data correctness Data permanence Disaster recovery Data currency
Manageability
Consider number of elements in a web applications Consistency Security Modifications Performance Configuration Training level required of operators
Malleability/Longevity
Continuous availability (despite update and failure) Time period of use of program
Integration
Note: millions of person-years of spent every year for applications This represents a total multi-trillion dollar investment Hence, integration is a necessity Integration approaches
Application to application Data sharing by multiple applications Process (Complex application integration)
For some applications, integration cost is 7x cost of system, yet this is less than recreating existing applications or losing benefits of integrated systems
Cost
Initial implementation Modification Installation Management (management is greater than development cost – usually
at least double)
Total Cost of Ownership
HW management
3%
Environmental14%
Downtime20%
Purchase20%
Administration
13%
Backup Restore
30%•Administration: all people time•Backup Restore: devices, media, and people time•Environmental: floor space, power, air conditioning
Cause of System Crashes
20%10% 5%
50%
18%
5%
15%
53%
69%
15% 18% 21%
0%20%40%60%80%
100%
1985 1993 2001
Other: app, power, network failureSystem management: actions + N/problemOperating SystemfailureHardware failure
(est.)
Current State of the ART
Failures due to people up, hard to measure VAX crashes ‘85, ‘93 [Murp95]; extrap. to ‘01 HW/OS 70% in ‘85 to 28% in ‘93. In ‘01, 10%? How get administrator to admit mistake? (Heisenberg?)
(based on the lecture “Recovery Oriented Computing” by Dave Patterson, Berkeley)
5. Techniques for Scaling
Techniques for achieving the requirements
Motivation
Defined: Data is stored without overlap across multiple sites and each site processes its data the same way
This is the architecture of the web (Order of magnitude circa 10^12 hits/day)
Back of the envelope thought exercise: Assume a server can handle average number of hits ranging from
10^1/sec. – 10^4 /sec Then, there must be 10^3 – 10^6 web sites to meet load…
Examples (data partitioning – segmented workload): 1999 data on one site, 1998 on another… a’s on one site, b’s on another…
Some typical Web site loads over a 24-hour period
Example Response Time Budget
Client Request5%
Request Network Latency5%
Server Time55%
Response Network Latency20%
Client Response Processing15%
How Latency Varies Based on Workload Pattern and Tier
Achieving the Requirements
Faster Machines (Vertical Growth) Replicated Machines (Horizontal Growth) Specialized Machines Segmented Workloads Request Batching User Data Aggregation Connection Management and Caching
It is important to note that a detailed understanding of the application is key to the successful implementation
Faster Machines - Vertical Growth
Scalability can be achieved through the use of faster machines. This technique can include:
moving to hardware that is bigger than current environment. For example: moving a web server from and PC based server running NT to a UNIX based serverusing machines with more CPUs to leverage
the operating system's multitasking and multiprocessing capabilitiesusing machines that leverage other
computing paradigms such as parallel computingusing better software that is optimized for the
CPUusing faster hardware components such as
memory, cache, disk and I/O devices etc.
Replicated Machines - Clusters
Adding more machines of the same type and load balancing requests across these machines. In order to implement this technique we have to implement additional components in the architecture such as:
Dispatcher node that can monitor and load balance processing requests across the replicated machines A synchronization node that synchronizes the
content and data across the machinesA mechanism for managing sessions across
replicated machines
Specialized Machines
Individual components of the architecture can be scaled by using specialized machines that perform a certain function much faster. This technique is typically used in architectures to facilitate: Intelligent routing of traffic and data across replicated machines Dynamic caching, used extensively by event sites and other media
sites to speed up access to frequently accessed content Security and encryption, used by high volume sites to speed up the
SSL encryption and decryption
Segmented Workload
This is a technique that is typically used in conjunction with replicated machines. It involves the partitioning of the workload of an application to achieve optimum performance. There are several ways of implementing this technique, they vary from:
URL references, which is the most simplistic form of segmenting the workload by analyzing the URL and directing the requests to appropriate serversFunctional Partitioning, which looks at the
application and builds the partitioning of the workload in through custom programmingData Partitioning, placing segments of the data
in different machines
Function 1
Function 2
Function 3
Request Batching Multi-tier communication places a large computational
load on both the client-tier (requester) and the server-tier. It also introduces considerable latency. Furthermore, the overhead costs of virtually all cross-tier requests are equal, therefore it is much better to make fewer, but larger requests.
The goal of this technique is to reduce the number of requests that are sent between requesters and responders (such as between tiers or processes) by allowing the requester to define new requests that combine multiple requests.
Client Server
Client Server
Client
Server
Server
Server
Client
Server
Server
Server
Command
User Data Aggregation
This technique aggregates most commonly accessed data from multiple backend systems to speed up the overall performance of the architecture. This technique is typically implemented using:Custom ProgrammingIntelligent Middleware andData replication
Client Server
Client Server
Client
Server
Server
Server
Client
Server
Server
Server
Server
Connection Management
This technique aims to achieve scalability by reducing the most expensive operations within an application's workflows. This includes connections to legacy systems, databases and other servers
Servlet /App
WEB Application Server
PoolConnection
Connection Manager
ClientClient
Resource
I ncoming Request
1
4 6
3
7
A5 B
2
1. WAS passes a user request to a Servlet/App2. The Servlet requests a connection from the Mgr. 3. The Mgr get a connection from the pool and gives the
Servlet/app a connection. 4. The Servlet uses the connection to the resource5. The resource returns data back6. The Servlet return the connection to the Manager and the
connection is returned to the pool7. The Servlet/App sends the response back
If a connection is not available: A The CM requests a new connection B Adds the connection to the pool
Caching
Defined: Storage of and reference to data in a location that can be accessed faster and/or with higher aggregate bandwidth
Done at every level of a system Processor/memory Computer/disk Browser Web
Simplest when only one, infrequent writer of the data Issues: Write through caches
Cache invalidation
Caching (continued)
More complex when multiple writers and/or higher frequency updates There is the distributed cache consistency problem This happens in:
Computer architecture Multi-computer architectures Distributed systems of all types, including the web
Examples: Browser cache DNS Mirror sites Etc.
Techniques Applied to Web Tiers
Dimensions of the Scaling Techniques
Scaling Technique Increase Power
Improve Efficiency
Shift / Reduce Load
Faster Machine X
Replicate Machines X
Specialized Machines X X
Segmented Workload X X
Request Batching X
User Data Aggregation X
Connection Management X
Caching X X
6. Caching and Replication
The Technology Behind the Techniques
Cache Consistency Techniques
Fuzzy Use time-out and hope for the best Setting time-out is very tricky and error-prone
Consistent caching Use distributed cache consistency algorithms There are trade-offs between availability and consistency Algorithms are very tricky but can be gotten right Typical approach is the concept of token management The concept of token management...
Read token Write token Usually more tokens required to make things really work
Replication
Definition: Explicit creation, maintenance, and access of multiple copies of some resource Processors Bandwidth Data Etc.
Why replicate? Throughput Bandwidth Availability Integrity
Replication vs. Data Partitioning
Replication Same or overlapping data stored at multiple locations
Partitioning Data non-overlapping Typically, only one “home” for any data element
Replication vs. Caching
Difference between caching and replication Caching: there is a fundamental difference between a cached copy
and the real “backing” data. Loss of the cache is not a failure except from the perspective of performance
Replication: all replicas are of the same type, albeit not necessarily identical. Loss of a replica is a failure and could result in higher likelihood of lost data
Semantics of Replication
Consistency/fuzzy replication Same issue as in caching as above
What does consistency mean? Ticket Sales (OK to not show all the seats) Latest Score in basketball game (Can lag by up to n seconds) Weather forecast (Variable lag, depending on serverity of change) Prices for certain goods (Perhaps they need to be exact, as
differentials would cause customer dissatisfaction)
Replication Algorithms Abound
Unanimous Update Always update all copies Read from any copy
Excellent read throughput Excellent read availability Very poor write throughout Very poor write availability
Unanimous Read Always read all copies Update any copy
Excellent write throughput and availability Very poor read throughput and availability
Additional Replication Algorithms
Primary Copy Must update primary copy Primary copy ensures all other copies get updated Read from any copy
Excellent read throughput and availability Poor write availability
Signicant complexity in ensuring primary copy updates all other replicas
Voting Assume n copies Read from any r Write to n-r+1
Replication Conclusions
All algorithms quite difficult to implement But, replication has compelling benefits
Best long term approach for high data availability Software update or data reorganization Disaster recovery
Obvious performance benefits as well, at least for data which is either read or written infrequently. (Often, one of these is true.)
Systems support for replication required if implementation is to be feasible
Systems Support – Atomic Transactions in particular
7. Load Balancing
Load Balancing
Definition: Load Balancing refers to a technique that uses a load balancing algorithm (LBA) to choose a replica
Definition: An LBA is an algorithm (typically distributed) that permits a client to select a replica that meets performance & availability goals
Participants in the algorithm include clients and commonly replicas and other intermediaries
May want priority for certain requests
Load Balancing In Use - Examples
Direct a data read or write to: An unloaded replica A nearby replica A replica that will not charge much for its service …
Direct a processing request to: A replica that will complete the request with minimum latency A node that has been used for similar processing, so its cache is
primed …
Many Approaches to Load Balancing
Maintain a replicated directory service Client can consult an instance of it to gain an address of a replica Approaches
Directory can return set of replicas and client can use algorithm to determine proper replica
Or, Directory service can apply algorithm and return proper replica Can use a replicated, intermediary that is a forwarding service
Algorithms for Directing Load
Randomization Round-robin Dynamic: Based on recent replica performance Locality-based (recent usage) Content-based Geography or Topology-based Negotiation-based (Request for Proposal -- direction to lowest bidder)
Randomization
Simple Excellent if
Locality effects are not important Reasonable distribution of requests
Timing Duration
No need for priority-based execution Willingness to accept stochastically good performance
Round-Robin
What is meant by Round-Robin Intra-client round robin? Inter-client round robin?
Simple Excellent if
Locality effects are unimportant (or non-existent) Requests have similar duration
Add’l Topics for Randomization & RR
Algorithms should take into account: Differential capacity of replicas Differential capacity of networks Ownership of resources Security issues
Dynamic Load Balancing
Can track in one or more places: Actual performance by replica Metrics of replica loading Results of probes
That information can be used to determine best replica Complex Advantages
Can provide excellent results in situations that randomized or round-robin load-balancing does not
Can be customized to provide priority, etc.
A Strawman LBA
Assumptions below… Clients 1..n, Datagatherer, & Replicas A & B DataGatherer
Probes replicas every 60 seconds, (Time = 0, 60, …) Chooses least loaded replica & reports it for 60 secs
Clients issue Service time for requests is ~10 secs w/low variance Requests to replicas based upon consulting DataGatherer
What’s the Result?
A meta-stable system: all load oscillates between Replica A and Replica B
Problem: reported load not tracking actual load Solutions
More frequent probes: probes should happen more frequently than 1/average(service time)
LBA should be less definitive in nature; e.g., somewhat stochastic In any case, designing good load balancing algorithms is hard without
knowing lots of information about the load
Locality-based
Premise is that a replica that has serviced a certain type of request recently should do so again
Why? Efficiency due to already available resources
E.g., open files or databases Efficiency due to security
E.g., secure communication sessions Complexity: how to other techniques, as Locality may not be enough
Content-based
As in data partitioning, assume certain types of data can best be handled by certain sites Site A stores “aa…az” in random access memory Site B does the same for “ba..bz” Therefore, “a” requests should generally go to Site A.
This is actually an approach for achieving locality
Geography or Topology-based
Based on co-location of client and replica May be an indicator of
Higher bandwidth Shorter latency Increased reliability Better security
Domain names are now registered with geographical coordinates
Negotiation-based
Virtual capitalism in action: Issue RFP Evaluate RFPs Ship work as appropriate
Cost of load-balancing overhead must be less than benefit This approach can get very interesting quickly:
Contractual commitments and compensation if unmet A way to do Pareto optimal scheduling
Useful to implement for real load balancing in business-to-business e-commerce
Role of Caching
Cache results of LBA for performance and availability The usual problem of cache correctness
How long until cache refresh Time-outs too short -> load balancing algorithm places too much load Time-outs too long -> data is insufficiently fresh
What happens when cache sends you to a failed site If faulty cached-data, go back and refetch This leads to the definition of a Hint
A cached entry which is right with high probability, but can be and always is checked for validity prior to use
The issue of time-out appears again
Example: Load Balancing to HTTP Server
User specifies http://www.xxx.com Request should actually be handled by one of many HTTP servers to
provide higher throughput One approach Can do request re-direction (a type of forwarder)
See http protocol definition as in assigned reading The forwarder a potential bottleneck
Approach 1 – Round Robin DNS
DNS entries allow 32 server addresses per record. DNS (name) servers will cycle through the entries therefore providing
round-robin load balancing Advantages
Cheap Easy
Round Robin DNS - Problems
Addresses of unavailable servers will remain until an administrator removes the entries
It takes hours or days for the DNS database to replicate So, system hands out addresses of down servers for a long time Address of recently added servers take a while to become visible
All servers treated equally Perhaps, new servers will likely be faster than the old ones and
would handle more load Some servers may handle multiple loads and should get fewer
requests
Cisco Local and Distributed Director
See:http://www.cisco.com/warp/public/cc/pd/cxsr/400/tech/scale_wp.htm
Session redirection accomplished by rewriting IP header using a mapping table
Intelligent load balancing to servers within a cluster Takes into account status of servers Uses only a single DNS entry for entire server complex
Simplifies administration Hot standby feasible
Fancier load balancing of this type Routes requests based on topological distance Routing decisions can be based on hop counts, network usage, &
round-trip latency.
IBM Secureway Network Dispatcher
http://www-4.ibm.com/software/network/dispatcher/about/features/keyfeatures.html Network dispatcher
Doesn't modify packets (vs. LocalDirector which does) Only inspects inbound requests (LocalDirector looks at both)
So, response go back directly to the requester (greater efficiency) Background processes check servers to ensure that they are up
"advisors" support HTTP, SSL, FTP, NNTP, POP3, SMTP, Telnet This way requests don't go to down servers.
Balances load across servers of different sizes: Servers send CPU, Disk, I/O metrics to dispatcher
Supports hot standby for high availability of dispatcher Uses a "sticky" port option to route client requests to same server to
ensure state preserved across requests: recall locality topic
8. Failure Detection
Failure Detection
Explicit –clear indication that failure has occurred Timely Semantics clean, … as far as they go Voting
Implicit – timeout Requester does not receive response after waiting a while Unclean: Does not necessarily mean remote system failed
Timeout often used in very many places/levels Communication Naming, … And, ultimately, End-to-end
Some have argued only end-to-end timeouts valuable, but this is incorrect
Timeout In More Depth
Problems with timeouts Semantics Specification of timeout length
Particularly difficult when requests take variable amounts of time And, requester, can not dynamically set time-out interval Long intervals lead to poor customer satisfaction – imagine an
ATM that made you wait 10 minutes before failing and giving you your card back?
Therefore, timeouts are used at multiple system levels Lower levels have more predictable performance so can trigger
timely failures better Higher levels are required for ultimate correctness
The Role of the Sequence Number
Sequence number in communication protocol Failure Duplicate detection Flow control
Sequence number in replication algorithms As discussed previously
Sequence number in site crash detection Sites increment a number post failure Therefore possible to tell if site has crashed This is important to not miss getting work done on a site
Voting
Discussed wrt: Weighted Voting Algorithm Used to determine most up-to-date copies
What if used to detect incorrect data N-way computation
Structure N-inputs: vote on them and determine most typical input N-computations on most typical input Vote on result N-outputs which go into next stage of computation Or go to some device which itself votes
Yahoo Denial of Service Attack
Mostly unavailable 10:20AM – 12:00PM PST 2/7/00 Reported cause (NYT, 2/8/00)
50 computers “flooded” Yahoo site 1 gigabyte/second or 20 mbytes/computer/second “Clogging” Yahoo’s site and routers Difficult to trace due to use of hijacked computers
Solutions Audit, Filter, Legal System
Typical Yahoo availability: 99.3%, according to Keynote Systems Corresponds to being down 61 hours/year And, Yahoo is a good site
Technique (2)
Part 3: Now, one by one: Stop a CtrReplicaGrp Start the new version Do for all CtrReplicaGrps
Now, there is a new function available. Finally, do Part 1: test what we have so carefully installed, so we haven’t
just (methodically) inserted a bug into the entire, supposedly fault-tolerant, system
Issues
Issues: Too many steps for a human being to get right
So, need automation via console May not handle a simultaneous failure during upgrading:
So, more replicas may be needed Cost of availability: The shape of this curve is right, though the calibration is
unknown and undoubtedly flattens as experience grows
010203040506070
Window of Vulnerability
If transactions used, there is a potential availability problem during the “Window of Vulnerability”
The only solution is that transactions coordinators must be rather reliable and be guaranteed to recover quickly after a crash
Availability
So, considerable thought required to achieve high availability in malleable systems
Better when not needed However, when high availability required
Every level of system needs to be studied and addressed
The Architecture As We’re Studying It
…
EJB
DB
MS
Servlet/JSPClient
Integrated Dev’t Environment
Java Runtime Environment
Security/Directory (X509, LDAP, Kerberos)
Linux NT AIX Solaris Sys/390
Reusable Components
Modeling and Other Softw’ Eng. Tools
Systems M
gmt
Reliable
Messsaging
Workflow Management