CS6320 – Systems, Networking and intro to Performance

49
1 CS6320 – Systems, CS6320 – Systems, Networking and Networking and intro to intro to Performance Performance L. Grewe L. Grewe

description

CS6320 – Systems, Networking and intro to Performance. L. Grewe. Systems and Issues. Common ingredients of the Web (review) URL, HTML, and HTTP HTTP: the protocol and its stateless property Web Systems Components (review) Clients Servers DNS (Domain Name System) - PowerPoint PPT Presentation

Transcript of CS6320 – Systems, Networking and intro to Performance

Page 1: CS6320 – Systems, Networking and intro to Performance

11

CS6320 – Systems, CS6320 – Systems, Networking and intro to Networking and intro to

Performance Performance

L. GreweL. Grewe

Page 2: CS6320 – Systems, Networking and intro to Performance

22

Systems and IssuesSystems and Issues Common ingredients of the Web (review)Common ingredients of the Web (review)

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems Components (review)Web Systems Components (review)• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network protocol: Interaction with underlying network protocol: TCPTCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 3: CS6320 – Systems, Networking and intro to Performance

33

Web HistoryWeb History Before the 1970s-1980sBefore the 1970s-1980s

• Internet used mainly by researchers and academicsInternet used mainly by researchers and academics• Log in remote machines, transfer files, exchange e-mailLog in remote machines, transfer files, exchange e-mail

Internet growth and commercializationInternet growth and commercialization• 1988: ARPANET gradually replaced by the NSFNET1988: ARPANET gradually replaced by the NSFNET• Early 1990s: NSFNET begins to allow commercial trafficEarly 1990s: NSFNET begins to allow commercial traffic

Initial proposal for the Web by Berners-Lee in 1989Initial proposal for the Web by Berners-Lee in 1989 Enablers for the success of the WebEnablers for the success of the Web

• 1980s: Home computers with graphical user interfaces1980s: Home computers with graphical user interfaces• 1990s: Power of PCs increases, and cost decreases1990s: Power of PCs increases, and cost decreases

Page 4: CS6320 – Systems, Networking and intro to Performance

44

Common ingredients of the WebCommon ingredients of the Web

URLURL• Denotes the global unique location of the web resourceDenotes the global unique location of the web resource• Formatted stringFormatted string

e.g., http://www.princeton.edu/index.htmle.g., http://www.princeton.edu/index.html

Protocol for communicating with server (e.g., Protocol for communicating with server (e.g., http)http)

Name of the server (e.g., www.Name of the server (e.g., www.princeton.eduprinceton.edu))

Name of the resource (e.g., Name of the resource (e.g., index.htmlindex.html))

HTMLHTML• Actual content of web resource, represented in ASCIIActual content of web resource, represented in ASCII

Page 5: CS6320 – Systems, Networking and intro to Performance

55

Common ingredients of the Common ingredients of the Web: HTMLWeb: HTML

HyperText Markup Language (HTML)HyperText Markup Language (HTML)• Format text, reference images, embed hyperlinksFormat text, reference images, embed hyperlinks• Representation of hypertext documents in ASCII formatRepresentation of hypertext documents in ASCII format• Interpreted by Web browsers when rendering a pageInterpreted by Web browsers when rendering a page

Web pageWeb page• Base HTML fileBase HTML file• referenced objects (e.g., images)referenced objects (e.g., images), , Each object has its Each object has its

own URL own URL

Straight-forward and easy to learnStraight-forward and easy to learn• Simplest HTML document is a plain text fileSimplest HTML document is a plain text file• Automatically generated by authoring programsAutomatically generated by authoring programs

Page 6: CS6320 – Systems, Networking and intro to Performance

66

Main ingredients of the Web: Main ingredients of the Web: HTTPHTTP

Client programClient program• E.g., Web browserE.g., Web browser• Running on end hostRunning on end host• Requests serviceRequests service

Server programServer program• E.g., Web serverE.g., Web server• Provides serviceProvides service

GET /index.html

“Site under construction”

Page 7: CS6320 – Systems, Networking and intro to Performance

77

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 8: CS6320 – Systems, Networking and intro to Performance

88

HTTP Example: HTTP Example: Request and Response MessageRequest and Response Message

GET /courses/archive/spring06/cos461/ HTTP/1.1Host: www.cs.princeton.eduUser-Agent: Mozilla/4.03<CRLF>

HTTP/1.1 200 OKDate: Mon, 6 Feb 2006 13:09:03 GMTServer: Netscape-Enterprise/3.5.1Last-Modified: Mon, 6 Feb 2006 11:12:23 GMTContent-Length: 21<CRLF>Site under construction

Request

Response

Page 9: CS6320 – Systems, Networking and intro to Performance

99

HTTP Request MessageHTTP Request Message Request message sent by a clientRequest message sent by a client

• Request line: method, resource, and protocol versionRequest line: method, resource, and protocol version

• Request headers: provide information or requestRequest headers: provide information or request

• Body: optional data (e.g., to “POST” data to the server)Body: optional data (e.g., to “POST” data to the server)

GET /somedir/page.html HTTP/1.1Host: www.someschool.edu User-agent: Mozilla/4.0Connection: close Accept-language:fr

(extra carriage return, line feed)

request line(GET, POST, HEAD commands)

header lines

Carriage return, line feed indicates end of message

Page 10: CS6320 – Systems, Networking and intro to Performance

1010

HTTP Response MessageHTTP Response Message Response message sent by a serverResponse message sent by a server

• Status line: protocol version, status code, status phraseStatus line: protocol version, status code, status phrase

• Response headers: provide informationResponse headers: provide information

• Body: optional dataBody: optional data

HTTP/1.1 200 OK Connection closeDate: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ...

status line(protocolstatus codestatus phrase)

header lines

data, e.g., requestedHTML file

Page 11: CS6320 – Systems, Networking and intro to Performance

1111

HTTP:HTTP:Request Methods and Response CodesRequest Methods and Response Codes

Request methods includeRequest methods include• GET: return current value of resource, …GET: return current value of resource, …

• HEAD: return the meta-data associated with a resourceHEAD: return the meta-data associated with a resource

• POST: update a resource, provide input to a program, …POST: update a resource, provide input to a program, …

• Etc.Etc.

Response code classesResponse code classes• 1xx: informational (e.g., “100 Continue”)1xx: informational (e.g., “100 Continue”)

• 2xx: success (e.g., “200 OK”)2xx: success (e.g., “200 OK”)

• 3xx: redirection (e.g., “304 Not Modified”)3xx: redirection (e.g., “304 Not Modified”)

• 4xx: client error (e.g., “404 Not Found”)4xx: client error (e.g., “404 Not Found”)

• 5xx: server error (e.g., “503 Service Unavailable”)5xx: server error (e.g., “503 Service Unavailable”)

Page 12: CS6320 – Systems, Networking and intro to Performance

1212

HTTP is a HTTP is a StatelessStateless Protocol Protocol

StatelessStateless• Each request-response exchange treated independentlyEach request-response exchange treated independently

• Clients and servers not required to retain stateClients and servers not required to retain state

Statelessness to improve scalabilityStatelessness to improve scalability• AvoidAvoidss need for the server to retain info across requests need for the server to retain info across requests

• EnableEnabless the server to handle a higher rate of requests the server to handle a higher rate of requests

Page 13: CS6320 – Systems, Networking and intro to Performance

1313

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 14: CS6320 – Systems, Networking and intro to Performance

1414

Web Web Systems Systems ComponentsComponents

ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents

ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses

DNS (Domain Name System)DNS (Domain Name System)• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers

Page 15: CS6320 – Systems, Networking and intro to Performance

1515

Web BrowserWeb Browser

Generating HTTP requestsGenerating HTTP requests• User types URL, clicks a hyperlink, or selects bookmarkUser types URL, clicks a hyperlink, or selects bookmark• User clicks “reload”, or “submit” on a Web pageUser clicks “reload”, or “submit” on a Web page• Automatic downloading of embedded imagesAutomatic downloading of embedded images

Layout of responseLayout of response• Parsing HTML and rendering the Web pageParsing HTML and rendering the Web page• Invoking helper applications (e.g., Acrobat, PowerPoint)Invoking helper applications (e.g., Acrobat, PowerPoint)

Maintaining a cacheMaintaining a cache• Storing recently-viewed objectsStoring recently-viewed objects• Checking that cached objects are freshChecking that cached objects are fresh

Page 16: CS6320 – Systems, Networking and intro to Performance

1616

Web TransactionWeb Transaction User clicks on a hyperlinkUser clicks on a hyperlink

• http://www.cnn.com/index.htmlhttp://www.cnn.com/index.html

Browser learns the IP address of the serverBrowser learns the IP address of the server• Invokes gethostbyname(Invokes gethostbyname(www.cnn.com))• And gets a return value of 64.236.16.20And gets a return value of 64.236.16.20

Browser establishes a TCP connectionBrowser establishes a TCP connection• Selects an ephemeral port for its end of the connectionSelects an ephemeral port for its end of the connection• Contacts 64.236.16.20 on port 80Contacts 64.236.16.20 on port 80

Browser sends the HTTP requestBrowser sends the HTTP request• ““GET /index.html HTTP/1.1GET /index.html HTTP/1.1

Host: www.cnn.com” Host: www.cnn.com”

Page 17: CS6320 – Systems, Networking and intro to Performance

1717

Web Transaction (Continued)Web Transaction (Continued)

Browser parses the HTTP response Browser parses the HTTP response messagemessage• Extract the URL for each embedded imageExtract the URL for each embedded image

• Create new TCP connections and send new requestsCreate new TCP connections and send new requests

• Render the Web page, including the imagesRender the Web page, including the images

Opportunities for caching in the Opportunities for caching in the browserbrowser• HTML fileHTML file

• Each embedded imageEach embedded image

• IP address of the Web siteIP address of the Web site

Page 18: CS6320 – Systems, Networking and intro to Performance

1818

Web Web Systems Systems ComponentsComponents

ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents

ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses

DNS (Domain Name System)DNS (Domain Name System)• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers

Page 19: CS6320 – Systems, Networking and intro to Performance

1919

Web ServerWeb Server

Web site vs. Web serverWeb site vs. Web server• Web site: collections of Web pages associated with a Web site: collections of Web pages associated with a

particular host nameparticular host name• Web server: program that satisfies client requests for Web server: program that satisfies client requests for

Web resourcesWeb resources

Handling a client requestHandling a client request• Accept the TCP connectionAccept the TCP connection• Read and parse the HTTP request messageRead and parse the HTTP request message• Translate the URL to a filenameTranslate the URL to a filename• Determine whether the request is authorizedDetermine whether the request is authorized• Generate and transmit the responseGenerate and transmit the response

Page 20: CS6320 – Systems, Networking and intro to Performance

2020

Web Server: Generating a ResponseWeb Server: Generating a Response

Returning a fileReturning a file• URL corresponds to a file (e.g., /www/index.html)URL corresponds to a file (e.g., /www/index.html)• … … and the server returns the file as the responseand the server returns the file as the response• … … along with the HTTP response headeralong with the HTTP response header

Returning meta-data with no bodyReturning meta-data with no body• Example: client requests object “if-modified-since”Example: client requests object “if-modified-since”• Server checks if the object has been modifiedServer checks if the object has been modified• … … and simply returns a “HTTP/1.1 304 Not Modified”and simply returns a “HTTP/1.1 304 Not Modified”

Dynamically-generated responsesDynamically-generated responses• URL corresponds to a program the server needs to runURL corresponds to a program the server needs to run• Server runs the program and sends the output to clientServer runs the program and sends the output to client

Page 21: CS6320 – Systems, Networking and intro to Performance

2121

Hosting: Hosting: Multiple Sites Per Multiple Sites Per MachineMachine

Multiple Web sites on a single machineMultiple Web sites on a single machine• Hosting company runs the Web server on behalf of Hosting company runs the Web server on behalf of

multiple sites (e.g., www.foo.com and multiple sites (e.g., www.foo.com and www.bar.com)www.bar.com)

Problem: returning the correct contentProblem: returning the correct content• www.foo.com/index.html vs. www.bar.com/index.htmlwww.foo.com/index.html vs. www.bar.com/index.html• How to differentiate when both are on same machine?How to differentiate when both are on same machine?

Solution: multiple servers on the same Solution: multiple servers on the same machinemachine• Run multiple Web servers on the machineRun multiple Web servers on the machine• Have a separate IP address for each serverHave a separate IP address for each server

Page 22: CS6320 – Systems, Networking and intro to Performance

2222

Hosting: Multiple Machines Per Hosting: Multiple Machines Per Site...Site...performance improvementperformance improvement ReplicatingReplicating a popular Web site a popular Web site

• Running on multiple machines to handle the loadRunning on multiple machines to handle the load• … … and to place content closer to the clientsand to place content closer to the clients

Problem: directing client to a Problem: directing client to a particular replicaparticular replica• To balance load To balance load across the server replicasacross the server replicas• To pair clients with To pair clients with nearbynearby servers servers

SolutionSolution: : • Takes advantage of Domain Name System (DNS)Takes advantage of Domain Name System (DNS)

Page 23: CS6320 – Systems, Networking and intro to Performance

2323

Web Web Systems Systems ComponentsComponents

ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents

ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses

DNS (Domain Name System) and the DNS (Domain Name System) and the WebWeb• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers

Page 24: CS6320 – Systems, Networking and intro to Performance

2424

DNS Query stepsDNS Query steps User types or clicks on a URLUser types or clicks on a URL

• E.g., http://www.cnn.com/2006/leadstory.htmlE.g., http://www.cnn.com/2006/leadstory.html

Browser extracts the site nameBrowser extracts the site name• E.g., www.cnn.comE.g., www.cnn.com

Browser calls Browser calls gethostbyname() gethostbyname() to learn IP to learn IP addressaddress• Triggers resolver code to query the local DNS serverTriggers resolver code to query the local DNS server

Eventually, the resolver gets a replyEventually, the resolver gets a reply• Resolver returns the IP address to the browserResolver returns the IP address to the browser

Then, the browser contacts the Web serverThen, the browser contacts the Web server• Creates and connects socket, and sends HTTP requestCreates and connects socket, and sends HTTP request

Page 25: CS6320 – Systems, Networking and intro to Performance

2525

Multiple DNS QueriesMultiple DNS Queries

Often a Web page has embedded Often a Web page has embedded objectsobjects• E.g., HTML file with embedded imagesE.g., HTML file with embedded images

Each embedded object has its own URLEach embedded object has its own URL• … … and potentially lives on a different Web serverand potentially lives on a different Web server• E.g., http://www.myimages.com/image1.jpgE.g., http://www.myimages.com/image1.jpg

Browser downloads embedded objectsBrowser downloads embedded objects• Usually done automatically, unless configured otherwiseUsually done automatically, unless configured otherwise• Requires learning the address for www.myimages.comRequires learning the address for www.myimages.com

Page 26: CS6320 – Systems, Networking and intro to Performance

2626

When are DNS Queries When are DNS Queries UnnecessaryUnnecessary??

Browser is configured to use a proxyBrowser is configured to use a proxy• E.g., browser sends all HTTP requests through a proxyE.g., browser sends all HTTP requests through a proxy• Then, the proxy takes care of issuing the DNS requestThen, the proxy takes care of issuing the DNS request

Requested Web resource is locally Requested Web resource is locally cachedcached• E.g., cache has http://www.cnn.com/2006/leadstory.htmlE.g., cache has http://www.cnn.com/2006/leadstory.html• No need to fetch the resource, so no need to queryNo need to fetch the resource, so no need to query

Resulting IP address is locally cachedResulting IP address is locally cached• Browser recently visited http://www.cnn.comBrowser recently visited http://www.cnn.com• So, the browser already called So, the browser already called gethostbyname()gethostbyname()• … … and may be locally caching the resulting IP addressand may be locally caching the resulting IP address

Page 27: CS6320 – Systems, Networking and intro to Performance

2727

Directing Web Clients to ReplicasDirecting Web Clients to Replicas Simple approach: different namesSimple approach: different names

• www1.cnn.com, www2.cnn.com, www3.cnn.comwww1.cnn.com, www2.cnn.com, www3.cnn.com• But, this requires users to select specific replicasBut, this requires users to select specific replicas

More elegant approach: different IP More elegant approach: different IP addressesaddresses• Single name (e.g., www.cnn.com), multiple addressesSingle name (e.g., www.cnn.com), multiple addresses• E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, …E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, …

Authoritative DNS server returns many Authoritative DNS server returns many addressesaddresses• And the local DNS server selects one addressAnd the local DNS server selects one address• Authoritative server may vary the order of addressesAuthoritative server may vary the order of addresses

Page 28: CS6320 – Systems, Networking and intro to Performance

2828

Clever Clever Load Balancing Load Balancing SchemesSchemes

The problem - Selecting the “best” IP The problem - Selecting the “best” IP address to returnaddress to return• Based on server performanceBased on server performance• Based on geographic proximityBased on geographic proximity• Based on network loadBased on network load• ……

Example policiesExample policies• Round-robin scheduling to balance server loadRound-robin scheduling to balance server load• U.S. queries get one address, Europe anotherU.S. queries get one address, Europe another• Tracking the current load on each of the replicasTracking the current load on each of the replicas

Page 29: CS6320 – Systems, Networking and intro to Performance

2929

Web Content DistributionWeb Content Distribution Main ingredients of the Web (review)Main ingredients of the Web (review)

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems Components (review)Web Systems Components (review)• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 30: CS6320 – Systems, Networking and intro to Performance

3030

TCP Interaction: Multiple TransfersTCP Interaction: Multiple Transfers Most Web pages have multiple objectsMost Web pages have multiple objects

• E.g., HTML file and multiple embedded imagesE.g., HTML file and multiple embedded images

Serializing the transfers is not efficientSerializing the transfers is not efficient• Sending the images one at a time introduces delaySending the images one at a time introduces delay• Cannot start retrieving second images until first arrivesCannot start retrieving second images until first arrives

A Solution A Solution - Parallel connections- Parallel connections• Browser opens multiple TCP connections (e.g., 4)Browser opens multiple TCP connections (e.g., 4)• … … and retrieves a single image on each connectionand retrieves a single image on each connection

Performance trade-offsPerformance trade-offs• Multiple downloads sharing the same network linksMultiple downloads sharing the same network links• Unfairness to other traffic traversing the linksUnfairness to other traffic traversing the links

Page 31: CS6320 – Systems, Networking and intro to Performance

3131

TCP Interaction: Short TransfersTCP Interaction: Short Transfers Most HTTP transfers Most HTTP transfers

are shortare short• Very small request message Very small request message

(e.g., a few hundred bytes)(e.g., a few hundred bytes)• Small response message Small response message

(e.g., a few kilobytes)(e.g., a few kilobytes)

TCP overhead may be TCP overhead may be bigbig• Three-way handshake to Three-way handshake to

establish connectionestablish connection• Four-way handshake to tear Four-way handshake to tear

down the connectiondown the connection

time to transmit file

initiate TCPconnection

RTT

requestfile

RTT

filereceived

time time

Page 32: CS6320 – Systems, Networking and intro to Performance

3232

A solution A solution - TCP Interaction: Persistent - TCP Interaction: Persistent ConnectionsConnections

Handle multiple transfers per Handle multiple transfers per connectionconnection• Maintain the TCP connection across multiple requestsMaintain the TCP connection across multiple requests• Either the client or server can tear down the connectionEither the client or server can tear down the connection• Added to HTTP after the Web became very popularAdded to HTTP after the Web became very popular

Performance advantagesPerformance advantages• Avoid overhead of connection set-up and tear-downAvoid overhead of connection set-up and tear-down• Allow TCP to learn a more accurate RTT estimateAllow TCP to learn a more accurate RTT estimate• Allow the TCP congestion window to increaseAllow the TCP congestion window to increase

Page 33: CS6320 – Systems, Networking and intro to Performance

3333

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 34: CS6320 – Systems, Networking and intro to Performance

3434

Web Content DeliveryWeb Content Delivery

Page 35: CS6320 – Systems, Networking and intro to Performance

3535

ScalabilityScalability Limitation Limitation

Page 36: CS6320 – Systems, Networking and intro to Performance

3636

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 37: CS6320 – Systems, Networking and intro to Performance

3737

Server FarmsServer Farms (motivated for scalability) (motivated for scalability)

Page 38: CS6320 – Systems, Networking and intro to Performance

3838

Server FarmsServer Farms DefinitionDefinition

• a collection of computer a collection of computer servers to accomplish server servers to accomplish server needs far beyond the capacity needs far beyond the capacity of one machine. of one machine.

• Often have both a primary and Often have both a primary and backup server allocated to a backup server allocated to a single task (for fault tolerance)single task (for fault tolerance)

Web FarmsWeb Farms• Common use of server farms is Common use of server farms is

for web hostingfor web hosting

Page 39: CS6320 – Systems, Networking and intro to Performance

3939

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 40: CS6320 – Systems, Networking and intro to Performance

4040

Web ProxiesWeb Proxies

Page 41: CS6320 – Systems, Networking and intro to Performance

4141

Web Proxies are Web Proxies are IntermediariesIntermediaries Proxies play both rolesProxies play both roles

• A server to the clientA server to the client• A client to the serverA client to the server

www.cnn.com

www.google.com

Proxy

Page 42: CS6320 – Systems, Networking and intro to Performance

4242

How can an intermediary help – Proxy How can an intermediary help – Proxy CachingCaching Client #1 requests http://www.foo.com/fun.jpgClient #1 requests http://www.foo.com/fun.jpg

• Client sends “GET fun.jpg” to the proxyClient sends “GET fun.jpg” to the proxy• Proxy sends “GET fun.jpg” to the serverProxy sends “GET fun.jpg” to the server• Server sends response to the proxyServer sends response to the proxy• Proxy stores the response, and forwards to clientProxy stores the response, and forwards to client

Client #2 requests Client #2 requests (cached case ) (cached case ) http://www.foo.com/fun.jpghttp://www.foo.com/fun.jpg• Client sends “GET fun.jpg” to the proxyClient sends “GET fun.jpg” to the proxy• Proxy sends response to the client from the cacheProxy sends response to the client from the cache

BenefitsBenefits• Faster response time to the clientsFaster response time to the clients• Lower load on the Web serverLower load on the Web server• Reduced bandwidth consumption inside the networkReduced bandwidth consumption inside the network

Page 43: CS6320 – Systems, Networking and intro to Performance

4343

Getting Requests to the ProxyGetting Requests to the Proxy

Explicit configurationExplicit configuration• Browser configured to use a proxyBrowser configured to use a proxy• Directs all requests through the proxyDirects all requests through the proxy• Problem: requires user actionProblem: requires user action

Transparent proxy (or “interception Transparent proxy (or “interception proxy”)proxy”)• Proxy lies in path from the client to the serversProxy lies in path from the client to the servers• Proxy intercepts packets en route to the serverProxy intercepts packets en route to the server• … … and interposes itself in the data transferand interposes itself in the data transfer• Benefit: does not require user actionBenefit: does not require user action

Page 44: CS6320 – Systems, Networking and intro to Performance

4444

Other Functions of Web Proxies Other Functions of Web Proxies

Anonymization Anonymization • Server sees requests coming from the proxy addressServer sees requests coming from the proxy address• … … rather than the individual user IP addressesrather than the individual user IP addresses

TranscodingTranscoding• Converting data from one form to anotherConverting data from one form to another• E.g., reducing the size of images for cell-phone browsersE.g., reducing the size of images for cell-phone browsers

PrefetchingPrefetching• Requesting content before the user asks for itRequesting content before the user asks for it

FilteringFiltering• Blocking access to sites, based on URL or contentBlocking access to sites, based on URL or content

Page 45: CS6320 – Systems, Networking and intro to Performance

4545

Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web

• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property

Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)

Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP

Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)

Page 46: CS6320 – Systems, Networking and intro to Performance

4646

Why CDN?Why CDN? PProvidersroviders want to want to offer content to consumersoffer content to consumers

• EfficientlyEfficiently• ReliablyReliably• SecurelySecurely• InexpensivelyInexpensively

The server and its link can be overloadedThe server and its link can be overloaded Peering points between ISPs can be congestedPeering points between ISPs can be congested Alternative solution: Content Distribution Alternative solution: Content Distribution

Networks Networks • Geographically diverse servers serving content from many Geographically diverse servers serving content from many

sourcessources

Page 47: CS6320 – Systems, Networking and intro to Performance

4747

Content Delivery NetworksContent Delivery Networks

Page 48: CS6320 – Systems, Networking and intro to Performance

4848

CDN ArchitectureCDN Architecture

Proactively replicate data by caching Proactively replicate data by caching static pagesstatic pages

ArchitectureArchitecture• Backend serversBackend servers• Geographically distributed surrogate serversGeographically distributed surrogate servers• Redirectors (according to network proximity, balancing)Redirectors (according to network proximity, balancing)• ClientsClients

Redirector MechanismsRedirector Mechanisms• Augment DNS to return different server addressesAugment DNS to return different server addresses• Server-based redirection: based on HTTP redirect feature Server-based redirection: based on HTTP redirect feature

Page 49: CS6320 – Systems, Networking and intro to Performance

4949

CDN ArchitectureCDN Architecture