1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring)...

43
1 Content Delivery Networks Web caching

Transcript of 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring)...

Page 1: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

1

Content Delivery NetworksWeb caching

Page 2: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

2

Replica Placement

• Permanent replicas (mirroring)• Server-initiated replicas (push caching)• Client-initiated replicas (pull/client caching)

Page 3: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

3

Web Caching

• Example of the web to illustrate caching and replication issues– Simpler model: clients are read-only, only server updates data

browser Web Proxycache

request

response

request

response

Web server

browserWeb server

request

response

Page 4: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

4

Consistency Issues

• Web pages tend to be updated over time– Some objects are static, others are dynamic

– Different update frequencies (few minutes to few weeks)

• How can a proxy cache maintain consistency of cached data?– Send invalidate or update

– Push versus pull

Page 5: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

5

Push-based Approach

• Server tracks all proxies that have requested objects• If a web page is modified, notify each proxy• Notification types

– Indicate object has changed [invalidate]

– Send new version of object [update]

• How to decide between invalidate and updates?– Pros and cons?

– One approach: send updates for more frequent objects, invalidate for rest

proxyWeb server

push

Page 6: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

6

Push-based Approaches

• Advantages– Provide tight consistency [minimal stale data]

– Proxies can be passive

• Disadvantages– Need to maintain state at the server

• Recall that HTTP is stateless

• Need mechanisms beyond HTTP

– State may need to be maintained indefinitely

• Not resilient to server crashes

Page 7: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

7

Pull-based Approaches

• Proxy is entirely responsible for maintaining consistency• Proxy periodically polls the server to see if object has

changed – Use if-modified-since HTTP messages

• Key question: when should a proxy poll?– Server-assigned Time-to-Live (TTL) values

• No guarantee if the object will change in the interim

proxyWeb server

poll

response

Page 8: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

8

Pull-based Approach: Intelligent Polling

• Proxy can dynamically determine the refresh interval– Compute based on past observations

• Start with a conservative refresh interval

• Increase interval if object has not changed between two successive polls

• Decrease interval if object is updated between two polls

• Adaptive: No prior knowledge of object characteristics needed

Page 9: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

9

Pull-based Approach

• Advantages– Implementation using HTTP (If-modified-Since)

– Server remains stateless

– Resilient to both server and proxy failures

• Disadvantages– Weaker consistency guarantees (objects can change between

two polls and proxy will contain stale data until next poll)

• Strong consistency only if poll before every HTTP response

– More sophisticated proxies required

– High message overhead

Page 10: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

10

A Hybrid Approach: Leases

• Lease: duration of time for which server agrees to notify proxy of modification

• Issue lease on first request, send notification until expiry– Need to renew lease upon expiry

• Smooth tradeoff between state and messages exchanged– Zero duration => polling, Infinite leases => server-push

• Efficiency depends on the lease duration

Client Proxy

Server

Get + lease req

Reply + leaseread

Invalidate/update

Page 11: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

11

Policies for Leases Duration

• Age-based lease – Based on bi-modal nature of object lifetimes– Larger the expected lifetime longer the lease

• Renewal-frequency based– Based on skewed popularity– Proxy at which objects is popular gets longer lease

• Server load based– Based on adaptively controlling the state space – Shorter leases during heavy load

Page 12: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

12

Cooperative Caching

• Caching infrastructure can have multiple web proxies– Proxies can be arranged in a hierarchy or other structures

• Overlay network of proxies: content distribution network

– Proxies can cooperate with one another

• Answer client requests

• Propagate server notifications

Page 13: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

13

Hierarchical Proxy Caching

Examples: Squid, Harvest

Server

Parent

HTTP

HTTP Read A1

ICPICP

ICP

2

HTTP

3

Clients

Leaf Caches

Page 14: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

14

Locating and Accessing Data

• Lookup is local• Hit at most 2 hops• Miss at most 2 hops (1 extra on wrong hint)

Properties

(A,X)

Node X

Server for B

Clients

CachesRead A

Get A

Read B

Get BNode Y

Minimize cache hops on hit Do not slow down misses

Node Z

Page 15: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

15

CDN Issues (Content Delivery Network )

• Which proxy answers a client request?– Ideally the “closest” proxy

– Akamai uses a DNS-based approach

• Propagating notifications– Can use multicast or application level multicast to reduce

overheads (in push-based approaches)

• Active area of research– Numerous research papers available

Page 16: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

Case Study: Akamai

Page 17: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

17

Basic conceptsServing Web content from a single location

can present serious problems:•Scalability•Reliability

•Performance

How to serve requests from a variable number of surrogate origin servers … at the network’s edge ?

•Flash crowds ?•Seasonal traffic spikes ?

•Over-provisioning?•Capacity planning?

Caching at the edge as a shock absorber

Page 18: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

18

Content Distribution Bottlenecks• Congestion delay• Local outage lost data• 1st mile problem:

– Connection bet. content/service provider & ISP

• Backbone & router/switch load & failures• Peering bet. ISPs• Last mile problem:

– Connection to users’ access points

Page 19: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

19

Why bother with a company?

Page 20: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

20

Page 21: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

21

Content Distribution Networks• Content Provider != Content Distributor

– CDNs “promise” improved response times & reliability, including handling of “hot spots”

• Without tremendous infrastructure & personnel investments by the content providers

• Components of a CDN:– Distributed server load balancing– DNS redirection, hashing & fault tolerance– Distributed system monitoring– Distributed software configuration management– Live stream distribution & entry points– Log collection, reporting & performance monitoring– Client provisioning mechanism– Content management & replication

Page 22: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

22

Hosted e-business architecture

Page 23: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

23

3-tier CDN architecture

Page 24: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

24

Market Size of Internet CDN• Assuming 250M Web users

– With average B/W consumption 10 Kbits/sec• 10% of users online at any given time

• … the total B/W requirement is 250 Gbits/sec

05

1015202530354045

20 100 300 1000

Number of Web Sites

Per

cen

t o

f In

tern

et T

raff

ic

Page 25: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

25

Akamai Services• EdgeSuite & FreeFlow (core products):

– GIF & static HTML delivery, streaming delivery, reporting

• StorageFlow: – replicated hosting of large files

• Digital Parcel Service: – digital rights mgmt for large files

• FirstPoint: global server load balancing– Up-to-date map of the best routes– Mirror fail-over

15,000+ servers in over 1,200 networks in 60+ countries

Page 26: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

26

State diagram

Content Provider

Site

Nearby Ghost

“Top Level”DNS

“Low Level”DNS

User

DNS Server

1.

4.

3.

2.5.

6.

Page 27: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

27

Serving user requests

Page 28: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

28

“Akamaized” content (I)

“Akamaized”

HTMLDelivered byCNN

“Akamaized”

“Akamaized”

“Akamaized”

“Akamaized”“Akamaized”

“Akamaized”

Entire WebPage deliveredby CNN

Page 29: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

29

“Akamaized” content (II)Akamaized URL:

http://a8.g.akamai.net/f/8/1162/1h/

images.cnn.com/ads/advertiser/fidelity/0104/160X60Fidelity.gif

3-stage DNS name for GSLBPage consistency policy

Embedded original hostname

“Akamaizer” filter/plug-in•for IIS, Apache, …

Page 30: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

30

“Akamaized” content (III)- Define the default metadata for the domain(s) that you

want to serve, using the distributed architecture. This default meta-data

can be overridden at a per-object level using host-response headers or URL-prepending meta-data.

- For dynamically generated content, markup the content for assembly at the edge by inserting the appropriate ESI

tags in your templates or in your HTML or in your JSP/ASP pages.

- Finally, integrate the content generation environment with the content

assembly environment through a 3-step hand-off of DNS:1) Turn recursion off on the authoritative name server(s).

2) Set up a private hostname for Akamai to poll.3) CNAME the live hostname to Akamai.

Page 31: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

31

“Akamaized” content (IV)When an end-user of the Web application visits www.yoursite.com,

the user’s local name server gets directed to Akamai’s DNS.

The Akamai DNS then resolves the reference to the optimally located edge server.

The edge server assembles the relevant content based on the rules established in the meta-data instructions that are stored locally.

- Static content is typically retrieved from cache to be served to the browser.

- Dynamic content is either assembled from page fragments stored in the cache, or retrieved from the origin server. The page

assembly engine then assembles these fragments to be tailored & personalized to the user’s geographic location, cookie, device or

other chosen mechanisms.

Page 32: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

32

“Akamaized” content (V)• http://www.mydomain.com/frontpage.jpg

http://xxxx.yy.zzzz.net/aaaa/frontpage.jpg

•xxxx: serial number•yy: lower-level DNS•zzzz: top-level DNS•aaaa: fingerprint

ghost1467.ghosting.akamai.net

1. Determine client’s location (IP block)2. Top-level DNS server uses map to locate a close-by low-level DNS server

… set TTL to a relatively high value3. Client’s local DNS server contacts close-by low-level DNS server to request a lookup for

a surrogate server… set TTL to a relatively low value

Keep track of each server’s projected load …

Buddy system for servers

Lookups return list of servers

Page 33: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

33

“Akamaized” content (VI)• Load data is circulated within each region

– … for each serial number

• Serial numbers are processed in increasing order of projected load– For each serial number, a random priority list of desired

servers is assigned• … using consistent hashing

– Each serial number is then resolved to the smallest initial segment of servers from the priority list so that no server becomes overloaded

• Initially, every serial number is mapped to every server.

• Iterative refinement of the assignments so as to balance the load with the minimum amount of replication

Page 34: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

34

Akamai in action (1st request)

Page 35: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

35

Akamai in action (subsequent requests)

Page 36: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

36

CDN Reduced latency benefit • Bandwidth x Delay product: limit on outstanding packets

(in-flight, unacknowledged)• TCP: ~ 8 RTTs to fill 1Mbps pipe

– ~11 RTTs, by including DNS round-trip & TCP handshake 128 KB over 11 RTTs

– If RTT=60 msec (eg: US coast-to-coast), we need ~600 msec to fill-out the pipe

– If RTT=3 mscec (eg: nearby CDN node), we need ~30 msec to fill-out the pipe

Page 37: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

37

The Black Art of “Network Mapping” (P. Danzig, 2001)

• Network mapping chooses reasonable data centers to satisfy a client request.

• Factors to consider:– Contracted data center bandwidth

– Path characteristics: RTT, Bottleneck Bandwidth, “Experience”, Autonomous Systems Crossed, Hop Count, Observed loss rates, etc.

– How do you measure these factors?

Page 38: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

38

Network Mapping “Options” • Cisco Boomerang

– Synchronized DNS servers

• Radware’s DSLB box– Linear combination of hop count & RTT

• F5’s 3DNS– ICMP ping

• Alteon, Foundry, Resonate, ….• Akamai’s FirstPoint

– Lowers Yahoo’s response times by ~18%

Page 39: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

39

Consistent Hashing• Hash function that maps URLs to a dynamic set of

available caches– A machine can locally compute exactly which cache should

contain a given object. • Push the page-location task down to the individual clients

– A unicast suffices to get the object or determine that it is not cached, decreasing network usage compared to multicast or directory schemes.

• It also discovers misses faster than multicast schemes that must wait for all caches to respond.

– It avoids the maintenance & query overhead associated with directory based schemes.

– It does not create new points of failure for the system

Page 40: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

40

Break-down of Web traffic• GIF & JPEG: 55%• HTML: 25%• Misc: 20%• Delivering static HTML from caches is fast !• HTML: 1/3 static, 2/3 dynamic

– How can we make dynamic HTML faster ?

• EdgeSuite service:– Construct or “assemble” dynamic HTML within the CDN, via

proprietary language extensions • ESI (Edge-Side Includes): http://www.esi.org

– Akamai + Oracle initiative

Page 41: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

41

Update of dynamic content

Page 42: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

42

Live Streaming CDN

Page 43: 1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.

43

References• Akamai Inc: http://www.akamai.com• D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin and R. Panigrahy, "Consistent

Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web“, Proc. ACM Symposium on Theory of Computation, 1997.

• D.R. Karger, A. Sherman, A. Berkheimer, W. Bogstad, R. Dhanidina, K. Iwamoto, B. Kim, L. Matkins, Y. Yerushalmi, " Web Caching with Consistent Hashing“, Proc. 8th WWW Conference, 1999.

• J. Diley et al, , “Globally Distributed Content Delivery”, IEEE Internet Computing, vol 6, no. 5, pp. 50-59, 2002.

• US Patent #6,108,703, Aug. 2000:– “Global hosting system”

• L. Kontothanassis, et al, “A Transport Layer for Live Streaming in a Content Delivery Network”, Proc. of the IEEE, vol. 92, no. 9, 2004

• A. Sherman, el al, “ACMS: The Akamai Configuration Management System”, Proc. USENIX NSDI, 2004.

• J. Jung, B. Krishnamurthy, M. Rabinovich, “Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites”, Proc. 11 th WWW Conference, 2002.