1 Mining Web Traces: Workload Characterization, Performance Diagnosis, and Applications Lili Qiu...

Mining Web Traces:Workload Characterization, Performance Diagnosis, and Applications

Lili QiuMicrosoft Research

Internal TalkNovember 2002

Motivation

Why do we care about Web traces? Content providers

How do users come to visit the Web site? Why do users leave the Web site? Is poor

performance the cause for this? Where are the performance bottlenecks? What content are users interested in? How does users’ interest change in time? How does users’ interest change across

different geographical regions?

Motivation (Cont.)

Web hosting companies Accounting & billing Server selection Provisioning server farms: where to place servers

ISPs How to save bandwidth by storing proxy caches? Traffic engineering & provisioning

System designers Where are the performance bottlenecks? How to improve Web performance? Examples: Traffic measurements have influenced

the design of HTTP (e.g., persistent connections and pipeline), TCP (e.g., initial congestion window)

Outline

Background Web workload characterization Performance diagnosis Applications of traces Bibliography

Part I: Background

Web software components Web semantic components Web protocols Types of Web traces

Web Software Components Web clients

An application that establishes connections to send Web requests

E.g., Mosaic, Netscape Navigator, Microsoft IE

Web servers An application that

accepts connections to service requests by sending back responses

E.g., Apache, Microsoft IIS

Web proxies (optional) Web replicas (optional)

Internetreplica

replica

WebClients

WebServer

Web Semantic Components Uniform Resource Identifier (URI)

An identifier for a Web resource Name of protocol: http, https, ftp, .. Name of the server Name of the resource on the server e.g., http://www.foobar.com/info.html

Hypertext Markup Language (HTML) Platform-independent styles (indicated by markup tags)

that define the various components of a Web document Hypertext Transfer Protocol (HTTP)

Define the syntax and semantics of messages exchanged between Web software components

Example of a Web Transaction

BrowserWeb server

DNSserver1. DNS query

2. Setup TCP connection

3. HTTP request

4. HTTP response

Internet Protocol Stack

Application layer: application programs (HTTP, Telnet, FTP, DNS)

Transport layer: error control + flow control (TCP,UDP)

Network layer: routing (IP)

Datalink layer: handle hardware details(Ethernet, ATM)

Physical layer: moving bits(coaxial cable, optical fiber)

HTTP Protocol Hypertext Transfer Protocol (HTTP)

HTTP 1.0 [BLFF96] The most widely used HTTP version A “Stop and wait” protocol

HTTP 1.1 [GMF+99] Persistent connections: use one TCP connection for

multiple HTTP requests Pipelining: send multiple requests without waiting for

a response between requests Content negotiation Range requests Caching control …

Types of Web Traces Application level traces

Server logs: CLF and ECLF formats CLF format

<remoteIP,remoteID,usrName,Time,request,responseCode,contentLength>e.g., 192.1.1.1, -, -, 8/1/2000, 10:00:00, “GET /news/index.asp HTTP/1.1”, 200, 3410

Proxies logs: CLF and ECLF formats Client logs: no standard logging formats

Packet level traces Collection method: monitor a network link Available tools: tcpdump, libpcap, netmon Concerns: packet dropping, timestamp accuracy

Tutorial Outline

Background Web workload characterization Performance diagnosis Applications of traces Bibliography

Part II: Web Workload Characterization

Overview of workload characterization

Content dynamics Access dynamics Common pitfalls Case studies

Overview of Workload Characterization

Process of trace analyses Common analysis techniques Common analysis tools Challenges in workload characterization

Process of Trace Analyses

Collect traces where to monitor, how to collect (e.g.,

efficiency, privacy, accuracy) Determine key metrics to characterize Process traces Draw inferences from the data Apply the traces or insights gained from

the trace analyses to design better protocols & systems

Common Analysis Techniques - Statistics

Mean Median Geometric mean: less sensitive to outliers

Variance and standard deviation

Confidence interval A range of values that has a specified probability of

containing the parameter being estimated Example: 95% confidence interval 10 x 20

)var()(,)(1

)var(1

2 xxstduxN

)(log xEnixGM

Common Analysis Techniques – Statistics (Cont.)

Cumulative distribution (CDF): (a, P(X a)) Complementary CDF: (a, P(x>a)) Probability density function (PDF)

Derivative of CDF: f(x) = dF(x)/dx Check for heavy tail distribution

Log-log complementary plot, and check its tail Example: Pareto distribution

If 2, distribution has infinite variance (a heavy tail)If 1, distribution has infinite mean

axF ,0,,)(1)(

Common Analysis Techniques – Data Fitting

Visually compare two distributions Chi Squared tests [AS86,Jain91]

Divide the data points into k bins Compute

If X2 X2(,k-c), then two distributions are close, where is significance level, c is the number of estimated parameters for the distribution + 1

Need enough samples Kolmogorov-Smirnov tests [AS86,Jain91]

Compares two distributions by finding the maximum differences between two variables’ cumulative distribution functions

Need to fully specify the distribution

Common Analysis Techniques – Data Fitting (Cont.)

Anderson-Darling Test [Ste74] Modification of the Kolmogorov-Smirnov test, giving more

weight to the tails

If A critical value, two distributions are similar; otherwise they are not (F is CDF, and Yi are ordered data)

Quantile-quantile plots [AS86,Jain91] Compare two distributions by plotting the inverse of the

cumulative distribution function F-1(x) for two variables, and find best fitting line

If the slope of the line is close to 1, and y-intercept is close to 0, the two data sets are almost identically distributed

iiNi YFInYInF

whereSNA

))](1()([12

Common Analysis Tools

Scripting languages VB, Perl, awk, UNIX shell scripts, …

Databases SQL, DB2, …

Statistics packages Matlab, S+, R, SAS, …

Write our own low level programs C, C++, C#, …

Challenges in Workload Characterization

Workload characteristics vary both in space and in time

Each of the Web components provides a limited perspective on the functioning of the Web

Internetreplica

replica

Clients Servers

Workload Variation Vary with measurement points Vary with sites being measured

Information servers (news site), e-commercial servers, query servers, streaming servers, upload servers

US vs. Europe, … Vary with the clients being measured

Internet clients vs. wireless clients University clients vs. home users US vs. Europe, …

Vary in time Day vs. night Weekday vs. weekend Changes with new applications, recent events Evolve over time, …

Different Web Components’ Views

View from clients Know details of client activities, such as requests satisfied by

browser caches, client aborts The ability to record detailed information

View from servers Most requests to the servers, excluding those satisfied by browser

& proxy caches May not log detailed information to ensure fast processing of client

requests View from proxies

Depending on the proxy’s location A proxy close to clients see requests from a a small number of

clients to a large number of servers [KR00] A proxy close to the servers see requests from a large number of

clients to a small number of servers [KR00] Requests satisfied by browser caches or proxy caches encountered

earlier will not appear in the logs

Part II: Web Workload Overview Content dynamics Access dynamics Common pitfalls Case studies

Content Dynamics

File types File size distribution File update patterns

How often files are updated How much files are updated

File Types

Text files HTML, plain text, …

Images Jpeg, gif, bitmap, …

Applications Javascript, cgi, asp, pdf, ps, gzip, ppt, …

Multimedia files Audio, video

File Size Distribution

Two definitions D1: Size of all files on a Web server D2: Size of all files transferred by a Web

server D1 D2, because some files can be

transferred multiple times or not in completion and other files are not transferred

Studies show that the distribution of file sizes in both definitions exhibit heavy tails (i.e., P[F > x] ~ x-, 0 2)

File Update Interval

Varies in time Hot events and fast changing events require more

frequent update, e.g., Worldcup Varies across sites

Depending on server update policies & update tools Depending on the nature of content (e.g., University

sites have slower update rate than news sites) Recent studies

Study of the proxy traces collected at DEC and AT&T in 1996 showed the rate of file change depended on content type, top-level domains etc. [DFK+97]

Study of 1999 MSNBC logs shows that modification history yields a rough predictor of future modification interval [PQ00]

Extent of Change upon Modifications

Varies in time Different events trigger different amount of

updates Varies across sites

Depending on servers’ update policies and update tools

Depending on the nature of the content Recent studies

Studies of 1996 DEC and AT&T proxy [MDF+97] and 1999 MSNBC log [PQ00] show that most file modifications are small delta encoding can be very useful

Part II: Web Workload Motivation Limitations of workload

measurements Content dynamics Access Dynamics Common pitfalls Case studies

Access Dynamics

File popularity distribution Temporal stability Spatial locality User request arrivals & durations

Document Popularity

Web requests follow Zipf-like distribution Request frequency 1/i, where i is a document’s ranking The value of depends on the point of measurements

Between 0.6 and 1 for client traces and proxy traces Close to or larger than 1 for server traces [ABC+96, PQ00]

The value of varies over time (e.g., larger during hot events)

Clients/Proxies Less popular servers MSNBC

Impact of the value Larger means more

concentrated accesses on popular documents caching is more beneficial

90% of the accesses are accounted by

Top 36% files in proxy traces [BCF+99, PQ00]

Top 10% files in small departmental server logs reported in [AW96]

Top 2-4% files in MSNBC traces

0 0.2 0.4 0.6 0.8 1

Percentage of Documents (sorted by popularity)

12/17/98 Server Traces 08/01/99 Server Traces10/06/99 Proxy Traces

Temporal Stability Metrics

Coarse-grained: likely duration that a set of current popular files remain popular

e.g., overlap between the set of popular documents on day 1 and day 2

Fine-grained: how soon a requested file will be requested again

e.g., LRU stack distance [ABC+96]

File 5

File 4File 3

File 2File 1

File 2

File 5File 4

File 3File 1

Stack distance = 4

Spatial Locality

Refers to if users in the same geographical location or at the same organization tend to request a similar set of content E.g., compare the degree of requests locally

shared

Spatial Locality (Cont.)

Normal Day

0.E+00 1.E+04 2.E+04 3.E+04 4.E+04 5.E+04

Domain ID

Domain membership is significant except when there is a “hot” event of global interest

Hot Event

0.0E+00 5.0E+03 1.0E+04 1.5E+04 2.0E+04 2.5E+04 3.0E+04 3.5E+04

Domain IDFr

Domain-based clustering Random clustering

User Request Arrivals & Duration User workload at three levels

Session: a consecutive series of requests from a user to a Web site

Click: a user action to request a page, submit a form, etc. Request: each click generates one or more HTTP requests

Exponential distribution [LNJV99,KR01] Session duration

Heavy-tail distribution [KR01] # clicks in a session, most in the range of 4-6 [Mah97] # embedded references in a Web page Think time: time between clicks Active time: time to download a Web page and its

embedded images

Common Pitfalls Trace analyses are all about writing scripts & plotting

nice graphs Challenges

Trace collection: where to monitor, how to collect (e.g., efficiency, privacy, accuracy)

Identify important metrics, and understand why they are important

Sound measurements require disciplines [Pax97] Dealing with errors and outliers Draw implications from data analyses

Understanding the limitation of the traces No representative traces: workload changes in time and in

space Try to diversify data sets (e.g., collect traces at different

places and different sites) before jumping into conclusions Draw inferences more than what data show

Part II: Web Workload Motivation Limitations of workload measurements Content dynamics Access dynamics Common pitfalls Case studies

Boston University client log study UW proxy log study MSNBC server log study Popular Mobile server log study

Case Study I: BU Client Log Study

Overview One of the few client log studies Analyze clients’ browsing pattern and their impact on

network traffic [CBC95] Approaches

Trace collection Modify Mosaic and distribute it to machines in CS Dept. at

Boston Univ. to collect client traces in 1995 Log format: <client machine, request time, user id, URI,

document size, retrieval time> Data analyses

Distribution of document size, document popularity Relationship between retrieval latency and response size Implications on caching strategies

Major Findings

Power law distributions Distribution of document sizes Distribution of user requests for documents # requests to documents as a function of

their popularity Caching strategies should take into

account of document size (i.e., give preference to smaller documents)

Case Study II: UW Proxy Log Study

Overview Proxy traces collected at the University of

Washington Approaches [WVS+99a, WVS+99b]

Trace collection: deploy a passive network sniffer between the Univ. of Washington and the rest of the Internet in May 1999

Set well-defined objectives Understand the extent of document sharing within

an organization and across different organizations Understand the performance benefit of cooperative

proxy caching

Major Findings

Members of an organization are more likely to request the same documents than a random set of clients

Most popular documents are globally popular

Cooperative caching is most beneficial for small organizations

Cooperative caching among large organizations yield minor improvement if any

Case Study III: MSNBC Server Log Study

Overview of MSNBC server site a large news site server cluster with 40 nodes 25 million accesses a day (HTML content

alone) Period studied: Aug. – Oct. 99 & Dec. 17, 98

flash crowd

Approaches Trace collection

HTTP access logs Content Replication System (CRS) logs HTML content logs

Data analyses Content dynamics

How often files are modified? How to predict modification interval? How much does a file change upon modification?

Access dynamics Document popularity Temporal stability Spatial locality Correlation between document age and popularity

Major Findings

Content dynamics Modification history is a rough predictor guide for setting

TTL, but need an alternative mechanism (e.g., callback based invalidation) as backup

Frequent but minimal file modifications delta encoding Access dynamics

Set of popular files remains stable for days pushing/prefetching previous hot data that have undergone modifications

Domain membership has a significant bearing on client accesses except during a flash crowd of global interest make sense to have a proxy cache for an organization

Zipf-like distribution of file popularity but with a much larger than at proxies potential of reverse caching and replication

Case Study IV: Popular Mobile Server Log

Overview of a popular commercial Web site for mobile clients Content

news, weather, stock quotes, email, yellow pages, travel reservations, entertainment etc.

Services Notification Browse

Period studied 3.25 million notifications in Aug. 20 – 26, 2000 33 million browse requests in Aug. 15 – 26, 2000

Approaches Analyze by user categories

Cellular users Browse the Web in real time using cellular technologies

Offline users Download content onto their PDAs for later (offline)

browsing, e.g. AvantGo Desktop users

Signup services and specify preferences Analyze by Web services

Browse Notifications

Use SQL database to manage data

Major Findings

Notification Services Popularity of notification messages follows a Zipf-like

distribution, with top 1% notification objects responsible for 54-64% of total messages multicast notifications

Exhibits geographical locality useful to provide localized notification services

Browse Services 0.1% - 0.5% queries account for 90% requests cache

the results of popular queries The set of popular queries remain stable cache a

stable set of queries or optimize query based on a stable workload

Correlation between the two services Correlation is limited influence design of pricing plans

Tutorial Outline

Background Web Workload Performance Diagnosis Applications of traces

Part III: Performance Diagnosis

Overview of performance diagnosis Infer the causes of high end-to-end delay

in Web transfers [BC00] Infer the causes of high end-to-end loss

rate in Web transfers [CDH+99,DPP+01,NC01,PQ02, PQW02]

Overview of Performance Diagnosis

Goal: determine trouble spot locations

Metrics of interest Delay Loss rate Raw bandwidth Available bandwidth Traffic rate

Why interesting Resolve the trouble spots Server selection Placement of mirror

servers

Sprint

Web Server

Qwest AOL

EarthlinkWhy so slow?

Finding the Sources of Delays Goal

Why is my Web transfer slow? Is it because of the server or the network or the client?

Sources of delay in Web transfer DNS lookup Server delays Client delays Network delays

Propagation delays Queuing delays Delays introduced by packet losses (e.g., signaled by

the fast retransmit mechanism or TCP timeouts)

TCPEval Tool

Inputs: “tcpdump” packet traces taken at the communicating Web server and client

Generates a variety of statistics for file transactions File and packet transfer latencies Packet drop characteristics Packet and byte counts per unit time

Generates both timeline and sequence plots for transactions

Generates critical path profiles and statistics for transactions

Critical Path Analysis Tool [BC00]

Client Server Client ServerData flow Critical Path

Network delay

Network delayServer delayNetwork delay

Client delayNetwork delayServer delay

Network delaydue to pkt loss

Finding Sources of Packet Losses Goal

Identify lossy links

l8l7l6

server

clientsp1 p2 p3 p4 p5

(1-l1)*(1-l2)*(1-l4) = (1-p1)

(1-l1)*(1-l2)*(1-l5) = (1-p2)…(1-l1)*(1-l3)*(1-l8) = (1-p5)

Challenges - an under-constrained

system of equations- measurement errors

Approaches Active probing

Probing Multicast probes Striped unicast probes

Technique -- Expectation Maximization (EM) [CDH+99, DPP+01]

a numerical algorithm to compute that maximizes P(D|), where D are observations, are ensemble of link loss rates

Approaches (Cont.) Passive monitoring

Random sampling Random sample the solution space, and draw conclusions

based on samples Akin to Monte Carlo sampling

Linear optimization Determine a unique solution by optimizing an objective

function Gibbs sampling

Determine P(|D) by drawing samplings, where is ensemble of loss rates of links in the network, and D is observed packet transmission and losses at the clients

EM A numerical algorithm to compute that maximizes P(D|)

Other Performance Studies using Web traces

Characterize Internet performance (e.g., spatial & temporal locality) [BSS+97]

Study the behavior of TCP during Web transfers [BPS+98]

Tutorial Outline

Background Web Workload Performance Diagnosis Applications of traces Bibliography

Part IV: Applications of Traces Synthetic workload generation Cache design

Cache replacement policies [CI97,BCF+99] Cache consistency algorithms [LC97, YBS99,YAD+01] Cooperative cache or not [WVS+99] Cache infrastructure

Pre-fetching algorithms [CB98, FJC+99] Placement of Web proxies/replicas [QPV01] Other optimizations

Improving TCP for Web transfers [Mah97,PK98,ZQK00] Concurrent downloads, pipelining, compression,…

Synthetic Workload Generation

Generate user requests Generate user sessions using a Poisson arrival

process For each user session, determine # clicks

using a Pareto distribution Assign a click to a request for a Web page,

while making sure The popularity distribution of files follows a Zipf-like

distribution [BC98] Capture the temporal locality of successive requests

for the same resource Generate a next click from the same user with

think time following a Pareto distribution

Synthetic Workload Generation (Cont.)

Generate Web pages Determine the number of Web pages Generate the size of each Web pages using a

log-normal distribution Associate a page with some number of

embedded pages using an empirical distribution (heavy-tail)

Generate file modification events Examples of generators

Webbench [Wbe], WebStone[TS95], Surge [BC98], SPecweb99 [SP99], Web Polygraph [WP], …

Cache Replacement Policies Problem formulation

Given a fixed size cache, how to evict pages to maximize the hit ratio once the cache is full?

Hit ratio Fraction of requests satisfied by the cache Fraction of the total size of requested data satisfied by the

cache Factors to consider

Request frequency Modification frequency Benefit of caching: reduction in latency & BW Cost of caching: storage Caveat: NOT all hits are equal. Hit ratios do NOT map

directly to performance improvement.

Cache Replacement Policies (Cont.) Approaches

Least recently used (LRU) Least frequently used (LFU)

Perfect: maintain counters for all pages seen In-cache: maintain counters only for pages that are in

cache GreedyDual-size [CI97]

Assign a utility value to each object, and replace the one with the lowest utility

Use of traces Evaluate the algorithms using trace-driven simulations Analytically derive the hit ratios for different

replacement policies based on a workload model

References [AS86] R. B. D’Agostino and M. A. Stephens. Goodness-of-Fit Techniques. Marcel

Dekker, New York, NY 1986. [ABC+96] Virgilio Almeida, Azer Bestavros, Mark Crovella and Adriana de Oliveria.

Characterizing reference locality in the WWW. In Proceedings of 1996 International Conference on Parallel and Distributed Information Systems (PDIS'96), December 1996.

[ABQ01] A. Adya, P. Bahl, and L. Qiu. Analyzing Browse Patterns of Mobile Clients. In Proc. of SIGCOMM Measurement Workshop, Nov. 2001.

[ABQ02] A. Adya, P. Bahl, and L. Qiu. Characterizing Alert and Browse Services for Mobile Clients. In Proc. of USENIX, Jun. 2002.

[AL01] P. Albitz, and C. Liu. DNS and BIND (4th Edition), O’Reilly & Associates, Apr. 2001.

[AW97] M. Arlitt and C. Williamson. Internet Web Servers: Workload Characterization and Performance Implications. IEEE/ACM Transactions on Networking, Vol. 5, No. 5, pp. 631-645, October 1997.

[BC98] P. Barford and M. Crovella. Generating representative workloads for network and server performance evaluation. In Proc. of SIGMETRICS, 1998.

References (Cont.) [BBC+98] P. Barford, A. Bestavros, M. Crovella, and A. Bradley. Changes in

Web Client Access Patterns: Characteristics and Caching Implications, Special Issue on World Wide Web Characterization and Performance Evaluation; World Wide Web Journal, December 1998.

[BCF+99] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. In Proc. of INFOCOM, Mar. 1999.

[BC00] P. Barford and M. Crovella. Critical Path Analysis of TCP Transactions. In Proc. of ACM SIGCOMM, Aug. 2000.

[BLFF96] T. Berners-Lee, R. Fielding, and H. Frystyk. Hypertext Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996.

[BPS+98] H. Balakrishnan, V. N. Padmanabhan, S. Seshan, M. Stemm and R. H. Katz. TCP Behavior of a Busy Internet Server: Analysis and Improvements. In Proc. IEEE Infocom, San Francisco, CA, USA, March 1998.

[BSS+97] H. Balakrishnan, S. Seshan, M. Stemm, and R. H. Katz. Analyzing Stability in Wide-Area Network Performance. In Proc. of SIGMETRICS, Jun. 1997.

References (Cont.) [CDH+99] R. Caceres, N. G. Duffield, J. Horowitz, D. Towsley, T. Bu. Multicast-

Based Inference of Network Internal Loss Characteristics. In Proc. Infocom, Mar. 1999.

[CB98] M. Crovella and P. Barford. The network effects of prefetching. In Proc. of INFOCOM, 1998.

[CBC95] C. R. Cunha, A. Bestavros, and M. E. Crovella. Characteristics of WWW client-based traces. Technical Report BU-CS-95-010, CS Dept., Boston University, 1995.

[CI97] P. Cao and S. Irani. Cost-Aware WWW proxy caching algorithms. In Proc. of USITS, Dec. 1997.

[DFK+97] F. Douglis, A. Feldmann, B. Krishnamurth, and J. Mogul. Rate of change and other metrics: a live study of the World Wide Web. In Proc. of USITS, 1997.

[DPP+01] N. G. Duffield, F. Lo Presti, V. Paxson, D. Towsley. In Proc. Infocom, Apr. 2001.

[FCD+99] A. Feldmann, R. Caceres, F. Douglis, and M. Rabinovich. Performance of Web Proxy Caching in heterogeneous bandwidth enviornments. In Proc. of INFOCOM, March 1999.

References (Cont.) [FJC+99] L. Fan, Q. Jacobson, P. Cao and W. Lin. Web Prefetching Between Low-

Bandwidth Clients and Proxies: Potential and Performance. In Proc. of SIGMETRICS, 1999.

[FCT+02] Y. Fu, L. Cherkassova, W. Tang, and A. Vahdat. EtE: Passive End-to-End Internet Service Performance Monitering. In Proc. of USENIX, Jun. 2002.

[GMF+99] J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee. Hypertext Transfer Protocol – HTTP 1.1. RFC 2616, Jun. 1999.

[JK88] V. Jacobson, M. J. Karels. Congestion Avoidance and Control. In Proc. SIGCOMM, Aug. 1988.

[JJK+01] S. Jamin, C. Jin, A. R. Kurc, D. Raz, and Y. Shavitt. Constrained Mirror Placement on the Internet. In Proc. of INFOCOM, Apr. 2001.

[Jain91] R. Jain. The Art of Computer Systems Performance Analysis. John Wiley and Sons, 1991.

[Kel02] T. Kelly. Thin-Client Web Access Patterns: Measurements from a Cache-Busting Proxy. Computer Communications, Vol. 25, No. 4 (March 2002), pages 357-366.

[KR01] B. Krishnamurthy and J. Rexford. Web Protocols and Practice, HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement. Addison-Wesley, May 2001.

References (Cont.) [LC97] C. Liu and P. Cao. Maintaining Strong Cache Consistency in the World-

Wide Web. In Proc. of ICDCS'97, pp. 12-21, May 1997. [LNJV99] Z. Liu, N. Niclausse, and C. Jalpa-Villaneuva. Web Traffic Modeling

and Performance Comparison Between HTTP 1.0 and HTTP 1.1. In Erol Gelenbe, editor, System Performance Evaluation: Methodologies and Applications. CRC Press, Aug. 1999.

[Mah97] Bruce Mah. An empirical model of HTTP network traffic. In Proc. of INFOCOM, April 1997.

[Mogul95] Jeffrey C. Mogul. The Case for Persistent-Connection HTTP. In Proc. SIGCOMM '95, pages 299-313. Cambridge, MA, August, 1995.

[MDF+97] J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta-encoding and data compression for HTTP, In Proc. of SIGCOMM, September 1997.

[NC01] R. Nowak and M. Coates. Unicast Network Tomography using the EM algorithm. Submitted to IEEE Transactions on Information Theory, Dec. 2001

[Pad95] V. N. Padmanabhan. Improving World Wide Web Latency. Technical Report UCB/CSD-95-875, University of California, Berkeley, May 1995.

References (Cont.) [PQ00] V. N. Padmanabhan and L. Qiu. The Content and Access Dynamics of a

Busy Web Server. In Proc. of SIGCOMM, Aug. 2000. [PQ02] V. N. Padmanabhan and L. Qiu. Network Tomography using Passive End-

to-End Measurements, DIMACS on Internet and WWW Measurement, Mapping and Modeling, Feb. 2002.

[PQW02] V. N. Padmanabhan, L. Qiu, and H. J. Wang. Passive Network Tomography using Bayesian Inference. Internet Measurement Workshop, Nov. 2002.

[QPV01] L. Qiu, V. N. Padmanabhan, and G. M. Voelker. On the Placement of Web Server Replicas. In Proc. of INFOCOM, Apr. 2001.

[SP99] SPECWeb99 Benchmark. http://www.spec.org/osg/web99/. [Pax98] V. Paxson. An Introduction to Internet Measurement and Modeling.

SIGCOMM’98 tutorial, August 1998. [Ste74] M. A. Stephens. EDF Statistics for Goodness of Fit and Some Comparison.

Journal of the American Statistical Association, Vol. 69, pp. 730 – 737. [TS95] G. Trent and M. Sake. WebStone: The First Generation in HTTP Server

Benchmarking, Feb. 1995. http://www.mindcraft.com/webstone/paper.html.

References (Cont.) [Wbe] Webbench. http://www.zdnet.com/etestinglabs

/stories/benchmarks/0,8829,2326243,00.html. [WP] Web Polygraph: Proxy performance benchmark. http://polygraph.ircache.net/. [WVS+99a] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, M. Brown, T.

Landray,D. Pinnel, A. Karlin, and H. Levy. Organization-Based Analysis of Web-Object Sharing and Caching. In Proc. of the Second USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999.

[WVS+99b] A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale and performance of cooperative Web proxy caching. In Proc. of the 17th ACM Symposium on Operating Systems Principles, Kiawah Island, SC, Dec. 1999.

[YAD01] J. Yin, L. Alvisi, M. Dahlin, A. Iyengar. Engineering server-driven consistency for large scale dynamic services.

[YBS99] H. Yu, L. Breslau, and S. Shenker. A Scalable Web Cache Consistency Architecture. In Proc. of SIGCOMM, August 1999.

Acknowledgement

Thank Alec Wolman for his helpfulcomments.

Thank you!Thank you!\\valleyview\pub\internal-web-perf.ppt\\valleyview\pub\internal-web-perf.ppt

1 Mining Web Traces: Workload Characterization, Performance Diagnosis, and Applications Lili Qiu...

Documents

Transcript of 1 Mining Web Traces: Workload Characterization, Performance Diagnosis, and Applications Lili Qiu...

Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

Speeding Up Short Data Transfers Yin Zhang, Lili Qiu Cornell University Srinivasan Keshav Ensim Corporation NOSSDAV’00, Chapel Hill, NC, June 2000 Theory,

1 Analyzing Browse Patterns of Mobile Clients Lili Qiu Joint work with Atul Adya and Victor Bahl {adya,bahl,liliq}@microsoft.com Microsoft Research ACM.

Robust Network Compressive Sensing Yi-Chao Chen The University of Texas at Austin Joint work with Lili Qiu *, Yin Zhang *, Guangtao Xue +, Zhenxian Hu.

MIMO: Challenges and Opportunities Lili Qiu UT Austin New Directions for Mobile System Design Mini-Workshop.

Enabling High-Bandwidth Vehicular Content …lili/papers/pub/conext10.pdfEnabling High-Bandwidth Vehicular Content Distribution Upendra Shevade, Yi-Chao Chen, Lili Qiu, Yin Zhang,

SOAR: Simple Opportunistic Adaptive Routing Protocol for Wireless Mesh Networks Authors: Eric Rozner, Jayesh Seshadri, Yogita Ashok Mehta, Lili Qiu Published:

1 Clustering Web Content for Efficient Replication Yan Chen, Lili Qiu*, Weiyu Chen, Luan Nguyen, Randy H. Katz EECS Department UC Berkeley *Microsoft Research.

1 A General Model of Wireless Interference Lili Qiu, Yin Zhang, Feng Wang, Mi Kyung Han University of Texas at Austin Ratul Mahajan Microsoft Research.

1 Measuring and Modeling the Impact of Wireless Interference Lili Qiu UT Austin Rice University Nov. 21, 2005.

On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Zhang @ Yale, MR and.

Low-rank By: Yanglet Date: 2012/12/2. Included Works. Yin Zhang, Lili Qiu ―Spatio-Temporal Compressive Sensing and Internet Traffic Matrices, SIGCOMM.

1 Estimation of Link Interference in Static Multi-hop Wireless Networks Jitendra Padhye, Sharad Agarwal, Venkat Padmanabhan, Lili Qiu, Ananth Rao, Brian.

Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.

NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.

From GPS Traces to a Routable Road Map GPS Traces to a Routable Road Map Lili Cao Department of Computer Science University of California Santa Barbara, California USA +1 805 893 3417

Predictable Performance Optimization for Wireless Networks Lili Qiu University of Texas at Austin lili@cs.utexas.edu Joint work with Yi Li, Yin Zhang,

Summary-based Routing for Content-based Event …...Summary-based Routing for Content-based Event Distribution Networks Yi-Min Wang, Lili Qiu, Chad Verbowski, Dimitris Achlioptas,

Measurement-based models enable predictable wireless behavior Ratul Mahajan Microsoft Research Collaborators: Yi Li, Lili Qiu, Charles Reis, Maya Rodrig,

Robust Network Compressive Sensing Yi-Chao Chen The University of Texas at Austin Joint work with Lili Qiu , Yin Zhang , Guangtao Xue +, Zhenxian Hu.

1 Clustering Web Content for Efficient Replication Yan Chen, Lili Qiu, Weiyu Chen, Luan Nguyen, Randy H. Katz EECS Department UC Berkeley Microsoft Research.