1 Increasing the Scalability of Dynamic Web Applications Thesis Defense Amit Manjhi March 4, 2008...

81
1 Increasing the Increasing the Scalability of Dynamic Scalability of Dynamic Web Applications Web Applications Thesis Defense Amit Manjhi March 4, 2008 Thesis committee: Bruce Maggs (co-chair) Todd Mowry (co-chair) Chris Olston (co-chair) Mahadev Satyanarayanan Mike Franklin (UC Berkele School of Computer Science Carnegie Mellon

Transcript of 1 Increasing the Scalability of Dynamic Web Applications Thesis Defense Amit Manjhi March 4, 2008...

1

Increasing the Scalability of Increasing the Scalability of Dynamic Web ApplicationsDynamic Web ApplicationsIncreasing the Scalability of Increasing the Scalability of Dynamic Web ApplicationsDynamic Web Applications

Thesis Defense

Amit Manjhi March 4, 2008

Thesis committee: Bruce Maggs (co-chair) Todd Mowry (co-chair) Chris Olston (co-chair) Mahadev Satyanarayanan Mike Franklin (UC Berkeley)

School of Computer ScienceCarnegie Mellon

2

Typical Architecture of Dynamic Typical Architecture of Dynamic Web ApplicationsWeb Applications

Home server

Web Server

App Server

Database

Users Request

Response

Execute code

Accessdatabase

Internet

Web applications need to provision for variable and unpredictable load

3

An Example of Unpredictable An Example of Unpredictable LoadLoad

CNN, NY Times, ABC News unavailable from 9-10 AM

(Eastern Time)

Applications face a dilemma: how much resources to provision?

Need on-demand scalability

Dai

ly p

age

view

s(in

mill

ions

)

CNN.com

4

Content Delivery NetworksContent Delivery Networks

Users

• Scales central web server• Works well for static content

CDN nodes

Internet

1. Large infrastructure handle load spikes

2. Shared infrastructure charge on a usage basis

5

CDN Application ServicesCDN Application Services

Users

CDN nodes

Database server is still a bottleneck

Internet

6

A distributed architecture still A distributed architecture still has database as a bottleneckhas database as a bottleneck

users:

home serverdatabase

Content Delivery NetworkContent Delivery Network

7

Methods to Scale the Database Methods to Scale the Database ComponentComponent In-house database scalability: [DBCache, DBProxy,

MTCache, NEC Cache Portal]: Not economical

Database outsourcing: Database as a service [Hacigumus+ ICDE ’02, Hacigumus+SIGMOD ’02]: Applications have to cede control of data

Database Outsourcing: Commercial Efforts [Amazon SimpleDB, Longjump, Zoho Creator]

Useful only for simple applications Must trust the provider

8

Secondary GoalsSecondary Goals

Generate response as the application developer intended [Ramaswamy+ WWW ’04, Challenger+ INFOCOM ’00]

Execute code written for the traditional architecture [Yang+ ICDE ’06, WWW ’07]

Must work on three benchmark applications AUCTION (ebay.com) BBOARD (slashdot.org) BOOKSTORE (amazon.com)

Our ApproachOur Approach

Database Scalability Service (DBSS): Shared infrastructure that caches applications’ data [Olston, Manjhi+ CIDR ’05, Manjhi+ SIGMOD ’06, Manjhi+ ICDE ’07]

Apply benefits of CDN to scaling the database

1. Large infrastructure handle load spikes

2. Shared infrastructure charge on a usage basis

9

10

Database Scalability Service Database Scalability Service ArchitectureArchitecture

Database Scalability Service (DBSS)

Database Scalability Service (DBSS)

home serverdatabases

users:

Request

Database queries and updates

Query results

Data

Content Delivery NetworkContent Delivery Network

Response

Database queries and updates

• Data security concerns

• Reducing user latency

11

Thesis StatementThesis Statement

It is possible to economically scale dynamic Web applicationswhile respecting their security concerns

12

OutlineOutline

Need for on-demand scalability Guaranteeing security in a DBSS setting

Security-scalability tradeoff Security without hurting scalability General framework to manage the tradeoff

Reducing user latency in a DBSS setting Contributions

13

Guaranteeing Security in a DBSS Guaranteeing Security in a DBSS SettingSetting

All data passing through the DBSS can be encrypted:Query, Update, Query results

Goal: limit DBSS from observing an application’s data

Database Scalability ServiceDatabase Scalability Service

Content Delivery NetworkContent Delivery Network

Home server handles updates directly

DBSS caches query results —kept consistent by invalidation

14

A Simple ExampleA Simple Example

Empty

Home server database

Q:SELECT id FROM comments WHERE story=“Intel” AND rating>0

DBSS nodeQ

Q:id=11,15

U

Empty

Q

Nothing is encrypted

Results are encrypted

No Invalidations

Q:

Q:

U

Invalidate

More encryption can lead to more invalidations

comments (id, rating, story)

Result

Result

U:UPDATE comments SET rating=2 WHERE id=15

Q: id=11,15

11 1 Intel

15 1 Intel

11 1 Intel

15 1 Intel

11 1 Intel

15 2 Intel

11 1 Intel

15 2 Intel

15

Security-Scalability Space for Security-Scalability Space for Query Result CachingQuery Result Caching

Sca

labi

lity

Security

(Maximum security, read-only scalability)

No encryption

Encrypt everything

No

Full

(Not to scale. Just for illustration)

Easy to either get good scalability or good security

16

Providing Scalability While Providing Scalability While Guaranteeing SecurityGuaranteeing Security

More encryption Less encryption

Conservative Invalidation

Security

Precise Invalidation

Scalability

Security-scalability tradeoff

When updates occur, DBSS must decide what to invalidate

Applications face a dilemma in what to encrypt (secure)

17

OutlineOutline

Need for on-demand scalability Guaranteeing security in a DBSS setting

Security-scalability tradeoff Security without hurting scalability General framework to manage the tradeoff

Reducing user latency in a DBSS setting Contributions

18

Key Insight: Arbitrary Queries and Key Insight: Arbitrary Queries and Updates Not PossibleUpdates Not Possible

function get_toy_id ($toy_name) {

$template:=“SELECT toy_id FROM toys

WHERE toy_name=?”;

$query:=attach_to_template ($template, $toy_name);

$result:=execute ($query);

}

An algorithm for statically identifying data

that does not help in invalidation

Given templates:

Importantcontribution

19

Examples of Data Examples of Data NotNot Useful for Useful for InvalidationInvalidation

SELECT toy_id FROM toys WHERE toy_name=?

DELETE FROM toys WHERE toy_id=?

Example 2:

Example 1:SELECT toy_id FROM toys WHERE toy_name=?

SELECT toy_name FROM toys WHERE toy_id=?

Any data passing through the DBSS is not useful

Query parameters are not useful for invalidation

20

Security without Hurting Security without Hurting ScalabilityScalability

Scalability Conscious Security Approach [Manjhi+ SIGMOD ’06]

Tradeoff has to be managed only over remaining data

Data not useful for invalidation

Can secure “for free” (without hurting scalability)

As a result,

21

Security-Scalability Space for Security-Scalability Space for Query Result CachingQuery Result Caching

Sca

labi

lity

Security

(Maximum security, read-only scalability)

No encryption

Encrypt everything

Encrypt data not useful for invalidation [Manjhi+ SIGMOD 06]

Want solutions in this space Want solutions in this space

No SCSA

Full

(Not to scale. Just for illustration)

22

OutlineOutline

Need for on-demand scalability Guaranteeing security in a DBSS setting

Security-scalability tradeoff Security without hurting scalability General framework to manage the tradeoff

Reducing user latency in a DBSS setting Contributions

23

Invalidation Clues: MotivationInvalidation Clues: Motivation

SELECT toy_id, price FROM toys WHERE toy_name=?DELETE FROM toys WHERE toy_id=?

Want to encrypt part of the query result

SELECT id FROM commentsWHERE story=‘Intel’

AND rating>0

UPDATE comments SET rating=?WHERE id=?

#1

#2

Knowing ‘story’ of the comment helps in invalidation(If comment’s story is not ‘Intel’ no invalidations)

BULLETIN-BOARD: comments(id, rating, story)

24

How do invalidation clues work? How do invalidation clues work? [Manjhi+ ICDE 07][Manjhi+ ICDE 07]

Home serverDBSS

Database

Query

Update

Emptyquery clue

ResultQuery

query clue

ResultQueryResultQuery

Updateupdate clue

Invalidations (query clue, update clue)

Home servers attach query clues to query results and update clues to updates. DBSS uses query and update clues for invalidation.

25

Security-Scalability Space for Security-Scalability Space for Query Result CachingQuery Result Caching

Sca

labi

lity

No encryption

Encrypt everything

Encrypt data not useful for invalidation [Manjhi+ SIGMOD 06]

Want solutions in this space Want solutions in this space

No SCSA

Full

Database

(Code-analysis security, maximum scalability)

clues offer fine-grained tradeoff

Security(Not to scale. Just for illustration)

26

Minimizing Invalidations in the Minimizing Invalidations in the Clues FrameworkClues Framework

What is the “most precise” invalidation that can be done? -- may need more data than what passes through the DBSS

SELECT id FROM comments WHERE story=? AND rating>?UPDATE comments SET rating=? WHERE id=?

Is comment id ‘5’ present in the result? Yes: invalidation decision is based on rating values No: Based on rating values, need to know story

Invalidation logic on an update with id ‘5’:

Database Inspection Strategy: Invalidate as if using the database

27

On an update, need the story of the comment id being updated

Query Clue:

Database Inspection Strategy and Database Inspection Strategy and BeyondBeyond

Auxiliary view

id story

SELECT id FROM comments WHERE story=? AND rating>?UPDATE comments SET rating=? WHERE id=?

1. Consistency2. Privacy

On-the-fly

Opportunistic Strategy: Use database cluesonly when benefits exceed overhead

Update Clue: send story of the comment

OR

28

Methodology of Sample Methodology of Sample ExperimentExperiment

Home serverCDN and DBSSUsers

5 ms 100 ms

Scalability: max # concurrent users with response time less than 2 seconds

Machines on Emulab

29

Benchmark Applications1.Factor of 2-5 improvement over using no DBSS

2.Using more clues is not necessarily a win

Sca

labi

lity

(num

ber

of

conc

urre

nt u

sers

sup

port

ed)

Scalability Benefits of CluesScalability Benefits of Clues

0

300

600

900

Auction Bboard Bookstore

No DBSS Clues (excl. DB clues)

Clues (incl. DB clues)

Hybrid

30

Related Work: View Related Work: View InvalidationInvalidation View invalidation strategies: Levy and Sagiv VLDB ’93,

Candan+ VLDB ’02, Choi and Luo APWeb ’04

View Maintenance: Gupta and Blakeley Information Systems ’95, Quass+ PDIS ’96

Database update clues: Candan+ VLDB ’02

Cheap but conservative invalidator: Satya PODS ’96

Our work: • compares view-invalidation strategies• study database update clues formally

31

Related Work: PrivacyRelated Work: Privacy

Order preserving encryption [Agrawal+ SIGMOD ’04] Fails under a model where DBSS can pose as a user

Privacy-scalability tradeoff in the “coarseness” of index on encrypted data [Hore+ VLDB ’04] Different domain and different objectives

Privacy metrics: k-anonymity [Sweeney IJUFK’02], L-diversity [Machanavajjhala+ ICDE ’06], t-closeness [Li+ ICDE ’07] The tradeoff does not depend on the privacy metric

32

Managing Security Scalability Tradeoff: Managing Security Scalability Tradeoff: ContributionsContributions Identify security-scalability tradeoff Static analysis of database templates for identifying data

not useful for invalidation Most data encrypted for free is moderately sensitive

Study “precise” invalidation – Database (update) clues Using database clues is not always good for scalability—

hybrid strategy Applications can manage tradeoff at a fine granularity Factor of 2-5 improvement in scalability

33

OutlineOutline

Need for on-demand scalability Guaranteeing security in a DBSS setting

Security-scalability tradeoff Security without hurting scalability General framework to manage the tradeoff

Reducing user latency in a DBSS setting Contributions

34

Contributors to User LatencyContributors to User Latency

Web server App server Database

DatabaseDBSSCDN

Traditional architecture

DBSS architecture

Request, high latency

Response, high latency

high latency

A single HTTP request Multiple database requests

35

Sample Web Application CodeSample Web Application Code

function find_comments ($user_id) {

$template:=“SELECT from_id, body FROM comments

WHERE to_id=?”

$query:=attach_to_template ($template, $user_id)

$result:=execute ($query)

foreach ($row in $result)

print (get_body ($row), get_name (get_id ($row)))

}

(N+1) queries are issued because:

• Convenient for programmers to abstract database values

• No effect on performance in the traditional setting

Found many examples in the benchmark applications

36

Reducing User Latency in a DBSS Reducing User Latency in a DBSS SettingSettingTransformations to reduce number of round-trips1. Group execution of queries: MERGING transformation2. Overlap execution of queries: NONBLOCKING transformation

Proceduralprogram with

embedded SQL Holistic transformations using src-to-src compilers

Transformed program and SQL

Web Application Code Transformed Code

37

The MERGING TransformationThe MERGING Transformation

Names of users who have posted comments about John

1. Find user_ids who have made comments

2. For each user_id, find name of the user

Database Scalability Service

Database Scalability Service

www.ebay.com

Content Delivery NetworkContent Delivery Network

John

High latency

1 Query

N Queries

38

The MERGINGThe MERGING Transformation Transformation

SELECT from_id, u.nameFROM comments, users uWHERE from_id = u.id AND to_id = ?

Find names of users who have commented about John

Names of users who have posted comments about John

1. Find user_ids who have made comments

2. For each user_id, find name of the user

Assuming constant cache hit rate, the #round-trips to the database decreases by a factor of (N+1)

39

Database Scalability ServiceDatabase Scalability Service

The NONBLOCKINGThe NONBLOCKING Transformation Transformationwww.amazon.com

Content Delivery NetworkContent Delivery Network

John

Home page

1. Greet user

2. Get names of related books

High latency

Issue queries concurrently to reduce latency

40

Applicability of the Applicability of the TransformationsTransformations

Either transformation applies to 25% (Auction), 75% (Bboard), and 50% (Bookstore) dynamic runtime interactions

41

BBOARDBBOARD Application: Impact on Application: Impact on LatencyLatency

Transformations

Ave

rage

late

ncy

in m

s

Overall latency decreases by 38%, the DBSS-DB latency decreases by 65%

42

Impact of Latency on ScalabilityImpact of Latency on Scalability

Simultaneous users supported

Latency

ThresholdScalability

Improved scalability

Latency curve

Reduced latency curve

Reducing latency improves scalability

43

Effect of the Transformations on Effect of the Transformations on ScalabilityScalability

Sca

labi

lity

(num

ber

of

conc

urre

nt u

sers

sup

port

ed)

44

Effect of the Transformations on Effect of the Transformations on ScalabilityScalability

Applying both transformations yield the best scalability

Sca

labi

lity

(num

ber

of

conc

urre

nt u

sers

sup

port

ed)

45

Related Work:Related Work: MERGINGMERGING transformationtransformation Cassyopia [HOT OS’03]: cluster system calls

Preliminary work; in different domain Hilda [Yang+ WWW ’07], Abacus [Amiri+ ATC ’00]

Use a custom language Stored procedures

Difficult to optimize and cache Nested query optimization [TODS ’82, SIGMOD ’87] Multi-query optimization [SIGMOD 00]

Database optimizes instead of compiler

46

Related Work:Related Work: NONBLOCKINGNONBLOCKING transformationtransformation Use application specific knowledge for prefetching

[Brown+ OSDI ’00, Mowry+ OSDI ’96] , [Patterson+ SOSP ’95]

Different domain: No SQL analysis was necessary

Issue prefetches by detecting patterns in misses Page faults [Curewitz+ SIGMOD’93], web pages

[Nanopoulos+ TKDE’03], file-systems [Kroeger+ ATC’96] Patterns must be established Mis-prediction if pattern changes

47

Reducing User Latency in a DBSS Setting: Reducing User Latency in a DBSS Setting: Contributions Contributions

Proposed two holistic transformations that

Reduce the #round-trips in accessing the data

Apply in 25% to 75% of the interactions

Improve scalability by over 10% in a DBSS setting

Can be applied automatically by src-to-src compilers

48

Thesis ContributionsThesis Contributions

Identified and studied the security-scalability tradeoff Secured about 75% of data without hurting scalability Proposed invalidation clues that provide better tradeoffs

Proposed transformations to reduce user latency Improved scalability by 10%

Evaluated all techniques on a prototype DBSS using three benchmark applications Overall scalability improved by a factor of 3

49

Thanks!Thanks!

Questions?

50

Backup SlidesBackup Slides

51

Number of requests a website Number of requests a website receives is also unpredictablereceives is also unpredictable

Source: 1. CNN news release Sept 12, 2001; 2. Keynote’s news release Sept 11, 2001 1. http://archives.cnn.com/2001/TECH/internet/09/12/attacks.internet/ 2. http://www.keynote.com/news_events/releases_2001/091101.html

Pag

e vi

ews/

day

for

CN

N.c

om

(in m

illio

ns)

CNN, NYtimes, ABCnews unavailable from 9-10 EDT

52

An appealing solution is to use a An appealing solution is to use a CDNCDN

Source: http://www.tcsa.org/lisa2001/cnn.txt http://www.akamai.com/en/html/about/press/press479.html

Pag

e vi

ews/

day

(in m

illio

ns)

Traffic at CNN.com

Pag

e si

ze

(in k

B)

Used Akamai on Election Day

1. Large infrastructure handle load spikes

2. Shared infrastructure charge on a usage basis

53

CDNs do not provide a way to CDNs do not provide a way to scale the database componentscale the database component

Home server

Web Server

App Server

DB

Request

Response

Executecode

AccessDB

Users

Dynamic content sites are becoming increasingly popular

54

Trusting the Site of Code Trusting the Site of Code ExecutionExecution Code is executed at a much larger trustworthy

company Akamai vs. database-scalability-service startup

Code is executed by the application Database is the big bottleneck

Code is executed at the end-user’s site

Trusted computing initiative

55

A Simple ExampleA Simple Example

Empty

Home server Database

Q1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe”

DBSSQ1

Q1:toy_id=15

Q1: toy_id=15

U1: DELETE FROM toys WHERE toy_id=5

U1

Empty

Q1

Nothing is encrypted

Results are encrypted

No Invalidations

Q1:

Q1:

U1

Invalidate

Encryption leads to more invalidations

11 Barbie

15 GI Joe

11 Barbie

15 GI Joe

toys (toy_id, toy_name)

Result

Result

56

Security-Scalability TradeoffSecurity-Scalability Tradeoff

Template Parameters Query result

Invalidations

Blind All Q1, Q2, Q3

Template All Q1, Q2

Statement All Q1,

Q2 with toy_id=5

View Q1 with toy_id=5

Q2 with toy_id=5

U1: DELETE FROM toys WHERE toy_id=5

Sca

labi

lity

Sec

urity

x x xxxx

Q1 SELECT toy_id FROM toys WHERE toy_name=?

Q2 SELECT qty FROM toys WHERE toy_id=?

Q3 SELECT cust_name FROM customers WHERE cust_id=?

57

Security-Scalability tradeoff for the BOOKSTORE application

Security-Scalability tradeoffSecurity-Scalability tradeoff

Security (Number of query templates with encrypted results)

Sca

labi

lity

(Num

ber

of

con

curr

en

t use

rs s

up

port

ed) Nothing

encrypted

Everything encrypted

0

300

600

900

0 5 10 15 20 25 30

58

Opportunity for Managing the Opportunity for Managing the TradeoffTradeoff

But for most data, nontrivial to assess: 1. Data-sensitivity2. Scalability impact of securing the data

Data Sensitivity

Extremely sensitive

Completely insensitive

Moderately sensitive

Credit Card Information

Bestsellers list

Inventory records, customer records

Don’t careCare but worried about scalability impact

Secure atall costs

Not all data is equally sensitive

59

SCSA [SIGMOD ’06]SCSA [SIGMOD ’06]

Tradeoff needs to be managed over reduced data

Invalidation Matrix (IM) characterization results

Construct IM for each template pair

Find data not useful for invalidation

Apply a greedy algorithm

Privacy LawOther

constraints

60

Methodology of Sample Methodology of Sample ExperimentExperiment

Home serverCDN and DBSSUsers

5 ms 100 ms

Scalability: max # concurrent users with acceptable response times

Security: # templates with encrypted results

BOOKSTORE application

61

1. Easy to either get good scalability or good security2. SCSA presents a shortcut to manage the tradeoff

Scalability Conscious Security Scalability Conscious Security Approach (SCSA) for Managing the Approach (SCSA) for Managing the TradeoffTradeoff

Security (Number of query templates with encrypted results)

Sca

labi

lity

(Num

ber

of

con

curr

en

t use

rs s

up

port

ed)

Nothing encrypted

SCSA

Everything encrypted

0

300

600

900

0 5 10 15 20 25 30

62

Sca

labi

lity

(num

ber

of

conc

urre

nt u

sers

sup

port

ed)

Magnitude of Security-Scalability TradeoffMagnitude of Security-Scalability Tradeoff

00

Benchmark Applications

63

Security ResultsSecurity Results

Bboard

and result

Query data that can be encrypted “for free”

Auction

18

6 4 17 7

12

Bookstore

14

7 7

64

Security Results in DetailSecurity Results in Detail

Auction: The historical record of user bids was not exposed

Bboard: The rating users give one another based on the quality of their posting

Bookstore: Book purchase association rules discovered by the vendor – customers who purchase book A also purchase book B

65

Scalability Conscious Security Scalability Conscious Security Approach: ContributionsApproach: Contributions

Identify security-scalability tradeoff

Shortcut to manage the tradeoff Static analysis of database templates for identifying

data not useful for invalidation Tradeoff must be managed over the remaining data

Evaluation Blanket encryption hurts scalability Most data encrypted for free is moderately sensitive

66

Invalidation Clues: MotivationInvalidation Clues: Motivation

SELECT toy_id, price FROM toys WHERE toy_name=“GI Joe”

template parameter

Augmented example template:

DELETE FROM toys WHERE toy_id=5

1. Coarse grained—either encrypt query result or not

2. Not possible to get the best scalability

3. No general framework for studying the tradeoff

4. Did not consider specific attack models from DBSS

Previous solution:

67

Invalidation Clues [ICDE 2007]Invalidation Clues [ICDE 2007]

Limit unnecessary invalidations Rule out most unnecessary invalidation

Limit revealed information Achieve a target security/privacy by hiding information from

the DBSS

Limit database overhead Don’t enumerate what to invalidate—provide “hints”

68

Illustrative Example of CluesIllustrative Example of Clues

UPDATE items SET end_date = ? WHERE item_id = ?

UT

SELECT item_id, category, end_date FROM items WHERE seller = ?

QT

Query result invalidated ifUpdate clueQuery clue

item_id =7 present as per Bloom-filter

Bloom-filter of {7}

Bloom-filter of item_id values

item_id = 7 in query result7item_id values

item_id = 7 in query result20080304, 7query result

any update occursnonenone

UPDATE items SET end_date = 20080304 WHERE item_id = 7

69

Database Update Clues: UPDATEDatabase Update Clues: UPDATE

SELECT item_id FROM items

WHERE items.category=‘books’

AND items.end_date>=tomorrow

UPDATE items SET end_date=end_date+? DAYS WHERE item_id=?

For “precise” invalidation need to know: category of the item

70

Database Update Clues: INSERTDatabase Update Clues: INSERT

SELECT item_id FROM items, users

WHERE items.seller=users.user_id

AND items.category=‘books’

AND items.end_date>=tomorrow

AND users.region=PA

INSERT INTO items VALUES (…)

For “precise” invalidation need to know: category of the item, region of the seller

71

An application has to make An application has to make multiple round-trips to access its multiple round-trips to access its datadatafunction get_comments_on_user ($user_id) { $template:=SELECT from_user_id FROM comments WHERE to_user_id=? $query:=set_parameters ($template, $user_id) $result:=execute ($query)

foreach ($row in $result) { $from_id:=get_id_from_row ($row) $template:=“SELECT user_name FROM users WHERE user_id=?” $query:=set_parameters($template, $from_id) $result:=execute ($query)}

Affects interactivity in a DBSS setting

72

MERGINGMERGING Transformation Transformation

$query1:=“SELECT from_id FROM commentsWHERE to_id=?”;

$result1:=execute ($query1);

foreach ($from_id in $result1)$query2:=“SELECT name FROM users

WHERE id=$from_id”;$result2:=execute ($query2);

Names of users who have posted comments about John

comments (from_id,to_id,…), users (id,name)

Applicationjoin

73

Example for Example for NONBLOCKING NONBLOCKING

TransformationTransformation

SELECT iname FROM items i1, items i2

WHERE i1.iid=i2.related AND i2.iid=?

SELECT uname FROM users WHERE uid=?

Related item

Greet user

User viewing details of a book

items(iid, iname, related), users(uid, uname)

User latency decreased by issuing the queries concurrently

Do it automatically by code analysis tools

74

Why opportunities for applying Why opportunities for applying these transformations exist?these transformations exist?

Almost no overhead for code like “application join” in a centralized setting

Developers find it convenient to abstract database elements as values (ORMs like Ruby-on-Rails), and use object-oriented development

When presenting data to the user, developers find it convenient to get data as and when needed

75

Scalability Effects of Increasing Scalability Effects of Increasing Home Server BandwidthHome Server Bandwidth

Home server bandwidth was the bottleneck

Scalability increased by 20% in each case

Sca

labi

lity

(num

ber

of

conc

urre

nt u

sers

sup

port

ed)

76

Applicability of the Applicability of the TransformationsTransformations

AUCTION BBOARD BOOKSTORE

Applicable Not applicable Static

Transformations widely applicable

% o

f ru

ntim

e in

tera

ctio

ns

77

Benchmark ApplicationsBenchmark Applications

Auction (RUBiS, from Rice) Modeled after Ebay

Bulletin board (RUBBoS, from Rice) Modeled after Slashdot

Bookstore (TPC-W, from UW-Madison) Online bookseller, a standard web benchmark Changed the popularity of books

Benchmarks model popular websites

78

Related Work: ConsistencyRelated Work: Consistency

Two levels of consistency Best-effort consistency (eventual consistency):

sacrifice performance for consistency – BBOARD

Strong consistency: Civic emergency example If queries carry “freshness constraints”,

serializability can be guaranteed

Coverage of the Coverage of the MERGINGMERGING TransformationTransformation

79

Coverage of the Coverage of the NONBLOCKING NONBLOCKING TransformationTransformation

80

Impact of the Impact of the MERGINGMERGING Transformation Transformation on Latencyon Latency

81

The MERGING transformation is more effectivein reducing latency of the BBOARD benchmark