Consistent Join Queries for Cloud Data Stores

Zhou Wei 1,2

Guillaume Pierre1, Chi-Hung Chi2

1 VU University Amsterdam2 Tsinghua University Beijing

Background

• Cloud Computing Platforms for Web Apps– Unlimited resources– Scalable NoSQL data stores– Pay-as-you-go

Problems of Cloud data stores

• Problem 1: No Join queries– Only support queries within a single table– No queries across tables

• Problem 2: Relaxed data consistency– Single-row transaction only– No consistency guarantee across multi-items

• Web apps need consistent join queries

Why Web App needs join?

• Data Normalization Foreign-key Equi-joins

PK Title Comment1 IPhone Cool2 IPhone Nice3 IPhone Expensive

PostPK Title1 IPhone

PK Post_id Comment1 1 Cool2 1 Nice3 1 Expensive

Comment

Why consistent?

• Application correctness is not an option• Hard to maintain correctness with an

inconsistent data store– Require skillful programmers– Error-Prone– Cost more time

Alternatives

• SQL databases– Centralized– Full Replication– NOT scalable to update

workload– Complicated queries– ACID Transaction

• NoSQL data stores– Distributed– Partitioned– Scalable to update

workload– No Join Queries– Single-row Transaction

Google BigtableAmazon SimpleDBCassandra …

MySQLOracle …

Our Position

• Scalable NoSQL data stores supporting equi-joins queries for Web applications

• Join query as a ACID transaction• Queries of Web applications access only a

small portion of data– Also true for join queries– All of them are foreign-key equi-joins

How it works?

Web Application

Cloud Data Store

Load Data

Checkpoint

Transactions

TM TM TM

CloudTPSJoin Queries

Outline

• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion

Join Algorithm

• Web applications are response-time sensitive– Table scan is not an option

• The basic idea– Linking records by FK relationships in advance– Each join query has well identified “root records”– Begin from accessing each “root record”– Recursively obtaining its related records

Link records for forward join

PK FK1 22 33 2

Comment

Given referring row identify referenced row(FK) (PK)

Comment

(1,3)2

Link records for backward join

PK FK1 22 33 2

Comment Post Index

Give referenced row identify referring rows(PK) (FK)

Joining two tables

SELECT * FROM post, comment WHERE post.id = 1 and comment.post_id = post.id

comment

post.id = comment.post_id

Multi-Joins

• The table of the “root record” is the root

• Recursively join child tables

• Length of critical execution path = 2

Comment

Revision

Outline

Distributed Transaction

Transaction

•SubT(1)•SubT(2)•SubT(3) TM

SubT(1) SubT(2) SubT(3)

2 Phase Commit (2PC)

Problem

• 2PC requires all records identified in advance• Join queries/index management identify

accessed records on the fly• Solution: extend 2PC to allow adding records

dynamically

The Original 2PC

Data1 Data2

2. Vote 3. Commit/Abort1. Submit

Extended 2PC

• Dynamically add data items to the transaction

Data1 Data2

2. Vote1. Submit

Commit Add Data3

Extended 2PC

Data1 Data2

3. Derive

Extended 2PC

Data1 Data2

4. Vote

Commit

Extended 2PC

Data1 Data2

5. Commit

Outline

Experiment Setup

• Implement a prototype of CloudTPS – Java Servlet– Deployed in Tomcat v6.0

• Environments– 1. Das3+Hbase+Tomcat (99% reqs < 100ms)– 2. EC2+SimpleDB+Tomcat (90% reqs < 100ms)• High-CPU medium instances

Micro-benchmarks

• Generate specific workload– Containing only Join queries/RW transactions– Changing the number of accessed data items– Changing the length of critical execution path– Switch between Update indexes/Not update

• All configured with 10 CloudTPS nodes• Measuring maximum sustainable throughput– Under response time constraint

Two-table join evaluation

Transaction Per Second Record Per Second

Increasing number of accessed records in a transaction

Multi-Joins Evaluation

Increasing the height of join-query treeAll join queries access 12 records

RW transactions evaluation

Transaction Per Second Record Per Second

Increasing number of accessed records in a transaction

Scalability Evaluation

• Macro-Benchmark– Derived from TPC-W, simulating an online book

store– Our workload includes only the join queries and

RW transactions– No simple queries

Scalability Results

Linearly Scalability: Any increase of workload can be addressed by adding more machines

CloudTPS vs. Postgresql

Postgresql: binary-replication

Fault Tolerance

• Recovers from 3 network partitions and 2 machine failures

Conclusion

• We demonstrate supporting consistent join queries over partitioned and distributed data

• Join queries from Web applications only access a small portion of data

• It scales linearly even under an intensive workload of only join queries and read-write transactions

Questions?

Thank you!

Details in Macro-benchmark workload

Join queries RW transactions

PK SK1 A2 B3 B

Secondary-key Queries

• Select * from Table1 Where SK = ‘B’• Create a separate index table for the SK• Translate into a join query

Table 1Index Table

PK’(SK) T1A 1B 2,3

Index Management• Indexes link the related records

PK Title1 Hello2 Good3 Well

PK FK1 12 13 2

post comment

T11,23-

Index Management (cont.)

• Case: a new comment inserted• Updating indexes atomically

PK FK1 12 13 24 2

post comment

T11,23, 4

• Consistency Violated Case 1

PK FK1 12 13 24 2

post comment

T11,23-

• Consistency Violated Case 2

PK Title1 Hello2 Good3 OK

PK FK1 12 13 2

post comment

T11,23, 4

Fault tolerance

• Maintain ACID during server failure– Transactions that reach an agreement of “COMMIT”

should Commit– Abort other transactions

• Replication to N transaction managers– Values of Data Items– States of Transactions (Vote result and “redo logs”)

• Failover of a failed server– Transfer ownership of data items– Transfer leadership of transactions being coordinated

Optimization for Join Queries

• Join Query is a Read-only transaction– Remove the final Commit phase– Sub-transactions removed immediately after local

execution– RO transactions will not block other transactions

Concurrency Control

SubT(1)

SubT(1)5

Timestamp Ordering

SubT(2)

SubT(2)5

Read-Only Transaction

ROSubT(1)

SubT(1)5

SubT(2)5

Join(Data1) TM TM

SubT(1)7

Read-Only Transaction (Cont.)

ROSubT(1)6

Join(Data1) TM TM

SubT(1)7

Read-only Transaction (cont.)

NoneROSubT(2)6

Result

SubT(1)7

Consistent Join Queries for Cloud Data Stores

Documents

Transcript of Consistent Join Queries for Cloud Data Stores

RETENDER FOR ENGAGEMENT OF CLEARING AND ...Contract person for Technical Queries G. Soundara rajan, Deputy Registrar (Stores and Purchase) NITT, Tiruchy-15 Land line number : 0431

Consistent Range-Queries in Distributed Key-Value Stores617816/FULLTEXT01.pdf · Consistent Range-Queries in Distributed Key-Value Stores Providing Consistent Range-Query Operations

Coloumn Stores vs Row Stores

Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.

Declarative Programming over Eventually Consistent Data Stores - Gowthamgowthamk.github.io/docs/quelea.pdf · 2020-05-17 · Gowtham Kaki Purdue University, USA gkaki@cs.purdue.edu

SUPREME COURT OF THE UNITED STATES · ment. Respondent Abercrombie & Fitch Stores, Inc., operates several lines of clothing stores, each with its own “style.” Consistent with

RETAIL - ibef.org · Departmental stores Hypermarkets Supermarkets/ convenience stores Specialty stores Cash and carry stores Pantaloon has 209 stores cash and carry

Hoofdstuk 8: Werken met SQL voor eindgebruikers. Overzicht Inleiding op SQL: de select-instructie Eenvoudige queries Join queries Nested queries Queries.

Batch Processing - Cornell University · Iterative processing on real-time data stream Interactive queries on a consistent view of results Argue that currently Streaming systems cannot

Databases – Queries and Database Practice Queries

CREATING COMPLEX QUERIES WITH NESTED QUERIES CS1100: Data, Databases, and Queries CS1100Microsoft Access1.

Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to…

Your Tasks A.Create queries and reports that will meet the needs of the business. – Reports should have a consistent house style. B. Explain any testing.

Aggregate Queries in NoSQL Cloud Data Stores

An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

NoSQL: HBase and Neo4j - ce.uniroma2.it · Suitable use cases for column-family stores • Queries that involve only a few columns • Aggregation queries against vast amounts of

Working with Queries Keyboard Shortcuts psalltraining.com General · 2015. 9. 30. · Field Data Types Data Type Description Text (Default) Stores text, numbers, or a combination

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America)

Consistent Join Queries for Cloud Data Stores Zhou Wei 1,2 Guillaume Pierre 1, Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing.

Building SAP BusinessObjects Web Intelligence queries ... · 1 Building queries on BEx queries BEx queries (SAP Business Explorer queries) are queries created using the SAP BEx Query