Post on 25-Feb-2016
description
Consistent Join Queries for Cloud Data Stores
Zhou Wei 1,2
Guillaume Pierre1, Chi-Hung Chi2
1 VU University Amsterdam2 Tsinghua University Beijing
Background
• Cloud Computing Platforms for Web Apps– Unlimited resources– Scalable NoSQL data stores– Pay-as-you-go
Problems of Cloud data stores
• Problem 1: No Join queries– Only support queries within a single table– No queries across tables
• Problem 2: Relaxed data consistency– Single-row transaction only– No consistency guarantee across multi-items
• Web apps need consistent join queries
Why Web App needs join?
• Data Normalization Foreign-key Equi-joins
PK Title Comment1 IPhone Cool2 IPhone Nice3 IPhone Expensive
PostPK Title1 IPhone
PK Post_id Comment1 1 Cool2 1 Nice3 1 Expensive
Post
Comment
Why consistent?
• Application correctness is not an option• Hard to maintain correctness with an
inconsistent data store– Require skillful programmers– Error-Prone– Cost more time
Alternatives
• SQL databases– Centralized– Full Replication– NOT scalable to update
workload– Complicated queries– ACID Transaction
• NoSQL data stores– Distributed– Partitioned– Scalable to update
workload– No Join Queries– Single-row Transaction
Google BigtableAmazon SimpleDBCassandra …
MySQLOracle …
Our Position
• Scalable NoSQL data stores supporting equi-joins queries for Web applications
• Join query as a ACID transaction• Queries of Web applications access only a
small portion of data– Also true for join queries– All of them are foreign-key equi-joins
8
How it works?
Web Application
Cloud Data Store
Load Data
TM
Checkpoint
Transactions
TM TM TM
CloudTPSJoin Queries
Outline
• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion
Join Algorithm
• Web applications are response-time sensitive– Table scan is not an option
• The basic idea– Linking records by FK relationships in advance– Each join query has well identified “root records”– Begin from accessing each “root record”– Recursively obtaining its related records
Link records for forward join
PK FK1 22 33 2
Comment
PK123
Post
Given referring row identify referenced row(FK) (PK)
Comment
(1,3)2
Link records for backward join
PK FK1 22 33 2
PK123
Comment Post Index
Give referenced row identify referring rows(PK) (FK)
Joining two tables
SELECT * FROM post, comment WHERE post.id = 1 and comment.post_id = post.id
post
comment
id=1
post.id = comment.post_id
Multi-Joins
• The table of the “root record” is the root
• Recursively join child tables
• Length of critical execution path = 2
Post
Comment
Id=1
User
Revision
Outline
• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion
Distributed Transaction
TM
RW-TC
Data1
Transaction
•SubT(1)•SubT(2)•SubT(3) TM
RW-TC
Data2
TM
RW-TC
Data3
SubT(1) SubT(2) SubT(3)
2 Phase Commit (2PC)
Problem
• 2PC requires all records identified in advance• Join queries/index management identify
accessed records on the fly• Solution: extend 2PC to allow adding records
dynamically
The Original 2PC
TC
Data1 Data2
TC
Data1 Data2
TC
Data1 Data2
2. Vote 3. Commit/Abort1. Submit
Extended 2PC
• Dynamically add data items to the transaction
TC
Data1 Data2
TC
Data1 Data2
2. Vote1. Submit
Commit Add Data3
Extended 2PC
TC
Data1 Data2
3. Derive
Data3
Extended 2PC
TC
Data1 Data2
4. Vote
Data3
Commit
Extended 2PC
TC
Data1 Data2
5. Commit
Data3
Outline
• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion
Experiment Setup
• Implement a prototype of CloudTPS – Java Servlet– Deployed in Tomcat v6.0
• Environments– 1. Das3+Hbase+Tomcat (99% reqs < 100ms)– 2. EC2+SimpleDB+Tomcat (90% reqs < 100ms)• High-CPU medium instances
Micro-benchmarks
• Generate specific workload– Containing only Join queries/RW transactions– Changing the number of accessed data items– Changing the length of critical execution path– Switch between Update indexes/Not update
• All configured with 10 CloudTPS nodes• Measuring maximum sustainable throughput– Under response time constraint
Two-table join evaluation
Transaction Per Second Record Per Second
Increasing number of accessed records in a transaction
Multi-Joins Evaluation
Increasing the height of join-query treeAll join queries access 12 records
RW transactions evaluation
Transaction Per Second Record Per Second
Increasing number of accessed records in a transaction
Scalability Evaluation
• Macro-Benchmark– Derived from TPC-W, simulating an online book
store– Our workload includes only the join queries and
RW transactions– No simple queries
Scalability Results
Linearly Scalability: Any increase of workload can be addressed by adding more machines
CloudTPS vs. Postgresql
Postgresql: binary-replication
Fault Tolerance
• Recovers from 3 network partitions and 2 machine failures
Conclusion
• We demonstrate supporting consistent join queries over partitioned and distributed data
• Join queries from Web applications only access a small portion of data
• It scales linearly even under an intensive workload of only join queries and read-write transactions
Questions?
Thank you!
Details in Macro-benchmark workload
Join queries RW transactions
PK SK1 A2 B3 B
Secondary-key Queries
• Select * from Table1 Where SK = ‘B’• Create a separate index table for the SK• Translate into a join query
Table 1Index Table
PK’(SK) T1A 1B 2,3
Index Management• Indexes link the related records
PK Title1 Hello2 Good3 Well
PK FK1 12 13 2
post comment
T11,23-
Index
Index Management (cont.)
• Case: a new comment inserted• Updating indexes atomically
PK Title1 Hello2 Good3 Well
PK FK1 12 13 24 2
post comment
T11,23, 4
-
Index
Index Management (cont.)
• Consistency Violated Case 1
PK Title1 Hello2 Good3 Well
PK FK1 12 13 24 2
post comment
T11,23-
Index
?
Index Management (cont.)
• Consistency Violated Case 2
PK Title1 Hello2 Good3 OK
PK FK1 12 13 2
post comment
T11,23, 4
-
Index
?
Fault tolerance
• Maintain ACID during server failure– Transactions that reach an agreement of “COMMIT”
should Commit– Abort other transactions
• Replication to N transaction managers– Values of Data Items– States of Transactions (Vote result and “redo logs”)
• Failover of a failed server– Transfer ownership of data items– Transfer leadership of transactions being coordinated
Optimization for Join Queries
• Join Query is a Read-only transaction– Remove the final Commit phase– Sub-transactions removed immediately after local
execution– RO transactions will not block other transactions
43
Concurrency Control
Data1
SubT(1)
SubT(1)5
6
Timestamp Ordering
Data2
SubT(2)
SubT(2)5
6
Read-Only Transaction
Data1
ROSubT(1)
SubT(1)5
6
Data2
SubT(2)5
TM
RW-TC
Data3
RO-TC
Join(Data1) TM TM
SubT(1)7
Read-Only Transaction (Cont.)
Data1
ROSubT(1)6
Data2
TM
RW-TC
Data3
RO-TC
Join(Data1) TM TM
Data2
SubT(1)7
Read-only Transaction (cont.)
Data2
TM
RW-TC
Data3
RO-TC
TM
NoneROSubT(2)6
Result
Data1
SubT(1)7