Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of...
Transcript of Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of...
![Page 1: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/1.jpg)
15-319 / 15-619Cloud Computing
Recitation 9
Mar 19, 2019
1
![Page 2: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/2.jpg)
Overview
● Last week’s reflection
○ Project 3.2
○ OLI Unit 4 - Module 14 (Storage)
○ Quiz 7
● This week’s schedule
○ Project 3.3
○ OLI Unit 4 - Modules 15, 16 & 17
○ Quiz 8 due on Friday, March 22nd
● Team Project, Twitter Analytics
○ Q2M and Q2H correctness due on 3/24
○ Phase 1 due, Mar 31
2
![Page 3: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/3.jpg)
Last Week● OLI : Module 14 - Cloud Storage
○ Quiz 7
● Project 3.2
○ Social Networking Timeline with Heterogeneous Backends
■ MySQL
■ Neo4j
■ MongoDB
■ Choosing Databases
● Multi-Threaded Online Programming Exercise on Cloud9
3
![Page 4: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/4.jpg)
This Week● OLI : Module 15, 16 & 17
○ Quiz 8 - Friday, March 22
● Project 3.3 - Sunday, March 24
○ Task 1: Implement a Strong Consistency Model for
distributed data stores
○ Task 2: Implement a Strong Consistency Model
cross-region data stores
○ Bonus task: Implement an Eventual Consistency Model
● Team Project, Twitter Analytics - Sunday, March 24
○ Q2M and Q2H correctness
● Online Programming Exercise - Scheduling
4
![Page 5: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/5.jpg)
Conceptual Topics - OLI Content
OLI UNIT 4: Cloud Storage● Module 15: Case Studies: Distributed File System
○ HDFS○ Ceph
● Module 16: Case Studies: NoSQL Databases● Module 17: Case Studies: Cloud Object Storage● Quiz 8
○ DUE on Friday, March 22nd
5
![Page 6: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/6.jpg)
Individual Projects
● DONE
○ P3.1: Files vs Databases - comparison and Usage of flat
files, MySQL, Redis, and HBase
○ NoSQL Primer
○ HBase Basics Primer
● Done
○ P3.2: Social networking with heterogeneous backends
○ MongoDB Primer
● Now
○ P3.3: Replication and Consistency models
○ Introduction to multithreaded programming in Java
○ Introduction to consistency models6
![Page 7: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/7.jpg)
Scale of Data is Growing
International Data Corporation's (IDC) Digital Universe Study predicts an increase in the amount of data created globally from ● 16 zettabytes in 2016
to ● 160 zettabytes in 2025.
7
Guo H. Big Earth data: A new frontier in Earth and information sciences[J]. Big Earth Data, 2017, 1(1-2): 4-20.
![Page 8: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/8.jpg)
Users are Global
8
~26ms
~14ms
● Speed of Light (≈3.00×108 m/s)● Inherent latencies
Pittsburgh
Moscow
San Francisco
![Page 9: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/9.jpg)
● Typical end-to-end latency
○ The client sends the request to the server
■ Network latency
○ The backend processes the request and sends
the response
■ Overhead of fetching and processing data
from the backend
■ Network latency
○ The client receives the response
Typical End-To-End Latency
9
![Page 10: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/10.jpg)
Latency with a Single Backend
10
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage
~20ms ~40ms
~320ms
Client Statistics:Min Latency: 20msMax Latency: 320msAverage Latency: 126ms
![Page 11: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/11.jpg)
Replicate the Data Globally
11
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 2: Europe Central
~40ms
~20ms
Client Statistics:Min Latency: 20msMax Latency: 40msAverage Latency: 26.6ms
![Page 12: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/12.jpg)
Replicate the Data Close to Users
12
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 2: Europe Central
~20ms
~20ms
Client Statistics:Min Latency: 20msMax Latency: 20msAverage Latency: 20ms
Backend Storage 3: USA East
![Page 13: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/13.jpg)
● As you can see, by adding replicas to strategic
locations in the world, we can significantly reduce
the latency seen by our global clients
● Each added datacenter decreases the average
latency
● But how about the cost?
Replication
13
![Page 14: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/14.jpg)
What If We Continue to Replicate?
14
Client Statistics:Min Latency: ??Max Latency: ??Average Latency: ??
Cost: ?????
We have to consider cost as well as data consistency across replicas, which increases the latency for writes.
![Page 15: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/15.jpg)
Replication READ
15
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 3: Europe Central
~20ms
~20ms
Read Operation:
Min Latency: 20msMax Latency: 20msAverage Latency: 20ms
Backend Storage 2: USA East
![Page 16: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/16.jpg)
Replication WRITE
16
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
Backend Storage 3: Europe Central
~20ms
Write Operation:
Latency for Client 2 = 20ms +MAX(40ms, 240ms)= 260ms
All the clients suffer fromlong latency
Backend Storage 2: USA East
~40ms~240ms
![Page 17: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/17.jpg)
● Read operations are very fast! ○ All clients have a replica close to them to
access● Write requests are quite slow
○ Write requests must update all the replicas○ If multiple write requests for a certain key,
then they may have to wait for each other to complete
Replication Reads and Writes
17
![Page 18: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/18.jpg)
● Duplicate the data across multiple instances● Advantages
○ Low latency for reads○ Reduce the workload of a single backend server
(Load balance for hot keys) ○ Handle failures of nodes (High availability)
● Disadvantages○ Requires more storage capacity and cost○ Updates are slower○ Changes must reflect on all datastores either
instantly or eventually (Data Consistency)
Pros and Cons of Replication
18
![Page 19: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/19.jpg)
Data Consistency Becomes Necessary
● Data consistency across replicas is important○ Five consistency levels:
Strict, Strong (Linearizability), Sequential, Causal
and Eventual Consistency
● This week’s task: Implement Strong Consistency○ All datastores must return the same value for a key
at all times
○ The order in which the values are updated must
be preserved at all replicas
● Bonus: Implement Eventual Consistency19
![Page 20: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/20.jpg)
Choosing a Consistency LevelBad Example
20
Account Balance
xxxxx-4437 $100
![Page 21: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/21.jpg)
Choosing a Consistency LevelBad Example
21
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
![Page 22: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/22.jpg)
Choosing a Consistency LevelBad Example
22
Account Balance
xxxxx-4437 $0
$100
$100
Bank lost $100
![Page 23: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/23.jpg)
Choosing a Consistency LevelGood Example
23
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
![Page 24: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/24.jpg)
Choosing a Consistency LevelGood Example
24
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
![Page 25: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/25.jpg)
Choosing a Consistency LevelGood Example
25
Account Balance
xxxxx-4437 $0
$100
$0
![Page 26: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/26.jpg)
P3.3: Consistency Models
26
Tradeoff: Consistency vs. Latency● Strict● Strong● Sequential● Causal● Eventual
vs.
![Page 27: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/27.jpg)
P3.3 Task 1: Strong Consistency
27
Coordinator:
● A request router that
routes the web requests
from the clients to
datacenter
● Preserves the order of
both READ&WRITE
requests
Datastore:
● The actual backend
storage that persists
collections of data
![Page 28: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/28.jpg)
P3.3 Task 1: Strong Consistency
28
Single PUT request for key ‘X’
● Block all GET for key ‘X’
until all datastores are
updated
● GET requests for a
different key ‘Y’ should
not be blocked
Multiple PUT requests for ‘X’
● Resolved in order of their
timestamp received from
the Truetime Server.
● Any GET request in
between 2 PUTs must
return the first PUT value
![Page 29: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/29.jpg)
P3.3 Task 2: ArchitectureGlobal Coordinators and Data Stores
us-westus-east
Singapore
DCI
coordinator datacenter
DCI
coordinator datacenterDCI
coordinator datacenter
29
![Page 30: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/30.jpg)
P3.3 Tasks 1 & 2: Strong Consistency
30
● Every request has a global timestamp order
where timestamp is issued by a Truetime Server.
● Operations must be ordered by the timestamps
Requirement: At any given point of time, all clients
should read the same data from any datacenter
replica
![Page 31: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/31.jpg)
Task 2 Workflow and Example
• Launch a total of 8 machines (3 data centers, 3 coordinators, 1
truetime server and 1 client).
• All machines should be launched in the US East region.
We will simulate global latencies for you.
• The “US East” here has nothing to do with
the simulated location of datacenters
and coordinators in the project.
• Your task: implement the code
for the Coordinators and Datastores
31
![Page 32: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/32.jpg)
P3.3 Task 2: Architecture
32
![Page 33: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/33.jpg)
PRECOMMIT
33
● This API method will contact the Data center of a given region, and notify it that a PUT request is being serviced for the specified key, starting at the specified timestamp.
![Page 34: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/34.jpg)
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
34
TrueTime Server
put?key=X&value=1
![Page 35: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/35.jpg)
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
35
TrueTime Server
put?key=X&value=1
KeyValueLib.getTime()
![Page 36: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/36.jpg)
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
36
TrueTime Server
put?key=X&value=1
precommit?key=X×tamp=1
![Page 37: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/37.jpg)
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
37
TrueTime Server
put?key=X&value=1
PUT(REGIONAL-DNS, "X", "1", 1, "strong")
![Page 38: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/38.jpg)
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
38
TrueTime Server
put?key=X&value=1
Response back
![Page 39: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/39.jpg)
P3.3: Eventual Consistency (Bonus)
39
● Write requests are performed in the order received by the local coordinator○ Operations may not be blocked for replica
consensus (no communication between servers across region)
● Clients that request data may receive multiple versions of the data, or stale data○ Problems left for the application owner to
resolve
![Page 40: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/40.jpg)
Hints - PRECOMMIT● In strong consistency, “PRECOMMIT” should be
useful to help you lock requests because they are
able to communicate with Data centers.
● Locking needs to be performed on Data centers.
● Lock by the key across all the Data centers in
strong consistency
● Remember to update both KeyValueStore.java
and Coordinator.java in Eventual Consistency
40
![Page 41: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/41.jpg)
● Read the two primers (PLEASE!)
● Consider the differences between the 2
consistency models before writing code
● Think about possible race conditions
● Read the hints in the writeup and skeleton
code carefully
● Don’t modify any class except
Coordinator.java and KeyValueStore.java
Suggestions
41
![Page 42: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/42.jpg)
How to Run Your Program
● Run “./copy_code_to_instances” in client instance to copy your
code to servers on each of the Data centers instance,
Coordinators instance.
● Run “./start_servers” in the client instance to start the servers
on each of the data center instances, coordinator instances
and the truetime server instance.
● Use “./consistency_checker strong”, or “./consistency_checker
eventual” to test your implementation of each consistency.
(Our grader uses the same checker)
● If you want to test one simple PUT/GET request, you could
directly send the request to Data centers or Coordinators.
42
![Page 43: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/43.jpg)
Start early!
43
![Page 44: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/44.jpg)
44
TEAM PROJECTTwitter Data Analytics
![Page 45: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/45.jpg)
Team Project
Twitter Analytics Web Service• Given ~1TB of Twitter data• Build a performant web service
to analyze tweets• Explore web frameworks• Explore and optimize database systems
Web-tier Storage-tier
![Page 46: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/46.jpg)
Twitter Analytics System Architecture
● Web server architectures● Dealing with large scale real world tweet data● HBase and MySQL optimization 35
GCP Dataproc, Azure HDInsight, or Amazon EMR
Web-tier Storage-tier
![Page 47: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/47.jpg)
Suggested Tasks for Phase 1Phase 1 weeks Tasks Deadline
Week 1● 2/25
● Team meeting● Writeup● Complete Q1 code & achieve correctness● Q2 Schema, think about ETL
● Q1 Checkpoint due on 3/3● Checkpoint Report due on 3/3
Week 2● 3/4
● Q1 target reached● Q2 ETL & Initial schema design completed
● Q1 final target due on 3/10
Week 3● Spring
Break
● Take a break or make progress (up to your team)
Week 4● 3/18
● Achieve correctness for both Q2 MySQL, Q2 HBase & basic throughput
● Q2 MySQL Checkpoint due on 3/24● Q2 HBase Checkpoint due on 3/24
Week 5● 3/25
● Optimizations to achieve target throughputs for Q2 MySQL and Q2 HBase
● Q2 MySQL final target due on 3/31● Q2 HBase final target due on 3/31
47
![Page 48: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/48.jpg)
Reminders on penalties● M family instances only; must be ≤ large type ✓ m5.large, m5.medium, m4.large ✗ m5.2xlarge, m3.medium, t2.micro
● Only General Purpose (gp2) SSDs are allowed for storage
○ m5d (which uses NVMe storage) is forbidden
● Other types are allowed (e.g., t2.micro) but only for testing
○ Using these for any submissions = 100% penalty
● $0.85/hour applies to every submission, not just the livetest
● AWS endpoints only (EC2/ELB).
48
![Page 49: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/49.jpg)
Budget● AWS budget of $45 for Phase 1
● Your web service should cost at most $0.85 per hour
○ Including: EC2 cost, EBS cost, ELB cost
○ Excluding: data transfer, EMR
● Even if you use spot instances, we will calculate your cost
using the on-demand instance price
● Q2 target RPS: 12000 for both MySQL and HBase
49
![Page 50: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/50.jpg)
Query 2: Tips1. Libraries can be bottlenecks
2. MySQL connection configuration
3. MySQL warmup
4. Response formatting: be careful with \n \t
5. Understand the three types of scores completely.
50
![Page 51: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/51.jpg)
Query 2: More Tips1. Consider doing ETL on GCP/Azure to save AWS budget
2. Be careful about encoding 😁 (use utf8mb4 in MySQL)
3. Pre-compute as much as possible
4. ETL can be expensive, so read the write-up carefully
51
![Page 52: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/52.jpg)
Piazza FAQ1. Search before asking a question
2. Post public questions when possible
https://piazza.com/class/jqsp37y8m572vm?cid=1336
52
![Page 53: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/53.jpg)
This Week’s Deadlines
• Quiz 8:
Due: Friday, March 22nd, 2019 11:59PM ET
• Complete OPE task scheduling
Due: This week
• Project 3.3: Consistency
Due: Sunday, March 24th, 2019 11:59PM ET
• Team Project Phase 1 Q2M and Q2H Correctness
Due: Sunday, March 24th, 2019 11:59PM ET53
![Page 54: Cloud Computing 15-319 / 15-619msakr/15619-s19/recitations/S19_Recitation09.pdf · Pros and Cons of Replication 18. Data Consistency Becomes Necessary Data consistency across replicas](https://reader033.fdocuments.net/reader033/viewer/2022052804/5f8e072772c1067c865bf4b9/html5/thumbnails/54.jpg)
54