August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified...
-
Upload
donovan-allred -
Category
Documents
-
view
222 -
download
1
Transcript of August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified...
![Page 1: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/1.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
1
Practi ReplicationTowards a Unified Theory of Replication
Nalini Belaramani, Mike Dahlin, Lei Gao,
Amol Nayate, Arun Venkatramani, Praveen Yalangandula,
Jiandan Zheng
University of Texas at Austin
2nd August 2006
![Page 2: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/2.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
2
Replication Systems Galore
![Page 3: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/3.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 3
Server-Replication
Bayou [Terry et al 95]
• All servers have full set of data
• Nodes exchange updates made since previous synchronization
• Any server can exchange node with any other server
• Eventually nodes will agree on order of updates to data
Read
Write
Read
Write
![Page 4: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/4.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 4
Client-Server Model
Coda [Kistler et al 92]
• Data cached on client machine
• Callbacks established for notification of change
• Clients can get updates only from server
Read A Read A
Write A
Write A
A modified
![Page 5: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/5.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 5
File System For Planet Lab
• Data is replicated on geographically distributed nodes
• Updates need to be propagated from node to node
• Need to maintain strong consistency depending on application
• Some FS assume complete connectivity among nodes
![Page 6: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/6.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 6
Personal File System
Data at multiple locations • Desktop, server, laptop, pda, collegues laptop
Desirable properties• Download updates to only what I want• Do not necessarily have to connect to server for updates. • Some consistency guarantee
![Page 7: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/7.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 7
See a Similarity?
They are all data replication systems• Data is replicated on multiple nodes
They differ . . .• How much data is replicated at each node
• Who each node talks to
• What consistency to guarantee
So many existing replication systems• 14 systems in SOSP/OSDI in the last 10 years
• New applications, New domain -> build system from scratch
• Need characteristics from different systems -> build system from scratch
![Page 8: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/8.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 8
Motivation
What if we have a toolkit?• Supports mechanisms required for replication systems
• Mix and match mechanisms to build system for your requirements
• Pay for what you need.
We will have• A way to build better replication systems
• A better way to build replication systems
![Page 9: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/9.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 9
Our Work
3 properties to characterize replication systems• PR – Partial Replication
• AC – Arbitrary Consistency
• TI – Topology Independence
Mechanisms to support above properties• Practi prototype
• Subsumes existing replication systems
• Better trade-offs
Policy elegantly characterized• Policy as topology
• Concise declarative rules + configuration parameters
![Page 10: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/10.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 10
Grand Challenge
How can I convince you?• Better tradeoffs
• Build 14 OSDI/SOSP systems on prototype• With less that 1000 lines of code each
0
25
50
75
100
Tim
e(s
)
OfficeHomeHotelPlane
PR
AC
TI
1.6 2.0
81
1.75.1
81
1.7
66
81
1.7
Infinity
81
Clie
nt-
Ser
ver
Fu
ll R
ep
lica
tion
![Page 11: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/11.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 11
Outline
PRACTI Taxonomy
Achieving Practi• PRACTI prototype
• Evaluation
Making Policy Easier• Building on PRACTI
• Policy as topology
Ongoing and Future Work
![Page 12: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/12.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
12
Practi Taxonomy
Characterizing Replication Systems
![Page 13: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/13.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 13
PRACTI Taxonomy
TopologyIndependence
ArbitraryConsistency
PartialReplication
Any node can communicate with
any other node
Support consistency requirements of
application
Replicate any subset of data to any node
![Page 14: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/14.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 14
PRACTI Taxonomy
TopologyIndependence
ArbitraryConsistency
PartialReplication
Any node can communicate with
any other node
Support consistency requirements of
application
Replicate any subset of data to any node
Hierarchy, Client/Server (e.g. Coda, Hier-AFS)
DHT(e.g. CFS, PAST)
Object Replication (e.g. Ficus, Pangaea)
Server Replication (e.g. Bayou,
TACT)
![Page 15: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/15.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 15
PRACTI Taxonomy
TopologyIndependence
ArbitraryConsistency
PartialReplication
Any node can communicate with
any other node
Support consistency requirements of
application
Replicate any subset of data to any node
Hierarchy, Client/Server (e.g. Coda, Hier-AFS)
DHT(e.g. CFS, PAST)
Object Replication (e.g. Ficus, Pangaea)
Server Replication (e.g. Bayou,
TACT)
PRACTI
![Page 16: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/16.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 16
Why is Practi Hard?
project/module/aproject/module/b
project/module/z
project/module/bproject/module/aproject/module/b
time
…
Write module A
Write module B
project/module/bproject/module/aproject/module/b
project/module/b
Read module B
Read module A
![Page 17: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/17.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
17
Achieving Practi
Practi Prototype
Evaluation
![Page 18: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/18.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
18
Step 1: Peer-to-Peer Log Exchange
![Page 19: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/19.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 19
Peer to Peer Log Exchange [Patterson 97]
Log exchanges for updates• <objID, timestamp, body>
• Order of updates is maintained
Write = <objId, acceptStamp, BODY>
Node A
… …
Node B
Log
Checkpoint
Log
Checkpoint
![Page 20: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/20.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 20
Peer-to-Peer Log Exchange
…
Node 1
Node 2
Node 3
Node 4
…
…
……
Log exchanges for updatesTI: Pairwise exchange with any peerAC: Careful ordering of updates in logs
• Prefix property, causal/eventual consistency• Broad range of consistency [Yu and Vahdat 2002]
-PR: All nodes store all data, see all updates
![Page 21: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/21.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
21
Step 2: Separation of Metadata and Data Paths
![Page 22: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/22.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 22
Separate Data and Metadata Paths
Log exchange:• Ordered streams of metadata (invalidations)
• Invalidation : <object, timestamp>
• All nodes see all invalidations (logically)
Checkpoints track which objects are VALID
Nodes receive only bodies of interest
Node A
… …
Node B
Log
Checkpoint
Log
Checkpoint
Invalidation =<foo, <10, A>>
Write =<foo, <10, A>, body>
<foo, <10, A>, body>
Read foo
![Page 23: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/23.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 23
Separate Data and Metadata Paths
Separation of data and metadata paths:TI: Pairwise exchange with any peerAC: Careful ordering of updates in logs-PR: Partial replication of bodies
Full replication of invalidations
Node 1
body
…
…
…
…
…Invalidation stream
Node 2
Node 3
Node 4
![Page 24: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/24.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
24
Step 3: Summarize Unneeded Metadata
![Page 25: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/25.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 25
Summarize Unneeded Metadata
Imprecise invalidation• Summary of group of invalidations
• <objectSet, [start]*, [end]*>
• “One or more objects in objectSet were modified between start time and end time”
Conservative summary• ObjectSet may include superset of the targets
• Compact encoding of large number of invalidations
![Page 26: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/26.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 26
PI:<green, <10, A>>
Summarize unneeded Metadata (2)
Imprecise invalidations act as “placeholders”• In log and checkpoint
• Receiver knows that it is missing information
• Receiver blocks operations that depend on missing information
Node A
… …
Node B
Log
Checkpoint
Log
Read foo
II:<non-green, <11, A>, <13, A>>
subscribe for green
![Page 27: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/27.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 27
Summarize Unneeded Metadata (3)
Node 1
body
Invalidation stream
Node 2
Node 3
Node 4
…
…
…
…
…
Summarize unneeded metadata:TI: Pairwise exchange with any peerAC: Careful ordering of updates in logsPR: Partial replication of bodies
Partial replication of invalidations
![Page 28: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/28.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 28
Summary of Approach
3 key ideas• Peer-to-Peer log exchange
• Separation of data and metadata paths
• Summarize unneeded metadata
…
Node 1
Node 2
Node 3
Node 4
…
………
![Page 29: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/29.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 29
Summary of Approach
3 key ideas• Peer-to-Peer log exchange
• Separation of data and metadata paths
• Summarize unneeded metadata
Node 1
body
…
…
…
…
…invalidation stream
Node 2
Node 3
Node 4
![Page 30: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/30.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 30
Summary of Approach
3 key ideas• Peer-to-Peer log exchange
• Separation of data and metadata paths
• Summarize unneeded metadata
Node 1
body
invalidation stream
Node 2
Node 3
Node 4
…
…
…
…
…
![Page 31: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/31.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 31
Why is this better?
How to evaluate?• Compare with
• AC-TI server replication (e.g., Bayou, TACT)
• PR-AC client-server (e.g., Coda, NFS)
• PR-TI object replication (e.g., Ficus, Pangea)
• Key question• Does system provide significant advantages?
Prototype benchmarking• Java + Berkley DB
![Page 32: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/32.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 32
PRACTI v. Client/Server v. Full Replication
HOTEL
10 Mb/s1 Mb/s
50 Kb/s0 Mb/s
10 Mb/s
1 Mb/s
10 Mb/s
Storage Dirty Data Wireless Internet
Office server 1TB 100MB 10 Mb/s 100 Mb/s
Home desktop
10GB 10MB 10Mb/s 1Mb/s
Laptop 10GB 10MB 10Mb/s 50Kb/s (hotel)
Palmtop 100MB 100KB 1Mb/s NA
Internet
Internet
![Page 33: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/33.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 33
• Client-server (e.g., Coda)• Limited by network to server – Not an attractive solution
• Full Replication (e.g., Bayou)• Limited by fraction of shared data – Not a feasible solution
• PRACTI: • Up to order of magnitude better – Does what you want!
Synchronization Time
Palmtop <-> Laptop
0
25
50
75
100
Tim
e(s)
OfficeHomeHotelPlane
PR
AC
TI
1.6 2.0
81
1.75.1
81
1.7
66
81
1.7
Infinity
81
Clie
nt-S
erve
r
Ful
l Rep
licat
ion
![Page 34: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/34.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
34
Making Policy Easier
Building on Practi
Policy as Topology
![Page 35: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/35.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 35
Practi as a toolkit
Practi Prototype• Provides all 3 properties
• Subsumes existing replication systems
• Gives you the mechanisms
• Implement policy over PRACTI for different systems
Bayou
PRACTI Prototype
CodaPlanetLab
FSPersonal
FS. . .Policy
Mechanism
![Page 36: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/36.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 36
System Overview
PractiCore
Controller
Local Interface
Read()Write()Delete()
Requests Events
Requests from remote cores
Requests to remote cores
Inval Streams
Body Streams
Core – mechanisms• Asynchronous message passing
Controller - policy
Controller Interface
![Page 37: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/37.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 37
PRACTI Basics
Subscription Streams• 2 types of streams – Inval streams and body streams
• Every stream is associated with a subscription set
• Received Invals and bodies are forwarded to appropriate outgoing streams
Controller• Implements the policy
• Who to establish subscriptions to
• What to do in a read miss
• Who to send updates to
![Page 38: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/38.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 38
Controller InterfaceNotification of key events• Stream begin/end
• Invalidation arrival
• Body arrival
• Local read miss
• Became Precise
• Became Imprecise
Directs communication among cores• Subscribe to inval or body stream
• Request demand read body
Local housekeeping• Log garbage collection
• Cache replacement
![Page 39: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/39.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 39
Not all that simple yet
Need to take care of• Stream management, timeouts etc.
Some systems • Arrival of body or inval may require special processing
• Read misses occur and need to be dealt with
• Replication set based on priorities or access patterns
Policy - 39 methods to do magic• Can we make it easier?
![Page 40: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/40.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
40
Policy as Topology
Characterizing Policy Elegantly
![Page 41: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/41.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 41
Policy & TopologyOverlay topology question:• Among all the possible nodes I am connected to, who do I communicate
with?
Replication policy questions:• If data is not available locally, who do I contact?
• If data is locally updated• Who do I send updates?
• Who do I send invalidates?
• Whom to prefetch from?
~Topology
• Replication Set
• Consistency Semantics
~Configuration Parameters
![Page 42: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/42.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 42
Policy Revisited
Policy now separated into several dimensions• Propagation of updates -> Topology
• if there are updates, or if I have a local read miss, who do I contact?
• Consistency requirements -> Local interface• Whether we can read stale/invalid data. How stale?
• Replication of data -> config file• What subset of data does each node have
• Other policy essentials -> config file• How long is timeout?• How many times to retry?• How often do I GC logs?• How much storage to I have?• Conflict resolution?
![Page 43: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/43.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 43
Bayou Policy
Bayou Policy• Propagation of updates
• When connected to a neighbor, exchange updates for everything
-> establish update subscription from neighbor for “/*”
• Replication• Full Replication
• Local interface• Reads - only precise and valid objects
• On read miss• Should not happen
![Page 44: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/44.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
44
How to specify topology?
In concise rules
![Page 45: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/45.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 45
Overlog/P2Overlog [Boon et al 05]
• Declarative routing language based on Datalog
• Expressive and compact rules to specify topology
• Relational data model• Tuples
• Stored in tables, or transient
• Rules• Fired by combination of tuples and conditions
• A tuple is generated after a rule is fired
• Inter-node access : through remote table access or tuples
• Basic Syntax• <Action> :- <Event><Condition1><Condition2>…<ConditionN>
• @ - location specifier
• _ - wild card
P2• Runtime system for Overlog
• Parses Overlog and sets up data flows between nodes, etc.
![Page 46: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/46.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 46
Overlog 101Ping every neighbor periodically to find live neighbors.
/* Tables */
neighbor(X,Y)
liveNeighbor(X,Y)
/* generate ping event every PING_PERIOD seconds */
pg0 pingEvent@X(X) :- periodic@X(X, E, PING_PERIOD).
/* generate a ping request */
pg1 pingRequest@X(X, Y) :- pingEvent@X(X), neighbor@X(X, Y).
/* send reply to ping request */
pg2 pingReply@Y(Y, X) :- pingRequest@X(X, Y).
/* add to live neighbor table */
pg3 liveNeighbor@X(X, Y) :- pingReply@Y(Y, X).
![Page 47: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/47.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 47
Practi & P2 Overview
Wrapper
• handles conversion between overlog tuples and Practi requests and events
• takes care of reconnections and time-outs.
Overlog/P2
PractiCore
Controller Interface
Wrapper
Local Interface
Tuples
Requests Events
Overlog/P2
PractiCore
Wrapper
Local Interface
DataFlows
Local Read &
Writes
Streams
Controller Interface
![Page 48: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/48.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 48
Practi & Overlog
Implement policy with Overlog rules• Overlog tuples/table -> invoke mechanisms in Practi
• AddInvalSubscription / RemoveInvalSubscription
• AddBodySubscription / RemoveBodySubscription
• DemandRead
• Practi Events -> overlog tuples• LocalRead / LocalWrite / LocalReadMiss
• RecvInval
• Example:• Policy: Subscribe invalidates for /* from all neighbors
• Overlog rule:
AddInvalSubscription@X(X, N, SS) :- Neighbor@X(X, N), SS:= “/*”.
![Page 49: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/49.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 49
Bayou in OverlogBayou Policy• Replication
• Full Replication
• Local interface• Reads - only precise and valid objects
• On read miss• Should not happen
• Propagation of updates• When connected to a neighbor, exchange updates (anti-entropy)
-> establish update subscription from neighbor for “/*”
• In overlog:subscriptionSet("localhost:5000", "/*")
AddUpdateSubscription@X(X, Y, SS) :- liveNeighbor@X(X, Y),
subscriptionSet@X(X, SS)
![Page 50: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/50.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 50
Coda in Overlog
Policy for Coda (Single Server)• Replication
• Server: All data
• Client: HoardSet + currently being accessed
• Local Interface (Client)
• Reads - only precise & valid objects (blocks otherwise)
• Writes - to locally valid objects (otherwise conflict)
• ReadMiss
• Get the object from the server, and establish callback:
• Callback: establish a inval subscription for the object.
• Propagation of Updates
• Client sends updates to Server
• Server: Break callback for all other clients who have the obj
• To break callback: remove obj from inval subscription stream
• Hoarding
• Periodically, fetch all (invalid) objects and establish callbacks on them
![Page 51: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/51.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 51
Coda in Overlog 2• Client: On Read Miss
• Get Obj from Server• DemandRead@X(X, S, Obj, offset, length) :-
localReadMiss@X(X, Obj, offset, length), Server@X(X, S), isConnected@X(X, V), V == 1.
• Establish Callback• AddInvalSubscription@X(X, S, Obj) :-
localReadMiss@X(X, Obj, _, _), Server@X(X, S), isConnected@X(X, V), V == 1.
• Set up Subscription for Updates• AddInvalSubscription@S(S, X, Obj) :-
localReadMiss@X(X, Obj, _, _), Server@X(X, S), isConnected@X(X, V), V == 1.• AddBodySubscription@S(S, X Obj) :-
localReadMiss@X(X, Obj, _, _), Server@X(X, S), isConnected@X(X, V), V == 1.
• Server: On receiving Update from Client• Break callbacks for other clients
• removeOutgoingInvalSubscription@X(X, C2, Obj) :- receivedInval@X(X, C1, O, _, _, _, _, _),
establishedOutgoingInvalSubscriptions@X(X, C2, O),C1 != C2.
![Page 52: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/52.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 52
Grand Challenge
How can I convince you?• Better Tradeoffs
• Build 14 OSDI/SOSP Systems on Prototype• Experience so far
• Bayou – 1 rule + 10 config parameters
• CODA – 13 rules + 10 config parameters
0
25
50
75
100
Tim
e(s
)
OfficeHomeHotelPlane
PR
AC
TI
1.6 2.0
81
1.75.1
81
1.7
66
81
1.7
Infinity
81
Clie
nt-
Ser
ver
Fu
ll R
ep
lica
tion
![Page 53: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/53.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 53
Overlog/P2 – not quite perfectNo guarantee of atomicity, or ordering among rules.
Difficult to specify access-based policies.
Difficult to specify policies which store information in the replicated object itself
![Page 54: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/54.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin
54
Ongoing and Future Work
![Page 55: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/55.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 55
Ongoing and Future Work
To make a dream a reality• Overlog + Practi integration
• 14 OSDI/SOSP Systems
• NFS interface
• New Systems: Personal File System, Enterprise File System
• Scalibility -- 1000s of nodes
• Security
• …
![Page 56: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/56.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 56
Conclusions
Identified 3 properties which can be used to classify existing replication systems.
A way to build better replication systems
• First replication architecture which provides all three properties
• Subsumes existing systems
• Exposes new points in the design space
A better way to build replication systems
• Policy elegantly characterized
• Policy as topology and configuration parameters
• Policy can be written as concise rules + config parameters
![Page 57: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/57.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 57
Thank You
Towards a unified replication architecture
http://www.cs.utexas.edu/~dahlin/unifiedReplication
![Page 58: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/58.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 58
![Page 59: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/59.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 59
![Page 60: August 2nd, 2006Department of Computer Sciences, UT Austin 1 Practi Replication Towards a Unified Theory of Replication Nalini Belaramani, Mike Dahlin,](https://reader031.fdocuments.net/reader031/viewer/2022020115/551c44495503469d6a8b45ca/html5/thumbnails/60.jpg)
August 2nd, 2006 Department of Computer Sciences, UT Austin 60
Why is this better?• Subsumes existing systems
• Client-Server, server, object replication, P2P, quorums, ..
• Exposes new points design space
• Makes it easier to build new systems
• Builds better systems