Data Currency in Replicated DHTs
description
Transcript of Data Currency in Replicated DHTs
Data Currency in Replicated DHTs
Reza Akbarinia, Esther Pacitti and Patrick ValduriezUniversity of Nantes, France, INIRA
ACM SIGMOD 2007
Presenter Jerry Wu
Motivation
• P2P data sharing systems– Enable large amount of users to share a massive
number of files
– Query Reply Send request Download• Message forwarding on these systems
– Flooding : KaZaA, Gnutella– DHT : CAN, Chord, Pastry, … etc.
Distributed Hash Table (DHT)
• Use hash functions to locate files– h(meta data) = k (for identification)– g(k) = k1 (for routing)
A
B F
D
EC
MetaFreeLoop.mp3
g(k)=k1 (A)
U
k1
k1
Data Replication
• What if node A fails?• Duplicate several copies
A
B F
D
EC
g(h(FreeLoop.mp3))=k1 (A)
U
g2(h(FreeLoop.mp3))=k2 (D)
g3(h(FreeLoop.mp3))=k3 (E)
MetaFreeLoop.mp3
k2
k3
Basic Operations
• putH(meta key k, File D)– Insert a file into the DHT
• getH(meta key k)– Retrieve the file from the DHT
H : { g(k , D) | g is used as a hash function}|H| : The replication level of the system
Each file will be stored at |H| peers
Additional Problems
• If the owner can modify the data …
• The nature of P2P system– Peers can join and leave dynamically
• Update while some peers depart and rejoins later?
• Concurrent update?
Solution
• If we have a timestamp for each transaction of update/insert ?– The currency of the file is judged by its
timestamp– FileX = File + timestamp– Put (k, FileX) instead of (k, File) into the
DHT!!• Then we know the freshness of the file• Only the latest update can succeed
How Can We Get A Timestamp?
• KTS (Key-based Timestamp Service)– Issue timestamps for each transaction– gen_ts(key k)
• Generate a timestamp w.r.t. key k– last_ts(key k)
• Return the finally issued timestamp
The New DHT Functions
• Based on the KTS service• Insert(key k, FileX D, Hash function set Hr)
– Insert or update a file with identity key k into the DHT
• Retrieve(k, Hr)– Retrieve the latest copy of the file with identity
key k
Insert A File
B F
G
EC
g(k)=k1 (A)
U
g2(k)=k2 (C)
InsertP.avi
k2
k1
D
Hh(P.avi)=k
KTSTimestamp
Service
gen_ts(k)=tA
A
putg(k, (tA, P.avi))
putg2(k, (tA, P.avi))
Retrieve A File
B F
G
EC
g(k)=k1 (A)
U
g2(k)=k2 (C)
GetP.avi
k2
k1
D
Hh(P.avi)=k
KTSTimestamp
Service
last_ts(k)=tA
A
getg(k)
getg2(k)
(t0, P.avi)
(tA, P.avi)
• If( tsx > ts0) then– Update File D
Update A File
putg(k, (tsx, File D))Key TS File
k ts0 File D (P.avi)
k1 ts1 File D1 (X.mp3)
k2 ts2 File D2 (Y.m4v)
k3 ts3 File D3 (Z.tar)
Retrieval Cost Analysis• C = Ckts + N * Cret
• Ckts = Cret = O(logn), n = # of peers• Let X be the random variable of N
• N : Number of retries to get the latest copy• pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1
• |Hr| = number of replicas of the system
Retrieval Cost Analysis
• Then, how can we get a timestamp?– Key-based Timestamp Service (KTS)
The KTS Service• Use the same DHT but with different hash
function hts
1
2
Hash Table Req (k, hts)
Req(k, hts)=pTimeStampRequest (k)
Hash Table Req(k, hts)
3
4
The KTS Service• How can node p generate timestamps
w.r.t. key k?– Receive the counters from a leaving peer
• DHT system will distribute the load of the leaving peer to its neighbors
• Direct initialization
– Send a file request w.r.t. key k to obtain the latest timestamp• Take place if the leaving peer fails• Indirect initialization
The KTS Service
• Indirect initialization– The probability to fail pf
– pf = (1-pt)|H|
– If pt = 30%, |H|=13, then pf < 1%
• After initialization, increase timestamp on every timestamp request
Experiments And Simulations
• Environments– 64 node cluster– 10000 nodes on the SimJava platform
• Metrics– Response time : Time to return a current
replica in response to a query– Communication cost : # of messages to send
to answer a query
The Competitor - BRICKS
• Use a function to map key k to multiple keys (k1, k2, k3, k4, …)
• Each replica has a version number– Concurrent update problems– Must extract all replicas to find the newest
one
Response Time VS DHT Size
Communication Cost VS DHT Size
Response Time VS # of Replica
Failure Rate VS Response Time
Conclusion
• Pros– Use DHT to provide timestamp service is smart!– Consider the concurrent update problem– Easy to apply on exiting DHTs
• Cons– KTS service can raise additional communication
overhead
Thank You