Data Currency in Replicated DHTs

25
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu

description

Data Currency in Replicated DHTs. Reza Akbarinia , Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu. Motivation. P2P data sharing systems Enable large amount of users to share a massive number of files - PowerPoint PPT Presentation

Transcript of Data Currency in Replicated DHTs

Page 1: Data Currency in Replicated DHTs

Data Currency in Replicated DHTs

Reza Akbarinia, Esther Pacitti and Patrick ValduriezUniversity of Nantes, France, INIRA

ACM SIGMOD 2007

Presenter Jerry Wu

Page 2: Data Currency in Replicated DHTs

Motivation

• P2P data sharing systems– Enable large amount of users to share a massive

number of files

– Query Reply Send request Download• Message forwarding on these systems

– Flooding : KaZaA, Gnutella– DHT : CAN, Chord, Pastry, … etc.

Page 3: Data Currency in Replicated DHTs

Distributed Hash Table (DHT)

• Use hash functions to locate files– h(meta data) = k (for identification)– g(k) = k1 (for routing)

A

B F

D

EC

MetaFreeLoop.mp3

g(k)=k1 (A)

U

k1

Page 4: Data Currency in Replicated DHTs

k1

Data Replication

• What if node A fails?• Duplicate several copies

A

B F

D

EC

g(h(FreeLoop.mp3))=k1 (A)

U

g2(h(FreeLoop.mp3))=k2 (D)

g3(h(FreeLoop.mp3))=k3 (E)

MetaFreeLoop.mp3

k2

k3

Page 5: Data Currency in Replicated DHTs

Basic Operations

• putH(meta key k, File D)– Insert a file into the DHT

• getH(meta key k)– Retrieve the file from the DHT

H : { g(k , D) | g is used as a hash function}|H| : The replication level of the system

Each file will be stored at |H| peers

Page 6: Data Currency in Replicated DHTs

Additional Problems

• If the owner can modify the data …

• The nature of P2P system– Peers can join and leave dynamically

• Update while some peers depart and rejoins later?

• Concurrent update?

Page 7: Data Currency in Replicated DHTs

Solution

• If we have a timestamp for each transaction of update/insert ?– The currency of the file is judged by its

timestamp– FileX = File + timestamp– Put (k, FileX) instead of (k, File) into the

DHT!!• Then we know the freshness of the file• Only the latest update can succeed

Page 8: Data Currency in Replicated DHTs

How Can We Get A Timestamp?

• KTS (Key-based Timestamp Service)– Issue timestamps for each transaction– gen_ts(key k)

• Generate a timestamp w.r.t. key k– last_ts(key k)

• Return the finally issued timestamp

Page 9: Data Currency in Replicated DHTs

The New DHT Functions

• Based on the KTS service• Insert(key k, FileX D, Hash function set Hr)

– Insert or update a file with identity key k into the DHT

• Retrieve(k, Hr)– Retrieve the latest copy of the file with identity

key k

Page 10: Data Currency in Replicated DHTs

Insert A File

B F

G

EC

g(k)=k1 (A)

U

g2(k)=k2 (C)

InsertP.avi

k2

k1

D

Hh(P.avi)=k

KTSTimestamp

Service

gen_ts(k)=tA

A

putg(k, (tA, P.avi))

putg2(k, (tA, P.avi))

Page 11: Data Currency in Replicated DHTs

Retrieve A File

B F

G

EC

g(k)=k1 (A)

U

g2(k)=k2 (C)

GetP.avi

k2

k1

D

Hh(P.avi)=k

KTSTimestamp

Service

last_ts(k)=tA

A

getg(k)

getg2(k)

(t0, P.avi)

(tA, P.avi)

Page 12: Data Currency in Replicated DHTs

• If( tsx > ts0) then– Update File D

Update A File

putg(k, (tsx, File D))Key TS File

k ts0 File D (P.avi)

k1 ts1 File D1 (X.mp3)

k2 ts2 File D2 (Y.m4v)

k3 ts3 File D3 (Z.tar)

Page 13: Data Currency in Replicated DHTs

Retrieval Cost Analysis• C = Ckts + N * Cret

• Ckts = Cret = O(logn), n = # of peers• Let X be the random variable of N

• N : Number of retries to get the latest copy• pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1

• |Hr| = number of replicas of the system

Page 14: Data Currency in Replicated DHTs

Retrieval Cost Analysis

• Then, how can we get a timestamp?– Key-based Timestamp Service (KTS)

Page 15: Data Currency in Replicated DHTs

The KTS Service• Use the same DHT but with different hash

function hts

1

2

Hash Table Req (k, hts)

Req(k, hts)=pTimeStampRequest (k)

Hash Table Req(k, hts)

3

4

Page 16: Data Currency in Replicated DHTs

The KTS Service• How can node p generate timestamps

w.r.t. key k?– Receive the counters from a leaving peer

• DHT system will distribute the load of the leaving peer to its neighbors

• Direct initialization

– Send a file request w.r.t. key k to obtain the latest timestamp• Take place if the leaving peer fails• Indirect initialization

Page 17: Data Currency in Replicated DHTs

The KTS Service

• Indirect initialization– The probability to fail pf

– pf = (1-pt)|H|

– If pt = 30%, |H|=13, then pf < 1%

• After initialization, increase timestamp on every timestamp request

Page 18: Data Currency in Replicated DHTs

Experiments And Simulations

• Environments– 64 node cluster– 10000 nodes on the SimJava platform

• Metrics– Response time : Time to return a current

replica in response to a query– Communication cost : # of messages to send

to answer a query

Page 19: Data Currency in Replicated DHTs

The Competitor - BRICKS

• Use a function to map key k to multiple keys (k1, k2, k3, k4, …)

• Each replica has a version number– Concurrent update problems– Must extract all replicas to find the newest

one

Page 20: Data Currency in Replicated DHTs

Response Time VS DHT Size

Page 21: Data Currency in Replicated DHTs

Communication Cost VS DHT Size

Page 22: Data Currency in Replicated DHTs

Response Time VS # of Replica

Page 23: Data Currency in Replicated DHTs

Failure Rate VS Response Time

Page 24: Data Currency in Replicated DHTs

Conclusion

• Pros– Use DHT to provide timestamp service is smart!– Consider the concurrent update problem– Easy to apply on exiting DHTs

• Cons– KTS service can raise additional communication

overhead

Page 25: Data Currency in Replicated DHTs

Thank You