Shared Memory Emulation UCSDcircuit.ucsd.edu/~yhk/ece293-aut16/pdfs/aut16-wang.pdf · writer reader...

Information‐Theoretic Lower Bounds for Shared Memory Emulation

Zhiying WangUniversity of California, Irvine

Joint with Viveck R. Cadambe (PSU), Nancy Lynch (MIT)

Distributed Algorithms• Algorithms that are supposed to work in distributed networks, or on multiprocessors

• ~30 years research

• Applications– Machine learning– Networking– Multiprocessor programming

Timing• Synchronous

– Processes communicate every round– Requires “clock”

• Asynchronous: take action at any time

Changing Information in Distributed System

• Modern key value stores ‐ Amazon Dynamo DB, Couch DB, Apache Cassandra DB, Google Spanner, Voldermort DB …

• Used for transactions, reservation systems, multi‐player gaming, social networks, news feeds, distributed computing tasks…

• High frequency trading• Theoretical: relates to shared memory,

connects two fundamental communication models of asynchronous systems

Images source: google image 4

A Motivating Example• N servers • Tolerate f server failures• A message generated every time slot• Channel: delay between 0 and T‐1• Encode: a function of its own messages• Decode: from any N‐f servers get the latest

common message or something later

servers

A Motivating Example• N servers • Tolerate f server failures• A message generated every time slot• Channel: delay between 0 and T‐1, say T=2• Encode: a function of its own messages• Decode: from any N‐f servers get the latest

common message or something later• At time slot t

servers

Server 1 Server 2 Server 30, 1,…, t‐1, 0, 1,…, t‐1,

t0, 1,…, t‐1,t

A Motivating Example• T=2• At time t, only need to consider t‐1 and t• Call them message 1 and message 2• In general, T represents the concurrency level• Q: What is the storage cost?

Solution 1: Replication• Storage size = log|V|• Decode “latest” among the connected servers

Msg 1 Msg 1 Msg 1 Msg 1

Msg 2 Msg 2

Msg 1Msg 1

N=7, N‐f=3, T=2

Solution 2: Static Coding• Recover the message from any 3 pieces• Every server only stores log|V| × 1/3!• Examples: Facebook, Windows Azure…

Original message zx y

Coded message zx yParityx+9y+4z x+2y+3z x+3y+7z x+8y+z

Solution 2: Static Coding• However, need to keep history (Msg 1)• (Worst‐case) storage size = log|V| × 2/3 • Decode “Latest common”1 2 3 4 5

1/3 1/3 1/3 1/3

1/3 1/3

N=7, N‐f=3, T=2

Our Solution: Multi‐Version Code• Storage size = log|V| × 1/2 • Decode “latest common or something later”

1/2 1/2 1/2 1/2

1/2 1/2

1 2 3 4 5 6 7

N=7, N‐f=3, T=2

[Wang, Cadambe, ISIT 2014, Allerton 2014, arxiv 2015] 11

More Generally• Messages generated (write invocated) at any time point

• Channel delays can be arbitrary and unbounded

• Can design write and read protocols• Multiple write and read clients

Timing• Synchronous

– Processes communicate every round– Requires “clock”

• Asynchronous: take action at any time– Message passing model– Shared memory model

Message Passing Model

writer

reader

serversmessage

reader

message

messagemessage

shared memory

Shared Memory Model

writer

reader

message

reader

message

messagemessage

shared memory

messagemessagemessage

Emulate Shared Memory In Message Passing Model

shared memory

Message passing

Emulation algorithm

any algorithm for shared memory model => an algorithm for message passing model

In This TalkInformation‐theoretic lower bounds on the storage cost of the shared memory emulation algorithms

• Problem settings• First lower bound for any algorithm• Second lower bound for a restricted class of algorithms

Emulation of Shared Memoryin Message‐passing Networks

• Setup– Distributed, unreliable, asynchronous– Processes: N servers, any number of write and read clients– Failures: up to f servers, arbitrary number of clients can crash

– Concurrent access allowed• Requirements

– Atomicity– Liveness

Atomicitywrite m1 write m2

read m1

read m1 read m1 read m2 read m2

read m1 read m2

read m2

✓write m3

✗write m1 write m2

read m2read m0

write m3

✗write m1 write m2

read m2 read m1 [Lamport 86] 20

Liveness

write m2write m1

write m3

write m4

read m1 read m2✗

write m1 write m2

read m1

read m1 read m1 read m2 read m2

read m1 read m2

read m2

✓write m3

Challenges • Asynchronous communication

– A process has no global view– Atomicity requires a global ordering

• Unbounded delay– A very long delay?– A failure?

• Failures

What should the client do?• Write to 1 server?

writer

message

What should the client do?• Write to all servers?

writer

message

message25

What should the client do?• Write to or read from a “quorum”

– A quorum = any subset of size N‐f

N=3, f=1

writer

message

What should the client do?• Write to or read from a “quorum”

– A quorum = any subset of size N‐f

N=3, f=1

readerwriter

message

Other Concerns• Add “tags” to messages• Readers “write back”

✗write m1write m1 write m2write m2

read m1read m1 read m2read m2 read m1read m1

Replication Based Algorithm• Server stores the latest message it received

– According to the tag

• Can recover what has been written– When the write and read quorums have an overlap

[ABD, Attiya, Bar‐Noy, Dolev, 1990]

1 1 1 1 0 0tagcode

Improve by Coding• Coding based algorithms

– [HGR 04], [AJL 05], [CT 05, 06], [DGL 08], [DKLMSV 13], [ACDV 14], [CLMP 14]

– (Worst‐case) Storage cost is proportional to number of concurrent active writes at any time point, v

– Per‐server storage cost

• Replication based algorithms [ABD]– Per‐server storage cost =

Storage Cost Independent of v?

Writer 1 deliver some piecesMsg 1 = (x,y,z)

x y x+y+z

z’x+x’ y x+y+zWriter 2 deliver some piecesMsg 2 = (x’,y’,z’)

z+z’x+x’ y+y’ x+y+z+x’+y’+z’

Both writers deliver all pieces

zx y x+y+zTwo writers decide to store msg 1Works even if writer 1 fails

Baseline Lower BoundTheorem.

• Analogous to Singleton bound in coding theory

N = 4, f=1

x+y+zzx y

0 2 4 6 8 10 12 14 16 18 200

ABD algorithm

Baseline lower bound First lower bound

Erasure coding based algorithms

Storage Co

Number of concurrent writes

Summary of Results

[Cadambe, W, Lynch, 2016]

0 2 4 6 8 10 12 14 16 18 200

ABD algorithm

Baseline lower bound First lower bound

Second lower bound*

Erasure coding based algorithms

Number of concurrent writes

Summary of Results

[Cadambe, W, Lynch, 2016]

Storage Co

First Lower BoundTheorem.

Holds even for single‐writer single‐reader case

Always read 1 Always read 2Fail f servers.

Reads 1 Always reads 2

Prove by a counting argument.Both values can be returned from the two points.These points differ in at most 2 server states.A total of N‐f+2 server states.

Fail f servers.

Second Lower Bound

under certain assumptions

Informally, our assumptions prevent interactionsWriters send the value only once

Theorem.

• Writer actions are “black‐box actions”– oblivious to the actual value

• Write protocol operates in phases• For every write operation, there is at most one phase where

value‐dependent messages are sent

• Most existing algorithms fit above assumptions

Assumptions on Write Protocol

Second Lower Bound

Prove for the case of v<f.

Theorem.

One Time Point• v writers. Every writer has one

• If one writer delivers value‐dependent msgs to all alive servers, and any value‐independent msgs can be delivered,– Liveness: that write operation

should return– Atomicity: reader should return

somethingRead quorum

Msg4321

1 servers

• Lemma. There exist the following “worst‐case” assignments of N‐f+v‐1 server states such that they contain enough information about all the v messages

• Consequence of liveness and atomicity• In the bad‐case server states, the average storage cost per server is at least

v/(N‐f+v‐1)

Bad‐case Server States

Construct the Bad‐case Execution• Want to create an execution of the algorithm that has a time point

corresponding to the bad‐case server states• Let writers act until the phase to send value‐dependent messages• Let writers send the value‐dependent messages, keep them in the

channel• Let channel deliver according to the worst‐case server states

write m1

write m3

write m2

Construct the Worst‐case Execution• Want to create an execution of the algorithm that has a time

point corresponding to the bad‐case server states

Deliveredmessage

In the channels

Second Lower Bound

Proved for the case of v<f.

Theorem.

Open Question: Beat the O(min(v,f)) Bound?• Second bound

– Assumption: write once • Similar bound in [Spiegelman, Cassuto, Chockler, Keidar, 2016]

– Assumption: no coding across messages• In order to get storage cost < O(min(v,f))

– Need to write multiple times– And need to code across messages

• Maybe not possible to beat!

Other Future Directions• Improve performance of distributed algorithms with coding– Dynamic network configuration– Distributed optimization

Thank you!

Shared Memory Emulation UCSDcircuit.ucsd.edu/~yhk/ece293-aut16/pdfs/aut16-wang.pdf · writer reader...

Documents

Transcript of Shared Memory Emulation UCSDcircuit.ucsd.edu/~yhk/ece293-aut16/pdfs/aut16-wang.pdf · writer reader...

A MESSAGE FROM THE COMMISSIONER - ctdol.state.ct.us · A MESSAGE FROM THE COMMISSIONER. Dear Reader: Welcome to the Connecticut Career Paths, your personal guide to career decision-making.

EECS 570 Lecture 2 Message Passing & Shared Memory

NCHU System & Network Lab Lab 10 Message Queue and Shared Memory.

Shared Services Canada · Shared Services Canada 5 Minister’s Message As Minister responsible for Shared Services Canada (SSC), I am reminded daily of the critically important role

Message Passing Concurrent Programming Absence Of Shared Memory In previous lectures interaction between threads has been via shared memory – In Java, we refer to shared objects.

Efficient OpenMP Implementation and Translation For ... · Parallel Programming Models Message-passing Shared-address-space Memory Comm. Distributed memory (private memory) Shared

Task 통신 및 동기화 : Message Queue, Semaphore Shared Memory

2 Reader Management 1.0 · 211 10 Message/Transport Bindings (MTBs) ... 214 10.3 XML Message Format ... and management software.

WebSphere Message Broker In Shared Runtime Environments

Distributed Shared Memory (DSM). Distributed Shared Memory Message Passing Paradigm Send (recipient, data) Receive (data) Shared Memory Paradigm Data=

Business Message Standard (BMS)€¦ · Business Message Standard (BMS), Shared Common Library Release 3.1.13, 28-Jan-2020, Issue 1.8 All contents copyright ©

ELECTRONIC MESSAGE CENTERS/DIGITAL READER BOARDS

Shared Memory Message Queue Transport (MQT) and Pool

Optimizing a Parallel Video Encoder with Message Passing and a Shared Memory Architecture

10장 파일시스템 - Yonsei Universitycsys.yonsei.ac.kr/lect/os/o10-1.pdf · shared lock –reader lock과유사(여러프로세스가동시획득가능) exclusive lock –writer

If the reader of this message does not want certain ...

Aravind Venkataraman. Interprocess Communication Shared Memory Request/Reply Communication Concept Message PassingRemote Procedure Call.

Parallel Programming Concepts Shared Nothing Parallelism - MPI · 2014-01-06 · Message Passing Programming paradigm targeting shared-nothing infrastructures Implementations for

Distributed Memory Programming with MPI · MPI Programming 5 Message Passing and MPI Message passing is the principal alternative to shared memory parallel programming – Message

Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer.