An Implementation of User-level Distributed Shared Memory

15
An Implementation of User- level Distributed Shared Memory Wei Zhang & Shu Liu

description

An Implementation of User-level Distributed Shared Memory. Wei Zhang & Shu Liu. DSM: Shared Memory + Distributed Memory. Problems & Solutions. Cont. Design Overview. For a read. For a write. Implementation. Data structures: Page Table in each node Pageinfo in server - PowerPoint PPT Presentation

Transcript of An Implementation of User-level Distributed Shared Memory

Page 1: An Implementation of User-level Distributed Shared Memory

An Implementation of User-level Distributed Shared Memory

Wei Zhang & Shu Liu

Page 2: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 2

DSM: Shared Memory + Distributed Memory

Page 3: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 3

Problems & Solutions

Problems Solutions

Granularity Use 4-Kbyte Page as the unit of sharing

Data location/Mapping Centralized server

Communication MPI (Message Passing Interface)

Page 4: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 4

Cont.

Problems Solutions

Memory Coherence in parallelism

a: each page has one dynamic owner

b: multi readers (make copies)

c: single writer (only owner can write the page)d: lock & barrier (synchronize page operation)

Page 5: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 5

Design Overview

Program

Shared Memroy

Global Address Physical AddressPhysical

Memory

page_fault_handlerpage faults

Server

page_table_set_entry

Remote

Physical

Memory

page_table_create

MPI Send

MP

I For

war

d

MPI Send / Recv

Page 6: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 6

For a read

0

0

0

0

0

0

0

0

0

0

...

A

B

C

D

E

F

G

H

I

J

...

data

0

1

2

3

4

5

6

7

8

9

...

page#

0

5

2

1

4

-

3

-

-

-

...

frame

RW

RW

RW

RW

RW

-

R

-

-

-

...

access bit

0

0

0

0

0

0

0

0

0

0

...

copyset

Shared Memory Physical Memory

A

D

C

G

E

B

0

1

2

3

4

5

frame data

node1

A

B

C

D

E

F

G

H

I

J

...

data

0

1

2

3

4

5

6

7

8

9

...

page#

-

-

-

-

-

1

3

0

2

5

...

frame

-

-

-

-

-

RW

RW

RW

RW

RW

...

access bit

0

0

0

0

0

0

1

0

0

0

...

copyset

H

F

I

G

J

0

1

2

3

4

5

frame data

node2

Read page 6

page_fault_handler(pt, 6)

Server

1

1

1

1

1

2

2

2

2

2

3

3

3

...

0

1

2

3

4

5

6

7

8

9

10

11

12

...

page# owner

0

0

1

0

0

0

0

0

0

0

...

lock

lock

6 G - - 0 0

6 G RW 0

...

3

MPI send

0

Page 7: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 7

For a write

0

0

0

0

0

0

0

0

0

0

...

A

B

C

D

E

F

G

H

I

J

...

data

0

1

2

3

4

5

6

7

8

9

...

page#

0

5

2

1

4

-

3

-

-

-

...

frame

RW

RW

RW

RW

RW

-

W

-

-

-

...

access bit

0

0

0

0

0

0

0

0

0

0

...

copyset

Shared Memory Physical Memory

A

D

C

G

E

B

0

1

2

3

4

5

frame data

node1

A

B

C

D

E

F

G

H

I

J

...

data

0

1

2

3

4

5

6

7

8

9

...

page#

-

-

-

-

-

1

-

0

2

5

frame

-

-

-

-

-

RW

-

RW

RW

RW

...

access bit

0

0

0

0

0

0

0

0

0

0

...

copyset

H

F

I

J

0

1

2

3

4

5

frame data

node2

page_fault_handler(pt, 6)

Server

0

0

1

0

0

0

1

0

0

0

...

lock

lock

page_table_set_entry

Write page 6

invalidate6 G 3 R 0 0

6 G 3 RW 1 0

1

1

1

1

1

2

2

2

2

2

3

3

3

...

0

1

2

3

4

5

6

7

8

9

10

11

12

...

page# owner

...

Page 8: An Implementation of User-level Distributed Shared Memory

Implementation

• Data structures:– Page Table in each node – Pageinfo in server

• Important system calls – mmap()– mprotect()

• SIGSEGV signal: handle page fault • pthread: receive page fault request and send

data23/4/21 Final Report 8

page# data Access bitframe copyset lock

0 1 2 3 4 5 6 7 8 9 ... 1 1 1 1 1 2 2 2 2 2 ...

Page#

Owner

Page 9: An Implementation of User-level Distributed Shared Memory

Cont.• MPI: create a cluster and be responsible for

communication• #include ”dsm.h”: a simple yet powerful API

Name Functiondsm_startup() initialization

dsm_malloc (int size) allocate shared memory for the process

dsm_barrier () global synchronization

dsm_clock() count elapsed time

dsm_lock() page synchronization

dsm_exit() clean up and shut DSM down

23/4/21 Final Report 9

Page 10: An Implementation of User-level Distributed Shared Memory

Cont.

23/4/21 Final Report 10

Include dsm header file

Start dsm system

Allocate shared memory

Synchronize

Free shared memoryExit

Page 11: An Implementation of User-level Distributed Shared Memory

Evaluation

• Assumptions:– server congestion is not the bottleneck– network is reliable

• Benchmarks:– Jacobi: partial differential equations: Ax=b – MM: parallel matrix multiply: C=AB – Scan: multi-iteration scan program– Focus: multi-iteration write program

23/4/21 Final Report 11

Page 12: An Implementation of User-level Distributed Shared Memory

Cont.

• Speedup

23/4/21 Final Report 12

Page 13: An Implementation of User-level Distributed Shared Memory

23/4/21 Progress Report 13

Cont.

• Page Fault

Final Report

Page 14: An Implementation of User-level Distributed Shared Memory

Conclusion & Future work

• Achieved what we claimed• Improvement:

– Blocking Communication-> Non-blocking Communication

– Other Memory Consistency Model (MRMW)– Decrease network communication

Page 15: An Implementation of User-level Distributed Shared Memory

23/4/21 Final Report 15

Thank you!