An Active Reliable Multicast Framework for the Grids

23
An Active Reliable Multicast Framework for the Grids M. Maimour & C. Pham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday, April 21st, 2002 http://www.ens-lyon.fr/LIP/RESAM Action INRIA-RESO

description

An Active Reliable Multicast Framework for the Grids. M. Maimour & C . P ham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday, April 21st , 200 2. Action INRIA-RESO. http://www.ens-lyon.fr/LIP/RESAM. Outline. Motivations behind (reliable) multicast - PowerPoint PPT Presentation

Transcript of An Active Reliable Multicast Framework for the Grids

Page 1: An Active Reliable Multicast Framework for the Grids

An Active Reliable Multicast Framework for the Grids

M. Maimour & C. Pham

ICCS 2002, AmsterdamNetwork Support and Services for Computational Grids Sunday, April 21st, 2002

http://www.ens-lyon.fr/LIP/RESAM

Action INRIA-RESO

Page 2: An Active Reliable Multicast Framework for the Grids

2

Outline

Motivations behind (reliable) multicast

Use of active networks : the DyRAM protocol

DyRAM main services Simulation results Conclusion

Page 3: An Active Reliable Multicast Framework for the Grids

3

From unicast…

Problem Sending same data to many receivers via unicast is inefficient.

Sender

data

datadata

data

Receiver Receiver Receiver

datadata

Page 4: An Active Reliable Multicast Framework for the Grids

4

…to multicast on the Internet.Sender

data

datadata

data

Receiver Receiver Receiver

Problem Sending same data to many receivers via unicast is inefficient.

SolutionUsing multicast is more efficient

Page 5: An Active Reliable Multicast Framework for the Grids

5

At the routing level, IP Multicast efficiently delivers packets to all the receivers subscribed to a multicast session but without any reliability guarantees.

Reliability (including flow and congestion control) is to be addressed at the transport level.

Reliable multicast

Page 6: An Active Reliable Multicast Framework for the Grids

6

Data replications

Database updates

Code & data transfers

Data communications for distributed applications (collective & gather operations, sync. barrier)

Data replications

Database updates

Code & data transfers

Data communications for distributed applications (collective & gather operations, sync. barrier)

Reliable multicast: a big win for grids

Multicast address group 224.2.0.1

224.2.0.1

SDSC IBM SP1024 procs5x12x17 =1020

NCSA Origin Array256+128+1285x12x(4+2+2) =480

CPlant cluster256 nodes

Page 7: An Active Reliable Multicast Framework for the Grids

7

Reliable multicast strategies

End-to-end solutions :Only the end hosts (the source and/or the receivers) are involved.Problem : the lack of topology information at the end hosts.

In-network solutions :Some intermediate nodes (router/server) are involved in the recovery process.

Page 8: An Active Reliable Multicast Framework for the Grids

8

Active networking solutions

Active routers are able to perform customized computations on incoming packets: cache of data, feedback aggregation, filtering, subcasting, …

Page 9: An Active Reliable Multicast Framework for the Grids

9

The DyRAM framework for grids(Dynamic Replier Active Reliable Multicast)

In order to enable distributed grid applications, main design goals are :

low recovery latency using local recovery

low memory usage in routers : local recovery is performed from the receivers (no cache in routers)

low processing overheads in routers : light active services

Page 10: An Active Reliable Multicast Framework for the Grids

10

DyRAM loss recovery strategy : main active services

DyRAM is NACK-based …

Global NACK suppression Early packet loss detection Subcast of repair packets Dynamic replier election

Page 11: An Active Reliable Multicast Framework for the Grids

11

Global NACKs suppression

NACK4NACK4

NACK4

NACK4data4

NACK4

only one NACK is forwarded to the source

Page 12: An Active Reliable Multicast Framework for the Grids

12

Early loss packet detection

NACK4

NACK4

NACK4

NACK4

NACK4

A NACK is sent by the router

data3data4

data5

The repair latency can be reduced if the lost packet could be requested as soon as possible

These NACKs are ignored!

Page 13: An Active Reliable Multicast Framework for the Grids

13

Replier election

A receiver is elected to be a replier for each lost packet (one recovery tree per packet)

Load balancing can be taken into account for the replier election

Page 14: An Active Reliable Multicast Framework for the Grids

Replier election and repair subcast

IP multicastIP multicast

IP multicast

DyRAMDyRAM

IP multicast

IP multicast

DyRAMDyRAM

R1

R2R3R4

R5 R6 R7

0

12

1 0

NAK 2,@ NAK 2,@

NAK 2,@

NAK 2 from link 1NAK 2 from link 2

NAK 2

Repair 2

Repair 2

Repair 2

Repair 2

D0

D1

NAK 2

NAK 2

Page 15: An Active Reliable Multicast Framework for the Grids

core networkGbits rate

1000 Base FX

active routeractive router

active router

active router

active router

100 Base FX

sourcesourceThe backbone is very fast so nothing else than fast forwarding functions.

• Nacks suppresion• Subcast• Loss detection

A hierarchy of active routers can be used for processing specific functions at different layers of the hierarchy.

Any receiver can be elected as a replier for a loss packet.

•Nacks suppression•Subcast •Replier election

The DyRAM framework for grids

Page 16: An Active Reliable Multicast Framework for the Grids

16

Some simulation results

Network model and metrics used Local recovery from the receivers DyRAM vs. ARM (cache in routers) DyRAM : early lost packet detection

Page 17: An Active Reliable Multicast Framework for the Grids

17

Network model

10 MBytes file transfer

Source router

Page 18: An Active Reliable Multicast Framework for the Grids

18

Metrics

Load at the source : the number of the retransmissions from the source.

Load at the network : the consumed bandwidth.

Completion time per packet (latency).

Page 19: An Active Reliable Multicast Framework for the Grids

19

Local recovery from the receivers (1)

Local recoveries reduces the end-to-end delay (especially for high loss rates and a large number of receivers).

#grp: 6…24

4 receivers/group

p=0.25

Page 20: An Active Reliable Multicast Framework for the Grids

20

Local recovery from the receivers (2)

As the group size increases, doing the recoveries from the receivers greatly reduces the bandwidth consumption

48 receivers distributed in g groups #grp: 2…24

Page 21: An Active Reliable Multicast Framework for the Grids

21

DyRAM vs ARM

ARM performs better than DyRAM only for very low loss rates and with considerable caching requirements

Page 22: An Active Reliable Multicast Framework for the Grids

22

DyRAM: early lost packet detection

#grp: 6…244 receivers/group

The end-to-end latency is decreased when the early lost packet detection is enabled

Page 23: An Active Reliable Multicast Framework for the Grids

23

Conclusions

Reliability on large-scale multicast is difficult.

Active services can provide more efficient solutions for reliable multicast related problems.

Main DyRAM design goal is reducing the end-to-end latencies using active services

which are keeped as light as possible making DyRAM more suitable to grid applications.