State of IP Multicast

48
Copyright © 1999 Sun Microsystems, Inc. Radia Perlman 1 State of IP Multicast Radia Perlman [email protected]

description

State of IP Multicast. Radia Perlman [email protected]. Outline. Addresses IGMP Various Routing Protocols review of DVMRP, MOSPF, CBT, PIM-DM, PIM-SM, MSDP, BGMP/MASC problems (scaling, etc) potential solutions: Simple Multicast/Express. Addresses. IP Address is 4 bytes - PowerPoint PPT Presentation

Transcript of State of IP Multicast

Page 1: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 1

State of IP Multicast

Radia Perlman

[email protected]

Page 2: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 2

Outline

• Addresses

• IGMP

• Various Routing Protocols– review of DVMRP, MOSPF, CBT, PIM-DM,

PIM-SM, MSDP, BGMP/MASC– problems (scaling, etc)– potential solutions: Simple Multicast/Express

Page 3: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 3

Addresses

• IP Address is 4 bytes– “Class A” top bit is 0– “Class B” top bits 01– “Class C” top bits 001

• IP Multicast address is “class D”, top bits are 0001

• Mapping to layer 2: use bottom 23 bits, top 24 is OUI, one more bit so ISOC has some

Page 4: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 4

IGMP (Internet Group Management Protocol)

• Purpose: router on a LAN discovers which multicast addresses have receivers on LAN

• Rtr sends query. Members respond– V1: IGMP response to derived layer 2 multicast

address after random delay. Rtr listens promiscuously

– V2: Resign. Rtr queries again– V3: join ({S’s},G) sent to rtr layer 2 address

Page 5: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 5

There are two ways of constructing a design.One way is to make it so simple there areobviously no deficiencies. The other way isto make it so complicated that there are noobvious deficiencies. ---Tony Hoare

Page 6: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 6

DVMRP

• Flood and prune– send data everywhere (optimization: reverse

path forwarding)– send prune (S,G)– remember who you sent prunes to (in case join

happens, so you can de-prune)– remember prunes you received (so you can

filter)

Page 7: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 7

Flooding/RPF

• Forward received packet onto all links except the one it was received on– exponential overhead

• RPF: Only accept pkt with source S on link L if you’d send to S via L– n2 overhead: each pkt goes on each link

Page 8: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 8

Why DVMRP Doesn’t Scale

• Leaking even a few packets for each of millions of sessions periodically

• Prune state (S,G) pairs/neighbor of groups they DON’T want (most of the millions)

Page 9: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 9

MOSPF

• Pass information about all members for all groups in routing protocol

• Calculate spanning tree from source when packet arrives from (S,G) (and cache result)

• Scaling issues:– routing control overhead (all group members)– CPU for multiple Dijkstra calculations

Page 10: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 10

CBT

C

A B

D F

RR

R R

RR

R

Page 11: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 11

CBT

• Build bidirectional tree rooted at Core

• Only routers on tree need to know about tree

• Only problem: Who is the core?

• Two mechanisms specified– configure the routers with (C,G) mappings– do PIM-SM bootstrap protocol (see next)

Page 12: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 12

PIM-SM

• Unidirectional Shared Tree (tunnel packet to core)

• Plus dynamically formed per-source trees when (enough) traffic occurs

Page 13: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 13

Unidirectional Tree

C

A B

D F

RR

R R

RR

R

Page 14: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 14

Bidirectional Tree

C

A B

D F

RR

R R

RR

R

G

Page 15: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 15

Dynamically Formed Per-Source Trees

• If enough traffic from S

• join a tree rooted at S

• prune off from shared tree for (S,G)

• Routers keep more trees and more prune state

• State timed out. Bursty source problem

Page 16: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 16

(Simplified) PIM core mapping

• PIM: “bootstrap” routers flood advertisements throughout domain

• Core capable routers register with elected BSR

• BSR announces list of cores

Page 17: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 17

PIM core mapping (cont’d)

• Hash alg to map M to one of the set of currently alive core capable routers

• Core not necessarily near group, so shared tree can be really bad

• Advertisements don’t scale, so this is intra-domain only

Page 18: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 18

Interdomain

• Use protocols that don’t scale within domains

• Find some way of gluing domains together– BGMP/MASC– MSDP

Page 19: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 19

BGMP/MASC

• For interdomain: have each domain dynamically choose and defend a block of multicast addresses

• Have interdomain routing protocol pass around “reachability” of multicast address blocks

• Join is in direction of multicast address prefix

Page 20: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 20

Scaling Problems

• MASC– Harder than asking entire Internet to

automatically number itself with IP addresses.– Too much bandwidth used– Too hard to debug– Too much of a burden on BGP– Will run out of addresses

Page 21: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 21

MSDP

• Multicast Source Distribution Protocol

• “Interim solution” until BGMP/MASC done

• Configure tunnels between core capable routers in various domains, enough so hopefully Internet is connected

• Flood (S,G) for all active (S,G)’s, throughout Internet

Page 22: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 22

MSDP

xx

x

x x

x

x

x x

x

x

x

Page 23: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 23

Why MSDP Won’t Scale

• Too much information to pass around (all active (S,G) pairs

• Too many tunnels to configure

Page 24: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 24

“Current approach”

• Use protocols that don’t scale within a domain

• Find some way of hooking domains together for groups with members in different domains

• MSDP or MASC

Page 25: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 25

Simple Multicast

• What causes the greatest complexity, scalability problems in the design?

• Remove the need for those

• Result: one scalable mechanism that will work both inside and between domains

• Doesn’t need to be called “new protocol”. Can be modification of something else

Page 26: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 26

Solve 90% of the problem as simplyas possible. Then remove the remaining10% from the problem requirements --- Marshall Rose

Page 27: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 27

First Simplification

• Don’t bother dynamically creating per-source trees

• Instead use a single shared, good bidirectional tree

• Less state

• Better shared tree (bidirectional)

Page 28: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 28

Bidirectional Suboptimal?

• Cost to network to deliver data NOT MORE• Core is NOT a bottleneck• Core can be an endnode, does not need to

forward data• With single exit point from “domain”, delay

difference from source tree is negligible• Don’t need “optimal”. Need “good enough”

Page 29: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 29

Bidirectional Trees Best

• Per-Source Trees– Do NOT make network overhead lower (unless

core is poorly chosen)– More state for net (n trees rather than one)– Only metric under which per-source tree is better is

delay from source to each receiver

• Bidirectional tree, with slight care, can ensure short paths to nearby members from any source

Page 30: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 30

Choosing good bidirectional tree

• From each domain (or region separated by expensive links), have routers agree on one exit point per IP address prefix

• Choose core to be a member of the group, or close to a member of the group

• No “bandwidth bottleneck” around core--it’s just a node in the tree

• C can be endnode (only fwd tunneled pkts)

Page 31: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 31

Good Bidirectional Tree

R1

R2

R3

R4

Page 32: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 32

Next simplification

• Forcing all routers in Internet to figure out C from M is too expensive and complicated

• Instead, make group ID 8 bytes

• Only extra work for endnode: look up 8 byte group ID rather than 4 bytes.

• Eliminate need for multicast address allocation, domain-wide core advertisements, etc.

Page 33: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 33

Simple Multicast

• Bidirectional Tree

• Group ID is (C,M)

• To create group: choose C, ask C for M

• Member discovers 8 byte (C,M) – via email, web page, SDR, directory, etc.

• Include C and M in join or IGMP reply

• Include C and M in data messages

Page 34: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 34

Simple Multicast Variants

• (C,G) in join, not in data messages– requires unique G’s– what if disagreement about C for G?

• (C,G) in both join and data– explicitly (e.g., IP option)– MPLS– use link-local destination address

Page 35: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 35

Link Local Destination Address

A R1 CR2

Join C,G

Ack C,G, use X1

Join C,G

Ack C,G, use X2

Join C,G

Ack C,G, use X3

Data, dest=X1 Data, dest=X2 Data, dest=X3

Page 36: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 36

Simple Multicast Variants, Cont’d

• Express– 8-byte group ID (S,G)– Unidirectional Tree– If multiple senders

• create multiple trees

• tunnel to S

Page 37: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 37

Issues (with good answers)

• Access Control: controlling who sends by configuring “one” node

• Reliability if core goes down

• Backward compatibility (migrating nodes one at a time)

Page 38: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 38

“Access Control”

• Suppose want to restrict senders?

• Express: S can choose not to forward from others

• PIM: RP can be configured with authorized senders. Refuse to forward. (but members below 1st hop router will receive pkts)

• SM: Core can be configured, and tell others in heartbeat

Page 39: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 39

Access Control, Cont’d

• What if list doesn’t fit in the heartbeat msg?– Only say no S “if needed” (after bad S sends)– Only say yes S if needed (S tunnels to core or

asks permission of core)– Can have list of yes’s, no’s, or both

Page 40: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 40

Multiple Groups for Availability

• Rather than “backup core”, just create multiple groups (C1,M1), (C2,M2) and members join both

• Transmit on one (one where you’re getting heartbeat). Receive on both.

• Or if application requires absolute timeliness, transmit on both

• Also, create multiple for load sharing

Page 41: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 41

Multiple Groups

• Interdomain policy might require a tree per source domain.– Create a single tree for each domain rather than

one per source in that domain.– Can use shared tree like RP: If create extra

auxiliary tree, have it advertised via heartbeat

Page 42: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 42

Distributed Cores

• If really want failover to another core

• Have protocol among core capable routers

• They advertise among themselves

• Winner injects host route

• Will be less overhead than PIM BSR protocol advertising throughout domain

Page 43: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 43

Backward Compatibility

• Simplest: look different so other multicast protocols won’t forward the packet

• Assume incremental deployment

• Join sent to Core. Unicast by non-SM rtrs

• Data destination=core or M or tunnel endpoint

Page 44: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 44

Automatically discovering Tunnel

• R1 sends “join”. Destination=core

• Forwarded until it reaches R2

• R2 notes pkt rcv’d from non-neighbor R1

• Adds “tunnel port” to R1 to state for (C,M)

• Sends join-ack to R1

• R1 creates “tunnel port” to R2 as parent port for (C,M)

Page 45: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 45

Tunnel needed

R1 r r R2 r C

R1 -- R2 and R2 -- C are “tunnels”IP option contains both C and MIP destination address has C or tunnel endpoint or M

AR3

B

D

Page 46: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 46

New Protocol or New version of existing protocol?

• No reason to do “totally new thing”• Two suggestions: bidirectional shared trees,

and group ID=(C,G)• Suggestions orthogonal• CBT and BGMP already do bidirectional

trees. PIM could be modified to do it• Easy to modify any of them to get core from

pkt

Page 47: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 47

Summary

• Shared bidirectional trees– fewer trees to keep track of and maintain– more efficient than tunneling to core

• Group ID C+M– trivial address allocation– no extra info for BGP to pass around– no “core capable router advertisements”– controlled selection of core for group

Page 48: State of IP Multicast

Copyright © 1999 Sun Microsystems, Inc.

Radia Perlman 48

Summary

• This stuff doesn’t have to be so complicated

• It would be good for Internet if multicast really could allow millions of groups, easily formed by anyone