Post on 23-Dec-2015
1
Building Compilers for Reconfigurable Switches
Lavanya Jose, Lisa Yan,Nick McKeown, and George Varghese
Research funded by AT&T, Intel, Open Networking Research Center.
2
In the next 20 minutes
• Fixed-function switch chips will be replaced by reconfigurable switch chips
• We will program them using languages like P4• We need a compiler to compile P4 programs
to reconfigurable switch chips.
3
Fixed-Function Switch Chips
QueuesL2
StageIPv4StagePa
rser IPv6
StageACL Stage
L3L2
Pack
et
Pack
et
4
Control Flow Graph
QueuesL2
StageIPv4StagePa
rser IPv6
StageACL StageL2
Tab
le
IPv4
Tab
le IPv6
Ta
ble
ACL
Tabl
e
L2v4
v6ACL
Control Flow Graph
Switch Pipeline
Fixe
d Ac
tion
Fixe
d Ac
tion
Acti
on
Fixe
d Ac
tion
5
Fixed-Function Switch Chips Are Limited
1. Can’t add new forwarding functionality2. Can’t add new monitoring functionality
6
Fixed-Function Switch Chips
QueuesL2
StageIPv4StagePa
rser IPv6
StageACL StageL2
Tab
le
IPv4
Tab
le IPv6
Ta
ble
ACL
Tabl
e
Fixe
d Ac
tion
Fixe
d Ac
tion
Actio
n
Fixe
d Ac
tion
L2v4
v6ACL
Control Flow Graph
Switch Pipeline
MyEncap
?
MyEncap
7
Fixed-Function Switch Chips Are Limited
1. Can’t add new forwarding functionality2. Can’t add new monitoring functionality3. Can’t move resources between functions
QueuesL2 Stag
e
IPv4Stag
e
Pars
er
IPv6 Stag
e
ACL Stag
eFixe
d Ac
tion
Fixe
d Ac
tion
Actio
n
L2 T
able
Fixe
d Ac
tion
IPv4
Tab
le IPv6
Ta
ble
ACL
Tabl
e
8
Control Flow Graph
Switch Pipeline
Reconfigurable Switch Chips
Queues
Pars
er
Fixe
d Ac
tion
Fixe
d Ac
tion
Fixe
d Ac
tion
L2 T
able
Fixe
d Ac
tion
IPv4
Tab
le
IPv6
Tab
le
ACL
Tabl
e
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
L2v4
v6ACL
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
9
Mat
ch T
able
Actio
n M
acro
Mapping Control Flow to Reconfigurable Chip.
Queues
Pars
er
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
L2 T
able
IPv4
Tab
le
IPv6
Tab
le
ACL
Tabl
e
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
L2v4
v6ACL
Control Flow Graph
Switch Pipeline
L2
v6ACL
v4
L2 A
ction
Mac
ro
v4 A
ction
Mac
ro
v6 A
ction
ACL
Actio
n M
acro
10
Control Flow Graph
Switch Pipeline
Reconfigurable Switch Chips
Queues
Pars
er
L2 T
able
IPv4
Tab
le
ACL
Tabl
e
IPv6
MyE
ncap
L2v4
v6ACL
MyEncapL2
Acti
on M
acro
v4 A
ction
Mac
ro
ACL
Actio
n M
acro
Actio
n
MyEncap
Actio
n
IPv4
Actio
n
IPv4
Actio
n
IPv6
IPv4
Actio
n
11
Pars
er
L2 T
able
IPv4
Tab
le
IPv6
Tab
le
ACL
Tabl
e
MatchMemory
ActionALU
Protocol Independent Switch
L2 A
ction
Mac
ro
v4 A
ction
Mac
ro
v6 A
ction
Mac
ro
ACL
Actio
n M
acro
12
Pars
er
L2 T
able
IPv4
Tab
le
IPv6 AC
L Ta
ble
IPv4
Tab
le
IPv4
Ta
ble
L2 A
ction
Mac
ro
v4 A
ction
Mac
ro
Actio
n
Actio
n
Actio
n
13
Match + Action Processor: pipelined and in-parallel
14
Reconfigurability: the norm in 5 years
• Reconfigurability adds mostly to logic.• Logic is getting relatively smaller.• The cost of reconfigurability is going down.• Fixed switch chip area today:– I/O (40%), Memory (40%), – Wires, Logic
Switch I/O (30%)
Memory (30%)
Wires (20%)
Logic (20%)
Switch I/O
Memory
Wires Logic
15
Fixed Function Broadcom Tomahawk: 3.2 TbpsReconfigurable Cavium Xpliant: 3.2 Tbps
16
Reconfigurable chips are inevitable.
17
Configuring Switch ChipsP4 code
Compiler
Compiler Target
Queues
Pars
er
Fixe
d Ac
tion
Fixe
d Ac
tion
Fixe
d Ac
tion
L2 T
able
Fixe
d Ac
tion
IPv4
Tab
le
IPv6
Tab
le
ACL
Tabl
e
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
18
Queues
Pars
er
Fixe
d Ac
tion
Fixe
d Ac
tion
Fixe
d Ac
tion
L2 T
able
Fixe
d Ac
tion
IPv4
Tab
le
IPv6
Tab
le
ACL
Tabl
e
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
P4 (http://p4.org/)
parser parse_ethernet { extract(ethernet);select(latest.etherType) { 0x800 : parse_ipv4; 0x86DD : parse_ipv6; }}
table ipv4_lpm { reads { ipv4.dstAddr : lpm; } actions { set_next_hop; drop; }}
control ingress { apply(l2_table); if (valid(ipv4)) { apply(ipv4_table); } if (valid(ipv6)) { apply(ipv6_table); } apply (acl);}
L2 v4v6
ACL
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
(ANCS’13)Parser
Match ActionTables
Control Flow Graph
19
What does reconfigurability buy us?
20
• Use resources efficiently– Multiple tables per stage– Big table in multiple stages
• Use fewer stagesL2
IPv4
IPv6ACL
Benefits of Reconfigurability
21
Naïve Mapping: Control Flow GraphPa
rser
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Mat
ch T
able
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
Actio
n M
acro
L2v4
v6ACL
Control Flow
Switch Pipeline
Queues
L2 T
able
IPv4
Tab
le
IPv6
Tab
le ACL
Tabl
e
L2
v6ACL
v4
Actio
n
v4 A
ction
Mac
ro
v6 A
ction
Mac
ro
Actio
n
22
Control Flow Graph
L2
Table Dependency Graph (TDG)
v4
v6ACL
L2
v4
v6
ACL
Table Dependency Graph
23
Switch Pipeline
Efficient Mapping: TDG
Queues
Pars
er
L2 T
able
IPv4
Tab
le
IPv6
Tab
le
Table Dependency GraphControl Flow Graph
L2v4
v6ACLL2
v4
v6
ACLAc
tion
v4 A
ction
Mac
ro
v6 A
ction
Mac
ro
ACL
Tabl
e
Actio
n
24
L2
Control Flow Graph
Switch Pipeline
Resource constraints
v4
v6ACL
Queues
Pars
er
L2 T
able
IPv6
IPv4
L3L2
v6
v4
L2 A
ction
Mac
ro
v4 A
ction
Mac
ro
v6 A
ction
Mac
ro
Actio
n
ACL
Tabl
e
25
Header widths
Action ALU inputMemory Type
Table parallelism
More resource constraints
Action Memory
26
Map match action tables in a TDG to a switch pipeline while
respecting dependency and resource constraints.
The Compiler Problem
27
Step 1: P4 Program
Step 2: Control Flow Graph
L2v4
v6ACL
Step 3: Table Dependency Graph
L2 v4
v6ACL
Step 4: Table Configuration
28
Is that it?
29
Two Switches We Studied1 2 3 4 32…
14
3 5
2
RMT32 Stages
(SIGCOMM 2013)
FlexPipe5 Stages
(Intel FM6000)
30
Additional switch features
L2
v4
v6
ACL
L2
L2
v4
v6
Table shaping in RMT Table sharing in FlexPipe
31
Map match action tables in a TDG to a switch pipeline while
respecting dependency and resource constraints.
The Compiler Problem
Table shaping Table sharing
Header widths Action ALU input
Memory Type
Table parallelismAction Memory
32
First approach: Greedy
• Prioritize one constraint• Sort tables• Map tables one at a time
Queues
Pars
er
12 3
3 Sort by # dependencies
First approach: Greedy
Queues
Pars
er
1
• Prioritize one constraint• Sort tables• Map tables one at a time
2 3 4
11 Sort by match width
33
34
Too many constraints for Greedy
• Any greedy must sort tables based on a metric that is a fixed function of constraints.
• As the number of constraints gets larger, it’s harder for a fixed function to represent the interplay between all constraints.
• Can we do better than greedy?
35
Second approach:Integer Linear Programming (ILP)
Find an optimal mapping.
Pros:• Takes in all constraints• Different objectives• Solvers exist (CPLEX)
Cons:• Blackbox solver• Encoding is an art • Slow
36
ILP Setup
min # stagessubject to:
≥
≤
dependency constraints
table sizes assigned
memoriesassigned
table sizes specified
memories inphysical stage
37
Experiment Setup
• 4 datacenter use cases from Intel, Barefoot
• Differ in tables, table sizes, and dependencies
38
Example Use Case
A Typical TDG
IPv6-Mcast
EG-ACL1
EG-Phy-Meta
IG-Agg-Intf
IG-Dmac
IPv4-Mcast
IPv4-Nexthop
IPv6-Nexthop
IG-Props
IG-Router-Mac
Ipv4-Ecmp
IG-Smac
Ipv4-Ucast-LPM
Ipv4-Ucast-Host
Ipv6-Ucast-Host
Ipv6-Ucast-LPM
Ipv6-Ecmp
IG_ACL2
IG_Bcast_Storm
Ipv4_Urpf
Ipv6_Urpf
IG_ACL1
EG_Props
IG_Phy_Meta
Configuration for RMT
39
Metrics: Greedy vs ILP
1. Ability to fit program in chip
2. Optimality
3. Runtime
40
Setup: Greedy vs ILP
1. Ability to fit: FlexPipe– Variants of use cases in 5-stage pipeline.
2. Optimality: RMT– Minimum stage, pipeline latency, power
3. Runtime: both switches
Results: Greedy vs ILP
1. Can Greedy fit my program? – Yes, if resources aplenty (RMT, 32 stages)– No, if resources constrained (FlexPipe, 5 stages),
Can’t fit 25% of programs .
2. How close to optimal is Greedy? – 30% more time for packet to get through RMT pipeline.
3. Hmm.. looks like I need ILP. How slow is it? – 100x slower than Greedy – Reasonable if programs don’t change often.
41
42
If we have time,we should run ILP.
43
Use ILP to suggest best Greedy for program type.
Critical constraints• Dependency critical: 16 13 stages• Additional resource constraints less importantCritical resources• TCAM memories critical: 16 14 stages– Results for one of our datacenter L2/L3 use cases
44
Conclusion• Challenge: Parallelism and constraints in
reconfigurable chips makes compiling difficult.• TDG: highlights parallelism in program.• ILP: better if enough time, fitting is critical, or
objectives are complicated. • Best Greedy: ILP can choose via notion of critical
constraints and critical resources.
45
Thank you!
Research funded by AT&T, Intel, Open Networking Research Center.
ILP Run time
• Number of constraints? Not obvious. E.g., RMT– Min. stage: few secs.– Min. power: few secs.– Min. pipeline latency 10x slower
• Number of variables? How fine-grained is the resource assignment? E.g., FlexPipe– One match entry at a time: many days..– 100-500 match entries at a time: < 1 hr