Design Automation Conference (DAC), June 6 th, 20131 Taming the Complexity of Coordinated Place and...
-
Upload
wesley-shaw -
Category
Documents
-
view
217 -
download
0
description
Transcript of Design Automation Conference (DAC), June 6 th, 20131 Taming the Complexity of Coordinated Place and...
Design Automation Conference (DAC), June 6th, 2013 1
Taming the Complexity of Coordinated Place and Route
Jin Hu†, Myung-Chul Kim†† and Igor L. Markov†††
†Systems and Technology Group (STG), IBM, Fishkill, NY ††Systems and Technology Group (STG), IBM, Austin, TX
†††Dept. of Computer Science and Engineering (CSE), University of Michigan
[speaker]
Design Automation Conference (DAC), June 6th, 2013 2
Placement Solutions Must Be Routable
DAC 1998
DAC 2000 (Capo)
Back to the Future ? Things have changed:
Much larger, more challenging P&R instancesStronger, faster baseline placers & routersVery precise evaluation of results through contestsRecent trend: simultaneous P&R
Our contributions to P&R:1. Speed2. Speed3. Speed
Design Automation Conference (DAC), June 6th, 2013 3
1st place at ICCAD`12
What DAC`13 Reviewers Thought of Our Contributions
Design Automation Conference (DAC), June 6th, 2013 4
The Need for Speed Congestion estimation during global
placement requires fast routingWe can do 75K nets/sec (1 thread)Our competition – 6K nets/sec (1 thread)Our router is called 20 times during GP
and still takes <15% of total runtime The secret – simplify the router’s task Additional secrets
Scrap Dijkstra and A*-searchUse array-based, cache-friendly algorithms
Design Automation Conference (DAC), June 6th, 2013 5
How to Avoid Extra Work The placer invokes a router The router works hard to reduce violations The placer changes locations The placer invokes a router The router works hard to reduce violations The placer changes locations The placer invokes a router The router works hard to reduce violations
…
Design Automation Conference (DAC), June 6th, 2013 6
How to Avoid Extra Work
Spread the router’s work across global-placement iterationsDo not solve routing every time – no need to! Reuse work
Design Automation Conference (DAC), June 6th, 2013 7
Incremental Global Routing When movable objects stay in same GCells,
reuse routes When objects move a little, reuse routes When relative positions do not change,
reuse routes
When everything changes, try to reuse routes
Design Automation Conference (DAC), June 6th, 2013 8
Incremental Maze Routing ? Dijkstra and A*-search don’t do incremental ! They are also slow – pointer-chasing
in binary heaps is no good for cache
Need something else !
Design Automation Conference (DAC), June 6th, 2013 9
The Answer – Bellman-Ford (BF) Bellman-Ford can be incremental Bellman-Ford uses no pointers Bellman-Ford is cache-friendly Worst-case complexity is O(V 2)
versus O(V log V) for Dijkstra and A*-search So, many skeptics:
The trick: run 1 pass of BF at a time, incrementally
Design Automation Conference (DAC), June 6th, 2013 10
More on Bellman-Ford in Our Paper Finishes sooner with alternating passes (known) Generalizes monotonic routing (obvious) Finds some non-monotonic routes in one pass
(with our improvements) Theorem 1: optimal routes with k monotonic
segments are found in k passesFor most nets, k is very small
Design Automation Conference (DAC), June 6th, 2013 11
What About Scenic Routes?
Router invocations mark routing congestion The placer spreads cells to eliminate congestion Do not waste time on scenic routes
(but incremental Bellman-Ford can find them anyway)
Design Automation Conference (DAC), June 6th, 2013 12
Design Automation Conference (DAC), June 6th, 2013 13
Congestion map vs. Estimate (Early GP)
LIRE1 Iteration of BFG-R
Design Automation Conference (DAC), June 6th, 2013 14
Congestion map vs. Estimate (Early GP)
LZ Routing1 Iteration of BFG-R
Design Automation Conference (DAC), June 6th, 2013 15
Congestion map vs. Estimate (Early GP)
L Routing1 Iteration of BFG-R
Design Automation Conference (DAC), June 6th, 2013 16
Congestion map vs. Estimate (Mid GP)
1 Iteration of BFG-R LIRE
Design Automation Conference (DAC), June 6th, 2013 17
Congestion map vs. Estimate (Mid GP)
1 Iteration of BFG-R LZ Routing
Design Automation Conference (DAC), June 6th, 2013 18
Congestion map vs. Estimate (Mid GP)
L Routing1 Iteration of BFG-R
Design Automation Conference (DAC), June 6th, 2013 19
Congestion map vs. Estimate (Late GP)
1 Iteration of BFG-R LIRE
Design Automation Conference (DAC), June 6th, 2013 20
Congestion map vs. Estimate (Late GP)
1 Iteration of BFG-R LZ Routing
Design Automation Conference (DAC), June 6th, 2013 21
Congestion map vs. Estimate (Late GP)
L Routing1 Iteration of BFG-R
Congestion Classification: Cell-based congestion: cell-to-cell proximity
Solution: cell bloating (known)
Layout-based congestion: due to static design properties (blockages, routing obstacles)Solution: static whitespace injection
Remotely-induced layout-based congestion: caused by non-local factors, e.g., long netsSolution: tricky
Design Automation Conference (DAC), June 6th, 2013 22
Design Automation Conference (DAC), June 6th, 2013 23
Packing Peanut vs. Macro Expansion
After 4 invocations of placement
Initial Macro
After 2 invocations of placement
Packing peanut
Facilitates full use of available resources --- does not overconstrain placement
Design Automation Conference (DAC), June 6th, 2013 24
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 25
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 26
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 27
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 28
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 29
Example During Global Placement
Congestion Map Placement
Design Automation Conference (DAC), June 6th, 2013 30
Empirical ValidationCompares against official results from ICCAD 2012 Contest[Viswanathan et al. – ICCAD 2012]
CoPR implemented using C++ (g++ 4.7.0) using OpenMP
CoPR is 1% slower than SimPLR, which was 5.7x faster than RippleCoPR has 2% and 7% better quality than SimPLR and NTUplace [no runtime factor]
CoPR invokes LIRE once every 3 placement iterations, contributing 14.3% of total runtime
Conclusions Crazy fast coordinated place-and-route
Through incremental routing and better algorithms Three congestion types
+ ways to relieve them in global placement
Design Automation Conference (DAC), June 6th, 2013 31
Placement Routing
1st place at ICCAD`12
Design Automation Conference (DAC), June 6th, 2013 32
Backup: Placements Visualized
Design Automation Conference (DAC), June 6th, 2013 33
Backup: Detailed Placement
Congestion Aware DPAfter Global Placement Congestion UNaware DP