Ph.D. Preliminary Exam
-
Upload
maisie-barnett -
Category
Documents
-
view
27 -
download
0
description
Transcript of Ph.D. Preliminary Exam
04/19/23Ph.D. Preliminary Exam 1
Ph.D. Preliminary Exam
Mozammel HossainColorado State University
Department of Electrical and Computer EngineeringNest Circuit Lead, IBM, Austin, TX
Advisor: Prof. Tom W. Chen
Committee Members: Prof. Yashwant Malaiya
Dr. Sudeep PasrichaDr. Ali Pezeshki
04/19/23Ph.D. Preliminary Exam 2
Research Area
• Synthesis Based Design and Implementation Methodology of • High Speed, High Performing Unit (LBS)• Sync-Async Interface timing • Arrays with clock gating
• To convert to synthesizable macro
04/19/23Ph.D. Preliminary Exam 3
Outlines
Introduction Overview of Present Synthesis Methodology Future: Research and innovation in Synthesis Methodology
– Problem definitions1. Large Block Synthesis (LBS): L2 Cache Unit
2. Sync-Async Interface timing
3. Clock Gating support for Array Design
– Approaches– Preliminary results
Conclusion and Future Work Acknowledgement
04/19/23Ph.D. Preliminary Exam 4
Introduction
Technology market demand faster turn around of IC design and designers struggle to meet performance requirements.
Increasing costs for design, validation, and time to market. past generations of microprocessors had more custom circuit
design to meet tighter cycle time battle. moving towards common synthesizable design methodology and
most cases sacrificing desired speed of the chip in favor of new functionality and time to market.
04/19/23Ph.D. Preliminary Exam 5
Introduction: Design Methodology
04/19/23 Ph.D. Preliminary Exam 6
Introduction: Macro Design Spectrum
5) Custom design(conventional)
1) VHDL structuring,parm customization
2) Preplace lcb/latches
3) Embed custom
components
4) Custom prerouting
0) “Vanilla” synthesis
Design Effort
De
sig
n C
ust
om
iza
tio
n
ATTRIBUTE BLOCK_DATA of add64 : label is "LOGIC_STYLE=/xxxx/";
04/19/23Ph.D. Preliminary Exam 7
Introduction:Trend of Design methodology for last 16 years
Macro Design History in L2 Unit for las 16 Years
0%10%20%30%40%50%60%70%80%90%
Custom Hybrid rlm LBS
Design Methodology
% o
f Mac
ro T
ype 90nm
65nm
45nm
32nm
22nm
14nm
04/19/23Ph.D. Preliminary Exam 8
Synthesis – VHDL – compile vhdl– PDSRTL – front-end synthesis– PDSEMPAD – early mode padding– MAR – routing– RAPIDS – post routing optimization
– PROMOTE – promote routed design Run all backend tools (PDV, extraction, timing)
Overview of Present Synthesis Methodology
04/19/23 Ph.D. Preliminary Exam 9
Backend toolsPDVRLMB
Cadence SpaceMAR/Rapids
Overview of Present Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 10
Slack sharing Example:
• Look at timing across multiple latches• Consider sharing positive slack
Broken path Has marginto share
Overview of Present Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 11
Slack sharing Example:
• Delayed 1st Clock by 17 ps• Balanced slack of +3ps across 2 latches
Balanced Slack Balanced Slack
Overview of Present Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 12
Overview of Present Synthesis Methodology
Works very well on – Traditional control macro with 2.5-5M Transistors or about
20K-40K Latches– Timing non-critical macro– Non-embeded IP macro– Without parent’s blockages
Unit buffer, latches, clock blockages
– Slack sharing at synchronous clock domain– Without clock gating after Local Clock Buffer (LCB)
04/19/23Ph.D. Preliminary Exam 13
Future:Research and innovation in Synthesis Methodology
1. Problem definition: Large Block Synthesis (LBS)– Current methodology does not work well for much bigger design: L2
Cache Unit (20M Transistor) Need techniques such as IP pre-placement, dataflow structuring, and
hierarchical embedded synthesis. Need techniques for Wire Trait, soft hierarchy, Interior PIN Congestion analysis at Critical timing and wiring area.
– Develop Synthesis Methodology to support Significant Shorter Design Cycle Significant Physical Design Resources Reduction Potential Area Reduction
04/19/23Ph.D. Preliminary Exam 14
LBS test case to develop methodology: – Why L2 Cache Unit?
Area challenged unit Has both 1:1 and 2:1 clocking methodology
– 1:1 Clocking is same clock speed as Core clock Paths on1:1 clocking, are highly timing challenged Require Dual voltage routing and clock gating Combination of data flow and control macros Big unit to challenge tool flow run time and data management
Future:Research and innovation in Synthesis Methodology
04/19/23 Ph.D. Preliminary Exam 15
C5C4C3
C8C7C6 C11C10C9
C2C1C0
L3 Unit
Core
L2 Unit
Why L2 unit as test case?
04/19/23Ph.D. Preliminary Exam 16
LBS: Why L2 unit as test case?Total Cache size: 512KByte
• >4 GHz with core interface, control and Data Flow interface• >2 GHz with cache, dir, address, L3 and Fabric interface
Unit Size: > 4.0 sq mm in 22nm, Total Black Box: 82#of Transistor including cache: 44M # of Synthesizable Transistor: 19M
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 17
LBS: Physical Design Resource Comparison with Proposed Methodology
Physical Design Resources
Traditional Approach(man month)
Synthesizable Unit Approach(man month)
Ckt. Designer 18 0
Unit Timer 6 0
Unit Integrator 6 0
Unit Ckt. Lead 6 12
Total Resources 36 12
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 18
Future:Research and innovation in Synthesis Methodology
2. Problem definition: Synthesis timing methodology for Sync-Async interface.
– Slack Sharing can not be done at Sync-Async Interface. Can result in meta-stable condition . Need to develop a methodology.
– To handle Slack sharing in synthesis and timing environment Identify latches involved. Turn-ff slack sharing.
– For Design Automation.
04/19/23Ph.D. Preliminary Exam 19
Slack Sharing can not be done at Sync-Async Interface
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 20
Slack-Sharing at Sync-Async Interface can result in Meta-stability condition
Meta-stabilityAt Latch point
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 21
3. Problem definition: Clock Gating support for Array Design in Synthesis Methodology.
Compliable Array offers fixed menu with limited read write ports.– Does not support clock gating.
Current methodology does not allow any gates between LCB (Local Clock Buffer) and Latch to prevent electrical rule violation.
Wiring, gate placement & timing constraints need to be developed. Minimum custom design: Only Array Column Potential Benefits:
– Around 20% Physical Design Resources Reduction. – Significant Shorter Design Cycle– Apply learning to other array design for more savings.– Potential area saving in Synthesis flow.
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 22
Proposed Array Design in Synthesis Methodology
• LCB: Local Clock Buffer• Generate CLK for MS Latch
Future:Research and innovation in Synthesis Methodology
04/19/23Ph.D. Preliminary Exam 23
Approaches:
Pre-Placing Hard IP in LBS– Pseudo Algorithm
begin_place place <inst_name> xloc <> yloc <> <rot>
movetype=fixedend_place
Wire Trait Example in LBS– Pseudo Parms file
<Flow>: <wire_code> <time gain> <routing layers> synthesis_layer_traits : W20S10L15 3 3 M2 X3 fine_opt_layer_traits : W20S10L15 3 3 M2 X3
04/19/23Ph.D. Preliminary Exam 24
Approaches: Soft-Hierarchy in LBS
Algorithm:inst_name=rlctl prefix=l2rlctl xlow=< > ylow=< > width= height=
where<inst_name>: user specified name to recognize gatesprefix: is the name of logic gates used in VHDLxlow, ylow= left lower coordinatewidth, height: width and height of macro in micron
04/19/23Ph.D. Preliminary Exam 25
Approaches: Synthesis Parms in LBS
VT Upgrade*user_native_vt: 1
*user_alternate_vt: 2 3
Interior PIN*pds_assign_interior_pins: true
*pds_pin_spec: “<metal layer> <width> <height>“
*pds_horizontal_pin_spacing: “<metal layer> <Spacing>"
*pds_vertical_pin_spacing: “<metal layer> <Spacing>”
Rapids
04/19/23Ph.D. Preliminary Exam 26
Approaches: Congestion Analysis
• Routing resource allocation at top level• Negotiate routing resources with macro (IP)• Negotiate PIN placement with macro (IP)
04/19/23Ph.D. Preliminary Exam 27
Application of Sync-Async Latch:
Approaches: Synthesis Methodology at Sync-Async Interface
Logic Logic
data_in
data_out
Sync-AsyncLatch
LatchLatch
NCLK NASYNCNASYNC
04/19/23Ph.D. Preliminary Exam 28
Pseudo Algorithm to exclude Sync-Async Latch in slack borrowing:
Approaches: Synthesis Methodology at Sync-Async Interface
04/19/23Ph.D. Preliminary Exam 29
Preliminary Results: Placed and Timed Gates of L2
#of Transistor including cache: 44M# of Synthesizable Transistor: 19M
04/19/23Ph.D. Preliminary Exam 30
Preliminary Results: Slack Take Down of L2
Slack Data
0
5000
10000
15000
20000
25000
30000
35000
40000
>40ps 40-36ps 35-31ps 30-26ps 26-21ps 20-16ps 15-11ps 10-6ps 5-0ps
Negative Slacks
# o
f Fa
iled
Pa
th Base line
Soft Hierarchy
Vt Upgrade
Interior PIN
Rapids
eFinale
04/19/23Ph.D. Preliminary Exam 31
Preliminary Results: Clock Gating at Array interface
LCB
LCB
• Clock gating is not working• Red shape/line: Current Routing and Placement
• Violates timing at array cell, Electrical check• Blue shape/line: Desired Routing and Placement
L
L
LCB
04/19/23Ph.D. Preliminary Exam 32
• Clock gating is not working
Preliminary Results: Clock Gating at Array interface
04/19/23Ph.D. Preliminary Exam 33
Conclusion and Future Works
With robust tool sets, newly proposed synthesis methodology and design guideline, L2 cache unit design can take almost ~50% less resources to design even without dedicated unit timing and integration resources.
– Preliminary data is very promising. – Further Experiment with 10% less unit area once design is closed.
Timing at Sync-Async interface methodology in Synthesis flow is being developed with user controlled parms.
Clock-Gating work in progress with collaboration from the Tool development team of IBM.
– Save 20% of design effort at present application in RF design– Potential lead to more physical design effort savings in all type of array
design. i.e SRAM, CAM, DRAM
04/19/23Ph.D. Preliminary Exam 34
Acknowledgement
Advisor: Prof. Tom W. Chen
Committee Members: Prof. Yashwant Malaiya
Dr. Sudeep PasrichaDr. Ali Pezeshki
IBM:Joshua Friedrich
Dr. Vikas AgarwalChirag DesaiJohn Badar
04/19/23Ph.D. Preliminary Exam 35
04/19/23Ph.D. Preliminary Exam 36
Personal Background
Educational– BS in Electrical Engineering, BUET, Dhaka, Bangladesh– ME in Electrical Engineering, CUNY, New York
Professional– Product Development Engineer, Advanced Micro Devices (AMD), TX:1994 – 1997
Circuit Design, Critical timing path analysis, Layout for K5 development team– Hardware Development Engineer, Mentor Graphics Corporation, NJ: 1997 -1999
Test chip, Data Path Design, verilog model for ROM/RAM, – Member of Technical Staff, Hewlett Packard (HP), CO: 1999 – 2002
Circuit design for FPU, High Speed IO Driver, Place and route, Timing analysis– Senior Engineer, International Business Machines (IBM), TX: 2003 – Present
Fabric Unit interim/co-Circuit Lead P6 GX, TP, CLIB, PC Unit Circuit Lead: P6 DD1 L2, L3, NCU Circuit Lead: P6 DD2 L2, NCU Circuit Lead: P7 DD1, DD2 Nest Circuit Lead for P8, P9