Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
-
date post
19-Dec-2015 -
Category
Documents
-
view
249 -
download
4
Transcript of Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Kazi Spring 2008 CSCI 660 1
CSCI-660
Introduction to VLSI Design
Khurram Kazi
2Kazi Spring 2008 CSCI 660
Overview of Synthesis flow
3Kazi Spring 2008 CSCI 660
Fundamental Steps to a Good design
If you have a good start, the project will go smoothly
Partitioning the Design is a good start Partition by:
Functionality Don’t mix two different clock domains in a single block
Don’t make the blocks too large Optimize for Synthesis
4Kazi Spring 2008 CSCI 660
Block diagram of the Framer Receiver direction:Is it partitioned well? Does it follow previous suggestions of the previous slide?
Frame_detectFraming state machine
Bit counterByte counter
Serial to parallel converter
ser_in
reset_b
clk
Clock generation
Overhead bytes RAM controller
(Generates signals for RAM)
8 bits data
counters
Byte clock
Overhead bytes RAM
SPE Data out processor(transports data, generates
SPE_Valid .. Etc.)
D1_3_clk
D4_12_clk
D1_3 bytes reader from RAM
Parallel to serial data
D4_12 bytes reader from RAM
Parallel to serial data
D1_3_data
D1_3_clk
D4_12_clk
D4_12_data
D1_3_data_val
D4_12_data_val
SPE_data
SPE_val
D1_12 data
FRAMER.vhd
5Kazi Spring 2008 CSCI 660
Partitioning
Partition Design into smaller components:Partition can be done in HDLorDuring Synthesis
6Kazi Spring 2008 CSCI 660
Recommended rules for Synthesis
Share resources whenever possible When implementing combinatorial paths do not have
hierarchy Register all outputs Do not implement glue logic between block, partition
them well Separate designs on functional boundary Keep block sizes to a reasonable size Separate core logic, pads, clock and JTAG
7Kazi Spring 2008 CSCI 660
Resource Sharing
HDL Description if (select) then
sum <= A + B;
Else
sum <= C + D;
Mux
+
+
AB
CD
sum
select
+
muxAC
BD
sumselect
mux
One Possible Implementation
Another Implementation: shared resource Implementation -> Area-efficient
8Kazi Spring 2008 CSCI 660
Sharable HDL Operators
Following HDL (VHDL and Verilog) synthetic operators can result in shared implementation
* + ->= < <== /= ==
Within the same blocks, the operators can be shared (i.e. they are in the same process)
9Kazi Spring 2008 CSCI 660
DesignWare Implementation Selection•DesignWare implementation is dependent on Area and timing goals
•Smallest implementation is selected based on timing goals being met
+
Synthetic Modulesmallest
fastest Carry Look Ahead
Ripple Carry
10Kazi Spring 2008 CSCI 660
Sharing Common Sub-Expressions
•Design compiler tries to share common sub-expressions to reduce the number of resources necessary to implement the design -> area savings while timing goals are met
SUM1 <= A + B + C;
SUM2 <= A + B + D;
SUM3 <= A + B + E;
+ + +
+
SUM1 SUM2 SUM3
A B C D E
11Kazi Spring 2008 CSCI 660
Sharing Common Sub-Expression’s Limitations Sharable terms must be in the same order within the each
expression
sum1 <= A + B + C;
sum2 <= B + A + D; -> not sharable
sum3 <= A + B + E; -> sharable Sharable terms must occur in the same position (or use
parentheses to maintain ordering)
sum1 <= A + B + C;
sum2 <= D + A + B; -> not sharable
sum3 <= E + (A + B); -> sharable
12Kazi Spring 2008 CSCI 660
How to Infer Specific Implementation (Adder with Carry-In
•Following expression infers adder with carry-in
sum <= A + B + Cin;
where A and B are vectors, and Cin is a single bit
A B
Cin
sum
+
13Kazi Spring 2008 CSCI 660
Operator Reordering
•Design Compiler has the capability to produce the reordering the arithmetic operators to produce the fastest design
•For example
Z <= A + B + C + D; (Z is time constrained)
Initially the ordering is from left to right
A
B
C
DZ
+
+
+
14Kazi Spring 2008 CSCI 660
Reordering of the Operator for a Fast Design
•If the arrival time of all the signals, A, B, C and D is the same, the Design Compiler will reorder the operators using a balanced tree type architecture
A
B
Z
+
+
+C
D
15Kazi Spring 2008 CSCI 660
Reordering of the Operator for a Fast Design
•If the arrival time of the signal A is the latest, the Design Compiler will reorder the operators such that it accommodates the late arriving signal
C
B
D
AZ
+
+
+
16Kazi Spring 2008 CSCI 660
Avoid hierarchical combinatorial blocks
The path between reg1 and reg2 is divided between three different block
Due to hierarchical boundaries, optimization of the combinatorial logic cannot be achieved
Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinatorial optimization cannot be achieved between blocks (unless “grouping” is used).
Not recommended Design Practice
CombinatorialLogic1
CombinatorialLogic2
CombinatorialLogic3
Block A Block B Block C
reg1 reg2
17Kazi Spring 2008 CSCI 660
Recommend way to handle Combinatorial Paths
All the combinatorial circuitry is grouped in the same block that has its output connected the destination flip flop
It allows the optimal minimization of the combinatorial logic during synthesis
Allows simplified description of the timing interface
Recommended practice
CombinatorialLogic1 &
Logic2& Logic3
Block A Block C
reg1reg2
18Kazi Spring 2008 CSCI 660
Register all outputs
Simplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays)
Don’t really need to specify output requirements since paths starts at flip flop outputs.
Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output)
Register all outputs
Block X Block Y
reg1reg2
Block Y
reg3
19Kazi Spring 2008 CSCI 660
NO GLUE LOGIC between blocks
No Glue Logic between Blocks, nomatter what the temptation
Block X
reg1
Block Y
reg3
Top
Due to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!!
At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block.
20Kazi Spring 2008 CSCI 660
Separate design with different goals
reg1
Slow Logic
Top
Timecritical path
reg3
reg1 may be driven by time critical function, hence will have different optimization constraints
reg3 may be driven by slow logic, hence no need to constrain it for speed
21Kazi Spring 2008 CSCI 660
Optimization based on design requirements
reg1
Slow Logic
Top
Timecritical path
reg3
Area optimized block
Speed optimized block Use different entities to
partition design blocks Allows different
constraints during synthesis to optimize for area or speed or both.
22Kazi Spring 2008 CSCI 660
Separate FSM with random logic
Separation of the FSM and the random logic allows you to use FSM optimized synthesis
reg1
RandomLogic
Top
FSM
reg3
Standard optimizationtechniques used
Use FSM optimization tool
23Kazi Spring 2008 CSCI 660
Maintain a reasonable block size
Partition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent)
Larger the blocks, longer the run time -> quick iterations cannot be done.
24Kazi Spring 2008 CSCI 660
Partitioning of Full ASIC
Top-level block includes I/O pads and the Mid block instantiation
Mid includes Clock generator, JTAG, CORE logic
CORE LOGIC includes all the functionality and internal scan circuitry
Clockgenerator(PLL etc)
JTAG
CORELogic
Mid
Top
I/O Pads
25Kazi Spring 2008 CSCI 660
Synthesis Constraints
Specifying an Area goal Area constraints are vendor/library dependent
(e.g. 2 input-nand gate, square mils, grid etc) Design compiler has the Max Area constraint
as one of the constraint attributes.
26Kazi Spring 2008 CSCI 660
Timing constraints for synchronous designs
Define timing paths within the design, i.e. paths leading into the design, internal paths and design leading out of the design Define the clock Define the I/O timing relative to the clock
reg2
Block to be synthesized
reg3A EDCB
clk
27Kazi Spring 2008 CSCI 660
Define a clock for synthesis
Clock source Period Duty cycle Defining the clock constraints the internal timing
paths
reg2
Block to be synthesized
reg3DCB
clk
Duty cycle
Clock period
QD QD
1 Clock cycle
28Kazi Spring 2008 CSCI 660
Timing goals for synchronous design
Define timing constraints for all paths within a design Define the clocks Define the I/O timing relative to the clock
reg2
Block to be synthesized
reg3DCB QD QD
Constrained by clk
Paths B and D still unconstraint
A E
clk
29Kazi Spring 2008 CSCI 660
Constraining input path
Input delay is specified relative to the clock External logic uses some time within the clock period and i.e. TclkToQ(clock to Q delay) + Tw (net delay) ->{At input to B} Example command for this in synopsys design compiler:
dc_shell> set_input_delay –clock clk 5 (where 5 represents the input delay)
reg2
Block to be synthesized
B QDA
clk
Q W
TclkToQ Tw
30Kazi Spring 2008 CSCI 660
Constraining output path
Output delay is specified relative to the clock How much of the clock period does the external logic
(shown by cloud b) use up? Tb + Tsetup; The amount to be specified as the output delay
reg2
Block to be synthesized
b QDA
clk
Q
TclkToQ
Tsetup
Tb
External logic
31Kazi Spring 2008 CSCI 660
Timing paths
32Kazi Spring 2008 CSCI 660
Combinatorial logic may have multiple paths
•Static Timing Analysis uses the longest path to calculate a maximum delay or the shortest path to calculate a minimum delay.
33Kazi Spring 2008 CSCI 660
Schematic converted into a timing graph
Each arrow represents a net or a cell delay (timing arc)
34Kazi Spring 2008 CSCI 660
Calculating a path’s delay1.0
0.50.34
0.25
0.12
Path delay = 1.0 + 0.5 + 0.34 + 0.25 + 0.12 = 2.21
0.0
0.75
0.450.56
0.2
0.1
Path delay = 0.75 + 0.45 + 0.56 +0.1 + 0.2 +0.1 = 2.16 0.1
35Kazi Spring 2008 CSCI 660
Summarizing: High level synthesis is constraint driven
Resource sharing, sharing common sub-expressions and implementation selection are all dependent on design constraints and coding style
Design Compiler based on timing constraints decides what to share, how to implement and what ordering should be done.
If no constraints are given, area based optimization is performed (maybe a good start to get an idea of the synthesized circuit)
It is imperative that realistic constraints should be set prior to compilation
High Level synthesis takes place only when optimizing an HDL description