Verilog FAQ TIDBITS

Verilog FAQ

What is VCD and is there any free tool to view it ?

VCD - Value Change Dump format - is an ASCII file that contains the "Changes in Values of Signals". This is a STANDARD format and is compatible between different waveform viewers etc. Also most of the simulators can write out VCD files - both VHDL & Verilog, though in Verilog you could do it more easily (than in VHDL - where you have to go through your simulator's C-API) with the system tasks like $dumpvars.

VCD (Value Change Data) Verilog simulator dumps the simulation information for waveform viewing in VCD Format (Value Change Data).

Different types of Verilog Simulators

There are mainly two types of simulators available.

Event Driven Cycle Based

Event-based Simulator:

This Digital Logic Simulation method sacrifices performance for rich functionality: every active signal is calculated for every device it propagates through during a clock cycle. Full Event-based simulators support 4-28 states; simulation of Behavioral HDL, RTL HDL, gate, and transistor representations; full timing calculations for all devices; and the full HDL standard. Event-based simulators are like a Swiss Army knife with many different features but none are particularly fast.

Event based simulators are further categorized in 2 types.

Compiled-Code Simulators:

This technique takes the input definition (HDL) of the design and spends time compiling it into a new data structure in order to enable much faster calculations during run-time. You sacrifice compile time to be able to run large numbers of tests faster. it is used in some high end, Event-based simulators.

e.g. Synopsys Inc.'s VCS Simulator converts verilog files into C code which then be compiled and run, just like any other executable file. It is 10 to 50 times faster than any other interpretive simulator. see http://www.synopsys.com/products/simulation/vcs_ds.html

http://www.synopsys.com/index.html/products/simulation/vcs_ds.html

Cadence's Native Compiled Verilog generates direct machine language instructions from verilog files. see http://www.cadence.com/datasheets/affirma_nc_verilog_sim.html

Interpreted Code Simulators:

This method of simulation allows for rapid change of the source HDL of the design and restart of the simulation since there is little or no compilation involved after every design change. This is good for interaction but leads to poor run times of large tests compared to Compiled Code Techniques.

e.g. Cadence Design Systems Verilog - XL.

see http://www.cadence.com/technology/pcb/products/prev_ds/verilog-xl-family.html

Cycle Based Simulator:

This is a Digital Logic Simulation method that eliminates unnecessary calculations to achieve huge performance gains in verifying Boolean logic:

1.) Results are only examined at the end of every clock cycle; and

2.) The digital logic is the only part of the design simulated (no timing calculations). By limiting the calculations, Cycle based Simulators can provide huge increases in performance over conventional Event-based simulators.

Cycle based simulators are more like a high speed electric carving knife in comparison because they focus on a subset of the biggest problem: logic verification.

Cycle based simulators are almost invariably used along with Static Timing verifier to compensate for the lost timing information coverage.

In following table differences between Event based and Cycle based simulation are summarized.

Event based Simulation Cycle Based SimulationEvaluates inputs looking for state change

Evaluate entire design every clock cycle

Schedule events in time No event scheduling Calculate time delay No delay calculations or timing checks Store state values and time No such storage. Very fast, very efficient

http://www.cadence.com/datasheets/affirma_nc_verilog_sim.html



information memory usage. Identify timing violations Does not identify timing violations

Where two simulations are appropriate

Comparison between Event Based and Cycle based Simulation

What is the difference between cycle and event based Verilog simulators ? Cycle based Simulator :Cycle simulation is a technique (i.e. an algorithm) for

digital circuit simulation. It does not simulate detailed circuit timing, but instead computes the steady state response of a circuit at each clock cycle. The user cannot see the glitch behavior of signals between clock cycles. Instead the user observes circuit signals once per clock cycle. Cycle based simulators work only with synchronous designs.

Event based Simulator: Simulation based on events in logic means that whenever there is change in a input event, the output is evaluated. This makes the simulation very slow compared to Cycle based simulators. Verilog-XL is an event based simulator.

Consider the circuit below: if a cycle based simulator runs a simulation on the circuit below, then it will evaluate B, C, D and E only at each cycle. In the case of an event based simulator, B, C, D and E are evaluated not only at clock cycle, but also when any of the events at the input of gates and flip-flops occurs.

What is the difference between compiled and interpreted Verilog simulator ? Compiled Simulator : This kind of simulator converts the whole Verilog code

into machine dependent code and then runs the simulation. Example : VCS generates the binary file, which can be run from the command prompt. Compiled simulators are very fast.

Interpreted Simulator : This kind of simulator executes line by line, thus is very slow compared to a compiled simulator. Verilog-XL is one such simulator.

Finite State Machines in Verilog

State machine design is becoming more complex due to increasing time constraints and verification issues. Following papers provide good insight into design and optimization.

1] State Machine Design Techniques for Verilog and VHDL : by Steve Golson, Trilobyte Systems PDF version of article Text Version : http://www.synopsys.com/news/pubs/JHLD/JHLD-099401

TIDBITS

Wire And Reg

Well I had this doubt when I was learning Verilog: What is the difference between reg and wire? Well I won't tell stories to explain this, rather I will give you some examples to show the difference.

From the college days we know that wire is something which connects two points, and thus does not have any driving strength. In the figure below, in_wire is a wire which connects the AND gate input to the driving source, clk_wire connects the clock to the flip-flop input, d_wire connects the AND gate output to the flip-flop D input.

There is something else about wire which sometimes confuses. wire data types can be used for connecting the output port to the actual driver. Below is the code which when synthesized gives a AND gate as output, as we know a AND gate can drive a load.

1 module wire_example( a, b, y); 2 input a, b; 3 output y; 4 5 wire a, b, y; 6 7 assign y = a & b; 8 9 endmodule

SYNTHESIS OUTPUT

http://www.synopsys.com/index.html/news/pubs/JHLD/JHLD-099401

http://bawankule.com/verilogfaq/files/jhld099401.pdf

What this implies is that wire is used for designing combinational logic, as we all know that this kind of logic can not store a value. As you can see from the example above, a wire can be assigned a value by an assign statement. Default data type is wire: this means that if you declare a variable without specifying reg or wire, it will be a 1-bit wide wire.

Now, coming to reg data type, reg can store value and drive strength. Something that we need to know about reg is that it can be used for modeling both combinational and sequential logic. Reg data type can be driven from initial and always block.

Reg data type as Combinational element

1 module reg_combo_example( a, b, y); 2 input a, b; 3 output y; 4 5 reg y; 6 wire a, b; 7 8 always @ ( a or b) 9 begin 10 y = a & b; 11 end 12 13 endmodule

SYNTHESIS OUTPUT

This gives the same output as that of the assign statement, with the only difference that y is declared as reg. There are distinct advantages to have reg modeled as combinational element; reg type is useful when a "case" statement is required (refer to the Verilog section for more on this).

To model a sequential element using reg, we need to have edge sensitive variables in the sensitivity list of the always block.

Reg data type as Sequential element

1 module reg_seq_example( clk, reset, d, q); 2 input clk, reset, d;

3 output q; 4 5 reg q; 6 wire clk, reset, d; 7 8 always @ (posedge clk or posedge reset) 9 if (reset) begin 10 q <= 1'b0; 11 end else begin 12 q <= d; 13 end 14 15 endmodule

SYNTHESIS OUTPUT

There is a difference in the way we assign to reg when modeling combinational logic: in this logic we use blocking assignments while modeling sequential logic we use nonblocking ones.

Blocking and Nonblocking Statements

Blocking Statements: A blocking statement must be executed before the execution of the statements that follow it in a sequential block. In the example below the first time statement to get executed is a = b followed by

Nonblocking Statements: Nonblocking statements allow you to schedule assignments without blocking the procedural flow. You can use the nonblocking procedural statement whenever you want to make several register assignments within the same time step without regard to order or dependence upon each other. It means that nonblocking statements resemble actual hardware more than blocking assignments.

1 module block_nonblock(); 2 reg a, b, c, d , e, f ;

3 4 // Blocking assignments 5 initial begin 6 a = #10 1'b1;// The simulator assigns 1 to a at time 10 7 b = #20 1'b0;// The simulator assigns 0 to b at time 30 8 c = #40 1'b1;// The simulator assigns 1 to c at time 70 9 end 10 11 // Nonblocking assignments 12 initial begin 13 d <= #10 1'b1;// The simulator assigns 1 to d at time 10 14 e <= #20 1'b0;// The simulator assigns 0 to e at time 20 15 f <= #40 1'b1;// The simulator assigns 1 to f at time 40 16 end 17 18 endmodule

Example - Blocking

1 module blocking (clk,a,c); 2 input clk; 3 input a; 4 output c; 5 6 wire clk; 7 wire a; 8 reg c; 9 reg b; 10 11 always @ (posedge clk ) 12 begin 13 b = a; 14 c = b; 15 end 16 17 endmodule

Synthesis Output

Example - Nonblocking

1 module nonblocking (clk,a,c); 2 input clk; 3 input a; 4 output c; 5 6 wire clk;

7 wire a; 8 reg c; 9 reg b; 10 11 always @ (posedge clk ) 12 begin 13 b <= a; 14 c <= b; 15 end 16 17 endmodule

Synthesis Output

Introduction

Basically a FSM consists of combinational, sequential and output logic. Combinational logic is used to decide the next state of the FSM, sequential logic is used to store the current state of the FSM. The output logic is a mixture of both combo and seq logic as shown in the figure below.

Types of State Machines

There are many ways to code these state machines, but before we get into the coding styles, let's first understand the basics a bit. There are two types of state machines:

Mealy State Machine : Its output depends on current state and current inputs. In the above picture, the blue dotted line makes the circuit a mealy state machine.

Moore State Machine : Its output depends on current state only. In the above picture, when blue dotted line is removed the circuit becomes a Moore state machine.

Depending on the need, we can choose the type of state machine. In general, or you can say most of the time, we end up using Mealy FSM.

Encoding Style

Since we need to represent the state machine in a digital circuit, we need to represent each state in one of the following ways:

Binary encoding : each state is represented in binary code (i.e. 000, 001, 010....)

Gray encoding : each state is represented in gray code (i.e. 000, 001, 011,...) One Hot : only one bit is high and the rest are low (i.e. 0001, 0010, 0100, 1000) One Cold : only one bit is low, the rest are high (i.e. 1110,1101,1011,0111)

Most of the time we use one hot encoding.

Example

To help you follow the tutorial, I have taken a simple arbiter as the example; this has got two request inputs and two grant outputs, as shown in the signal diagram below.

When req_0 is asserted, gnt_0 is asserted When req_1 is asserted, gnt_1 is asserted When both req_0 and req_1 are asserted then gnt_0 is asserted i.e. higher priority

is given to req_0 over req_1.

We can symbolically translate into a FSM diagram as shown in figure below, here FSM has got following states.

IDLE : In this state FSM waits for the assertion of req_0 or req_1 and drives both gnt_0 and gnt_1 to inactive state (low). This is the default state of the FSM, it is entered after the reset and also during fault recovery condition.

GNT0 : FSM enters this state when req_0 is asserted, and remains here as long as req_0 is asserted. When req_0 is de-asserted, FSM returns to the IDLE state.

GNT1 : FSM enters this state when req_1 is asserted, and remains there as long as req_1 is asserted. When req_1 is de-asserted, FSM returns to the IDLE state.

Coding Methods

Now that we have described our state machine clearly, let's look at various methods of coding a FSM.

We use one-hot encoding, and all the FSMs will have the following code in common, so it will not be repeated again and again.

Using A Function For Combo Logic

1 //----------------------------------------------------- 2 // This is FSM demo program using function 3 // Design Name : fsm_using_function 4 // File Name : fsm_using_function.v 5 //----------------------------------------------------- 6 module fsm_using_function ( 7 clock , // clock 8 reset , // Active high, syn reset 9 req_0 , // Request 0 10 req_1 , // Request 1 11 gnt_0 , // Grant 0 12 gnt_1 13 ); 14 //-------------Input Ports----------------------------- 15 input clock,reset,req_0,req_1; 16 //-------------Output Ports---------------------------- 17 output gnt_0,gnt_1; 18 //-------------Input ports Data Type------------------- 19 wire clock,reset,req_0,req_1; 20 //-------------Output Ports Data Type------------------ 21 reg gnt_0,gnt_1; 22 //-------------Internal Constants-------------------------- 23 parameter SIZE = 3 ; 24 parameter IDLE = 3'b001,GNT0 = 3'b010,GNT1 = 3'b100 ; 25 //-------------Internal Variables--------------------------- 26 reg [SIZE-1:0] state ;// Seq part of the FSM

27 wire [SIZE-1:0] next_state ;// combo part of FSM 28 //----------Code startes Here------------------------ 29 assign next_state = fsm_function(state, req_0, req_1); 30 //----------Function for Combo Logic----------------- 31 function [SIZE-1:0] fsm_function; 32 input [SIZE-1:0] state ; 33 input req_0 ; 34 input req_1 ; 35 case(state) 36 IDLE : if (req_0 == 1'b1) begin 37 fsm_function = GNT0; 38 end else if (req_1 == 1'b1) begin 39 fsm_function= GNT1; 40 end else begin 41 fsm_function = IDLE; 42 end 43 GNT0 : if (req_0 == 1'b1) begin 44 fsm_function = GNT0; 45 end else begin 46 fsm_function = IDLE; 47 end 48 GNT1 : if (req_1 == 1'b1) begin 49 fsm_function = GNT1; 50 end else begin 51 fsm_function = IDLE; 52 end 53 default : fsm_function = IDLE; 54 endcase 55 endfunction 56 //----------Seq Logic----------------------------- 57 always @ (posedge clock) 58 begin : FSM_SEQ 59 if (reset == 1'b1) begin 60 state <= #1 IDLE; 61 end else begin 62 state <= #1 next_state; 63 end 64 end 65 //----------Output Logic----------------------------- 66 always @ (posedge clock) 67 begin : OUTPUT_LOGIC 68 if (reset == 1'b1) begin 69 gnt_0 <= #1 1'b0; 70 gnt_1 <= #1 1'b0; 71 end 72 else begin 73 case(state) 74 IDLE : begin 75 gnt_0 <= #1 1'b0; 76 gnt_1 <= #1 1'b0; 77 end 78 GNT0 : begin 79 gnt_0 <= #1 1'b1; 80 gnt_1 <= #1 1'b0; 81 end 82 GNT1 : begin 83 gnt_0 <= #1 1'b0;

84 gnt_1 <= #1 1'b1; 85 end 86 default : begin 87 gnt_0 <= #1 1'b0; 88 gnt_1 <= #1 1'b0; 89 end 90 endcase 91 end 92 end // End Of Block OUTPUT_LOGIC 93 94 endmodule // End of Module arbiter

Using Two Always Blocks

1 //----------------------------------------------------- 2 // This is FSM demo program using always block 3 // Design Name : fsm_using_always 4 // File Name : fsm_using_always.v 5 //----------------------------------------------------- 6 module fsm_using_always ( 7 clock , // clock 8 reset , // Active high, syn reset 9 req_0 , // Request 0 10 req_1 , // Request 1 11 gnt_0 , // Grant 0 12 gnt_1 13 ); 14 //-------------Input Ports----------------------------- 15 input clock,reset,req_0,req_1; 16 //-------------Output Ports---------------------------- 17 output gnt_0,gnt_1; 18 //-------------Input ports Data Type------------------- 19 wire clock,reset,req_0,req_1; 20 //-------------Output Ports Data Type------------------ 21 reg gnt_0,gnt_1; 22 //-------------Internal Constants-------------------------- 23 parameter SIZE = 3 ; 24 parameter IDLE = 3'b001,GNT0 = 3'b010,GNT1 = 3'b100 ; 25 //-------------Internal Variables--------------------------- 26 reg [SIZE-1:0] state ;// Seq part of the FSM 27 reg [SIZE-1:0] next_state ;// combo part of FSM 28 //----------Code startes Here------------------------ 29 always @ (state or req_0 or req_1) 30 begin : FSM_COMBO 31 next_state = 3'b000; 32 case(state) 33 IDLE : if (req_0 == 1'b1) begin 34 next_state = GNT0; 35 end else if (req_1 == 1'b1) begin 36 next_state= GNT1; 37 end else begin 38 next_state = IDLE; 39 end 40 GNT0 : if (req_0 == 1'b1) begin 41 next_state = GNT0; 42 end else begin

43 next_state = IDLE; 44 end 45 GNT1 : if (req_1 == 1'b1) begin 46 next_state = GNT1; 47 end else begin 48 next_state = IDLE; 49 end 50 default : next_state = IDLE; 51 endcase 52 end 53 //----------Seq Logic----------------------------- 54 always @ (posedge clock) 55 begin : FSM_SEQ 56 if (reset == 1'b1) begin 57 state <= #1 IDLE; 58 end else begin 59 state <= #1 next_state; 60 end 61 end 62 //----------Output Logic----------------------------- 63 always @ (posedge clock) 64 begin : OUTPUT_LOGIC 65 if (reset == 1'b1) begin 66 gnt_0 <= #1 1'b0; 67 gnt_1 <= #1 1'b0; 68 end 69 else begin 70 case(state) 71 IDLE : begin 72 gnt_0 <= #1 1'b0; 73 gnt_1 <= #1 1'b0; 74 end 75 GNT0 : begin 76 gnt_0 <= #1 1'b1; 77 gnt_1 <= #1 1'b0; 78 end 79 GNT1 : begin 80 gnt_0 <= #1 1'b0; 81 gnt_1 <= #1 1'b1; 82 end 83 default : begin 84 gnt_0 <= #1 1'b0; 85 gnt_1 <= #1 1'b0; 86 end 87 endcase 88 end 89 end // End Of Block OUTPUT_LOGIC 90 91 endmodule // End of Module arbiter

Using Single Always For Sequential, Combo And Output Logic

1 //==================================================== 2 // This is FSM demo program using single always 3 // for both seq and combo logic 4 // Design Name : fsm_using_single_always

5 // File Name : fsm_using_single_always.v 6 //===================================================== 7 module fsm_using_single_always ( 8 clock , // clock 9 reset , // Active high, syn reset 10 req_0 , // Request 0 11 req_1 , // Request 1 12 gnt_0 , // Grant 0 13 gnt_1 14 ); 15 //=============Input Ports============================= 16 input clock,reset,req_0,req_1; 17 //=============Output Ports=========================== 18 output gnt_0,gnt_1; 19 //=============Input ports Data Type=================== 20 wire clock,reset,req_0,req_1; 21 //=============Output Ports Data Type================== 22 reg gnt_0,gnt_1; 23 //=============Internal Constants====================== 24 parameter SIZE = 3 ; 25 parameter IDLE = 3'b001,GNT0 = 3'b010,GNT1 = 3'b100 ; 26 //=============Internal Variables====================== 27 reg [SIZE=1:0] state ;// Seq part of the FSM 28 reg [SIZE=1:0] next_state ;// combo part of FSM 29 //==========Code startes Here========================== 30 always @ (posedge clock) 31 begin : FSM 32 if (reset == 1'b1) begin 33 state <= #1 IDLE; 34 gnt_0 <= 0; 35 gnt_1 <= 0; 36 end else 37 case(state) 38 IDLE : if (req_0 == 1'b1) begin 39 state <= #1 GNT0; 40 gnt_0 <= 1; 41 end else if (req_1 == 1'b1) begin 42 gnt_1 <= 1; 43 state <= #1 GNT1; 44 end else begin 45 state <= #1 IDLE; 46 end 47 GNT0 : if (req_0 == 1'b1) begin 48 state <= #1 GNT0; 49 end else begin 50 gnt_0 <= 0; 51 state <= #1 IDLE; 52 end 53 GNT1 : if (req_1 == 1'b1) begin 54 state <= #1 GNT1; 55 end else begin 56 gnt_1 <= 0; 57 state <= #1 IDLE; 58 end 59 default : state <= #1 IDLE; 60 endcase 61 end

62 63 endmodule // End of Module arbiter

What is metastability?

Whenever there are setup and hold time violations in any flip-flop, it enters a state where its output is unpredictable: this state is known as metastable state (quasi stable state); at the end of metastable state, the flip-flop settles down to either '1' or '0'. This whole process is known as metastability. In the figure below Tsu is the setup time and Th is the hold time. Whenever the input signal D does not meet the Tsu and Th of the given D flip-flop, metastability occurs.

When a flip-flop is in metastable state, its output oscillate between '0' and '1' as shown in the figure below (here the flip-flop output settles down to '0') . How long it takes to settle down, depends on the technology of the flip-flop.

If we look deep inside of the flip-flop we see that the quasi-stable state is reached when the flip-flop setup and hold times are violated. Assuming the use of a positive edge triggered "D" type flip-flop, when the rising edge of the flip-flop clock occurs at a point in time when the D input to the flip-flop is causing its master latch to transition, the flip-flop is highly likely to end up in a quasi-stable state. This rising clock causes the master latch to try to capture its current value while the slave latch is opened allowing the Q output to follow the "latched" value of the master. The most perfectly "caught" quasi-stable state (on the very top of the hill) results in the longest time required for the flip-flop to resolve itself to one of the stable states.

How long does it stay in this state?

The relative stability of states shown in the figure above shows that the logic 0 and logic 1 states (being at the base of the hill) are much more stable than the somewhat stable state at the top of the hill. In theory, a flip-flop in this quasi-stable hilltop state could remain there indefinitely but in reality it won't. Just as the slightest air current would eventually cause a ball on the illustrated hill to roll down one side or the other, thermal and induced noise will jostle the state of the flip-flop causing it to move from the quasi-stable state into either the logic 0 or logic 1 state.

What are the cases in which metastability occurs?

As we have seen that whenever setup and hold violation time occurs, metastability occurs, so we have to see when signals violate this timing requirement:

When the input signal is an asynchronous signal. When the clock skew/slew is too much (rise and fall time are more than the

tolerable values). When interfacing two domains operating at two different frequencies or at the

same frequency but with different phase. When the combinational delay is such that flip-flop data input changes in the

critical window (setup+hold window)

What is MTBF?

MTBF is Mean time between failure, what does that mean? Well MTBF gives us information on how often a particular element will fail or in other words, it gives the average time interval between two successive failures. The figure below shows a typical MTBF of a flip-flop and also it gives the MTBF equation. I am not looking here to derive MTBF equation :-)

So how do I avoid metastability?

In reality, one cannot avoid metastability and increased clock-to-Q delays in synchronizing asynchronous inputs, without the use of tricky self-timed circuits. So a more appropriate question might be "How do I tolerate metastability?"

In the simplest case, designers can tolerate metastability by making sure the clock period is long enough to allow for the resolution of quasi-stable states and for the delay of whatever logic may be in the path to the next flip-flop. This approach, while simple, is rarely practical given the performance requirements of most modern designs.

The most common way to tolerate metastability is to add one or more successive synchronizing flip-flops to the synchronizer. This approach allows for an entire clock period (except for the setup time of the second flip-flop) for metastable events in the first synchronizing flip-flop to resolve themselves. This does, however, increase the latency in the synchronous logic's observation of input changes.

Neither of these approaches can guarantee that metastability cannot pass through the synchronizer; they simply reduce the probability to practical levels.

In quantitative terms, if the Mean Time Between Failure (MTBF) of a particular flip-flop in the context of a given clock rate and input transition rate is 33.33 seconds then the

MTBF of two such flip-flops used to synchronize the input would be (33.33* 33.33) = 18.514 Minutes. Well I have taken the worst flip-flop ever designed in history of man kind :-). The figure below shows how to connect two flip-flops in series to achieve this and also the resultant MTBF.

Normally,

We can use a metastable hardened flip-flop Cascade two or three D-Flip-Flops (two or three stages synchronizer).

All About Reset

Synchronous Reset : Reset is sampled with respect to clock

Asynchronous Reset : Reset is sampled with no respect to clock.

Synchronous Reset Asynchronous Reset

Synchronous reset requires more gates to implement (see the example below)

Asynchronous reset requires less gates to implement (see the example below)

Synchronous reset requires clock to be active always

Asynchronous reset does not require clock to be always active

Synchronous reset does not have metastability problems.

Asynchronous reset suffer from metastability problems.

Synchronous reset is slow. Asynchronous reset is fast. Code Example

Synchronous Reset

1 module syn_reset (clk,reset,a,c); 2 input clk; 3 input reset; 4 input a; 5 output c; 6 7 wire clk; 8 wire reset; 9 wire a; 10 reg c; 11 12 always @ (posedge clk ) 13 if ( reset == 1'b1) begin 14 c <= 0; 15 end else begin 16 c <= a; 17 end 18 19 endmodule

Asynchronous Reset

1 module asyn_reset(clk,reset,a,c); 2 input clk; 3 input reset; 4 input a; 5 output c; 6 7 wire clk; 8 wire reset; 9 wire a; 10 reg c; 11 12 always @ (posedge clk or posedge reset) 13 if ( reset == 1'b1) begin 14 c <= 0; 15 end else begin 16 c <= a; 17 end 18 endmodule

Synthesis Output

Synchronize the asynchronous external reset signal, use this synchronous reset as input to all the asynchronous flip-flops inside the design, as shown in the figure below. We do this as an asynchronous reset flip-flop takes less logic to implement, is faster, consumes less power.

Introduction

There are times when a designer needs to interface two systems working at two different clocks. This interfacing is difficult in the sense that design becomes asynchronous at the boundary of interface, which results in setup and hold violation, metastability and unreliable data transfers. So we need to go out for special design and interfacing techniques.

Any two systems are considered asynchronous to each other:

When they operate at two different frequency

When they operate at same frequency, but at two different clock phase angles

Here we have two systems, which are asynchronous in nature to each other. In such a case if we need to do data transfer, there are very few methods to achieve this:

Handshake Signaling method. Asynchronous FIFO.

Handshake Signaling

In this method the system (module) A sends data to system/module B based on the handshake signals ack and req signals. The protocol for this uses the same old method that is found with 8155 chip used with 8085.

Protocol

Transmitter asserts the req (request) signal, asking the receiver to accept the data on the data bus.

Receiver asserts the ack (acknowledge) signal, asserting that it has accepted the data.

This method is straightforward, but it too has got loop holes: when system B samples the systems A's req line and System A samples system B's ack line, they are done with respect to their internal clock, so there will be setup and hold time violation. To avoid this we go for double or triple stage synchronizers, which increase the MTBF and thus are immune to metastability to a good extent. The figure below shows how this is done with respect to the above example.

If we do the double or triple stage synchronizing, then the transfer rate comes down, due to the fact that a lot of clock cycles are wasted just handshaking.

Sometimes it is good to synchronize the data also to be double sure, but normally we don't do this, as it takes a lot of logic and what we gain is very small. The figure below shows one such case (there is no difference between circuits shown for req and data, they are one and same).

Asynchronous FIFO

An Asynchronous FIFO has got two interfaces, one for writing the data into the FIFO and the other for reading the data out. It has got two clocks, one for writing and the other for reading. System A writes the data in the FIFO and System B reads out the data from it. To facilitate error free operations, we have FIFO full and FIFO empty signals. These signals are generated with respect to the corresponding clock. FIFO full signal is used by system A (as when FIFO is full, we don't want system A to write data into FIFO, this data will be lost), so it will be driven by the write clock. Similarly, FIFO empty will be driven by the read clock. Here read clock means system B clock and write clock means system A clock.

Asynchronous FIFO is used at places when the performance is a matter, when one does not want to waste clock cycles in handshake signals, when there is a lot of system resources available.

How to design an Asynchronous FIFO is not in the scope of this document, but what I would like to point out is that one should be careful with the generation of FIFO full and FIFO empty signals, as it may, in certain cases, cause metastability.

Introduction

One of the most common questions in interviews is how to calculate the depth of a FIFO. Fifo is used as buffering element or queueing element in the system, which is by common sense is required only when you slow at reading than the write operation. So size of the FIFO basically implies the amount of data required to buffer, which depends upon data rate at which data is written and the data rate at which data is read. Statistically, Data rate varies in the system majorily depending upon the load in the system. So to obtain safer

FIFO size we need to consider the worst case scenario for the data transfer across the FIFO under consideration.

For worst case scenario, Difference between the data rate between write and read should be maximum. Hence, for write operation maximum data rate should be considered and for read operation minimum data rate should be considered.

So in the question itself, data rate of read operation is specified by the number of idle cycles and for write operation, maximum data rate should be considered with no idle cycle.

So for write operation, we need to know Data rate = Number of data * rate of clock. Writing side is the source and reading side becomes sink, data rate of reading side depends upon the writing side data rate and its own reading rate which is Frd/Idle_cycle_rd.

In order to know the data rate of write operation, we need to know Number of data in a Burst which we have assumed to be B.

So following up with the equation as explained below: Fifo size = Size to be buffered = B - B * Frd / (Fwr* Idle_cycle _rd ).

Here we have not considered the sychnronizing latency if Write and Read clocks are Asynchronous. Greater the Synchronizing latency, higher the FIFO size requirement to buffer more additional data written.

Example : FIFO Depth Calculation

Assume that we have to design a FIFO with following requirements and We want to calculate minumum FIFO depth,

A synchronized fifo Writing clock 30MHz - F1 Reading clock 40MHz - F2 Writing Burst Size - B Case 1 : There is 1 idle clock cycle for reading side - I Case 2 : There is 10 idle clock cycle for reading side - I

FIFO depth calculation = B - B *F2/(F1*I)

If if we have alternate read cycles i.e between two read cycle there is IDLE cycle.

FIFO depth calculation = B - B * F2/(F1*2)

In our present problem FIFO depth = B - B *40/(30*2)

= B(1-2/3)

= B/3

That means if our Burst amount of data is 10 , FIFO

DEPTH = 10/3 = 3.333 = 4 (approximatly)

If B = 20 FIFO depth = 20/3 = 6.6 = 7

or 8 (clocks are asynchronous)

If B = 30 FIFO depth = 30/3 = 10

10+1 = 11 (clocks are asynchronous)

If 10 IDLE cycles betweeen two read cycles .

FIFO DEPTH = B - B *F2/(F1*10) .

= B(1-4/30)

= B * 26 /30

Verification Flow With Specman

Verification flow with specman is the same as with any other HVL. The figure below shows the verification flow with specman.

Verification flow starts with understanding the specification of the chip/block under verification. Once the specification is understood, a test cases document is prepared, which documents all possible test cases. Once this document is done to a level where 70-80 percent functionality is covered, a testbench architecture document is prepared. In the past, this document was prepared first and the test cases one was prepared next. There is a drawback with this style: if test cases document shows a particular functionality to be verified and if testbench does not support it, as the architecture document was prepared before the test cases one. If we have a test cases document to refer to, then writing an architecture document becomes much easier, as we know for sure what is expected from the testbench.

Note: This section was written in a hurry, so it is very far from what I really want it to be!!!

Test Cases

Identify the test cases from the design specification: a simple task for simple cases. Normally requirement in test cases becomes a test case. Anything that specification mentions with "Can do", "will have" becomes a test case. Corner test cases normally take lot of thinking to be identified.

Testbench Architecture

Typical testbench architecture looks as shown below. The main blocks in a testbench are base object, transaction generator, driver, monitor, checker/scoreboard.

The block in red is the DUT, and boxes in orange are the testbench components. Coverage is a separate block which gets events from the input and output monitors. It is the same as the scoreboard, but does something more.

Base Object

Base object is the data structure that will be used across the testbench. Let's assume you are verifying a memory, then the base object would contain:

1 <' 2 struct mem_object { 3 addr : uint (bits:8); 4 data : uint (bits:8); 5 rd_wt : uint [0..100]; 6 wr_wt : uint [0..100]; 7 rd_wr : bool; 8 keep soft rd_wt == 50; 9 keep soft wr_wt == 50; 10 11 keep gen (wr_wt) before (rd_wr); 12 keep gen (rd_wt) before (rd_wr); 13 // Default operation is Write 14 keep soft rd_wr == FALSE; 15 16 keep soft rd_wr == select { 17 rd_wt : TRUE; 18 wr_wt : FALSE; 19 }; 20 }; 21 '>

Here base_object is the name of the base object, in the same way as we have a module name for each module in Verilog or an entity name in VHDL. Address, data, read, write are various field of the base_object. Normally we have some default constraints and some methods (functions) which could manipulate the objects in the base object.

Transaction Generator

Transaction generator generates the transactions based on the test constraints. Normally the transaction generator applies test case constraints on the base object and generate a base object based on constraints. Once generated, the transaction generator passes it to the driver.

A typical transaction generator would be like this:

1 <' 2 struct mem_txgen { 3 ! mem_base : mem_object; 4 //driver : mem_driver; 5 ! num_cmds : uint; 6 // This method generates the commands and 7 // calls the driver 8 genrate_cmds()@sys.any is { 9 for {var i:uint = 0 ; i < num_cmds; i+=1} do { 10 // Generate a write access 11 gen mem_base keeping { 12 it.addr == 0x10; 13 it.data == 0x22; 14 it.rd_wr == FALSE; 15 }; 16 // call the driver 17 //driver.drive_object(mem_base); 18 }; 19 }; 20 }; 21 '>

Driver

Driver drives the base object generated by the transaction generator to the DUT. To do this, it implements the DUT input protocol. Something like this:

1 <' 2 unit mem_driver { 3 event clk is rise('top.mem_clk') @sim; 4 // This method drives the DUT 5 drive_mem(mem_base : mem_object)@clk is { 6 wait cycle; 7 //Driver ce,addr,rd_wr command 8 'top.mem_ce' = 1; 9 'top.mem_addr' = mem_base.addr; 10 'top.mem_rd_wr' = mem_base.rd_wr; 11 if (mem_base.rd_wr == FALSE) { 12 'top.mem_wr_data' = mem_base.data; 13 }; 14 // Deassert all the driven signals 15 wait cycle; 16 'top.mem_ce' = 0;

17 'top.mem_addr' = 0; 18 'top.mem_rd_wr' = 0; 19 'top.mem_wr_data' = 0; 20 }; 21 }; 22 '>

Input Monitor

Input monitor monitors the input signals to the DUT. Example: in an ethernet switch, each ingoing packet is picked by the input monitor and passed to the checker.

Output Monitor

Output monitor monitors the output signals from DUT. Example: in an ethernet switch, each outgoing packet from the switch is picked by the output monitor and passed to the checker.

Checker/Scoreboard

Checker or Scoreboard basically checks if the output coming out of the DUT is correct or wrong. Basically scoreboards in e language are implemented using keyed lists.

TestBench Coding

Testbench coding starts after the testbench architecture document is complete, typically we start with:

base object transaction generator driver input monitor output monitor scoreboard

If the project is big, all the tasks can start at the same time, as many engineers will be working on them.

Test Case Execution

In this phase, test execution teams execute the test cases based on a priority. Typically once the focused test cases pass and some level of random test cases pass, we move to regression. In regression all the test cases are run with different seeds every time there is change in RTL.

Post Processing

In post processing, code and functional coverage is checked to see if all the possible DUT functionality is covered.

Code Coverage

Code coverage shows which part of the RTL is tested, thus is used as a measurement to show how well the DUT is verified. Also code coverage shows how good the functional coverage matrix is.

There are many types of code coverage as listed below:

Line Coverage Branch Coverage Expression Coverage Toggle Coverage FSM Coverage

Line Coverage

Line coverage or block coverage or segment coverage shows how many times each line is executed.

Branch Coverage

Branch coverage shows if all the possible branches of if..else or case statements are reached or not.

Expression Coverage

The golden of all coverage types. Expression coverage shows if all possible legal boolean values of an expression are reached or not. Generally expression coverage of 95% and above for large design is considered good.

Toggle Coverage

Toggle coverage shows which bits in the RTL have toggled. Toggle coverage is used for power analysis mainly.

FSM Coverage

The FSM coverage shows if all states are reached, if all possible state transitions have happened.

Verilog FAQ TIDBITS

Documents

Transcript of Verilog FAQ TIDBITS