CDC Tutorial Slides

download CDC Tutorial Slides

of 53



Transcript of CDC Tutorial Slides

Cover Title Arial Bold, 40pt

Clock Domain Crossing: Issues and Solutions Udit Kumar, DCGSakshi Gupta,TRnDBhanu Prakash, DCG

March 12 , 2012

Clock Domain1

Clock Domain2

1AgendaWhat is Clock Domain Crossing (CDC)The Problem ? The trendLeads to Chip Failure .no software fix..How CDC Issues look likeAnd recommendations around themSpecial case CDC IssuesDesign is CDC OK but Silicon has issues ??Different flows available (with Atrenta Spyglass) when to use what..Appendix#2List of KeywordsKeywordMeaningCDCClock domain CrossingStructural CheckLooking into structure of design.Functional CheckApply formal method to prove design works as expected. (Kind of Simulation!)Gray codeEncoding where state change has only1 bit changeSpyglassAn EDA tool to perform CDC analysis, from Atrentaip_blockSpyglass term, to check CDC at IP boundary.sgdcSpyglass design constraintsAbstract model A CDC model of an IP which can be used at SoC #Clock in a Digital SystemDigital system consists of combinational and sequential logic.Clock triggers flops leading to state changes (viz. State Machines).Clock is the fastest signal in a synchronous system.

FF1FF2ClkMain Properties Of ClockA clock has Period of repetition (linked with its frequency)

Phase depicts rise & fall transitions

A flop can be triggered thru any of clock phases.

Tp Phase+ve-veDomain of a clockLogic which is triggered by clock (or derived clocks) Also known as Synchronous system.

Conversely, domains with clocks of variable phase and frequency are different clock domains. Also known as Asynchronous.

Domain 100Mhz Domain 27MhzWhy have multiple clock domains?..SoC have multiple interfaces with very different clock frequencies

Why have multiple clock domains?To reduce Clocktree balancing complexities.

Consider clocks to various blocks as asynchronous.

Synchronous clocks but considered Async.8The CDC PathWhen clocks are a-synchronous, the signals that interface between are called clock domain (CDC) paths. Within each domain, data transfer is protected by Setup & Hold checks of flops.No check exists on CDC path

Setup/HoldSetup/HoldCDC is related to Asynchronous clock domainsThe CDC Paradigm Heterogeneous applications having Dozens of Clocks

Traditional STA checks or functional simulation do not verify CDC problems!

Responsibility for validating CDC issues is shifting from verification engineers to logic designers

1010 Most of you are designing complex chips that go on devices like the Blackberry, Wii, iPhone, HD Camcorders etc These applications have heterogeneous applications like wireless, networking, audio, video etc which involves many async clock domains Traditional methods like STA and functional simulation do not adequately catch CDC issues, since it a manual process and is typically done through design/code reviews and by writing test benches for specific protocols.

Another trend we are seeing is that logic designers are now responsible for validating and sign-off for CDC checks since they have responsibility for adding synchronizers in RTLWhy is it important? Multiple cases of Chip failure due to this effect across the world.Chip Failures in past, failure Cost. Wrong clock connections IP data convergence issue.

ctrl_reqsoft_reset_active_stbusctrl_req_intgdp_proc_clkst_ckst_ck#Clock Domain crossingsDos & Don'tMeta-stabilityA flip-flop needs input to be stable before and after the clock edge. (Setup & Hold Time) .

In CDC crossing, there will be setup & hold violations.

Then, the output of flip-flop may take much longer time to reach a valid logic level. This is called metastability.

Very close13MTBF - Mean Time Between FailureReciprocal of failure rateShould be as high as possibleFailure means signal goes metastable after first stage synchronizer and continues to be metastable one cycle later when it is sampled in the second stage synchronizer flop.

Synchronizing clock frequencydata changing frequencyDuration of metastable output (1/Tau)Techno dependent#Example on how to calculate MTBFMTBF (Mean Time Between Failures): Average time a system will run between failures.

A system has 4000 components with a failure rate of 0.02% per 1000 hours. Calculate MTBF.No of failure per hour= (Failure rate) * (Number of components)(0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4 per hours

MTBF = 1 / (8 * 10-4 ) = 1250 hours

#Clock crossing: Minimum Solution

"A synchronizer is a device that samples an asynchronous signal and outputs a signal that is synchronized to a local or sample clock [1].

Asynchronous signalD QD QClkDaAsAWFF3Synchronized signalD QAC1FF2FF1synchronizerSynchronizing cell should come from special cell library.#Special Cell Properties

Total number of synchronizer instance in the system contribute to system MTBF.Sync. Cell has less Tau#Having a Synchronizer is not enough,one needs to follow more rules !#No combinational logic at crossing pointUnconstrained path has delay imbalance, leading to loss of data & glitches.Make sure that CDC signal is directly coming from a flop.

#19Datahold problem occurs when you have signal crossing from a FAST clock domain to a SLOWER clock domainIn the waveform diagram, signal A is shown to be held only for one period of domain A.Slower clock, clk_B, will miss this signal A

Re-convergence after SyncChip Killer !!

ActualCan also Lead to Bad FSM triggering

Compute the controls & then do one transfer across domain#20The re-convergence problem occurs when you have a group of signals crossing over clock domains and then re-converge. Even when these signals are properly synchronized, they would still encounter race conditions. (Typically, these signals compose an address bus to a FIFO)While referring to the example above,Suppose Delay B is greater than Delay ASignal X2 will be captured by clk_B one cycle earlier than Y2Signal X3 and Y3 are clearly out of synch. As such, it will impact downstream logic

Crossing fast to slow domainData Hold problem(Signal crosses from a fast clock domain to a slow clock domain)To avoid CDC issues, Hold the data till a time-out (using Pulse extenders).

Q1D1Clk1clk2==NCounterClk1Hold data (for minimum 3 RX clock edge) till the transfer takes place (Traffic police).#21Datahold problem occurs when you have signal crossing from a FAST clock domain to a SLOWER clock domainIn the waveform diagram, signal A is shown to be held only for one period of domain A.Slower clock, clk_B, will miss this signal A

Handshake based data transferHandshake, where control path is synchronized & data path is follower.

DisadvantageDelay in synchronizing control signals (in both directions) affects the thru-put.

clk_A domainclk_B domainLogic in control path ensures that transfer on Data bus is coherent.#Bulk data transfer using FIFOWhen throughput is important, FIFO based synchronizers fit well.

clk_A Domainclk_B DomainControl bits transfer should be gray coded.Flow control using FIFOs overflow/underflow.

#23Sometime more sophisticated synchronization schemes are employedHandshake synchronization scheme is good for portability, IP, and reuseFIFO synchronization scheme is good for high volume data transferTypes of CDC ChecksStructural Checkslooks for presence of corrective logic (viz. synchronizer) at crossing.Functional ChecksFormally verify that protection is error free.Functional CDC (assertion based)

123Structural CDC 1. Missing or incorrect synchronizer.2. In-correctly implemented CDC protocols.3. Re-convergence issue.

#Why Functional CDC Checks ?Validates by looking at hardware that FIFO is being written when overflow?Hand shaking scheme not functionality correct ?Gray-code behavior breaks?

#Not so obvious Silicon IssuesCDC OK on RTLCan we assume Silicon will work ?

#Multi-bit data bus CDC issue, due to physical implementation

Logic in Serial (Control) Path is used to ensure that transfer on multi-bit (Data) is coherent.Loop Delay of Control Path should be Less than Stable time of Data#Physical View

1nsCapture Period : 5 ns100 nsHuge delay imbalanceData path is severely skewed due to No constraints & also due to Physical constraintsNot CheckedIn flow

#28Constraints for such paths

1nsCapture Period : 5 ns100 nsNeed to constrain data bus even though transfer is on an Asynchronous interface.Skew Limit for the bus#Related IssuesShoot-thru with-in a clock domain Clock assignment leading to shoot-thruAny assignment in the clock path.data_inDclkclkdata_intdout_outclockclock2

VHDL: Due to Delta DelayShoot Thru may occur if capture clock is delayedShoot Thru @IPIn Verilog Non-Blocking can trigger new events #How to solve the shoot-thru problemSilicon behavior is no longer the same as RTL simulationRoot cause is change in scheduling of events in a RTL simulator

How to avoid ?Force event scheduling at such crossings by explicit delayX_delayed