Data transfer between asynchronous clock domains

SNUG Europe 2000, Palais des Congrès, Paris, France 1 of 10 Data Transfer between Asynchronous Clock Domains

EUROPE

Data Transferbetween Asynchronous Clock Domains without Pain

Resear ch Inst it ut e f or Int egrat ed Cir cuit s

Markus Schutti, Markus Pfaff, Richard HagelauerFreistädter Str. 315, A-4040 Linz, Austria, email: [email protected]

AbstractA fully synchronous design style with a single global clock is brilliant, but not always possible. Every

designer suffers from time to time from the burden caused by an implementation requiring multiple clockdomains that are inherently asynchronous. Synthesis is possibly not the main barrier but recurrenthurdles during simulation, verification, timing analysis and (scan) test integration could possiblyjeopardize the entire design.

This article describes a design style developed for circuits with extensive data transfer betweenasynchronous clock domains achieving an error- and even warning-free synthesis flow. As result theentire design can be examined properly by the (built-in) static timing analysis of Design Compiler andscan test insertion can be done straight-forward by Test Compiler without any tricks.

Of course the proposed design style does not utilize the common “3-stage shift-register” synchroni-zing mechanism! Instead a more refined interface circuitry is presented, improving systems robustnessand cutting down implementation efforts. The costs for a data transfer from the faster clock domain to theslower is a simple 2:1-multiplexer, and in the reverse direction a 2:1-multiplexer plus one D-type flip-flop –certainly less hardware effort than any other “synchronization” scheme.

The theory of this approach, typical clock/data waveforms, and timing calculation will be shown anddiscussed. The proposed design style has been successfully employed on a 25k gate design with twodifferent clock domains (using partial shutdown of the clock branches), an embedded 8051 core anseveral onchip memories.

IntroductionASICs developed nowadays often require different clocks. SoC (System-on-a-Chip) designs are inherently built

of macros and IP cores, delivered by distinct suppliers, using separate clock domains. It must be assumed that theseclock domains are asynchronous – having no common clock base nor having an even-numbered ratio of theirfrequencies. The specification of such an ASIC might comprise a microcontroller running at 10 MHz, supported bya special DSP core clocked with 85 MHz, communicating with an USB interface (12 MHz) an some peripheralparallel ports clocked by external components with 1.1 MHz, and not to forget the real time clock operating at32,768 Hz.

mailto:[email protected]


Clock Domain A Clock Domain B

Clock A

Clock B

Data Transfer (bitwise)

... clock signals completely asynchronous

Figure 1. Data Transfer between asynchronous clock domains.

In this paper we want to limit the problem to only two different clock domains as shown in Figure 1. Howeverthe proposed interface circuitry can be extended to an arbitrary number of clocks. Data transfer takes place in bothdirections. The term “Data Transfer” is valid for bitwise or bitparallel data without any underlying protocol.

The ProblemComplex digital circuits – implemented in a HDL and synthesized by a synthesis tool like the Design Compiler –

obey the synchronous-sequential design style. Such designs can be processed automatically; timing paths areexamined by the built-in static timing analyzer. But at the junction of clock domains this design style is violated,resulting in a potential cause for malfunction when a signal has to be captured in a “foreign” clock domain.

Clock Domain A

Data Sink

Clock Domain B

Clock A

Clock B



D DQ Q

C C

combinationallogic

combinationallogic

Figure 2. The conflict-causing path from flip-flop “Data” to flip-flop “Sink”.

Figure 2 depicts such a conflict-causing signal path in detail. The path starts at a flip-flop sensitive to Clock A,passing some combinational logic (consisting of simple non-sequential logic elements) with further signal inputs,and ends at a flip-flop sensitive to Clock B.

The circuit of Figure 2 is examined in detail through the waveforms given in Figure 3. Waveform Clock A andClock B represent the clock signals of flip-flop Data and Sink respectively. The content of flip-flop Data is loaded eachand every cycle. This signal has to be latched into the target flip-flop.


Clock B

Sink

Data

Clock A

setup time violation setup time violation

Metastability MetastabilityFigure 3. Setup (and Hold) Time Violation at flip-flop “Sink” results in Metastability. The shaded areas

mark time zones where no data change is allowed.

Since frequency and phase of Clock A and B are not related, the setup (and hold)1 time of the target flip-flop(here: Sink) will be violated from time to time. The Setup Time Condition is violated when data arriving at the Dinput of the flip-flop is too close to the rising (active) edge of clock (in Figure 3: Clock B). Setup Time Violationcauses metastability at the flip-flop. Metastability manifests either in hyperactivity (oscillation) or a invalid logiclevel at output Q.2

Metastability must be avoided. Metastable elements cause an increase in power consumption, diminish systemreliability, and are the reason for malfunction.

Standard ApproachThe common way solving this problem is to use a series of D-type flip-flops (typically three), similar to a shift-

register. Using a chain of flip-flops reduces the probability of metastability.

Clock Domain A Clock Domain B

Clock A

Clock B



FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF FF FF

Figure 4. Data Synchronization by shift-register structures.

This means each data path going from one clock domain to another has to be provided with a 3-stage shiftregister sensitive to the rising edge of clock (of the target clock domain). As illustrated in Figure 4 this approach isfeasible but not optimal: in terms of hardware a 3-stage shift register is expensive and introduces large delay.

Smart ApproachAim of the proposed circuitry is to cut down the hardware effort of the clock interface. To accomplish this

objective the two clocks have to be distinguished into one clock running at a faster and one clock running at aslower frequency. Thus we will use the terms ClockF and ClockS (instead of A and B).

1 The following text deals only with the setup time violation since most of the flip-flops used today have a hold time of zero. However thissimplification imposes no restriction because the proposed interface circuitry addresses a larger time zone around the active edge of clock.2 For better characterization metastability is presented in the timing diagrams as oscillation.


Figure 5 gives an example with ClockF about 8 times faster than ClockS. In a first step we will focus only the datatransfer (DataS) from the slow clock domain to the fast clock domain. Hence signal DataS has its source at a flip-flopsensitive to ClockS and has to be captured at a flip-flop sensitive to ClockF, also passing some combinational logic onits path.

ClockF

ClockS

DataS

DataSinkS

SyncClkS(1)

SyncClkS(3) = SyncSignal

SyncClkS(2)

DataS captured at Clock F domain

setup time violation

Figure 5. Data Transfer from Slow Clock to Fast Clock Domain.

The shaded time zones mark those areas where data capture must be suspended to avoid timing violation at thetarget flip-flop. Data capture (at the target flip-flop) is not allowed in the vicinity of the rising edge of ClockS(because a stable data signal can not be expected at this period).

• The time immediately before the rising edge of ClockS can not be determined: it is not possible to lookinto the future. Advanced means (e.g. an additional phase-shifted clock signal, a timer, a PLL) are out ofscope of this article.

• The time immediately after the rising edge of ClockS can be detected by synchronizing the signalwaveform of ClockS into ClockF. Using the above mentioned 3-stage shift-register for this purpose we getsignal SyncSignal as shown in Figure 5 (being signals SyncClk(1..3) the three stages of the shift-register).Consequently the high period of SyncSignal represents the delayed high period of ClockS.

With other words: data capture can be enabled during the delayed high period3 of ClockS. During the delayedhigh period of ClockS no signal change has to be expected on a data line coming from that clock domain. Thus it issafe to strobe data into a flip-flop triggered by a foreign clock (ClockF) unless that clock performs at least someactive clock edges during this period of time. This condition is guaranteed as long as the timing requirement ofsection Clock Ratio (page 5) is met.

Time WindowFrom that it is clear to use the high period of SyncSingal as valid “time window” to capture data into a foreign

clock domain. DataS is guaranteed to be stable for the period of this time window.

The SyncSignal is generated by the standard 3-stage shift-register structure as shown in Figure 6. Please notethat this signal is generated only once – globally for the entire ASIC or to be more precise: once for each junction ofclock domains.

FFSYNC1 FFSYNC2 FFSYNC3

D

Clr Clr Clr

DDQ Q QClock S

Clock F

Reset

SyncSignal

Figure 6. Generation of SyncSignal.

3 The high period is the first half of a clock cycle.


Some extra delay caused by combinational cells on the path from DataS to the target flip-flop DataSinkS is alsotaken into account. This delay has to be smaller than n times the clock period of ClockF (with n being the number ofstages of the shift-register):

FASTdelaynalcombinatio Tnt ⋅<

If this time period is too small, the stages n of the shift-register generating SyncSignal can be extended.

Interface circuitryThe “gate” on the signal path from DataS to DataSinkS should be opened during the time window (when

SyncSignal is ‘1’) and otherwise be closed. Clearly this “gate” can be implemented as simple multiplexer as shown inFigure 7 or as Enable input of the target flip-flop as shown in Figure 8.

1

0

INTERFACE

D DQ Q

C C

ClockS

ClockF

Clock FDomain

SyncSignal

Slow Clock Fast Clock

DataS DataSinkScombinational

delay

Figure 7. Interface Circuitry for Data Transfer from the Slow to the Fast Clock Domain using a 2:1multiplexer.

The circuitry of Figure 7 allows the target flip-flop to be loaded with data from the own clock domain (e.g. toclear of to set the flip-flop), whereas the simpler circuitry of Figure 8 allows only the straightforward data capture.

D DEn

Q Q

C C

ClockS

ClockFSyncSignal

Slow Clock Fast Clock

DataS DataSinkScombinationaldelay

Figure 8. Interface Circuitry for Data Transfer from the Slow to the Fast Clock Domain using a flip-flop with Enable.

Data Transfer from the Fast to the Slow Clock DomainUntil now we have covered the data transfer from the slow to the fast clock domain. As we will see, the reverse

direction is somewhat more difficult to handle. The source data comes from a flip-flop sensitive to ClockF. Thus datachange has to be expected each clock cycle of the fast running clock. That happens several times during the periodof the slow running clock. Furthermore it is almost certain that from time to time a data change will occur exactly atthe rising edge of ClockS, resulting in metastability at the target flip-flop DataSinkF.


DataF captured at Clock S domain

ClockS

DataF locked

DataSinkF

SyncClkS(1)


SyncClkS(2)

ClockS

DataF locked

DataSinkF

SyncClkS(1)

SyncClkS(2)

DataF

ClockF

DataF captured at Clock S domain


Figure 9. Data Transfer from Fast Clock to Slow Clock Domain.

The shaded time zones of Figure 9 indicate those areas where the data to be captured at the target flip-flopDataSinkF has to be stable. Obviously this condition is not true for the source signal DataF.

If the design engineer can not ensure that DataF is not modified (not loaded with new data) during the shadedtime zone (that is postulated here), we need an additional flip-flops to freeze the data signal temporarily. The datasignal has to be frozen in the region of the rising (active) edge of ClockS. This time period corresponds to thedelayed low period of ClockS, covering the shaded time zones of Figure 9. The delayed low period of ClockS is thesame as the low period of the above introduced SyncSignal.

DD D QQ Q

CC C

Clock S

Clock F

Data FDataSink F

1

0

SyncSignal Data does not change during(delayed) low period of Clock S

INTERFACE

Data F “locked”

Slow ClockFast ClockFigure 10. Interface Circuitry for Data Transfer from the Fast to the Slow Clock Domain.

Therefore DataF is latched into a flip-flop sensitive to ClockF and locked there during the low period ofSyncSignal. The output of this flip-flop is used to deliver the data signal safely to the target flip-flop DataSinkF.Exactly on this path the crossing of the both clock domains takes place. However this crossing is safe since theinterface circuitry guarantees a stable data signal at the rising edge of ClockS.

Clock RatioThe presented interface circuitry makes a clear distinction between both transfer directions involved. The

interface circuitry works as expected, provided that ClockS runs somewhat slower than ClockF!

The following timing calculation is executed separately for both transfer directions. n is a placeholder for thenumber of stages of the shift-register generating SyncSignal (in the examples above n is 3). A duty cycle of 50% to50% (for the high and low period) of ClockS and ClockF is assumed.


Fast to SlowIn the worst case n cycles of ClockF after the rising edge of ClockS the SyncSignal rises to high. Then it takes the

high period of SyncSignal until DataF is locked in the interface flip-flop. At last also some combinational delay (orwire delay) and the setup time of the target flip-flop has to be considered before reaching the next rising edge ofClockS, where stable data is required to ensure safe data strobing. Thus we can say:

SLOWsetupdelaynalcombinatioSLOW

FAST TttTTn <+++⋅2

Assuming the combinational delay together with the setup time is possibly a value in the range of the clock cycletime of ClockF ...

FASTsetupdelaynalcombinatio Ttt ≈+ ... we can simplify the formula and get ...

( ) FASTSLOW TnT ⋅+⋅> 12

That means the frequency of ClockS must at least be 8 times4 slower than the frequency of ClockF (as long as aduty cycle of 1:1 is guaranteed).

3with8 =<⋅ nff FASTSLOW

Slow to FastFor the data transfer from the slow to the fast clock domain we can argue very similar. In the worst case n cycles

of ClockF after the rising edge of ClockS the SyncSignal rises to high. Then it takes the high period of SyncSignal untilthe “multiplexer gate” is closed again. At last also some combinational delay (or wire delay) and the setup time ofthe target flip-flop has to be considered before we reach the next rising edge of ClockS, where we require stable datato ensure safe data strobing. Thus we have exactly the same calculation as before.

The given formula is the upper limit of the frequency of the slower clock. The interface circuitry imposes nolower limit. As well there is no absolute boundary for the frequency, just the ratio of both frequencies is the crucialfactor.

Clock Ratio ScenariosThe following screenshots show the result of some simulation runs with different clock ratio. Data transfer is

performed in both directions. The data type is an integer number. The source data is updated each clock cycle withan increasing integer number.

Figure 11. Clock Ratio of 1:31.1

4 with n = 3.


The simulation of Figure 11 was done with a clock ratio of 1 : 31.1. The transfer of data packet 568 and 600(from the fast to the slow clock domain) is visible in the middle of the waveform. In the other direction data packets115, 116, and 117 are transferred.


The simulation of Figure 12 was done with a clock ratio of 1 : 8.1 – near the limit of the operating conditions.This can be reviewed by the transfer of data packet 961 (from the fast to the slow clock domain). The transferprocess was executed successfully. But if data 961 packet had been locked only one clock cycle of ClockF later, itwould have come to metastability at the target flip-flop: the next rising edge of ClockF is too near to the rising edgeof ClockS!


The simulation of Figure 13 was done with a clock ratio of 1 : 4.3 – a ratio far outside the operating conditions.The transfer of data packet 583, 600, and 617 (from the fast to the slow clock domain) fails due to not locked data inthe interface flip-flop. The content of the interface flip-flop is changed near the rising edge of ClockS, causingmetastability in the target flip-flop DataSinkF (illustrated as fast oscillating resp. black waveform in the window).

The same happens in the other transfer direction (from the slow to the fast clock domain). The transfer of datapacket 210 fails due to SyncSignal being still ‘1’ near the rising edge of ClockF. That means the “multiplexer gate” onthe transfer path is not closed in time, causing metastability in the target flip-flop DataSinkS.

Test IssuesScan test insertion is almost mandatory for complex digital ASICs. Having multiple clock domains is always

risky when doing scan insertion and test pattern generation. Fortunately the presented interface circuitry can be


recognized properly by the Test Compiler. ATPG is also no hurdle. It is possible to arrange separate scan clocks forthe different clock domains, and it is also possible to use a common scan clock for all clock domains together. Ifusing different scan clock, the test designer should take care of the test variable test_capture_clock_skew andthe command set_scan_configuration –clock_mixing.

test_capture_clock_skew = no_skew | small_skew | large_skew

... can be used to avoid creating unreliable capture conditions and thus preventing Test Compiler ATPG togenerate invalid vectors. The appropriate setting of the variable can be determined only when the timing of the testclocks is known.

set_scan_configuration –clock_mixing no_mix | mix_edges | mix_clocks

... can be used to specify whether insert_scan includes cells from different clock domains in the same scanchain. If set to no_mix, scan chains inserted by insert_scan can contain only cells clocked by the same clockedge. If set to mix_edges, scan chains can contain cells clocked by different edges of the same clock. If set tomix_clocks, scan chains can contain cells clocked by different clocks.

SummaryThe presented interface circuitry can be seen as smart substitution to the standard n-stage shift-register

synchronization approach. In terms of hardware the proposed interface is a significant simplification. Assuming anASIC with about 100 transfer signals (e.g. two 32-bit data busses, one 32-bit address lines, and some more controlsignals) from one clock domain to another would normally cause an hardware effort of 100 � 3 � 10 = 3,000 gates(using a 3-stage shift-register and calculating with approx. 10 gates per (scan) flip-flop). Compared to that theproposed interface circuitry requires only 100 � 3 = 300 gates or 100 � (3 + 10) = 1,300 gates for the direction fromthe slow to the fast or the fast to slow clock domain respectively (calculating with approx. 3 gates per 2:1-multiplexer). The overhead for generating signal SyncSignal, a 3-stage shift-register (= 3 � 10 = 30 gates), is notsignificant.

Clock Domain A (faster)

Clock Domain B(slower)

Clock ASyncSignal

Clock B



Figure 14. Data Synchronization by SyncSignal. Less hardware effort compared to “3-stage shift-register” standard approach.

Another benefit of the proposed interface circuitry is the better latency when transferring data to the slow clockdomain. The transferred data is available in the slow clock domain not later than one and a half clock cycle of ClockSafter new data has been issued in the fast clock domain. The old-fashioned shift-register circuitry would have alatency of n to n+1 cycles of ClockS.

Moreover the interface circuitry is easy to implement, can be applied for a wide range of frequency ratios, cutsdown the possibility of metastability to a single point (at the shift-register generating SyncSignal) and thus improvessystems robustness, and saves power and area.


References[ABF90] M. Abramovici, M. A. Breuer, and A. D. Friedman. Digital Systems Testing and Testable Design. Revised

printing. New York: IEEE-Press, 1990.

[Arr97] Arreguy, Nick. “ASIC Multidimensional Design for Test” in Integrated System Design, June 1997.

[Bot98] Bottoms, Bill. “The Third Millennium’s Test Dilemma” in IEEE Design & Test of Computers vol. 15 (4),October-December 1995: 7-11.

[DT97] D&T Roundtable. “Testing Embedded Core” in IEEE Design & Test of Computers vol. 14 (2), April-June1997: 81-89.

[DDDA95] C. Dislis, J. H. Dick, I. D. Dear, and A. P. Ambler. Test Economics and Design for Testability of ElectronicCircuits and Systems. Great Britain: Ellis Horwood, 1995.

[KB99] Keating, Michael und Bricaud, Pierre. Reuse Methodology Manual – For System-On-a-Chip Designs. 3.Auflage. Norwell, Massachuesetts: Kluwer, 1999.

[SPKNH98] Schutti, Markus, Markus Pfaff, Robert Köck, Gerhard Nikolaus, und Richard Hagelauer. „STELLA - ScanTest Equipment for Low-Volume Low-Cost ASIC Analysis“. International Workshop on Design Test andApplications, Faculty of Electrical Engineering and Computing University of Zagreb, Dubrovnik, Croatia,Jun. 1998: 121-124.

[TCRM99] Test Compiler Reference Manual. Version 1999.05. Mountain View, California: Synopsys, 1999.

Data transfer between asynchronous clock domains

Documents

Transcript of Data transfer between asynchronous clock domains