An adjoint sensitivity technique for dynamic neural-network modeling and design of high-speed...

An Adjoint Sensitivity Technique for DynamicNeural-Network Modeling and Design ofHigh-Speed Interconnect

Yi Cao,1 Jianjun Xu,1 Vijaya K. Devabhaktuni,1 Runtao Ding,2 Qi-Jun Zhang1

1 Department of Electronics, Carleton University, Ottawa, Ontario K1S 5B6, Canada2 School of Electronics and Information Engineering, Tianjin University, Tianjin, 300072, China

Received 12 May 2005; accepted 4 August 2005

ABSTRACT: In this article, we develop an adjoint dynamic neural network (ADNN) tech-nique aimed at enhancing computer-aided design (CAD) of high-speed VLSI modules. A novelformulation for exact sensitivities is achieved by defining an adjoint of a dynamic neuralnetwork (DNN). We further present an in-depth description of how our ADNN is computa-tionally linked with the original DNN in the transient-simulation environment in order toimprove the efficiency of solving the ADNN. Using ADNN-enabled sensitivities, we develop anew training algorithm that facilitates DNN learning of nonlinear transients directly fromcontinuous time-domain waveform data. The proposed algorithm is also expanded to enablephysics-based nonlinear circuit CAD through faster sensitivity computations. Applications ofour ADNN approach in transient modeling and circuit design are demonstrated by theexamples of modeling physics-based high-speed interconnect drivers and gradient-basedsignal integrity optimization. © 2006 Wiley Periodicals, Inc. Int J RF and Microwave CAE 16: 385–399,2006.

Keywords: computer-aided design; sensitivity; neural networks; nonlinear circuits; modeling anddesign

I. INTRODUCTION

Artificial neural networks (ANNs) have gained rec-ognition as fast and reliable vehicles for high-fre-quency/microwave modeling and design [1, 2].ANN-based approaches have been exploited tomodel a wide variety of microwave components/circuits such as FETs [3], amplifiers [4, 5], and soforth. More recently, a unique category of ANNcalled dynamic neural network (DNN) [6] has beenintroduced for dynamic nonlinear CAD in the har-monic balance (HB) environment. Unlike feed-for-ward neural networks, such as the popularly usedmultiplayer perceptron (MLP) [1] which models

nonlinear algebraic relationships, the DNN ad-dresses nonlinear dynamic relationships in circuitsand systems. Compared to the recurrent neural net-works [5], the DNN approach is more compatiblewith nonlinear circuit simulation requirements andmore effective in addressing the challenges of non-linear circuit modeling.

This article focuses on the application of DNN toa new area, that is, nonlinear transient modelingand high-speed interconnect circuit design. Accu-rate and fast representation of nonlinear transientbehaviors is key to successful digital high-speedVLSI interconnect CAD, including multichip mod-ules and multilayer printed circuit boards (PCBs).Sensitivity analysis of transmission lines [7] andtransient-oriented optimization [8] of VLSI inter-connects have been used in the design of high-speedpackages to improve signal integrity. Currently, CAD

Correspondence to: Q.-J. Zhang; email: [email protected] 10.1002/mmce.20159Published online 3 April 2006 in Wiley InterScience (www.

interscience.wiley.com).

© 2006 Wiley Periodicals, Inc.

385

of interconnect modules with nonlinear terminationsis an active research subject [7, 9] and this article aimsto accomplish efficient neural-based nonlinear tran-sient modeling and design of high-speed interconnectcomponents, including physics-based effects. In orderfor the DNN to learn transient data, sensitivities (de-rivatives) of the corresponding training error withrespect to DNN weights are essential. In addition,transient design requires sensitivities of the targetfunctions with respect to geometrical/physical param-eters of nonlinear components. However, due to thedynamic nature of transient response and the DNNbeing not directly algebraic, the sensitivity informa-tion cannot be obtained by a simple differentiation ofthe DNN equations. A brute-force approximation isthe perturbation approach, in which the DNN differ-ential equations have to be resolved for the perturba-tion of each DNN variable. For DNN training withtransient data and many internal weight variables, thiswould lead to a substantially long training time. Thisis one of the major factors why circuit transient mod-eling by neural networks is still a rarely exploredsubject. Our motivation towards solving this bottle-neck problem is the paramount sensitivity concept,namely, the adjoint sensitivity concept pioneered in[10]. Following this, various circuit-based adjointtechniques [11] have been established and circuit op-timization via adjoint Lagrangians formulation [12]has also been studied.

In this article, we present an ADNN technique fornonlinear transients such as those in high-speed inter-connects in the presence of nonlinear elements [13].By defining an adjoint network of the DNN, the exactadjoint sensitivities of a general energy function forthe DNN are derived. To further enhance the effi-ciency of ADNN evaluation, the present article ex-pands the work presented in [13] to include the ad-vanced computational features of the ADNN modelthat allow ADNN to be solved with small overheadcomputations once the original DNN solutions areobtained. Through the ADNN-based sensitivity for-mulas, we develop a new algorithm that facilitatesDNN training directly from circuit transient wave-forms. This algorithm is also generalized to cover theDNN sensitivities with respect to external static inputs(such as physical/geometrical values of nonlinearcomponents) of the model, thus permitting fasterphysics-based nonlinear circuit design, including thegeometrical/physical parameters.

The rest of the article is organized as follows.Section II starts with a brief review of the DNNformulation for modeling nonlinear dynamic circuits.Then the ADNN technique is presented for the exactsensitivities of the DNN energy function with respect

to both DNN internal weights and external inputs. Insection III, we describe the computational features ofADNN for the environment of circuit transient simu-lation. Based on the ADNN-based exact sensitivities,a comprehensive algorithm for DNN transient trainingas well as the sensitivity analysis of a trained DNN isdescribed in section IV. Finally, in section V, theproposed ADNN algorithm is applied to DNN mod-eling of a physics-based nonlinear interconnect driverand gradient-based signal integrity optimization. Thevalidity of the proposed approach is verified and theadvantages of using the ADNN are demonstratedthrough a substantial speedup in transient-based train-ing of the circuit buffer model, and signal-integrity-based optimization.

II. ADNN SENSITIVITY TECHNIQUE

A. Dynamic Neural Network

In the time domain, a nonlinear dynamic circuit canbe represented by a DNN [6] of order n. Inputs toDNN include dynamic inputs u(t), correspondingkth-order derivatives u(k)(t), and static inputs �.Here, � is a N� vector containing static inputs suchas geometrical/physical parameters of the circuit tobe modeled. Let Ny be the number of DNN outputs.The DNN equations modeling a nonlinear circuitare given by

v1�t� � v2�t�

�

vn�1�t� � vn�t�

vn�t� � fANN��, u�n�1��t�, . . . , u�1�

� �t�, u�t�, vn�t�, . . . , v1�t��w�, (1)

where each vi is a Ny vector representing a set of statesof the DNN. The output signals of the DNN model aregiven by y(t) � v1(t). Here, fANN represents a MLPneural network [1] with trainable weight parametersw. For example, if the DNN is used to model amicrowave amplifier circuit, u(t) and y(t) would rep-resent the amplifier input voltage and output voltage,respectively. In this case, the order n would reflect thedynamic nature of the amplifier, that is, the effects ofinternal capacitors and inductors in the circuit. Assuch, the total DNN model realizes a nonlinear dy-namic relationship between the circuit input and out-put signals.

International Journal of RF and Microwave Computer-Aided Engineering DOI 10.1002/mmce

386 Cao et al.

B. Problem Statement

We define an energy function E for the DNN thatrepresents a typical design function in transient anal-ysis as

E � �T1

T2

f�y�t��dt, (2)

where [T1,T2] is the time-interval of interest. An im-portant application of (2) is training the DNN with atransient signal. Let f(y(t)) be the instantaneous dif-ference between the DNN output signal and the DNNtraining signal. The integration in (2) yields the wave-form-based training error, which measures the signaldifference accumulated over the entire time range. Inthis way, E can be conveniently used to represent theerror criteria for the DNN training.

The purpose of our sensitivity technique is todetermine the derivatives of E with respect to theDNN internal weights w and static inputs �, that is,dE/dw and dE/d�. Such derivatives could be veryuseful for training the DNN to learn nonlineartransient data and for sensitivity analysis and opti-mization of high-speed interconnect circuits utiliz-ing the trained DNN model. The challenge here isthat y(t) has a dynamic rather than a simple alge-braic relationship with w and �. In this article, weexploit the adjoint concept for developing an effi-cient sensitivity technique for transient-orientedDNN training and applications.

C. The ADNN TechniqueBy formulating our task as a nonlinear optimizationproblem with dynamic constraints, we are able todefine a new ADNN system as follows [13]:

v1 � ��fANN

T

�v1vn �

�f

�y

v2 � ��fANN

T

�v2vn � v1

�

vn � ��fANN

T

�vnvn � vn�1, (3)

where each vj is an Ny vector representing a set ofADNN states. The outputs of the ADNN are given byy(t) � vn(t). The ADNN is a linear time-varyingdynamic system, which is simulated backward in timefrom T2 to T1. The boundary conditions for the ADNNare imposed at the upper time limit as v(T2) � 0. Ournew ADNN has a close relationship with the originalDNN in terms of the model structures. Figure 1 showsthe exact circuit representations of both DNN andADNN. Inputs to ADNN (that is, (�f/�y)) excite thestate v1 that corresponds to the output of the originalDNN. In Figure 1(a), the DNN state vn is controlledby all the DNN states v1, v2, ……, vn through theneural network fANN while in Figure 1(b) the ADNNstate vn conversely controls each of the ADNN statev1, v2, ……, vn.

The final sensitivity expressions can be subse-quently derived according to the locations of theweight parameters w in fANN. Let fANN represent a

Figure 1. Circuit schematic of (a) the original DNN and (b) the new ADNN. An MLP feed-forward neural network is used as a nonlinear controlling function in the original DNN schematicand the derivative of the neural network is used as the controlling coefficient for all the states inADNN.


Sensitivity for Dynamic Neural Networks 387

commonly used three-layer MLP neural network.Let Nl denote the number of neurons in the lth layer,and wij

l represent the weight of the link between thejth neuron of the l � 1th layer and the ith neuron ofthe lth layer. We define zi

l(t) as the instantaneousoutput of the ith neuron in the lth layer. Based on thesolutions of the ADNN given by eq. (3), dE/dw anddE/d� can be systematically evaluated as

dE

dwijl

� ��T1

T2

yk(t)zil�1(t)dt, l � 3, k � i, k � 1, 2, . . . , Ny

��k�1

Ny

wkil�1 �

T1

T2

yk(t)zil(t)(1�zi

l(t))zjl�1(t)dt, l�2

,

(4)

dE

d�i� ��

k�1

Ny �j�1

Nl

wkjl�1wji

l �T1

T2

yk(t)zj2(t)(1�zj

2(t))dt,

l�2, (5)

respectively, where yk(t) represents the kth ADNNoutput and the location of static DNN inputs � withinMLP input vector x is defined as �i � xi, i � 1, 2, …,N�.

III. COMPUTATIONAL FEATURES OFADNN

We examine how the ADNNs are solved and relatethis to the solution process of the original DNN.Since we are dealing with transient responses, weneed to use numerical techniques for solving thedifferential equations in order to evaluate the orig-inal and adjoint dynamic neural models. Here weemploy the most frequently used type of such in-tegration techniques, that is, implicit integrationmethods. Applying a general multistep implicit in-tegration formula [11] with fixed step size �t, theDNN equations (1) can be reduced to a set ofnonlinear algebraic equations, given by

�v1(tk�1)

�

vn�1(tk�1)vn(tk�1)

� � ��i�0

m

[aiv1(tk�i) � �tbiv2(tk�i)]

�

�i�0

m

[aivn�1(tk�i) � �tbivn(tk�i)]

�i�0

m

[aivn(tk�i) � �tbifANN(tk�i)]

� � �tb�1�v2(tk�1)

�

vn(tk�1)fANN(tk�1)

�

fANN�tj� � fANN��, u�n��tj�, . . . , u�1��tj�, u�tj�, vn�tj�, . . . , v1�tj��w� for j � k � m, k � m � 1, . . . , k � 1 (6)

where tk and tk�1 represent two consecutive time in-stances and �t � tk�1 � tk is the time step for forwardintegration. Here, m represents the order of the inte-gration formula while ai and bi are real coefficientsdefining a specific implicit integration method. Tosolve such nonlinear equations in (6), Newton–Raph-son (NR) iterative techniques are commonly used

[11]. At each iteration, the solution process involvesthe LU factorizing of the NR equation

J�vl�1 � vl� � �F�vl� (7)

where


388 Cao et al.

JnNy�nNy

� �I ��tb�1I 0 · · · 00 I ��tb�1I · · · 0

�

0 · · · 0 I ��tb�1I

��tb�1

�fANN

�v1T �tk�1� ��tb�1

�fANN

�v2T �tk�1� · · · ��tb�1

�fANN

�vn�1T �tk�1� ��tb�1

�fANN

�vnT �tk�1� � I

� (8)

is the Jacobian matrix representing the derivatives ofthe equations in (6) with respect to the DNN state

variables at time tk�1 and

F�vl� � �v1(tk�1)

�


� � ��i�0

m

[aiv1(tk�i) � �tbiv2(tk�i)]

�

�i�0

m

[aivn�1(tk�i) � �tbivn(tk�i)]

�i�0

m

[aivn(tk�i) � �tbifANN�tk�i)]

� � �tb�1�v2(tk�1)

�

vn(tk�1)fANN(tk�1)

�. (9)

Here, I denotes an Ny � Ny identity matrix and thesuperscript indicates the NR iteration sequence.

Now we examine the solution process for theADNN, which can be solved using implicit integra-

tion backward in time from T2 to T1. Applying themultistep implicit integration formula to the ADNN of(3) with fixed-step size ��t, we obtain a set oftime-varying linear algebraic equations:

�I 0 0 ��tb�1

�fANNT

�v1(tk�1)

��tb�1I I � ��tb�1

�fANNT

�v2(tk�1)

0 ��tb�1I · · · 0 �

� � I ��tb�1

�fANNT

�vn�1(tk�1)

0 0 ��tb�1I ��tb�1

�fANNT

�vn(tk�1) � I

� � v1(tk�1)v2(tk�1)

�


� � c, (10)



where c is an nNy vector determined by the ADNNsolutions at previous time instances, that is, t �tk�2, …, tk�m�1 as backward integration is per-formed, and the present input of ADNN (�f/�y)(tk�1). Bysolving the above linear equations (10), all the statevariables v1(tk�1), v2(tk�1),…, vn(tk�1) can be obtained.It should be noted that the matrix to be inverted in (10)is exactly the transpose of the Jacobian matrix J in (8)used to solve the original DNN equations (1). Therefore,the LU factors of the Jacobian required to solve theADNN are just the transpose of the LU factors used insolving the original DNN. Because the J matrix in (8)has been built and decomposed into LU factors duringthe integration of the original DNN, the solution of thelinearized ADNN equations (10) can be done withoutredoing the LU decomposition. In this way, the solutionof the adjoint DNN can be achieved at an incrementalcomputation effort once the original DNN is solved. Asan example, if we apply the trapezoidal rule integrationmethod [11], the expressions of J matrix and the linear-ized ADNN equations can be further specialized bysetting m � 0, a0 � 1, b0 � 1

2, and b�1 � 1

2in eqs. (8),

(9), and (10).This solution process can be made further efficient

by exploring the special structures of eqs. (7) and(10). Let [�1 �2 … �n]T be an nNy vector representingthe right-hand side of eq. (7). To find the DNN statesv, first solve a reduced set of linear equations (Ny

equations) as follows:

��i�1

n ��(�tb�1)n�i�1

�fANN

�viT � Ivn � �n

� �i�1

n�1�fANN

�viT �

j�n�1

i

��tb�1�j�i�1�j (11)

in order to obtain vn. Then all other vi values can besolved recursively as follows:

vi � �tb�1vi�1 � �i i � n � 1, n � 2, . . . , 2, 1.

(12)

Let c in eq. (10) be an nNy vector [c1 c2 … cn]T.Likewise, to find the ADNN states v, we initiallyobtain vn by solving

��i�1

n ��(�tb�1)n�i�1

�fANNT

�vi � Ivn

� �i�1

n

��tb�1�n�ici. (13)

The recursive relations to solve all other vi are givenby

v1 � �tb�1

�fANNT

�v1vn � c1,

vi � �tb�1vi�1 � �tb�1

�fANNT

�vivn � ci i

� 2, 3, . . . , n � 2, n � 1. (14)

Let MNy�Nyrepresent the left-hand-side matrix in (11).

It is to be noted that the left-hand-side matrix in (13)is the exact transpose of M in (11). Once the originalDNN solutions are achieved, the LU factors of M andthe matrices (�fANN/�vi

T) (i � 1, 2, ……, n) can bestored and reused in the solution process of theADNN in (13) and (14). A clear advantage of thisprocedure is that the formulas for computing v and vare much more compact as compared to the directsolutions of DNN and ADNN using (7) to (10), avoid-ing the factorization of the entire Jacobian matrix J aswell as the storage of its LU factors at each timeinstance. This computational feature becomes evenmore beneficial for sensitivity analysis with respect tomultiple variables. In the DNN model, the number ofhidden neurons in the fANN part reflects the degree ofnonlinearity between the DNN dynamic inputs andoutputs. More nonlinear problems will require morehidden neurons; therefore, more internal weights whave to be trained. For sensitivity analysis of theneural-network training error with respect to manyweight parameters, the brute-force perturbationmethod requires solving a perturbed DNN for eachweight, leading to a massive amount of DNN simu-lations. Using the proposed technique, only one orig-inal DNN and one adjoint DNN need to be solvedregardless of the number of variables. In this way, theproposed technique substantially speeds up DNN sen-sitivity analysis in the case of multiple variables.

IV. ADNN-ENABLED DNN TRAININGAND SENSITIVITY ANALYSIS

A. Training Objective

Let udi (t) and yd

i (t) represent the ith input and outputtransient waveforms, sufficiently sampled in the time-interval [T1, T2], to be used as the training data ofDNN. Let NT be the total number of transient wave-forms for training. The objective of DNN training isto adjust DNN parameters w such that the total train-ing error function,


390 Cao et al.

Ed � �i�1

NT

Edi� �

T1

T2 �i�1

NT 1

2�yi�t� � yd

i �t��2dt, (15)

is minimized. Here, Ediis defined as the waveform-

based training error for (udi (t), yd

i (t)) and yi(t) repre-sents DNN prediction of the ith output signal. Toallow direct utilization of the ADNN sensitivity ex-pression presented in section II for DNN training, wejudiciously choose f(y(t)) in (2) as an l2 error function,

f�y�t�� i�1

NT 1

2�yi�t� � yd

i �t��2, (16)

in order to establish consistency between energy func-tion E and training error Ed. To evaluate the ADNNequations for sensitivity calculation, the adjoint exci-tation in (3) can be formulated in terms of DNNoutputs yi(t) as

�f

�y� yi�t� � yd

i �t�. (17)

B. Algorithm of ADNN-EnabledTransient Training

Preparation Stage. Define the dynamic inputs andoutputs of DNN models, that is, u(t) and y(t). Gener-ate transient waveform training data ud

i (t) and ydi (t),

i � 1, 2, …, NT from detailed simulation or measure-ment of the original circuit. By supplying differentinput waveforms to the original circuit (that is, ud(t)),we obtain different output waveform data (that is,yd(t)). Optionally, we generate time-derivative data ofeach ud

i (t) and ydi (t) by connecting unit inductor/ca-

pacitors and controlled sources to the inputs and out-puts of the original circuit.

Initial Training. Using optional data generated fromthe preparation stage, we train the static MLP part,that is, fANN, to learn the data of input-output wave-forms ud(t) and yd(t), as well as their time-derivativesup to the nth order. The conventional MLP trainingtechniques such as back propagation (BP) [1] or con-jugate gradient algorithms [1] can be used. The resultof this training is an approximate fANN for which themodel dynamics have not been enforced. This resultserves as a good starting point for the transient-basedtraining of the dynamic behavior of the DNN tofollow.

Transient-Oriented Training. The overall transientoriented DNN training exploiting ADNN sensitivityanalysis is summarized in the following steps.

Step 1: Excite the original DNN with the ith inputwaveform data ud

i (t); initially, i � 1. Use numericalintegration to solve the original DNN equations for-ward from time T1 up to T2 with the user-specifiedinitial condition v(T1). At each time step, the M matrixis factorized and the resulting LU factors and thematrices (�fANN/�vi

T) (i � 1, 2, ……, n) are recordedafter being used for integration. Compute the l2-basedper-waveform training error Edi

using the ith outputwaveform data yd

i (t) and the DNN solution y(t), that is,eq. (15).

Step 2: Excite the ADNN using the waveform dif-ference between the DNN model output waveformand the present training waveform, that is, eq. (17).Use numerical integration to solve the ADNN equa-tions backward from time T2 to T1, starting with zeroboundary conditions at T2. The LU factors and thematrices (�fANN/�vi

T), (i � 1, 2, ……, n) stored in step1 are recalled periodically for reuse in solving theADNN.

Step 3: Compute the derivatives of the neural net-work with respect to its internal weights (�fANN/�w).This is achieved by the BP method [1] (or the staticadjoint neural-network method [14] if the sensitivityto be computed is with respect to the DNN externalinputs � instead of internal weights w). The ADNNsolution in step 2, that is, y(t), and the derivatives(�fANN/�w) are utilized to calculate the sensitivities ofthe per-waveform error Edi

of (15) with respect to allthe weight parameters in w.

Note that, as indicated in the above step, a singleADNN solution is adequate for computing the errorsensitivities with respect to all DNN internal weightsw. This is a significant computational advantage overthe perturbation method, where one additional simu-lation of original DNN is required for computing thesensitivity with respect to each weight parameter.

Step 4: If the waveform data used is not the firstwaveform (that is, if i � 1), then the per-waveformtraining error Edi

computed in step 1 and its sensitiv-ities with respect to w computed in step 3 are added tothose of the previous waveform (that is, waveformi�1). Proceed to the next input-output waveform pair,that is, set i � i�1; if i � NT, then go to step 1.

Step 5: Based upon the total training error Ed and itssensitivities accumulated over all the waveforms inthe training data, use efficient gradient-based trainingalgorithm (for example, the quasi-Newton method[1]) to calculate the update to weight parameters w.



Step 6: Update the DNN weight parameters w andcontinue with steps 1–6 until the total training satis-fies the user-defined threshold.

Through the above systematic algorithm, the DNNmodel can be trained to learn the nonlinear transientbehavior from the input-output waveforms. The use ofadjoint sensitivity technique makes the training pro-cess much more efficient than otherwise using theperturbation technique.

C. ADNN-Based Algorithm forDetermining Sensitivity with Respect toStatic Inputs to DNN

The algorithm presented above can also be extendedto cover another case of the ADNN technique, that is,sensitivity analysis of a trained DNN with respect toits external static input variables �. This sensitivitywill be useful for transient optimization of nonlinearcircuits using a DNN model where � contains theoptimization variables. In this case, the dynamic inputwaveform exciting the trained DNN will be a signal in

the circuit. The algorithm for such a sensitivity anal-ysis will be steps 1–3 as described above with thefollowing changes in steps 1 and 2: In step 1, the setof input waveform data is replaced by one inputwaveform u(t) which is the excitation received bytrained DNN, and the training error function is re-placed by the circuit design objective function. In step2, the excitation to the ADNN is replaced by (�f/�y),as defined in eq. (3). A block diagram, shown inFigure 2, demonstrates the process of computing theexact DNN model sensitivities using the ADNN tech-nique.

D. Discussions

The DNN energy function E introduced in eq. (2) is ina very general form, which is based on the timeinterval of interest [T1, T2] and the outputs of DNNmodel y(t). The requirement of T1 and T2 is that T2 �T1 � 0. For the purpose of DNN transient training, T1

is normally set to 0 so as to allow the DNN model tolearn the complete waveform starting from the DC

Figure 2. Block diagram of the ADNN algorithm for computing the sensitivities of the DNNenergy function with respect to both DNN weights w and static input �. D is an operator for timedifferentiation. The original DNN is solved by numerical integration forward in time and theproposed ADNN is evaluated using numerical integration backward in time. The excitation signalto the ADNN is derived from the solutions of the original DNN and the energy function.


392 Cao et al.

solution. In this case, f(y(t)) in eq. (2) can be special-ized to represent the least square norm of the differ-ence between the DNN outputs and training wave-form data. In addition, the energy function E can beformulated as various design objective functions inorder to further expand the ADNN-based sensitivitiesto accommodate circuit design optimization. For ex-ample, the energy function E can represent the aver-age power of the DNN output signal that extends theapplication of the ADNN-based sensitivities to coverphysics-based circuit design. As another example, theenergy function E can be used to describe the widelyadopted least pth optimization function featuring thesignal-integrity-based circuit design using efficientADNN sensitivities. Detailed examples of setting upthe DNN energy function E for the purpose of non-linear circuit design will be provided in the nextsection.

To solve the ADNN model, the LU factors oforiginal DNN Jacobian matrix have to be stored ateach time point during the simulation. If we use thecircuit-based adjoint techniques [10, 12] for solvingthe corresponding circuit of ADNN, the matrix size tobe stored per time point has to be nNy � nNy. Throughour proposed recursive approach, the matrix size iseffectively reduced by a factor of n to Ny � Ny, thusimproving the efficiency of ADNN simulation.

The stability of a nonlinear transient model is arelevant issue for ensuring its reliability in circuittransient simulation. The stability investigation of a

nonlinear dynamic system typically involves seekingan appropriate Lyapunov function to prove the equi-librium state of the nonlinear system to be stable [15].How to combine the theory of Lyapunov functionwith the DNN nonlinear model would be an interest-ing future research direction.

V. NUMERICAL EXAMPLES

A. Modeling of Physics-BasedMultistage Driver

In this example, we demonstrate the use of the ADNNtechnique for physics-based transient modeling anddesign. Physics-based modeling and design are im-portant for next-generation CAD of high-speed/high-frequency circuits and systems [16]. We considermodeling a four-stage CMOS driver circuit imple-mented in 1-m technology [13]. Training data isobtained using a physics-based semiconductor devicesimulator MINIMOS-NT [17]. The driver load is atransmission line with parameters R � 36 /m, L �360 nH/m, C � 100 pF/m, and G � 0.01 S/m, andterminated with a 5-pF capacitor. Training waveformsare generated for different values of rise time Tr [0.25ns, 0.75 ns] and pulse amplitude Amp [4.5 V, 5.5 V],and varying interconnect length d [0.08 m, 0.14 m].The driver size is also perturbed 50% around thenominal value. Let W1 be the transistor gate width of

Figure 3. The sensitivities of the average output power of the four-stage CMOS driver evaluatedat three different driver sizes. The average power was used as the energy function E whilecomputing the sensitivities using the proposed ADNN technique.



the 1st-stage buffer. The dynamic and static DNNinputs are u � [vin, iout]

T and � � [W1] respectively,and the DNN output is y(t) � vout(t). The time intervalof interest [T1, T2] is [0 ns, 6 ns]. The DC values of theinput signals are chosen as DNN initial conditionsv(T1) for integrating the original DNN forward intime. For this example, a DNN structure with dynamicorder n � 1 and 30 hidden neurons in fANN is utilized.Applying the proposed algorithm summarized in sub-section IV.B, training the DNN with physics-basedtransient data has been carried out. Table I shows thegood match between the sensitivities of the dynamictraining error Ed with respect to the different DNNweights obtained using our ADNN and those usingthe perturbation approach, thus confirming the valid-ity of the ADNN-based sensitivities. After training,the resulting DNN model has an average test error of0.25% when using a set of independent MINI-MOS-NT test waveforms that are never used in train-ing.

Electrical power is an important criterion for high-speed digital design. We use our ADNN technique forcomputing the sensitivity of average output power ofthe driver with respect to the driver size under tran-sient excitation. To represent the signal power, wechoose the energy function E as

E � �T1

T2

f� y�dt �1

T2 � T1�

T1

T2

vout�t�iout�t�dt. (18)

Let h(t) symbolically represent the impulse responseof the transmission-line load. In order to define theADNN excitation (�f/� y), we symbolically use theconvolution relationship iout(t) � ��

�� vout()h(t �)d and differentiate the energy function in (18) toobtain

dE

d��

T1

T2 �f

�y

�y

��dt �

1

T2 � T1�

T1

T2 ��vout�t�iout�t��

��dt

�1

T2 � T1�

T1

T2

�iout�t� � iout�t��vout�t�

��dt, (19)

where iout(t) � �� vout()h( � t)d. Therefore,

the excitation of ADNN, that is, �f/� y in eq. (3), canbe set up as follows:

�f

� y� �iout�t� � ıout�t��/�T2 � T1�. (20)

There is no need to do numerical convolution foriout(t) and iout(t). iout(t) is obtained by standard trans-mission-line simulation with excitation vout(t), andıout(t) is obtained by exciting the same transmissionline with a time-reversed version of the signal vout(t),that is, vout(T2�t). Utilizing the proposed algorithm insubsection IV.C, the power sensitivities are com-puted. As shown in Figure 3, the sensitivities com-puted using our ADNN accurately match those fromphysics-based MINIMOS-NT perturbations. The totalCPU time for sensitivity analysis taken by the ADNNis 2 s, as compared to 6254 s taken by the MINI-MOS-NT perturbations, thus proving the significanceof our ADNN technique in nonlinear transient design.The ADNN helps bridge the gap between physics-based CAD and circuit-based CAD, and achievephysics-oriented solutions for circuit design at fastcomputation speeds.

B. Physics-Based Interconnect DriverOptimization and Design

In this example, we demonstrate the use of anothertype of energy function in eq. (2) and the applica-tion of ADNN technique for physics-based time-domain optimization of high-speed interconnects,including the physical/geometrical parameters asoptimization variables. The four-stage interconnectdriver circuit in example A is again used in MINI-MOS-NT to generate training data. Inputs to theDNN model of a four-stage driver include thedriver input voltage vin as dynamic input, the tran-sistor width W1 of the 1st-stage buffer, and thescaling coefficient S for subsequent stages as staticinputs. The combinations of W1 and S represent thedifferent buffer geometrical dimensions at differentstages, that is, W1Si (power i of S) is the transistorwidth for the (i � 1)th-stage buffer. The output ofthe DNN model is the driver output voltage vout.

TABLE I. Examples of Sensitivity Comparison forthe Physics-Based Training of the DNN Buffer Modelin Example A

DNNSensitivity

PerturbationMethod

ProposedADNN

TechniqueDifference

[%]

�Ed/�w212 1.1248E � 01 1.1246E � 01 0.0178

�Ed/�w532 �6.6431E � 02 �6.6396E � 02 0.0527

�Ed/�w452 �1.0136E � 00 �1.0137E � 00 0.0099

�Ed/�w133 6.8080E � 01 6.8079E � 01 0.0015

�Ed/�w163 5.2605E � 01 5.2606E � 01 0.0019

�Ed/�w193 �5.9834E � 00 �5.9832E � 00 0.0033


394 Cao et al.

The transient training and test waveform data aregathered in MINIMOS-NT by varying the transistorwidths under different combinations of W1 and S.

Utilizing the proposed algorithm given in sub-section IV.B to provide error sensitivities duringtraining process, a DNN driver model with dynamicorder n � 1 and 25 hidden neurons in fANN istrained. The average test error of the trained DNNis within 0.5%. Figure 4 shows the comparison ofDNN model outputs with test waveforms gatheredfrom the MINOMOS-NT simulation. Excellentmatch is achieved, even though such test wave-forms are never used in training, thus validating theaccuracy of the DNN model trained using theADNN technique. The CPU time taken by ADNN-based training versus DNN perturbation-basedtraining is 45 min versus 5.5 h, showing a signifi-cant speed-up of training by utilizing the ADNNtechnique.

The trained DNN driver model is then used in ahigh-speed VLSI interconnect circuit for time-domainoptimization under specifications imposed on tran-sient responses in the interconnect network. A trans-mission-line load 100 mm in length is supplied to theDNN model. The specifications used in our transientoptimization are defined as follows:

Vspec�t� � �4.75 V as lower specification

for 1 ns � t � 2 ns0.25 V as upper specification

for t � 3.5 ns0 V otherwise

.

(21)

For transient optimization, error functions can be for-mulated in terms of design specifications followingthose in [8]. We formulate a continuous time-domainerror function e(t) as

e�t� � vout�t� � Vspec�t� (22)

To facilitate our continuous time-domain optimiza-tion, we formulate a continuous version of the one-sided least pth function [18] as

H�e�t�� T1

T2

[ g(t)e(t)]pdt 1/p

, (23)

where g(t) is a weighting function described by

Figure 4. The output voltage waveforms of the four-stage CMOS driver in example B at threedifferent geometries: (1) W1 � 1.0 m S � 2.5; (2) W1 � 1.5 m S � 4.0; (3) W1 � 3.0 m S �3.0. Excellent agreement is achieved between the responses of DNN with those of MINIMOS-NTeven though these MINIMOS-NT test waveforms are never used in training.



g�t� � �1 if e(t) � 0 and Vspec(t) is

upper specification�1 if e(t) � 0 and Vspec(t) is

lower specification0 otherwise

. (24)

We define the objective function for optimization inthe form of energy function (2) as

E � �T1

T2

f� y�t��dt � Hp�e�t�� T1

T2

� g�t��vout�t�

� Vspec�t��pdt, (25)

where y � vout(t) is the DNN model output and [T1,T2] is defined as [0 ns, 5 ns] in this example.

The optimization problem now is to find feasibleW1 and S such that the objective function E is mini-mized until all the specifications are satisfied. In orderto perform such an optimization, we need the sensi-tivity information of dE/dW1 and dE/dS, where W1

and S are the external static inputs to the DNN model,that is, � in eq. (1). We computed such sensitivitiesusing the ADNN and compared them with those fromdirect MINIMIS-NT perturbation, as shown in Figure5, for various combinations of W1 and S. The excel-lent match between the ADNN-based sensitivities and

those from perturbation again confirms the validity ofthe ADNN technique. Randomly choosing W1 � 1.5m and S � 2.2 as a starting point, the optimizationis carried out using derivatives obtained from theproposed algorithm in subsection IV.C. In this ex-ample, the ADNN excitation given in eq. (3) isdescribed as

�f

� y� p� g�t��vout�t� � Vspec�t��p�1. (26)

In our example, we used p � 3. During the optimizationprocess, both the original DNN and ADNN are numer-ically integrated by trapezoidal rule method with a fixedstep size �t � 0.01 ns. When solving the original DNN,at each time step the scalars M and (�fANN/�vout) in thelast iteration of the NR algorithm are saved and reused atthe corresponding time point of ADNN integration. LetNt be the number of time samples per waveform. Uponfinishing the original DNN analysis, Nt forward/back-ward substitutions with the previously stored M and(�fANN/�vout) are performed to obtain the ADNN solu-tions at all time points leading to sensitivity solutions foroptimization. After performing the optimization, the op-timal values are W1 � 2.8 m and S � 3.6.

Buffer sizing is an important aspect of high-speedVLSI interconnect design. For example, long trans-

Figure 5. Sensitivity of the objective function for time-domain signal integrity optimization ofhigh-speed VLSI interconnects (in example B) with respect to scaling coefficient S at differentgeometries of the devices: (1) W1 � 1.0 m; (2) W1 � 1.3 m; (3) W1 � 1.6 m. The one-sidedleast pth function was used to formulate the energy function E while computing the sensitivitiesusing the proposed ADNN technique.


396 Cao et al.

mission lines often need to be driven by a large driverbuffer to maintain signal quality while short transmis-sion lines can be driven by a small buffer to minimizepower. A good buffer model together with transmission-line loads will be necessary to effectively perform sucha design task. Here, we use our DNN buffer modeltrained with physics data to achieve this purpose. Inpractice, to cope with different interconnect loads, thecorrect sizes of buffers can be found through the opti-mization process. To illustrate the concept of the ADNNtechnique for a basic buffer-sizing example, we per-formed a second optimization with the same goal as eq.

(21) in order to determine the suitable values of buffersizes for a long interconnect load (with transmission-linelength of 200 mm). Following a procedure similar to thefirst optimization, the optimal values of the design vari-ables are achieved at W1 � 5.7 m and S � 3.4. Thetransient responses vout before and after optimizationtogether with the MINIMOS-NT results are shown inFigure 6 for both optimization cases. As can be ob-served, all the specifications have been satisfied. Theimprovements of signal quality and signal-delay reduc-tion are obvious. The signal responses of the intercon-nect circuit simulated with DNN at the optimal buffer

Figure 6. (a) Responses of the four-stage CMOS driver models before and after DNN-based signalintegrity optimization (the DNN-based optimal solutions are verified by direct physics-based MINI-MOS-NT simulation, that is, MINIMOS-NT simulation of the solutions obtained by DNN-basedoptimization); (b) example of ADNN inputs and outputs used to generate sensitivities for optimization;(c) and (d) are similar to (a) and (b) except that they are for long transmission-line loads, yielding optimalbuffer sizing for different interconnects. DNN-based optimization utilizing proposed ADNN sensitivitiesachieves physics-level accuracies at a fraction of the time needed for direct physics-based optimization.



sizes are in excellent agreement with those simulated byMINIMOS-NT. In addition, the buffer sizes after thesecond optimization are quite large as compared withthose from the first optimization, thus demonstrating theeffect of buffer-sizing design for different scenarios oftransmission-line loads.

The ADNN sensitivity enabled effective training ofthe DNN for this transient modeling and optimizationexample. Using the trained DNN buffer model togetherwith ADNN sensitivities, the buffer-sizing optimizationusing physics-based buffer information was achieved in8 s. Without the neural-network approach, direct opti-mization using the MINIMOS-NT physics simulatorwith perturbation-based derivatives would require10.5 h. Our ADNN approach makes it possible to useDNN models for high-speed interconnect design withphysics-level accuracy at only a fraction of the timeneeded for direct physics-based optimization.

VI. CONCLUSION

We have presented an effective ADNN technique forobtaining exact adjoint sensitivities of DNN models in atransient environment. The solution process has beendescribed by efficiently relating it to the solution processof the original DNN, further reducing the cost of adjointsensitivity computations. This technique has been usedto enable training of DNN in order to learn transientbehaviors of nonlinear circuits within a practically ac-ceptable computational time frame. In this way, ADNNopens the door for expanding neural-network techniquesinto a new area of application, namely, transient CAD ofnonlinear high-frequency/high-speed circuits. This tech-nique also allows the sensitivities of electrical criteriawith respect to the geometrical/physical design parame-ters of a nonlinear component to be computed effi-ciently. These features make it possible to use neural-network models trained with device physics data inhigh-speed circuit optimization, thus achieving physics-level accuracies using only a fraction of the time neededby direct physics-based optimization.

REFERENCES

1. Q.J. Zhang and K.C. Gupta, Neural networks for RF andmicrowave design, Artech House, Norwood, MA, 2000.

2. P. Burrascano, S. Fiori, and M. Mongiardo, A review ofartificial neural networks applications in microwavecomputer-aided design, Int J RF and Microwave CAE 9(1999), 158–174.

3. D. Schreurs, J. Verspecht, S. Vandenberghe, and E.Vandamme, Straightforward and accurate nonlinear de-

vice model parameter-estimation method based on vec-torial large-signal measurements, IEEE Trans Micro-wave Theory Tech 50 2 (2002), 315–2319.

4. V. Rizzoli, A. Neri, D. Masotti, and A. Lipparini, Anew family of neural-network-based bidirectional anddispersive behavioral models for nonlinear RF/Micro-wave subsystems, Int J RF and Microwave CAE 12(2002), 51–70.

5. Y. Fang, M.C.E. Yagoub, F. Wang, and Q.J. Zhang, Anew macromodeling approach for nonlinear microwavecircuits based on recurrent neural networks, IEEE TransMicrowave Theory Tech 48 (2000), 2335–2344.

6. J.J. Xu, M. Yagoub, R.T. Ding, and Q.J. Zhang, Neural-based dynamic modeling of nonlinear microwave cir-cuits, IEEE Trans Microwave Theory Tech 50 (2002),2769–2780.

7. A. Dounavis, R. Achar, and M.S. Nakhla, Efficientsensitivity analysis of lossy multiconductor transmis-sion lines with nonlinear terminations, IEEE Trans Mi-crowave Theory Tech 49 (2001), 2292–2299.

8. Q.J. Zhang, S. Lum, and M. Nakhla, Minimization ofdelay and crosstalk in high-speed VLSI interconnects,IEEE Trans Microwave Theory Tech 40 (1992), 1555–1563.

9. I.S. Stievano, Z. Chen, D. Becker, F.G. Canavero, G.Katopis, and I.A. Maio, Behavioral modeling of digitalIC input and output ports, Proc. 10th IEEE Topical MtgElect Performance Electron Packag (EPEP), Cam-bridge, MA, 2001, pp. 331–334.

10. S.W. Director and R.A. Rohrer, The generalized adjointnetwork and network sensitivities, IEEE Trans CircTheory 16 (1969), 318–323.

11. J. Vlach and K. Singhal, Computer methods for circuitanalysis and design, Van Nostrand Reinhold, NewYork, NY, 1993.

12. A.R. Conn, R.A. Haring, C. Visweswariah, and C.W. Wu,Circuit optimization via adjoint Lagrangians, Proc IEEE/ACM Int Conf CAD, San Jose, CA, 1997, pp. 281–288.

13. Y. Cao, J.J. Xu, V.K. Devabhaktuni, R.T. Ding, and Q.J.Zhang, An adjoint dynamic neural network technique forexact sensitivities in nonlinear transient modeling andhigh-speed interconnect design, IEEE MTT-S Int Micro-wave Symp, Philadelphia, PA, 2003, pp. 163–168.

14. J.J. Xu, M.C.E. Yagoub, R.T. Ding, and Q.J. Zhang,Exact adjoint sensitivity analysis for neural-based mi-crowave modeling and design, IEEE Trans MicrowaveTheory Tech 51 (2003), 226–237.

15. A.P. Derek, Stability of nonlinear systems, ResearchStudies Press, Chichester, NY, 1981.

16. M.B. Steer, J.W. Bandler, and C.M. Snowden, Com-puter aided design of RF and microwave circuits andsystems, IEEE Trans Microwave Theory Tech 50(2002), 996–1005.

17. MINIMOS-NT v.2.0, Institute for Microelectronics,Technical University Vienna, Vienna, Austria.

18. J.W. Bandler and S.H. Chen, Circuit optimization: thestate of the art, IEEE Trans Microwave Theory Tech 36(1988), 424–443.


398 Cao et al.

BIOGRAPHIES

Yi Cao was graduated with a B.Eng. degreefrom Tianjin University, China in 1999 andobtained his M.Sc. degree from CarletonUniversity, Ottawa, Canada in 2003, both inelectrical engineering. Currently, he is work-ing towards a Ph.D. degree at Carleton Uni-versity. His research interests include com-puted-aided design of VLSI modules andapplications of artificial neural networks for

high-frequency/high-speed circuit modeling and design. He is therecipient of the 2004 Carleton University Indira Gandhi MemorialFellowship.

Jianjun Xu received his B.Eng. degree fromTianjin University, Tianjin, China in 1998,and his Ph.D. degree in electrical engineer-ing from Carleton University, Ottawa, Can-ada in 2004. He is currently a Post-DoctoralFellow in the Department of Electronics atCarleton University. His research interestsinclude neural networks, modeling and theirapplications in computer-aided design for

electronics circuits. He is the recipient of numerous academicawards and scholarships, including the Ontario Graduate Scholar-ship in Science and Technology, the Ontario Graduate Scholarship,and the Senate Medal at Carleton University for outstanding workat the doctoral level. He was also the recipient of a Student PaperAward at IMS-2001.

Vijaya K. Devabhaktuni received hisB.Eng. degree in electrical and electronicsengineering and his M.Sc. degree in physicsboth from the Birla Institute of Technologyand Science, Pilani, Rajasthan, India in 1996,and his Ph.D. degree in electronics from Car-leton University, Ottawa, Ontario, Canada in2003. During 2003–2004, he was a NaturalSciences and Engineering Research Council

of Canada Postdoctoral Fellow in the RFIC Group, ATIPS Labo-ratory, Department of Electrical and Computer Engineering, Uni-versity of Calgary, Calgary, Alberta, Canada. In January 2005, hejoined the School of Engineering and Engineering Technology,Pennsylvania State University, Erie, PA, as a Visiting AssistantProfessor. He is currently an Assistant Professor and Canada Re-search Chair in Computer Aided High-Frequency Modeling andDesign in the Department of Electrical and Computer Engineering,Concordia University, Montreal, Quebec, Canada. He is a two-timerecipient (in the 1999–2000 and 2000–2001 academic years) of theOntario Graduate Scholarship presented by the Ministry of Train-ing, Colleges and Universities of the Province of Ontario, Canada.He received the Carleton University Senate Medal in 2003 for hisoutstanding academic achievements at the doctoral level. He is alsothe recipient of the 2001 Teaching Excellence Award in Engineer-ing presented by the Carleton University Students’ Association and2003 Teaching Excellence Award in Engineering presented by the

University of Calgary Engineering Students’ Society. His researchinterests include electronic design automation, neural networks,optimization methods, RF/microwave modeling, signal processing,and wireless sensor networks. He is a member of the Association ofProfessional Engineers, Geologists, and Geophysicists of the prov-ince of Alberta, Canada.

Runtao Ding was born in Shanghai, Chinain 1938. He received his Diploma degreefrom Tianjin University, Tianjin, China, in1961. Since 1961, he has been with the De-partment of Electronic Engineering, Schoolof Electronic Information Engineering, Tian-jin University, where he is currently a Pro-fessor. From 1991 to 1996 and from 1996 to1999, he was the Chairman of the Depart-

ment of Electronic Engineering and the Dean of the School ofElectronic Information Engineering, respectively. His research in-terests include nonlinear signal processing, image processing, neu-ral networks, and circuit design. He was a co-chair of the TechnicalProgram Committee (TPC) of the IEEE APCCAS’2000.

Qi-Jun Zhang received his B.Eng. degreefrom the East China Engineering Institute,Nanjing, China, in 1982, and his Ph.D. de-gree in electrical engineering from McMas-ter University, Hamilton, Ontario, Canada in1987. From 1982 to 1983, he was with theSystem Engineering Institute, Tianjin Uni-versity, Tianjin, China. From 1988 to 1990,he was with Optimization Systems Associ-

ates (OSA) Inc., Dundas, Ontario, Canada, where he developedadvanced microwave optimization software. In 1990, he joined theDepartment of Electronics, Carleton University, Ottawa, Ontario,Canada, where he is currently a Professor. His research interestsinvolve neural-network and optimization methods for high-speed/high-frequency circuit design, and he has more than 170 publica-tions in the area. He is an author of “Neural Networks for RF andMicrowave Design” (Artech House, 2000), a coeditor of “Modelingand Simulation of High-Speed VLSI Interconnects” (Kluwer,1994), and a contributor to “Encyclopedia of RF and MicrowaveEngineering” (Wiley, 2005), “Fundamentals of Nonlinear Behav-ioral Modeling: Foundations and Applications,” (Artech House,2005), and “Analog Methods for Computer-Aided Analysis andDiagnosis,” (Marcel Dekker, 1988). He was Guest Co-Editor forthe Special Issue on High-Speed VLSI Interconnects of the Inter-national Journal of Analog Integrated Circuits and Signal Process-ing (Kluwer, 1994), and twice a Guest Editor for the Special Issueson Applications of ANN to RF and Microwave Design of theInternational Journal of RF and Microwave CAE (Wiley, 1999,2002). He is a Fellow of the IEEE and a Member of the ProfessionalEngineers Ontario. He is on the editorial board of IEEE Transac-tions on Microwave Theory and Techniques, the InternationalJournal of RF and Microwave CAE, and the International Journalof Numerical Modeling. He is a member of the Technical Commit-tee on CAD of the IEEE MTT Society.



An adjoint sensitivity technique for dynamic neural-network modeling and design of high-speed...

Documents

Transcript of An adjoint sensitivity technique for dynamic neural-network modeling and design of high-speed...