Test-access mechanism optimization for core-based three...

Microelectronics Journal 41 (2010) 601–615

Contents lists available at ScienceDirect

Microelectronics Journal

0026-26

doi:10.1

$A p

of the IE$$Th

SRC gra� Corr

E-m

journal homepage: www.elsevier.com/locate/mejo

Test-access mechanism optimization for core-basedthree-dimensional SOCs$,$$

Xiaoxia Wu a, Yibo Chen b, Krishnendu Chakrabarty c,�, Yuan Xie b

a Qualcomm, San Diego, CA 92121, USAb Department of Computer Science and Engineering, Penn State University, University Park, PA 16802, USAc Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA

a r t i c l e i n f o

Article history:

Received 22 January 2010

Received in revised form

18 June 2010

Accepted 22 June 2010Available online 22 July 2010

Keywords:

Three-dimensional integration

Through silicon via

Test access mechanism

Integer linear programming

Randomized rounding

92/$ - see front matter & 2010 Elsevier Ltd. A

016/j.mejo.2010.06.015

reliminary conference version of this paper w

EE International Conference on Computer D

is work was supported in part by NSF 09053

nts.

esponding author.

ail address: [email protected] (K. Chakrabart

a b s t r a c t

Embedded cores in a core-based system-on-chip (SOC) are not easily accessible via chip I/O pins.

Test-access mechanisms (TAMs) and test wrappers (e.g., the IEEE Standard 1500 wrapper) have been

proposed for the testing of embedded cores in a core-based SOC in a modular fashion. We show that

such a modular testing approach can also be used for emerging three-dimensional integrated circuits

based on through-silicon vias (TSVs). Core-based SOCs based on 3D IC technology are being advocated

as a means to continue technology scaling and overcome interconnect-related bottlenecks. We present

an optimization technique for minimizing the post-bond test time for 3D core-based SOCs under

constraints on the number of TSVs, the TAM bitwidth, and thermal limits. The proposed optimization

method is based on a combination of integer linear programming, LP-relaxation, and randomized

rounding. It considers the Test Bus and TestRail architectures, and incorporates wire-length constraints

in test-access optimization. Simulation results are presented for the ITC 02 SOC Test Benchmarks and

the test times are compared to that obtained when methods developed earlier for two-dimensional ICs

are applied to 3D ICs. The test time dependence on various 3D parameters (e.g. 3D placement, the

number of layers, thermal constraints, and the number of TSVs) is also studied.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Embedded cores are widely used in large system-on-a-chip(SOC) designs. However, since embedded cores are not directlyaccessible via chip inputs and outputs, special access mechanismsare required to test them at the SOC level. A test-accessarchitecture, also referred to as a test-access mechanism (TAM),provides the means for on-chip test data transport. It can be usedto transport test stimuli from the test-pattern source to the core-under-test, and to transport test responses from the core-under-test to the test-response sink [1]. Test wrappers are alsoimportant components of the test-access infrastructure in acore-based SOC. Test wrappers form the interface between theembedded core and its environment, and connect the terminals ofthe embedded core to other cores in the SOC and to the TAMmechanism [1].

Modular testing of core-based SOCs allows embedded cores inSOCs to be tested as black boxes and it facilitates the reuse of test

ll rights reserved.

as published in Proceedings

esign, 2008.

65, 0903432, 0702617 and

y).

patterns. Therefore, it is an especially attractive test solution forcore-based SOCs. Test-wrapper design and TAM optimization arevital for modular testing because they directly impact SOC testingtime, and therefore the test cost. The recent IEEE 1500 standardaddresses wrapper design, but it leaves TAM optimization to thesystem integrator [2]. Therefore, TAM optimization and testscheduling at the SOC level has been an active area of researchin recent years [3–11]. Prior work has, however, been limited totraditional two-dimensional (2D) integrated circuits (ICs), whichis the widely used technology today for IC manufacturing.However, with the emergence of three-dimensional (3D) ICs andthe increasing likelihood of core-based design being the designstyle of choice for 3D ICs [12], there is a need to develop modulartesting methods for 3D ICs. While leveraging today’s solutions,these test-access infrastructure methods must also address TSVlimits, core placement, and thermal problems as constraints thatare unique to 3D integration.

1.1. Background on 3D integration

With continued technology scaling, interconnect has emergedas the dominant source of circuit delay and power consumption.The reduction of interconnect delays and power consumption areof paramount importance for deep-submicron designs. A 3D ICshave recently emerged as a promising means to mitigate these

www.elsevier.com/locate/mejo

dx.doi.org/10.1016/j.mejo.2010.06.015

mailto:[email protected]

dx.doi.org/10.1016/j.mejo.2010.06.015

Fig. 1. 3D IC structures with face-to-face and face-to-back bonding.

X. Wu et al. / Microelectronics Journal 41 (2010) 601–615602

interconnect-related problems [13–15]. Several 3D integrationtechnologies have been explored recently, including wire bonded,microbump, contactless (capacitive or inductive), and through-silicon-via (TSV) vertical interconnects [16]. TSV 3D integrationhas the potential to offer the greatest vertical interconnectdensity, and therefore is the most promising one among all thevertical interconnect technologies. In 3D ICs that are based on TSVtechnology, multiple active device layers are stacked together(through wafer stacking or die stacking) with direct vertical TSVinterconnects [17]. Fig. 1 shows two 3D IC structures with face-to-faceand face-to-back bonding.

3D ICs offer a number of advantages over traditional two-dimensional (2D) design [17]:

�
Shorter global interconnect because the vertical distance (orthe length of TSVs) between two layers is usually in the rangeof 10–100mm [17], depending on manufacturing processes. � Higher performance because of the reduction of average
interconnect length, as well as bandwidth improvement dueto die stacking.
� Lower interconnect power consumption due to wiring length
reduction (reduced capacitance).
� Higher packing density and smaller footprint. � Support for the implementation of mixed-technology chips:
each die can be based on different technologies.

The fabrication of 3D ICs is now viable. For example, in early2007, IBM announced breakthroughs that enable the transitionfrom horizontal 2-D chip layouts to 3-D chip stacking. Eventhough 3D manufacturing is feasible, 3D IC design will not becommercially viable without the support of relevant 3D design-automation tools, which are needed to allow IC designers toefficiently exploit the benefits of 3D technologies. Design andtest-automation tools are also needed to carry out appropriatetradeoffs because the number of available TSVs, which directlyaffects the chip area, is limited.

1.2. Research need and paper contributions

Test techniques and design-for-testability solutions for 3D ICshave remained largely unexplored in the research community,even though experts in industry have identified a number of testchallenges related to the lack of probe access for wafers, testaccess to modules in stacked wafers/dies, thermal concerns, testeconomics, and new defects arising from unique processing stepssuch as wafer thinning, alignment, and bonding. Solutions areneeded for both pre-bond and post-bond test of 3D stacked chips.In pre-bond test, each die or wafer is tested separately, so thatonly fault-free parts are bonded and packaged. Pre-bond test

increases the overall yield of 3D stacking but it requires majoradvances in probe access and DFT methods for partial designs on3D layers. Post-bond test, in which a 3D chip is tested afterstacking, is necessary to reduce defect escape and qualify theassembly process.

In this paper, we address TAM optimization for post-bond testtime minimization for 3D SOCs using the wrapper design methodfrom [3]. We do not consider pre-bond testing in this work. Whilethere are obvious similarities between TAM optimization for 2DICs and that for 3D ICs, additional constraints related to thenumber of TSVs, placement of cores on different layers, andthermal limits must be taken into account for 3D integration. In3D SOCs, the various cores are placed on different layers [18]; thisapproach reduces interconnect length, decreases chip footprint,and allows more efficient mixed-technology integration. How-ever, all the chip pins are at the lowest layer nearest to thepackage for post-bond test, and due to the limited availability ofTSVs for testing, the availability of bitwidth for test-access tocores on higher layers is adversely affected. TSVs not only occupyconsiderable area on a chip (especially with the need for a ‘‘keep-out area’’ for each TSV), but most of them are needed forfunctional interconnects, power/ground routing, and clock signals.Therefore, the test-access infrastructure needs to be designed touse only a few TSVs.

The rest of the paper is organized as follows: Section 2 uses asimple example to illustrate the problem investigated in thispaper. Section 3 presents related prior work on 3D ICs and TAMoptimization for 2D ICs. Section 4 describes integer linearprogramming models, as well as a heuristic method based onLP-relaxation and randomized rounding. It addresses both theTest Bus and TestRail architectures, and constraints related towire length. Section 5 presents experimental results on bench-mark SOCs. Finally, Section 6 presents conclusions and outlinesdirections for future work.

2. Problem formulation

2.1. Motivational example

Fig. 2 shows an example to illustrate the motivation of ourresearch. A total of five (wrapped) cores are shown in a 3D IC. Fourcores are placed in Layer 0, while the fifth core is placed onLayer 1. Suppose that a total of 2 W channels (TAM wires) areavailable from the tester to access the cores, out of which W

channels must be used for transporting test stimuli and theremaining W channels must be used for collecting the testresponses. We also have an upper limit on the number of TSVsthat can be used for the test-access infrastructure. The TAMdesign needs to be optimized to minimize the test applicationtime. Therefore, the objective here is to allocate wires (bitwidths)to the different TAMs used to access the cores. Fig. 2 shows twopossible TAM designs for the same 3D SOC design:

�
Approach 1: In Fig. 2(a), we have two TAMs of width w 1 and w
2, respectively, where w 1 + w 2 ¼ W and the core on Layer 1is accessed using the second TAM. A total of 2 �w1 TSVs areneeded for this test-infrastructure design.
� Approach 2: Fig. 2(b) presents an alternative design in which
three TAMs are used and the core of Layer 1 is accessed usingthe third TAM (v 1 + v 2 + v 3 ¼ W).

Note that both 2 �w1 (in Fig. 2(a)) and 2 � v3 (in Fig. 2(b)) areno more than the number of TSVs dedicated for test access. InApproach 1, there are only two TAMs and the width of each TAMcould be wider, but TAM 1 has to sequentially test 3 cores; In

Fig. 2. Two solutions for TAM optimization in a 3D IC.

X. Wu et al. / Microelectronics Journal 41 (2010) 601–615 603

Approach 2, there are three TAMs and the width of each TAMmight be smaller, but each TAM connects a maximum of twocores. It is not obvious to the designers which approach is betterin terms of test time. Therefore, optimization methods and designtools are needed to determine a test access architecture (e.g.,number of TAMs, width of the TAMs, assignment of cores to TAMs,etc.) that leads to the minimum test time under constraints on thenumber of TSVs.

2.2. Problem statement

The general problem of 3D SOC test planning includes thedesign/optimization of a test-access architecture, optimization ofcore wrappers, and test scheduling. The goal is to minimize thetesting time under limits on the number of TSVs and the SOC-levelTAM width. Additional constraints are also imposed by thermalconsiderations that are of paramount importance in 3D integration.The TAM optimization problem 3DPPAW (PAW stands for Partition

of TAM width, Assignment of cores to each TAM, and Wrapperdesign for each core) that we address in this paper is as follows.Given the test set parameters (number of test patterns, number ofI/Os, and scan chains, and scan-chain lengths) for the embeddedcores in the SOC, the total TAM width, 3D-technology constraints(maximum number of TSVs, thermal limits, etc.), and the 3Dplacement for each core, determine the partition of the total TAMwidth among the TAM partitions, an optimal assignment of cores toeach TAM partition, and an optimal wrapper design for each core,such that the overall SOC testing time is minimized.

In order to solve 3DPPAW, we first examine two simplerproblems, along the same lines as the general TAM optimizationproblem was decomposed into a progression of simpler problemsin [3]. The first problem 3DPW (W stands for Wrapper design foreach core) addresses test-wrapper design, which is identical to 2Dwrapper design [3] since we assume that any given core is placedon only one layer (i.e., a core is not partitioned across multiplelayers). The reason that we choose coarse partitioning (one singlecore is placed in one layer) instead of fine grain partitioning


(one core is partitioned into several layers) is that fine grainpartitioning results in larger number of TSVs and increases designand testing complexity. The second problem 3DPAW (AW standsfor Assignment of cores to each TAM, and Wrapper design for eachcore) addresses 3D TAM optimization; it considers a predefinedTAM width for each TAM partition. We formulate the relevantoptimization problems and develop integer linear programming(ILP) models for both 3DPAW and 3DPPAW. These two optimizationproblems have been shown to be NP-hard [19] in [3]. Therefore,we also develop a heuristic technique based on a combination ofILP modeling, LP-relaxation, and randomized rounding [20] toefficiently obtain near-optimal results.

3. Related prior work

A number of test-access architectures and TAM optimizationmethods have been proposed in the literature. These includemultiplexed access [21], partial isolation rings [22], core trans-parency [23], dedicated test bus [24], TestRail architecture[25,26], etc. TAM optimization and test scheduling problems havebeen tackled using ILP [3,27], rectangle packing [10,28], iterativerefinement [4,29], and other heuristics [7,30]. Wirelength and testtime minimization has also been studied [31,32]. However, allthese methods are targeted at traditional 2D technologies.

The 3D technologies have attracted considerable attention inthe past few years. Research has been focused on 3D fabrication,3D electronic design automation (EDA), and 3D system architec-tures [13–15]. Among all EDA challenges for 3D IC design, toolsand methodologies for 3D IC testing are regarded as the ‘‘No.1challenge’’ [33]. However, research on testing for 3D ICs is still inits infancy. It is only recently that progress has been reported onthe testing of 3D ICs. For example, Lewis et al. have proposed ascan-island based pre-bond test method to facilitate the testabi-lity of die-stacked microprocessors [34]. Wu et al. have proposedseveral 3D scan-chain design techniques [35]. A recently devel-oped TAM optimization technique does not impose any limits onthe number of TSVs used for the TAM, but considers pre-bond testconsiderations and wirelength limits [36]. A drawback of thisapproach is that since it does not limit the number of TSVs for theTAM, it ignores constraints related to the ‘‘keep-out area’’ that isassociated with a TSV. Recent work has also been reported on thetesting of TSVs [37] and an optimization method for reusing theTAM for both pre-bond and post-bond test [38].

4. Optimization of 3D TAMs

In this section, we first assume the ‘‘Test Bus’’ model for TAMdesign. We assume that each of the B TAMs on the 3D SOC areindependent; however, the cores on each TAM are tested insequential order. In Section 4.6, we show that the model can beeasily modified for TestRail architecture. We do not addressthe design of hierarchical TAMs [39,40] or test scheduling forhierarchical SOCs [40]. The SOC hierarchy is flattened for thepurpose of TAM design and hierarchical cores are considered tobe at the same design level in test mode. We first review thewrapper-design problem 3DPW from [41] and [3]. Next, wedescribe ILP models to solve the 3DPAW and 3DPPAW problems,and show how randomized rounding can be used to solve theseproblems efficiently.

4.1. Test-wrapper design

A test wrapper is a layer of design-for-test logic that connects aTAM to a core for the purpose of testing [1]. An embedded core

typically contains multiple functional I/Os as well as severalinternal scan chains. To perform test-width adaptation, wrapperscan chains are constructed by connecting core I/Os with internalscan chains. The number of wrapper scan chains thus constructedis equal to the TAM width provided to the core; hence eachwrapper scan chain is assigned to a unique TAM line. Thus thetest-data width (number of core terminals) of the core is adaptedto the TAM width. Balanced wrapper scan chains are those thatare as equal in length to each other as possible. Balanced wrapperscan chains are important because the number of clock cycles toscan in (out) a test pattern to (from) a core is a function of thelength of the longest wrapper scan-in (scan-out) chain. Let si (so)be the length of the longest wrapper scan-in (scan-out) chain for acore. The time required to apply the entire test set to the core isthen given by T ¼ ð1þmaxðsi,soÞÞ � piþminðsi,soÞ, where p is thenumber of test patterns [41]. Note that T decreases as both si andso are reduced, i.e., as the wrapper scan-in (and scan-out) chainsbecome more equal in length.

The test-wrapper design problem PW is defined as follows[41,3]: Given a core with n functional inputs, m functionaloutputs, sc internal scan chains of lengths l1, l2,y, lsc, respectively,and TAM width k, assign the n+m+sc wrapper scan chainelements to k1rk wrapper scan chains such that (i) max{si, so}is minimized, where si (so) is the length of the longest wrapperscan-in (scan-out) chain, and (ii) k1 is minimum, subject topriority (i).

We adopt the approximation algorithm based on the best fitdecreasing (BFD) heuristic [3,19] to solve PW efficiently. Thealgorithm has three main parts: (i) partition the internal scanchains among a minimum number of wrapper scan chains tominimize the longest wrapper scan chain length, (ii) assign thefunctional inputs to the wrapper scan chains created in part (i),and (iii) assign the functional outputs to the wrapper scan chainscreated in part (i).

4.2. ILP model for 3DPAW

We next present the ILP model for solving 3DPAW. This modelserves as the basis for the ILP model for the more general problem3DPPAW. The goal in ILP is to minimize a linear objective functionon a set of integer variables, while satisfying a set of linearconstraints [42]. A typical ILP model can described as follows:

Minimize Cx subject to AxrB, where xZ0 (r means lessthan or equal to). In this model, x represents a vector of variables,C is a cost vector, A is a constraint matrix, and B refers to a vectorof constants.

The 3DPAW problem is formally defined as follows: Given N

cores that have placed in a 3D floorplan and B TAM partitions ofbitwidths w1, w2,y,wB, respectively, and the maximum numberof TSVs available for the TAM, determine an assignment of coresto TAM partitions and a wrapper design for each core, such thatthe testing time is minimized.

Let Ti(wj) be the number of clock cycles needed to test Core i ifit is assigned to TAM partition j. The testing time is calculated asfollows [3]: TiðwjÞ ¼ ð1þmaxðsi,soÞÞ � piþminðsi,soÞ where pi is thenumber of test patterns for Core i (which is known) and si(so) isthe length of the longest wrapper scan-in(scan-out) chainobtained from design wrapper. As in [3], we define the binaryvariable xij as follows:

xij ¼1 if Core i is assigned to TAM j

0 otherwise

�ð1Þ

The time needed to test all cores on TAM partition j is given byPNi ¼ 1 TiðwjÞ � xij. Since test data can be transferred on the different


TAM partitions in parallel, the SOC testing time is simplymax1r jrB

PNi ¼ 1ðTiðwjÞ � xijÞ.

Next we introduce constraints that are imposed by 3D ICtechnology. Even though the various cores can be placed ondifferent layers, all the I/O pins in a 3D IC are only at the bottomlayer, i.e., Layer 0 [17]. We therefore assume that all the I/Os forthe TAM are also in Layer 0. An integer Li is introduced torepresent the layer assignment for Core i, e.g., L1 ¼ 2 means thatCore 1 is assigned to Layer 2. We also add a binary variable zijk,which is 1 only if both Core i and Core j are in TAM partition k. Weuse this variables to calculate the number of TSVs that are neededfor 3D TAM design. It can be easily seen that zijk ¼ xik � xjk. Thenonlinear equation can be linearized by treating zijk as anindependent variable and adding two new constraints:

(1)
xikþxjk�zijkr1 Fig. 3. The mathematical programming model for 3DPAW. (2) xikþxjk�2 � zijkZ0.
We next calculate the minimum number of TSVs needed toimplement TAM partition k.

Theorem 1. The minimum number of TSVs, vk, needed to implement

TAM partition k of width wk is given by

vk ¼ 2 �wk � max1r i,jrN

zijk � max1r i,jrN

fðLi�LjÞ,Li,Ljg

� ��

Proof. We prove the theorem using the principle of mathematicalinduction. Let Lij ¼max1r i,jrNððLi�LjÞ,Li,LjÞ. We therefore havevk ¼ 2 �wk �max1r i,jrNðzijk � LijÞ.

Induction basis: Suppose all cores assigned to TAM partition k are

on Layer 0. No TSVs are needed for this case since for all i and j,

Lij¼0, so we get vk¼0. Next suppose that all the cores assigned to

TAM partition k are on Layer 1. Now we have Lij¼1 for all i and j, and

therefore vk ¼ 2 �wk � 1¼ 2 �wk, which is valid since TAM partition k

is wk bit wide and access to Layer 1 needs 2 �wk bits (wk bits for

input and wk bits for output). Finally, suppose some cores assigned

to TAM partition k are on Layer 0 while other are on Layer 1. There

exists a pair i, j such that zijk¼1, therefore Lij¼1 and vk ¼ 2 �wk.

Induction hypothesis: Suppose the equation holds when there

are m layers (numbered 0 to m�1).

Inductive step: We have to consider two cases:

�
Case 1: If no core on Layer m is connected to TAM partition k, itis obvious that the equation for vk holds. � Case 2: There is at least one core on Layer m that is connected
to TAM partition k. The number of TSVs is given by


fzijk � Lijgþ2 �wk ð2Þ

¼ 2 �wk � max1r i,jrN

fzijk � ðLijþ1Þg ð3Þ

¼ 2 �wk � max1r i,jrN

ðzijk �~L ijÞ ð4Þ

where ~L ij ¼ Lijþ1¼max1r i,jrNfð~Li�

~LjÞ, ~Li, ~Ljg. The parameters~Li and ~Lj denote the fact that we are moving from m layers tom+1 layers.

Therefore, the number of TSVs for chain k can be represented by



ððLi�LjÞ,Li,LjÞÞ

� �&

�

The total number of TSVs for the TAM architecture is given byPBk ¼ 1 vk, which must be smaller than a predefined limit. We can

also add an upper limit on the number of TSVs for each TAMpartition. The complete mathematical programming model for3DPAW is shown in Fig. 3 and as a final step, the min–maxobjective and max constraints can be linearized as in [3]. Thenumber of variables for this model is BþN � BþN2 � B¼OðN2 � BÞ

and the number of constraints is Nþ2 � Bþ2 � N2 � B¼OðN2 � BÞ.

4.3. ILP model for 3DPPAW

In the 3DPAW problem, the TAM width for each TAM chainis predefined. However, appropriate sizing of each TAMpartition results in lower test time for an SOC [3]. Therefore, wenext address the more general problem 3DPPAW, as defined inSection 2.

From Theorem 1 in [3], the width of each TAM partition doesnot need to exceed an upper bound kmax for any core in the SOC.We denote this upper bound on the width of the individual TAMpartitions as wmax. If the width of a TAM partition is greater thanwmax, there is no further decrease in testing time. The objective of3DPPAW problem is therefore: Minimize maxj

PNi ¼ 1 TiðwjÞ � xij.

Since both wj and xij are variables, the objective function needsto be linearized. We add a new binary variable djkð1r jrB,1rkrwmaxÞ, defined as

djk ¼1 if TAM partition j is kbits wide

0 otherwise

�ð5Þ

Therefore, Ti(wj) is described as: TiðwjÞ ¼Pwmax

k ¼ 1 djk � TiðkÞ. Inaddition, two other constraints are included:

(1)
wj ¼Pwmax
k ¼ 1 k � djk, a TAM partition can have width between 1and wmax.P

(2)
wmax
k ¼ 1 djk ¼ 1, i.e., the width of a TAM partition must beunique.

The objective function can therefore be written as

maxj

XN

i ¼ 1

Xwmax

k ¼ 1

djk � TiðkÞ � xij

The testing time Ti(k) is obtained from the Design Wrapperprocedure [3] and stored in a look-up table for use in ILP models.Next, we linearize the non-linear term djk xij by introducing a newbinary variable yijk and two constraints:

(1)
xijþdjk�yijkr1 (2) xijþdjk�2 � yijkZ0.


The number of TSVs for chain k is as follows:



ððLi�LjÞ,Li,LjÞ

� ��

Since wk is also a variable and it is given by wk ¼Pwmax

m ¼ 1 m � dkm,a new binary variable nijkm is introduced to replace dkm � zijk. Twomore constraints are added:

(1)
dkmþzijk�nijkmr1 (2) dkmþzijk�2 � nijkmZ0.
Now the number of TSVs becomes

vk ¼Xwmax

m ¼ 1

2 �m � max1r i,jrN,1rkrB

fnijkm � Lijg

where Lij is defined as Lij ¼max1r i,jrNfðLi�LjÞ,Li,Ljg. The completemathematical programming model for the 3DPPAW problem isshown in Fig. 4. The number of variables for this model is BþB �

WþN � BþN2 � BþN2 � B �W ¼ OðN2 � B �WÞ and the number ofconstraints is 1þ4 � BþNþ2 � N2 � Bþ2 � N2 � B �W ¼ OðN2 � B �WÞ.

4.4. Thermal-aware 3DPPAW

Power and thermal issues have become a primary concern intraditional 2D IC testing. The design under test (DUT) canconsume significantly more power in test mode than in normaloperation. Consequently, the heat dissipation during test modebecomes an major problem since the temperature depends on thepower density. The stacking of multiple active layers in 3D designleads to even higher power densities than for the 2D counterpart,thereby exacerbating thermal problems. With increased tempera-ture, the chip under test may be damaged permanently. Inaddition, the test results may be invalidated due to changes in thepath delay caused by ‘‘hotspots’’. Therefore, it is essential toconduct thermal-aware 3D TAM optimization, maintaining testtime requirements without violating a temperature threshold.

If several cores with high test power consumption are assignedto different TAM partitions, they may be tested at the same timedepending on the test scheduling, causing very high test powerdissipation. There are two ways to tackle this problem. First,‘‘post-processing’’ can be done to reorder the cores in testscheduling to ensure that the parallelism is limited. During test

Fig. 4. The mathematical programming model for 3DPPAW.

scheduling, the order in which the cores are tested is changed. Inthis way, cores with high power consumption are tested inparallel to a lesser extent. The second solution is to addconstraints in the TAM design to limit test parallelism of ‘‘high-power’’ cores. Since we do not directly consider test scheduling inour model, the second solution is easier to implement. Theconstraints can be added to 3D TAM optimization problem3DPPAW. We count the number of pairs of cores that are assignedto different TAM partitions to estimate the amount of parallelism.A new binary constant Pij is added. If Core i are j are high-powercores then Pij¼1. We also set an upper limit Pr on the number ofpairs of cores that can be tested in parallel. Therefore, we have anew constraint:

XN

i ¼ 1,j ¼ 1

Pij � xik1� xjk2

r2 � Pr

where 1rk1,k2rB,k1ak2.As before, the nonlinear equation qijk1k2

¼ xik1� xjk2

can belinearized by treating qijk1k2

as an independent variable andadding two new constraints:

(1)
xik1þxjk2
�qijk1k2r1

(2)
xik1þxjk2
�2 � qijk1k2Z0.

4.5. Randomized rounding

While the ILP models discussed in this section can be used tooptimally solve 3D TAM optimization problems, they do not scalewell for large SOC designs. ILP problems are known to be NP-hard[19]; however, linear programming (LP) problems can be solvedoptimally in polynomial time [42]. Therefore, we adopt the methodof LP-relaxation and combine it with randomized rounding [20]. InLP-relaxation, the binary variables are relaxed to real-valuedvariables such that the solution to the relaxed LP problem providesa lower bound for the cost function (test time in this case). However,the fractional values obtained for the xij variables are inadmissible inpractice; these variables must be mapped to either 0 or 1. For thispurpose, we use the method of randomized rounding.

The randomized rounding technique for ILP problems consistsof three steps. The first step is to solve the corresponding LPproblem, fixing all xij variables that are assigned to 1. The secondstep is to randomly pick a variable from the set of variables withfractional values and assign it to 1 with a probability equal to thefractional value. For example, if the solution to the LP problemassigns the value 0.4 to a variable, we generate a random numberbetween 0 and 1. If this random number is less than or equalto 0.4, the variable is set to 1. Otherwise it is set to be 0. In thethird step, the LP problem is solved again, and the randomizedrounding step is repeated until all variables are set to either 0 or 1.

Note that in the Step 2 of the randomized rounding technique,the violation of constraints must be prevented in order to ensurethat the LP problem is feasible. For example, in the 3DPAW

problem, we randomly pick a variable xij with a fractional value,and assign the value 1 to it according to the random number thatwe generate. Before the assignment, we check if xkjðiakÞ isalready assigned to 1. If this is the case, we can only assign 0 to xij,otherwise the first constraint is violated, and the LP problembecomes infeasible. Therefore, we need to guarantee that there isno constraint violation with existing fixed-variable values.

4.6. TestRail architecture

Previous subsections have focused on Test Bus architecture, inwhich modules connected to the same test bus can only be tested

Fig. 5. New constraints for wire-length incorporation in the optimization flow.


sequentially. Therefore, the overall SOC test time is the maximumtest time over all the test buses. Another test architecture that haswidely advocated for core-based SOC testing is called TestRail[25,26]. In this architecture, modules connected to the sameTestRail can be tested simultaneously and sequentially as well.The advantage of TestRail architecture over the Test Busarchitecture is that it allows module-external testing, i.e., to testthe circuitry and wires between the modules. The extension of theoptimization framework to TestRail is straightforward. Assumingthat the modules connected to the same TestRail can be testedsimultaneously, the test time on a TAM partition for the TestRailarchitecture is the maximum test time of the individual modulesmax1r jrB,1r irNðTiðwjÞ � xijÞ on that partition plus the bypass timeand load pattern latency. The constraints are the same as forTest Bus architecture. We have evaluated 3D test-access optimi-zation for the TestRail architecture, and results are presented inSection 5.6.

4.7. Test-application time and wire-length co-optimization

In SOC test-architecture design, wire-length minimization isalso an important objective, besides test time minimization[31,32]. In this section, we discuss how to incorporate wirelength into our optimization model for the Test Bus architecture.Our wire-length cost model extends [32] by including the verticalinterconnect length due to TSVs. The placement of cores on thevarious layers and within a layer is obtained from a thermal-aware 3D floorplanning and placement tool [15]. The distancebetween two modules is calculated in terms of the Manhattandistance, including vertical TSV length if these two modules are indifferent layers.

Our objective is to minimize the test time as well as the totalwire length connecting the core to form the TAM partitions(chains). Assume that the number of TAM chains is B and thenumber of cores is N. We define a binary variable cijk,1r irN,1r jrN, such that cijk¼1 if Core j immediately connectsCore i in TAM chain k. Each TAM chain k has TAM-in pin bk andTAM-out pin ek, which are predefined in our model. Theinterconnect length from Core i to Core j is defined by tij, whichis obtained from the thermal-aware floorplanning and placementtool. The vertical length of TSVs is set to be 50mm. The additionalconstraints due to wire-length incorporation are listed in Fig. 5.

In Fig. 5, line 1 provides a constraint that there is only oneimmediate successor per core in each TAM chain. Line 2 ensuresthat there is only one immediate predecessor per core in eachTAM chain. Line 3 implies that there is no immediate predecessorfor TAM-in pin in each TAM chain. Line 4 ensures that there is noimmediate successor for TAM-out pin in each TAM chain. Line 5models the limitation that one core cannot connect to itself. Line 6and 7 indicate that one core can only be in one TAM chain. Line 8–11prevent the generation of cycles in each TAM chain. A new binaryvariable uijk is defined, indicating whether cell i is located before cellj in the ordered scan chain. Since the term cihk � uhjk in line 8 isnonlinear, it is replaced by a new binary variable sijhk. In addition,two new constraints are added:

cihkþuhjkrsijhk and cihkþuhjkZ2 � sijhk

Line 9 introduces the constraint that Core i either is beforeCore j or after Core j in the TAM chain if those two cores are in thatchain. Line 10 and 11 constrain the source nodes and end nodes ofeach chain since the source node is before every other node andthe end node is after every other node in each chain. The wirelength of each TAM chain is the sum of the distance between theTAM-in pin and the core connected to that pin, the distancesbetween all subsequent pairs of cores in that TAM chain, and the

distance between TAM-out pin and the core connected to it. Thelocation of TAM-in and TAM-out pins are predefined. The length

for TAM chain k is calculated as lk ¼wk � ðPN

i ¼ 1

PNj ¼ 1 cijk � tijÞ,

where wk is the width of the TAM chain. The total wire length is

the sum of the wire lengths of each TAM chain, i.e. L¼PB

k ¼ 1 lk.

The objective function in the ILP model is changed to

A � maxj

XN

i ¼ 1

Xwmax

k ¼ 1

djk � TiðkÞ � xij

!þð1�AÞ � L

where A is a weight used in the multiobjective cost functionð0rAr1Þ. Since wire length and test time are measures withdifferent units, we use normalized values in the objectivefunction. We divide the test time Ti by the maximum test time(corresponding A¼0 when the wire-length minimization is givenmaximum weight), and divide wire length L by the maximumwire length (corresponding to A¼1 when test-time minimizationis given maximum weight). Since wk in problem 3DPPAW is also avariable, the multiplication of two non-binary variables is notsupported by standard ILP solvers. Therefore, we incorporatewire-length minimization into problem 3DPAW, in which wk ispredefined. In order to incorporate wire length to 3DPPAW, aheuristic approach is needed, which is left for future work.

5. Experiments and results

5.1. SOC test benchmarks and experimental setup

To illustrate the proposed 3D TAM optimization method and todemonstrate its effectiveness, we use four representative SOCsfrom the ITC’02 SOC test benchmarks, namely d695, p22810,p34392, and p93791 [43]. We use Xpress-MP [44], a commercialILP solver, to solve the ILP model with randomized rounding forthe 3DPPAW problem.

5.2. Thermal-aware 3D SOC floorplanning

One of the major concerns in 3D SOC design is the potentialincrease in chip temperature. The stacking of multiple activelayers can lead to higher power densities, and the on-chiptemperature depends on the power density [14–16]. Therefore,it is essential to have a thermal-aware floorplanner for 3D SOC


design. In this paper, we adopt the thermal-aware floorplannerfrom [15] for the placement of the 3D SOC benchmarks. This 3Dthermal-aware floorplanner is based on a simulated annealingalgorithm. The inputs to the floorplanning algorithm are the areaand the functional power of all functional modules in a 3D SOC. Inthermal-aware floorplanning, each core is considered as a blockwith fixed coordinates for the block. The inputs of the floorplannerinclude the area of each core (including x and y positions, assumethe layout is a square) and the power consumption. Since there isno detailed area information for the ITC’02 SOC test benchmarks,we estimate the area of each module based on the number of flip-flops. Since only test power for each module is available [43], wescale down the test power to approximate the functional power for3D SOC floorplanning. After we feed the area and power values intothe placement algorithm, it outputs the thermal-aware floorplanwith layer assignment as well as x and y positions of each core,which is used as an input to our 3D TAM optimization models.

5.3. Results for 3DPPAW problem

Fig. 6 shows a TAM design obtained for SOC d695 (B¼2,W¼32, 2 layers). The upper limit on the number of TSVs is set to

5 6

2 8

TAM2[in]19

TAM1[in]13

Layer0

7

TAM2[out]

Layer1

TSV (19wires)

TSV (19wires)

919

1 313

413

10

13

TAM1[out]

TSV (13wires)

TSV (13wires)

19

Fig. 6. TAM design obtained for d695 (B¼2, W ¼ 32, two layers).

Table 1Results on TAM optimization for SOC d695 for B¼2, 3, 4 (two layers, TSV limits 48/60

W B 3D TAM partition Test time 2D (#CC) Te

16 2 (7,9) 44 225 24

3 (4,5,7) 46 108 22

4 (1,1,4,10) 51 707 27

24 2 (5,19) 31 070 21

3 (1,4,19) 31 077 20

4 (1,3,4,16) 41 611 16

32 2 (11,21) 28 524 20

3 (4,10,18) 23 944 16

4 (2,4,10,16) 33 782 13

40 2 (19,21) 25 761 18

3 (4,17,19) 21 070 13

4 (4,4,13,19) 26 961 11

48 2 (21,27) 25 761 17

3 (7,19,22) 21 070 11

4 (3,4,7,34) 25 086 9

56 2 (23,33) 23 256 15

3 (4,19,33) 21 070 11

4 (2,4,16,34) 24 852 8

64 2 (23,41) 23 232 13

3 (5,22,37) 21 070 9

4 (13,16,16,19) 24 852 7

60/80, where the maximum number of TSVs for each TAMpartition is 60 and the maximum number of TSVs for the SOC is80. Cores 2, 5, 6, 8, and 9 are placed in Layer 0 and Cores 1, 3, 4, 7,and 10 are placed in Layer 1. The first partition of width 19 bits isconnected to Cores 1, 2, 3, 4, 8, 10. The second TAM partition ofwidth 13 bits is used to access Core 5, 6, 7, 9. This TAM partitionresults in minimum test time under constraints of TSV limit andthe thermal-aware placement.

Table 1 presents the testing time and the TAM partitions forSOC d695 for B ¼ 2,3,4 and for different values of W. The upperlimit on the number of TSVs is set to 48/60 for each case, wherethe maximum number of TSVs for each TAM partition is 48 andthe maximum number of TSVs for the SOC is 60. The first twocolumns list the total TAM width W and the number of TAMchains B, respectively. The third column shows the TAM partitionfor different values of W and B. Column 4 shows the baseline 2Dtest time in clock cycles (#CC), where we apply the TAMoptimization method (based on LP-relaxation and randomizedrounding) separately to each layer. The TAM width limit for eachlayer is set as follows: if W4TSV=2, for Layer 0, the width limit isset to W and for Layers 1,2, y, the TAM width limit is set to TSV/2.If WrTSV=2, for each layer the TAM width limit is W. Aftercomputing the test time for each layer, we obtain the total testtime by summing up the test times for all the layers. Other‘‘baseline methods’’ that attempt to parallelize the testing ofmultiple layers are implicitly considered in the proposedoptimization method.

The test time (in clock cycles, #CC) for the proposed 3Dmethod is shown in Column 5. Column 6 lists the test time resultsobtained from LP relaxation before randomized rounding, whichprovides a lower bound on the test time. Column 7 lists the runtime of LP-relaxation combined with randomized rounding forthis SOC. Tables 2–4 present similar results for SOCs p22810,p34392, and p93791 with B¼2,3,4 (three layers), B¼2,3,4 (threelayers), and B¼2,3,4,5 (four layers), respectively. The TSV limit forthe three larger SOCs are set to 60/80.

We draw several conclusions from the results in Tables 1–4.First, the test time is considerably higher if we simply use 2D TAMoptimization on a layer-by-layer basis. Therefore, 3D TAMoptimization is especially important for 3D SOCs in order to

).

st time 3D (#CC) Lower bound (#CC) CPU time for 3D (min)

257 20 355 0.04

984 19 714 0.13

403 23 380 0.3

953 18 868 0.04

764 16 221 0.14

549 13 984 0.5

934 17 101 0.04

085 13 798 0.15

505 10 334 1.0

604 15 364 0.04

095 11 720 0.15

282 8437 2.0

331 14 232 0.06

808 9375 0.16

522 6775 1.0

392 13 768 0.06

808 8112 0.17

452 6177 1.1

984 12 153 0.06

731 6860 0.21

923 5752 2.2

Table 2Results on TAM optimization for SOC p22810 for B ¼ 2, 3, 4 (three layers, TSV limits 60/80).

W B 3D TAM partition Test time 2D (#CC) Test time 3D (#CC) Lower bound (#CC) CPU time for 3D (min)

16 2 (5,11) 334 276 289 986 265 329 0.3

3 (4,4,8) 323 526 275 537 257 791 3.7

4 (1,4,4,7) 421 172 286 734 286 146 1.6

24 2 (11,13) 326 153 263 892 236 842 0.6

3 (1,10,13) 299 590 254 732 220 664 3.8

4 (1,1,10,12) 352 227 249 274 233 497 2.1

32 2 (7,25) 314 522 230 925 223 984 0.8

3 (10,4,18) 289 293 232 810 181 696 3.8

4 (1,10,10,11) 348 318 205 831 166 188 9.0

40 2 (13,27) 310 323 215 506 203 672 0.5

3 (10,13,17) 287 143 201 221 158 628 21.0

4 (7,10,4,19) 348 318 181 687 144 182 26.1

48 2 (13,35) 306 389 205 782 192 256 0.9

3 (7,13,28) 283 272 172 191 133 103 4.7

4 (10,10,7,21) 289 710 169 392 145 543 13.8

56 2 (15,41) 304 831 204 155 179 474 2.3

3 (10,13,33) 281 965 153 282 123 137 9.8

4 (4,10,16,26) 285 825 162 818 141 406 10.1

64 2 (15,49) 304 810 179 688 170 123 0.5

3 (13,19,32) 279 400 142 210 121 908 21.3

4 (10,13,19,22)) 285 825 157 460 138 737 130.0

Table 3Results on TAM optimization for SOC p34392 for B¼2, 3, 4 (three layers, TSV limits 55/70).


16 2 (7,9) 1 338 043 1 134 919 1 113 301 0.2

3 (1,7,8) 1 365 702 999 543 999 201 0.9

4 (1,4,4,7) 2 025 402 1 288 341 1 174 558 0.5

24 2 (9,15) 1 202 389 913 550 859 758 0.1

3 (7,7,10) 1 055 242 762 841 762 669 0.8

4 (1,1,10,12) 1 377 077 798 449 743 508 18.5

32 2 (11,21) 1 183 301 843 301 779 966 0.1

3 (10,7,15) 1 045 242 665 445 592 325 0.3

3 (4,7,10,11) 1 359 846 684 524 563 604 9.2

40 2 (13,27) 1 047 913 752 782 716 114 0.2

3 (13,7,20) 982 127 552 231 506 506 0.8

4 (4,1,16,19) 1 026 819 584 301 497 810 100

48 2 (13,35) 1 026 760 693 465 671 796 0.4

3 (7,16,25) 982 127 546 152 490 218 1.4

4 (1,7,19,21) 995 186 544 579 474 016 13.5

56 2 (13,43) 1 014 317 637 606 621 753 2.3

3 (16,7,33) 922 127 540 069 485 039 4.5

4 (1,7,19,29) 972 797 544 579 455 916 6.7

64 2 (13,51) 987 564 559 758 539 403 0.5

3 (16,7,41) 932 476 534 212 481 386 17.0

4 (10,7,19,28) 948 945 544 330 429 201 19.0


reduce test cost. Second, as the total TAM width W is increased,the testing time decreases, which is consistent with the resultsobtained using earlier methods for 2D ICs [3]. The thirdobservation is that the run time becomes larger when the totalTAM width increases because there are more variables andconstraints in the LP models. However, with a combination ofLP-relaxation and randomized rounding techniques, results can beobtained much more efficiently compared to previous 2Dcounterparts [3]. The fourth conclusion is that the resultsobtained before and after randomized rounding are withinreasonable range, indicating the efficiency of the randomizedrounding technique.

5.4. Dependence of test time on 3D placement, the number of layers,

and the maximum number of TSVs

In this subsection, we examine the dependence of test time ondifferent 3D placements, the number of layers, and upper limitson the number of TSVs. Fig. 7 shows the dependence of the testtime on different 3D placements, instead of the thermal-aware 3Dfloorplanner [15], for all benchmarks. The limit on the number ofTSVs is assumed to be the same for each benchmark. Threedifferent values of W, namely W¼16, 32, 64 are considered. Weexamine three placements: tough, random, and nice. Toughplacement places larger cores on top layers, starving the test

Table 4Results on TAM optimization for SOC p93791 for B¼2,3,4,5 (four layers, TSV limits 60/80).


16 2 (7,9) 1 869 200 1 800 413 1 712 598 0.3

3 (4,4,8) 2 021 836 1 779 401 1 779 298 1.6

4 (1,4,4,7) 3 651 006 2 179 527 2 177 620 1.1

5 (1,1,4,1,9) 4 379 506 2 030 868 1 913 331 1.4

24 2 (7,17) 1 294 512 1 259 711 1 169 532 0.3

3 (4,4,16) 1 316 252 1 197 683 1 108 672 1.6

4 (1,7,7,9) 1 709 129 1 337 682 1 303 728 6.3

5 (1,1,7,7,8) 2 115 963 1 263 801 1 152 322 8.4

32 2 (9,23) 966 752 894 463 860 629 0.3

3 (4,4,24) 970 585 910 957 891 713 2.4

4 (1,10,10,11) 1 296 534 944 807 944 706 14

5 (4,4,7,4,13) 1 913 443 1 008 684 972 991 17

40 2 (15,25) 778 480 778 296 767 018 0.7

3 (7,10,23) 874 504 817 805 750 008 46.4

4 (1,4,16,19) 1 017 802 890 145 786 207 63.0

5 (4,4,10,1,21) 1 430 320 780 202 728 064 40.8

48 2 (15,33) 766 270 758 806 644 230 1.3

3 (7,16,25) 827 277 722 042 632 076 7.7

4 (4,13,13,18) 933 243 713 347 650 231 25.2

5 (1,10,10,7,20) 1 349 575 727 978 660 292 35.8

56 2 (11,45) 686 577 662 686 626 878 3.1

3 (13,16,27) 685 314 635 095 536 410 72.8

4 (1,7,22,26) 889 290 664 447 608 740 60.0

5 (1,1,16,1,37) 1 300 670 680 635 628 872 65.1

64 2 (15,49) 662 198 630 365 614 287 5.5

3 (16,22,26) 662 808 572 342 488 366 85.5

4 (1,13,22,28) 870 037 604 553 520 967 50.4

5 (1,3,16,7,37) 1 273 410 621 235 533 928 77.1

W=16 W=32 W=64

Placement instance

W=16 W=32 W=64

Placement instance

01000020000300004000050000

0500000

100000015000002000000 W=16 W=32 W=64

Placement instance

100000150000200000250000300000350000

01000000200000030000004000000

tough

W=16 W=32 W=64

Placement instance

Test

tim

e (c

lock

cyc

les)

random nice

tough random nice

tough random nice

tough random nice

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

d695

p34392

p22810

p93791

Fig. 7. Test time dependence on 3D placement for different benchmarks.


bitwidth under the limit on the number of TSVs. Randomplacement is generated randomly. Nice placement places largercores on the bottom layer. We note that different placementsresult in considerably different test times for the same value of W.Therefore, an effective placement in the 3D SOC design spaceis essential for minimizing the test time. Fig. 8 shows thedependence of test time on the limit on the number of TSVs forall benchmarks. When the number of TSVs exceeds a certain limit,there is no further reduction in the test time because sufficientbitwidth is now available for the higher layers. Similar results areobtained for the three larger benchmark SOCs. Fig. 9 shows thedependence of test time on the number of layers for allbenchmarks. With an increase in the number of layers, the test

time usually increases. The limit in the number of TSVs reducesthe bitwidth for cores placed on higher layers.

5.5. Results for thermal-aware 3DPPAW

Finally, we present results for thermal-aware 3DPPAW. Tables 5–8compare the TAM partitions, test times, and CPU run times for3DPPAW with that for thermal-aware 3DPPAW for benchmarks d695,p22 810, p34 392, and p93 791, respectively. The test power for eachmodule is obtained from [43]. For example, in d695, there are fourcores, namely Cores 2,5,7, and 10, which consume significantlymore power than the other cores in test mode. Therefore, we

20/40Limit on the number of TSVs

0100000200000300000400000

48/60Limit on the number of TSVs

1000020000300004000050000

0

400000

800000

1200000

Limit on the number of TSVs

Test

tim

e (c

lock

cyc

les)

W=16 W=32 W=64

0500000

100000015000002000000

Limit on the number of TSVs

W=16 W=32 W=64

W=16 W=32 W=64

W=16 W=32 W=64

48/60 60/80 80/120 55/70 60/80

48/60 55/70 60/8048/60 55/70 60/80

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

d695

p34392

p22810

p93791

Fig. 8. Test time dependence on limits on the number of TSVs for different benchmarks.

Number of layers

W=16 W=32 W=64

W=16 W=32 W=64 W=16 W=32 W=64

W=16 W=32 W=64

2Number of layers

0

1000020000300004000050000

400000

800000

1200000

Number of layers

0100000200000300000400000

0500000

100000015000002000000

Number of layers

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

Test

tim

e (c

lock

cyc

les)

3 4

2 3 4

2 3 4

2 3 4

d695

p34392 p93791

p22810

Fig. 9. Test time dependence on number of layers for different benchmarks.


set Pij ¼ 1 for i,jAf2,5,7,10g. We set the upper limit Pr to be 2,limiting the test parallelism for high-power cores. For p22 810,p34 392, and p93 791, the upper limit Pr is set to be 2, 3, and 4 sincethere are 6, 8, and 10 high-power cores for each benchmark. Theresults show that the optimized TAM partitions are different whenthermal constraints are taken into account. The test time is higherwith thermal constraints but it is still less than that for the 2D casein d695, p22 810 and p34 392. However, for p93 791, the test timewith thermal constraints is even higher than for the 2D casebecause of the limited amount of test parallelism available. As atradeoff, the thermal constraints can be relaxed according to therequirement of the test-time budget and the cooling solutionsavailable.

5.6. Result for TestRail architecture

In this section, we provide the test-time results for benchmarkd 695 for the TestRail architecture. Table 9 illustrates the test timecomparison of d 695 between Test Bus architecture and TestRailarchitecture. Column 1 lists the total TAM width. The number ofTestRails is provided in column 2. Columns 3 and 4 give the test

time in 2D case for Test Bus and TestRail architecture. Columns 5and 6 present the TAM partition and the test time for the Test Busarchitecture. Columns 7 and 8 show the TAM partitions and thetest time for the TestRail architecture. Since we assume a parallelschedule in TestRail architecture, the cores connected to oneTestRail are tested in parallel. When one of the cores finishestesting using its given test patterns, the bypass mode of this coreis turned on. Other cores continue to be tested. The bypass takesone cycle in the test access path. The latency to load patterns(at the beginning of the test) is normally several cycles dependingon how many cores are connected to the TestRail. Since thenumber of cores in one TestRail is small in d 695, the bypassand load pattern latency is small enough to be ignored in the testtime result.

We observe that the test time for TestRail is less than that forTest Bus since it allows parallel testing for each TestRail. Anotherobservation is that when the total TAM width is larger than 24,the test time for TestRail remains unchanged, even though theTAM partitions are different. The reason is that the test time inthe TestRail architecture is determined by individual cores insteadof the sum of cores connected to one chain. We also show that a3D-aware TestRail optimization method, as described in this

Table 5Comparison between 3DPPAW and thermal-aware 3DPPAW for SOC d695 for B ¼ 2, 3, 4 (two layers, TSV limits 48/60).

W B 2D test time (#CC) Thermal-oblivious Thermal-aware

TAM partition Test time (#CC) CPU time (min) TAM partition Test time (#CC) CPU time (min)

16 2 44 225 (7,9) 24 257 0.04 (5,11) 44 815 0.8

3 46 108 (4,5,7) 22 984 0.13 (2,5,9) 44 809 1.3

4 51 707 (1,1,4,10) 27 403 0.3 (1,1,5,9) 44 809 14.5

24 2 31 070 (5,19) 21 953 0.04 (3,21) 31 170 0.2

3 31 077 (1,4,19) 20 764 0.14 (1,2,21) 31 132 7.5

4 41 611 (1,3,4,16) 16 549 0.5 (1,1,1,21) 31 132 53.8

32 2 28 524 (11,21) 20 934 0.04 (16,16) 26 578 0.4

3 23 944 (4,10,18) 16 085 0.15 (4,9,19) 21 856 46.8

4 33 782 (2,4,10,16) 13 505 1.0 (1,2,10,19) 22 427 20.0

40 2 25 761 (19,21) 18 604 0.04 (20,20) 22 848 0.6

3 21 070 (4,17,19) 13 095 0.15 (5,9,26) 21 005 10.9

4 26 961 (4,4,13,19) 11 282 2.0 (1,4,9,26) 21 005 7.6

48 2 25 761 (21,27) 17 331 0.06 (16,32) 20 314 0.9

3 21 070 (7,19,22) 11 808 0.16 (4,12,32) 18 799 2.0

4 25 086 (3,4,7,34) 9522 1.0 (2,2,12,32) 18 799 43.4

56 2 23 256 (23,33) 15 392 0.06 (24,32) 19 457 0.9

3 21 070 (4,19,33) 11 808 0.17 (5,17,34) 13 575 1.3

4 24 852 (2,4,16,34) 8452 1.1 (1,4,17,34) 13 575 13.8

64 2 23 232 (23,41) 13 984 0.06 (18,32) 19 457 0.8

3 21 070 (5,22,37) 9731 0.21 (7,17,39) 12 941 1.3

4 24 852 (13,16,16,19) 7923 2.2 (3,6,16,39) 12 841 14.2

Table 6Comparison between 3DPPAW and thermal-aware 3DPPAW for SOC p22810 for B ¼ 2, 3, 4 (two layers, TSV limits 60/80).



16 2 334 276 (5,11) 289 986 0.3 (6,10) 324 902 0.2

3 323 526 (4,4,8) 275 537 3.7 (4,4,8) 321 172 1.6

4 421 172 (1,4,4,7) 286 734 1.6 (1,1,1,13) 309 905 0.4

24 2 326 153 (11,13) 263 892 0.6 (11,13) 308 674 1.1

3 299 590 (1,10,13) 254 732 3.8 (1,10,13) 282 849 0.5

4 352 227 (1,1,10,12) 249 274 2.1 (1,4,4,15) 271 146 4.2

32 2 314 522 (7,25) 230 925 0.8 (15,17) 270 379 0.3

3 289 293 (10,4,18) 232 810 3.8 (6,10,16) 258 778 38.9

4 348 318 (1,10,10,11) 205 831 9.0 (4,1,13,14) 240 120 45.5

40 2 310 323 (13,27) 215 506 0.5 (19,21) 265 452 0.7

3 287 143 (10,13,17) 201 221 21.0 (10,13,17) 230 462 47.8

4 348 318 (7,10,4,19) 181 687 26.1 (3,7,13,17) 220 194 73.5

48 2 306 389 (13,35) 205 782 0.9 (19,29) 265 452 0.5

3 283 272 (7,13,28) 172 191 4.7 (10,13,25) 226 844 37.4

4 289 710 (10,10,7,21) 169 392 13.8 (4,13,10,21) 214 379 297.5

56 2 304 831 (15,41) 204 155 2.3 (19,37) 245 234 0.3

3 281 965 (10,13,33) 153 282 9.8 (10,13,33) 226 813 17.6

4 285 825 (4,10,16,26) 162 818 10.1 (10,4,13,29) 189 146 315

64 2 304 810 (15,49) 179 688 0.5 (19,45) 245 234 0.7

3 279 400 (13,19,32) 142 210 21.3 (13,10,41) 223 079 6.5

4 285 825 (10,13,19,22)) 157 460 130.0 (1,13,10,40) 181 698 407


paper, leads to lower test time than the use of a 3D-oblivious 2Doptimization approach, as published earlier.

5.7. Result for test time and wire-length co-optimization

Table 10 presents the test time and wire length results forproblem 3DPAW, in which the width of each TAM chain is predefined.Column 1 lists the number of TAM chains. Column 2 lists the widthof each TAM chain. Columns 3–5 show the test time with different

values of weight A. Columns 6–8 show the wire length for differentvalues of A. We make some key observations: (1) the test timereduces or remains unchanged with an increase of A since largervalues of A add more weight to the test time in the objectivefunction. Due to the predefined width of each TAM chain, theoptimization search space is limited, so that for some cases, the testtime is the same for A¼0.5 and A¼1; (2) the wire length increasesor remains unchanged with an increase in A since larger A leads toless weight for the test time in the objective function; (3) when thenumber of TAM chains increases, the test time decreases as expected

Table 7Comparison between 3DPPAW and thermal-aware 3DPPAW for SOC p34392 for B ¼ 2, 3, 4 (two layers, TSV limits 55/70).



16 2 1 338 043 (7,9) 1 134 919 0.2 (8,8) 1 081 570 1.9

3 1 365 702 (1,7,8) 999 543 0.9 (1,7,8) 1 005 088 1.0

4 2 025 402 (1,4,47) 1 288 341 0.5 (1,1,1,13) 1 391 607 3.8

24 2 1 202 389 (9,15) 913 550 0.1 (9,15) 1 064 994 5.5

3 1 055 242 (7,7,10) 762 841 0.8 (1,10,13) 1 049 435 2.6

4 1 377 077 (1,1,10,12) 798 449 18.5 (1,1,1,21) 1 391 607 0.8

32 2 1 183 301 (11,21) 843 301 0.1 (15,17) 831 177 6.7

3 1 045 242 (10,7,15) 665 445 0.3 (10,7,15) 933 906 10.8

4 1 359 846 (4,7,10,11) 684 524 9.2 (1,1,7,23) 834 826 7.0

40 2 1 047 913 (13,27) 752 782 0.2 (9,31) 831 177 7.0

3 982 127 (13,7,20) 552 231 0.8 (7,16,17) 859 998 4.8

4 1 026 819 (4,1,16,19) 584 301 100 (1,10,1,28) 790 995 45.5

48 2 1 026 760 (13,35) 693 465 0.4 (16,9) 808 844 5.7

3 982 127 (7,16,25) 546 152 1.4 (7,13,28) 843 401 34.5

4 995 186 (1,7,19,21) 544 579 13.5 (1,13,13,21) 754 307 26.5

56 2 1 014 317 (13,43) 637 606 2.3 (16,9) 808 844 5.6

3 922 127 (16,7,33) 540 069 4.5 (10,13,33) 832 498 19.6

4 972 797 (1,7,19,29) 544 579 6.7 (7,13,10,26) 661 844 225.3

64 2 987 564 (13,51) 559 758 0.5 (16,9) 808 844 5.6

3 932 476 (16,7,41) 534 212 17.0 (10,13,41) 689 711 2.2

4 948 945 (10,7,19,28) 544 330 19.0 (7,13,10,26) 661 844 201.7

Table 8Comparison between 3DPPAW and thermal-aware 3DPPAW for SOC p93791 for B¼2, 3, 4 (Four layers, TSV limits 60/80).



16 2 1 869 200 (7,9) 1 800 413 0.3 (3,13) 2 103 700 8.2

3 2 021 836 (4,4,8) 1 779 401 1.6 (1,1,16) 2 229 152 1.9

4 3 651 006 (1,4,4,7) 2 179 527 1.1 (1,1,2,12) 4 023 112 2.2

24 2 1 294 512 (7,17) 1 259 711 0.3 (4,20) 1 406 320 22.0

3 1 316 252 (4,4,16) 1 197 683 1.6 (1,4,19) 1 412 113 3.5

4 1 709 129 (1,7,7,9) 1 337 682 6.3 (1,1,1,21) 3 823 747 7.0

32 2 966 752 (9,23) 894 463 0.3 (5,20) 1 405 440 160.6

3 970 585 (4,4,24) 910 957 2.4 (1,7,24) 1 009 262 6.9

4 1 296 534 (1,10,10,11) 944 807 14 (1,1,4,26) 3 695 454 2.0

40 2 778 480 (15,25) 778 296 0.7 (5,20) 1 405 440 160.6

3 874 504 (7,10,23) 817 805 46.4 (1,16,23) 999 931 53.8

4 1 017 802 (1,4,16,19) 890 145 63.0 (1,10,1,28) 2 699 795 55.4

48 2 766 270 (15,33) 758 806 1.3 (5,20) 1 241 245 160.6

3 827 277 (7,16,25) 722 042 7.7 (7,16,25) 995 650 96.3

4 933 243 (4,13,13,18) 713 347 25.2 (4,13,1,30) 1 903 700 220.5

56 2 686 577 (11,45) 662 686 3.1 (5,20) 1 152 364 160.6

3 685 314 (13,16,27) 635 095 72.8 (13,19,24) 995 522 302.8

4 889 290 (1,7,22,26) 664 447 60.0 (4,1,25,26) 1 121 150 252.2

64 2 662 198 (15,49) 630 365 5.5 (5,20) 1 008 220 160.6

3 662 808 (16,22,26) 572 342 85.5 (4,25,35) 995 522 46.6

4 870 037 (1,13,22,28) 604 553 50.4 (1,10,25,28) 995 522 339.7


in most cases. However, when A¼0, i.e., test time is not consideredin the objective function, the test time does not follow this trend.Our results demonstrate that our optimization framework is generaland it can easily incorporate wire-length constraints.

6. Conclusions

We have shown that a modular post-bond testing approachcan be used for emerging 3D ICs. We have presented an

optimization technique for minimizing the test time for 3Dcore-based SOCs under constraints on the number of TSVs and theTAM width. We have carried out a series of simulations for fourITC’02 SOC test benchmarks by considering thermal-awareplacement on a 3D substrate. Simulation results show that theproposed method leads to lower test times compared to a baselinemethod that applies 2D TAM optimization to each layer of the 3DSOC. We have also presented results to study how the test timevaries with the number of IC layers, core placement, and theupper limit on the number of TSVs. This work is expected to pave

Table 9Test time comparison between Test Bus architecture and TestRail architecture (B¼2, 3, 4, two layers, TSV limits 48/60).

W B Test Bus 2D test

time (#CC)

TestRail 2D Test

time (#CC)

Test Bus TAM

partition

Test Bus test

time (#CC)

TestRail TAM

partition

TestRail test

time (#CC)

16 2 44 225 33 437 (7,9) 24 257 (4,12) 14 873

3 46 108 33 437 (4,5,7) 22 984 (1,3,12) 14 873

4 51 707 33 782 (1,1,4,10) 27 403 (1,1,2,12) 14 873

24 2 31 070 17 750 (5,19) 21 953 (2,22) 7881

3 31 077 17 750 (1,4,19) 20 764 (4,3,17) 7881

4 41 611 17 750 (1,3,4,16) 16 549 (4,1,2,17) 7881

32 2 28 524 17 750 (11,21) 20 934 (8,22) 7881

3 23 944 17 750 (4,10,18) 16 085 (3,7,22) 7881

4 33 782 17 750 (2,4,10,16) 13 505 (3,4,7,18) 7881

40 2 25 761 17 750 (19,21) 18 604 (12,22) 7881

3 21 070 17 750 (4,17,19) 13 095 (2,12,22) 7881

4 26 961 17 750 (4,4,13,19) 11 282 (2,8,12,18) 7881

48 2 25 761 17 750 (21,27) 17 331 (3,24) 7881

3 21 070 17 750 (7,19,22) 11 808 (12,14,22) 7881

4 25 086 17 750 (3,4,7,34) 9522 (4,10,12,22) 7881

56 2 23 256 17 750 (23,33) 15 392 (16,32) 7881

3 21 070 17 750 (4,19,33) 11 808 (15,17,24) 7881

4 24 852 17 750 (2,4,16,34) 8452 (4,8,20,24) 7881

64 2 23 232 17 750 (23,41) 13 984 (16,40) 7881

3 21 070 17 750 (5,22,37) 9731 (4,18,24) 7881

4 24 852 17 750 (13,16,16,19) 7923 (3,9,16,24) 7881

Table 10Test time and wire length results for SOC d695 with different B (2 layers, TSV limits 48/60).

B TAM partition Test time (#CC) Wire length ðmmÞ

A¼0 A¼0.5 A¼1 A¼0 A¼0.5 A¼1

2 8+16 74 224 69 619 69 619 223 200 223 200 505 600

3 8+16+32 69 683 69 619 69 619 512 800 512 800 1 025 600

4 8+12+16+20 74 224 69 619 69 619 511 600 511 600 874 000

5 3+5+17+19+23 175 861 66 024 66 024 610 500 610 500 711 300

6 4+7+16+17+28+31 131 122 49 248 49 248 938 400 938 400 1 109 900

7 5+7+11+24+27+34+37 105 895 46 952 46 952 1 320 700 1 320 700 1 350 400


the way for core-based testing of emerging 3D SOCs. As part ofongoing work, we are studying post-bond testing of 3D ICsdesigned using hierarchical cores [40] and the optimization of apre-bond test infrastructure.

References

[1] Y. Zorian, E.J. Marinissen, S. Dey, Testing embedded-core-based system chips,Computer 32 (6) (1999) 52–60 0018-9162..

[2] IEEE Std. 1500: IEEE Standard Testability Method for Embedded Core-basedIntegrated Circuits. IEEE, New York, 2005.

[3] V. Iyengar, K. Chakrabarty, E.J. Marinissen, Test wrapper and test accessmechanism co-optimization for system-on-chip, Journal of Electronic Testing:Theory and Applications 18 (2002) 213–230.

[4] S. K. Goel, E.J. Marinissen, Effective and efficient test architecture design forSOCs, in: International Test Conference, 2002, pp. 529–538.

[5] Q. Xu, N. Nicolici, Resource-constrained system-on-a-chip test: a survey,IEE Proceedings Computers and Digital Techniques 152 (1) (2005) 67–811350-2387.

[6] E. Larsson, H. Fujiwara, Optimal system-on-chip test scheduling, in: AsianTest Symposium, 2003, pp. 306–311.

[7] E. Larsson, K. Arvidsson, H. Fujiwara, Z. Peng, Efficient test solutions for core-based designs, IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems 23 (5) (2004) 758–775 0278-0070.

[8] Q. Xu, N. Nicolici, Modular SOC testing with reduced wrapper count, IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems24 (12) (2005) 1894–1908 0278-0070.

[9] Z. Wei, S.M. Reddy, I. Pomeranz, Y. Huang, SOC test scheduling usingsimulated annealing, in: VLSI Test Symposium, 2003, pp. 325–330.

[10] Y. Huang, S.M. Reddy, W. Cheng, P. Reuter, N. Mukherjee, C.Tsai, O. Samman,Y. Zaidan, Optimal core wrapper width selection and SOC test scheduling

based on 3-D bin packing algorithm, in: International Test Conference, 2002,pp. 74–82.

[11] T. Yoneda, M. Imanishi, H. Fujiwara, An SoC test scheduling algorithm usingreconfigurable union wrappers, in: DATE, 2007, pp. 231–236.

[12] R. Weerasekera, L. Zheng, D. Pamunuwa, A. Hannu Tenhunen, Extendingsystems-on-chip to the third dimension: performance, cost and technologicaltradeoffs, in: ICCAD, 2007, pp. 212–219.

[13] B. Black, D.W. Nelson, C. Webb, N. Samra, 3D processing technology and itsimpact on iA32 microprocessors, in: ICCD, 2004, pp. 316–318.

[14] J. Cong, J. Wei, Y. Zhang, A thermal-driven floorplanning algorithm for 3D ICs,in: International Conference on Computer Aided Design, 2004, pp. 306–313.

[15] W.L. Hung, G.M. Link, Y. Xie, V. Narayanan, M.J. Irwin, Interconnect andthermal-aware floorplanning for 3D microprocessors, in: InternationalSymposium on Quality Electronic Design, 2006, pp. 98–104.

[16] W.R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, P.D.Franzon, Demystifying 3D ICs: the pros and cons of going vertical, IEEEDesign and Test of Computers 22 (6) (2005) 498–510.

[17] Y. Xie, G.H. Loh, B. Black, K. Bernstein, Design space exploration for 3Darchitectures, Journal on Emerging Technologies in Computing Systems 2 (2)(2006) 65–103.

[18] J.W. Joyner, P. Zarkesh-Ha, J.D. Meindl, Global interconnect design in a three-dimensional system-on-a-chip, IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems 12 (4) (2004) 367–372 1063-8210.

[19] M.R. Gary, D.S. Johnson, Computers and Intractability: A Guide to the Theoryof NP-completeness, Freeman, 1979.

[20] P. Raghavan, C. Thompson, Randomized rounding: a technique for provablygood algorithms and algorithmic proofs, Combinatorica 7 (4) (1987)365–374.

[21] V. Immaneni, S. Raman, Direct access test scheme-design of block andcore cells for embedded ASICs, in: International Test Conference, 1990,pp. 488–492.

[22] N.A. Touba, B. Pouya, Using partial isolation rings to test core-based designs,IEEE Design & Test of Computers 14 (4) (1997) 52–59 0740-7475.


[23] I. Ghosh, S. Dey, N.K. Jha, A fast and low cost testing technique for core-basedsystem-on-chip, in: Design Automation Conference, 1998, pp. 542–547.

[24] P. Varma, B. Bhatia, A structured test re-use methodology for core-basedsystem chips, in: International Test Conference, 1998, pp. 294–302.

[25] E.J. Marinissen, R. Arendsen, G. Bos, H. Dingemanse, M.Lousberg, C. Wouters,A structured and scalable mechanism for test access to embedded reusablecores, in: International Test Conference, October 1998, pp. 284–293.

[26] S.K. Goel, E.J. Marinissen, Cluster-based test architecture design for system-on-chip, in: VLSI Test Symposium, 2002, pp. 259–264.

[27] K. Chakrabarty, Optimal test access architectures for system-on-a-chip, ACMTransactions on Design Automation of Electronic Systems 6 (2001) 26–49.

[28] V. Iyengar, K. Chakrabarty, E.J. Marinissen, Test access mechanism optimiza-tion, test scheduling and tester data volume reduction for system-on-chip,IEEE Transactions on Computers 52 (2003) 1619–1632.

[29] S.K. Goel, E.J. Marinissen, SOC test architecture design for efficient utilizationof test bandwidth, ACM Transactions on Design Automation of ElectronicSystems 8 (4) (2003) 399–429.

[30] Q. Xu, N. Nicolici, Multi-frequency test access mechanism design for modularSOC testing, in: Asian Test Symposium, 2004, pp. 2–7.

[31] K. Chakrabarty, Design of system-on-a-chip test access architectures underplace-and-route and power constraints, in: Design Automation Conference,2000, pp. 432–437.

[32] S.K. Goel, E.J. Marinissen, Layout-driven SOC test architecture design for testtime and wire length minimization, in: Design, Automation and Test inEurope Conference, 2003, pp. 738–743.

[33] H.-H.S. Lee, K. Chakrabarty, Test challenges for 3D integrated circuits, IEEEDesign & Test of Computers 26 (5) (2009) 26–35.

[34] D.L. Lewis, H.S. Lee, A scan-island based design enabling pre-bond testabilityin die-stacked microprocessors, in: Proceedings of the International TestConference, 2007, pp. 1–8.

[35] X. Wu, P. Falkenstern, K. Chakrabarty, Y. Xie, Scan-chain design andoptimization for three-dimensional integrated circuits, Journal on EmergingTechnologies in Computing Systems 5 (2) (2009) 1–26.

[36] L. Huang L. Jiang, Q. Xu, Test architecture design and optimization for three-dimensional SoCs, in: DATE, 2009, pp. 220–225.

[37] P.-Y. Chen, C.-W. Wu, On-chip TSV testing for 3D IC before bonding usingsense amplification, in: Asian Test Symposium, 2009, pp. 450–455.

[38] L. Jiang, Q. Xu, K.Chakrabarty, T.M. Mak, Layout driven test-architecturedesign and optimization for 3D SoCs under pre-bond testpin-countconstraint, in: IEEE International Conference on Computer-Aided Design,2009, p. 191–196.

[39] K. Chakrabarty, V. Iyengar, M. Krasniewski, Test planning for modular testingof hierarchical SOCs, IEEE Transactions on Computer Aided-Design ofIntegrated Circuits & Systems 24 (2005) 435–448.

[40] S. Goel, E.J. Marinissen, A. Sehgal, K. Chakrabarty, Testing of SoCs withhierarchical cores: common fallacies, test access optimization and testscheduling, IEEE Transactions on Computers 58 (3) (2009) 409–423.

[41] E.J. Marinissen, S.K. Goel, M. Lousberg, Wrapper design for embedded coretest, in: International Test Conference, 2000, pp. 911–920.

[42] D. Bertsimas, J. Tsitsiklis, Introduction to Linear Optimization, AthenaScientific, 1997.

[43] S. Samii, E. Larsson, K. Chakrabarty, Z. Peng, Cycle-accurate test powermodeling and its application to SoC test scheduling, in: ITC, 2006, pp. 1–10.

[44] Xpress-MP. /www.dashoptimization.comS, 2007.

www.dashoptimization.com




Test-access mechanism optimization for core-based three...

Documents

Transcript of Test-access mechanism optimization for core-based three...