[IEEE 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and...

Design Space Exploration for Reliable mm-Wave Wireless NoC Architectures

Paul Wettin, Partha Pratim Pande, Deukhyoun Heo, Benjamin Belzer School of Electrical Engineering and

Computer Science Washington State University

Pullman, USA {pwettin, pande, dheo, belzer}@eecs.wsu.edu

Sujay Deb Department of Electronics and Communications Engineering

Indraprastha Institute of Information Technology Delhi, India

[email protected]

Amlan Ganguly Department of Computer Engineering

Rochester Institute of Technology Rochester, USA

[email protected]

Abstract—The Network-on-Chip (NoC) paradigm is used as a scalable interconnection infrastructure for multi-core chips. To enhance the performance of conventional interconnect-based multi-core chips, on-chip wireless interconnect has emerged as a radically different technology. However, this emerging interconnect paradigm imposes significant challenges pertaining to reliable integration and design. In this paper, we focus on two types of mm-wave wireless NoC architectures. One is a hierarchical architecture with long-range wireless shortcuts and the other is a power-law connectivity based small-world network without any hierarchy. We demonstrate that though the hierarchical architecture offers more bandwidth with lower energy dissipation than the small-world-based counterpart, it has significantly more area overhead. Also, the power-law connectivity based small-world wireless NoC is more robust in presence of wireless link failures.

Keywords—NoC; wireless; mm-wave; small world

I. INTRODUCTION Conventionally, the Network-on-Chip (NoC) paradigm is

used as a scalable interconnection infrastructure for multi-core chips. Despite their widespread adoption, traditional NoCs suffer from limitations arising out of planar metal wire-based, multi-hop communication, which gives rise to high power and latency. NoCs with mm-wave wireless links working in the 10-100 GHz range is one of the promising alternatives to alleviate the performance limitations of traditional wireline counterparts [1]. NoCs with long-range wireless links can be designed in various ways. One possibility is to design hierarchical NoC architectures with mm-wave, long-range wireless shortcuts (HmWNoC) as proposed in [2]. The system is partitioned into multiple small clusters of neighboring cores called subnets. The subnets have regular NoC architectures like a mesh or ring. The cores in a subnet are connected to a centrally located hub through wired links, and the hubs from all subnets are connected in a second level network forming a hierarchical structure. This is achieved by interconnecting adjacent hubs with wireline links, and by introducing a few long-range mm-wave wireless shortcuts between distant hubs. Alternatively in this work, we consider design of a small-world NoC architecture with mm-wave wireless links (mSWNoC) in which the nodes are connected based on a power-law model. Our aim is to quantify the performance-reliability-area

overhead tradeoffs for the HmWNoC and mSWNoC architectures. On-chip wireless links are an emerging technology and may encounter high rates of failures due to integration challenges. On the other hand, it is the enabler to achieve the performance benefits for both HmWNoC and mSWNoC. It is important to study the effects of wireless link failures on the performances of both of these.

II. RELATED WORK Conventional NoCs use multi-hop, packet switched

communication. NoCs have been shown to perform better by inserting long range wired links following the principles of small world graphs [3]. A comprehensive survey regarding various wireless NoC (WiNoC) architectures and their design principles is presented in [1]. Possibilities of creating novel architectures aided by the on-chip wireless communication have been explored in [2] and [4]. These two works proposed design of hierarchical and hybrid WiNoC architectures using long-range wireless shortcuts.

In this work, our aim is to introduce design methodology for a small-world mm-wave wireless NoC where the network architecture is designed following the power-law based connectivity and without incurring additional area overhead inherent in the hierarchical structure. We present detailed performance evaluation of the proposed architecture with respect to the already proposed hierarchical counterpart and show that mSWNoC performance degrades less than that of HmWNoC.

III. NETWORK TOPOLOGY In this section we describe the salient characteristics of the

HmWNoC and mSWNoC architectures.

A. HmWNoC Architecture The design of an HmWNoC was proposed in [2]. The

whole system is divided into multiple small clusters of neighboring cores called subnets. These subnets are connected like a standard NoC. The cores are also connected to a centrally located hub through direct links and the hubs from all the subnets are connected in a 2nd level network forming a hierarchical structure. The hubs are connected in a mesh topology and on top of it a few long-range mm-wave wireless links between distant hubs are distributed. The distribution of wireless interfaces (WIs) is done optimally to reduce hop-

This work was supported in part by the US National Science Foundation (NSF) grants CCF-0845504, CNS-1059289, and CCF-1162202, and Army Research Office grant W911NF-12-1-0373.

978-1-4799-0493-8/13/$31.00 © 2013 IEEE ASAP 201379

count. The hybrid and hierarchical nature of the HmWNoC introduces various possibilities for the overall system architecture. It is already demonstrated that a star-ring based subnet architecture in conjunction with a mesh-based upper level with long range wireless links provides the best performance-overhead tradeoff [2]. The suitable hierarchical divisions that achieve the best performance for 64, 128, and 256-core systems turn out to be 8, 8, and 16 subnets respectively [2].

B. Architecture of mSWNoC In the proposed mSWNoC topology, each core is

connected to a switch and switches are interconnected using both wireline and wireless links. The topology of the mSWNoC is a small-world network where the links between switches are established following a power-law model [5]. In this small-world network there are still several long wireline interconnects. As these are extremely costly in terms of power and delay, we use mm-wave wireless links to connect switches that are separated by a long distance. In [6], we have demonstrated that it is possible to create three non-overlapping channels with on-chip mm-wave wireless links. Using these three channels we overlay the wireline small-world connectivity with the wireless links such that a few switches get an additional wireless port. Each of these wireless ports will have WIs tuned to one of the three different frequency channels. Each WI in the network is then assigned one of the three channels; more frequently communicating WIs are assigned the same channel to optimize the overall hop-count. Then one WI is replaced by a gateway WI that has all three channels assigned to it; this facilitates data exchange between the non-overlapping wireless channels.

We have assumed an average number of connections from each switch to other switches, <k>. The value of <k> is chosen to be four so that the mSWNoC does not introduce any additional switch overhead with respect to a conventional mesh. Also, an upper bound, kmax, is imposed on the number of wireline links attached to a particular switch so that no particular switch becomes unrealistically large in the mSWNoC. This also reduces the skew in the distribution of the links among the switches. Both <k> and kmax do not include the local NoC switch port to the core.

IV. COMMUNICATION AND CHANNELIZATION This section describes the overall communication

mechanism that includes routing and flow control and the WI components for both the HmWNoC and mSWNoC.

A. Routing and Flow Control For the HmWNoC, intra-subnet data routing depends on

the Ring-Star subnet topology. Deadlock avoidance is achieved within the subnet by adopting the Red Rover algorithm [7]. The routing strategy for the hubs is a combination of dimension order routing, for the hubs without WIs, and a South-East routing algorithm for the hubs with wireless shortcuts; this routing algorithm is deadlock free [3].

For the mSWNoC to ensure deadlock-free routing, we adopted an up/down tree-based routing architecture [8]. All

wireless and wireline links that are not a part of the tree are introduced as shortcuts. An allowed route never uses an up direction along the tree after it has been in the down path once. Hence, channel dependency cycles are prohibited, and deadlock freedom is achieved [8].

Between a source and destination pair, the wireless links, through the WIs, are only chosen if the wireless path reduces the total path length compared to the wire line path and the WI has a token, as described below. This can potentially give rise to a hotspot situation in the WIs. Messages will try to access the WI shortcuts simultaneously, thus overloading the WIs, which would result in higher latency and energy dissipation. Token flow control [9] is used to alleviate this problem. An arbitration mechanism is designed to grant access to the wireless medium to a particular WI at a given instant to avoid interference and contention between the WIs that have the same frequency. To avoid centralized control and synchronization, the arbitration policy adopted is a wireless token passing protocol [2].

B. Wireless Interface The two principal WI components are the antenna and the

transceiver. The on-chip antenna for both architectures has to provide the best power gain for the smallest area overhead. A metal zig-zag antenna [10] has been demonstrated to possess these characteristics. To ensure high throughput and energy efficiency, the WI transceiver circuitry has to provide a very wide bandwidth as well as low power consumption. The detailed description of the transceiver circuit is out of the scope of this paper and is available in [2] and [6].

V. EXPERIMENTAL RESULTS A. Experimental Setup

The widths of all wired links are considered to be the same as the flit size, which is 32 in this paper. Each packet consists of 64 flits. The input and output ports, including the ports connected to WIs, have four virtual channels per port. All ports except those associated with the WIs have a buffer depth of two flits. The ports associated with the WIs have an increased buffer depth of eight flits to avoid excessive latency penalties while waiting for the token. Increasing the buffer depth beyond this limit does not produce any further performance improvement for this particular packet size, but will give rise to additional area overhead [2].

B. Determination of Topology First, our aim is to determine the suitable maximum

number of ports for the switches of mSWNoC. In [5] it is shown that the optimum values for kmax are 7, 8, and 8 for 64, 128, and 256 system sizes respectively.

As mentioned previously for HmWNoC, all of the cores in a subnet are connected to the hub via wireline links. These direct connections, however, will lead to additional area overhead. By decreasing the number of core-to-hub direct connections in each subnet, the hub area overhead can be reduced but that will also result in performance degradation. To analyze the area-bandwidth tradeoff of HmWNoC, we consider three types of core to hub connectivity configurations.

80

HmWNoC architecture with all cores connected to the hub in the subnets is called HmWNoC network with full connection (HmWNoC_fc). Reducing the number of core-to-hub connections by half we create a network with intermediate connection (HmWNoC_ic). Even further reducing the numbers of core-to-hub connections by half, we design an HmWNoC with sparse connection (HmWNoC_sc) [2]. For the 128-core HmWNoC, the numbers of core-to-hub connections for the HmWNoC_fc, HmWNoC_ic, and HmWNoC_sc are 16, 8, and 4 respectively. Similarly for the 64-core HmWNoC the numbers of connections for the HmWNoC_fc, HmWNoC_ic, and HmWNoC_sc are 8, 4, and 2 respectively, and for the 256-core HmWNoC, the numbers are 16, 8, and 4 respectively. It is shown in [2] that by reducing the number of core-to-hub direct connections the hub area overhead reduces but it also significantly affects the overall achievable bandwidth of the system. The suitable HmWNoC configuration should be chosen by considering the relative importance of area overhead and bandwidth depending on design requirements.

C. Wireless Channel Characteristics The metal zig-zag antennas described earlier are used to

establish the on-chip wireless links. We are able to obtain three different channels with 3dB bandwidths of 16 GHz and center frequencies of 31, 57.5, and 120 GHz respectively with a communication range of 20 mm [6]. The wireless transceiver circuitry is designed and simulated using the TSMC 65-nm CMOS process and occupies an area of 0.3 mm2. The wireless link dissipates 2.3 pJ/bit and sustains a data rate of 16 Gbps [2].

D. Optimum Number of WIs and Placement We next determine the optimum number and location of

WIs that will be placed on both architectures for each system size. From [2] it was shown that WI placement is optimum when the distance between WIs is at least 7 mm (in the 65 nm technology node). Using this 7 mm length constraint, the optimum number of WIs for the HmWNoC is 5, 5, and 7 for system sizes of 64, 128, and 256 respectively [2]. For HmWNoC, as the 2nd level of the network is much smaller in size compared to the mSWNoC where WIs can be placed at any switch, the total possible number of WIs is also less. Also, significant performance gains are achieved through the hierarchical division. Hence, the number of WIs needed to achieve a given performance gain is less.

In mSWNoC, there is a possibility of having switches separated by long distances connected by wireline links when creating the small-world network according to [5]. We want to replace those with the more energy efficient wireless links. To accommodate this, we replace the long wireline links with more energy efficient wireless shortcuts, while still following the 7 mm length constraint between WIs mentioned above. The

main difference between mSWNoC and HmWNoC WI placement is that for mSWNoC, we follow the power-law based connectivity to place WIs; whereas for HmWNoC, we start with a random placement of WIs in the mesh network and then use SA to find the optimum placement. For mSWNoC, however, not all long wireline links are replaced. As mentioned above, WIs must be placed 7 mm or further from each other. A wireline link cannot be replaced if it is a part of the routing tree [8]. Like the HmWNoC, the allowable number of WIs depends on the token return period. The optimum number of WIs is 12, 13, and 13 for 64, 128, and 256 system sizes respectively [5]. Due to the above-mentioned link length and tree routing constraints, the number of possible locations for WIs to be placed does not increase with system size. Also, the token return period limits the allowable number of WIs. Therefore, the optimum number of WIs does not scale up with increasing system size.

E. Comparative Performance Analysis Figs. 1 and 2 show the achievable bandwidth and energy

dissipation for the mSWNoC and the three HmWNoC architectures considering all the three system sizes. All of these wireless NoC architectures offer more bandwidth and dissipate less energy than a conventional mesh, but at the cost of additional area overhead. Fig. 3 shows this area overhead, compared to a standard mesh of the same system size. Our aim is to find the wireless architecture that provides the best performance-area overhead trade-off.

It is clear that HmWNoC_fc performs better than mSWNoC. The performance benefit mainly stems from the hierarchical nature of the network. However, this benefit comes at the cost of higher area overhead. The area overhead of HmWNoC arises from the subnet hubs with extra ports and the wireless transceivers. Whereas in mSWNoC, the area overhead arises from the wireless transceivers only. As an example, when the 128-core HmWNoC is divided into 8 subnets each with 16 cores, then each hub in those subnets has 16 extra ports. Hence, though the mSWNoC has more WIs, the hub area in HmWNoC is still much larger. Also, the area overhead with an increase in system size does not increase in the mSWNoC, whereas in the HmWNoC it does. This is due to the fact that with an increase in system size, either the hub size

Figure 2. Energy dissipation of mSWNoC, HmWNoC, and Mesh.

Figure 1. Bandwidth of mSWNoC, HmWNoC, and Mesh.

Figure 3. Area overhead of mSWNoC and HmWNoC architectures.

81

or the number of hubs increases in the HmWNoC, but in the mSWNoC, the number of WIs is the same for 128 and 256 system sizes; and for 64, it is just one less. If the number of core to hub connections in HmWNoC is reduced then its bandwidth reduces and energy dissipation increases, but the area overhead reduces too. As shown in Figs. 1 and 2, HmWNoC_ic performs better than mSWNoC and this is again due to the hierarchical nature of the network, but the effects of reduction in the number of core-to-hub connections start to become evident. In case of HmWNoC_sc, the reduction of core-to-hub connections results in less bandwidth and higher energy compared to mSWNoC. The overhead for HmWNoC_sc is less than that of mSWNoC for 64 and 128 system size, but for the 256-core system even its area overhead is more. With reduction in core-to-hub connections in the HmWNoC_sc, the data from the cores without having direct connection to the hubs need to traverse multiple hops within the subnet before they can access the wireless channel and hence the performance degrades. Consequently, the benefit of hierarchical network begins to degrade and mSWNoC becomes better. This analysis proves that if the area overhead of the HmWNoC is reduced to the same level of mSWNoC, then mSWNoC can outperform HmWNoC. The higher performance of a fully connected HmWNoC comes at the cost of more area.

F. Robustness to WI Failure The performance advantages of all these wireless NoC

architectures principally arise from the long-range wireless links. Hence, it can be anticipated that failure of the WIs will compromise those performance gains. Therefore, in this section our aim is to evaluate performance of mSWNoC and the three HmWNoC architectures in presence of WI failures. The failures are injected into the system by randomly disabling a certain percentage of WIs.

Fig. 4 shows the bandwidth of a 128-core mSWNoC, HmWNoC_fc, HmWNoC_ic, and HmWNoC_sc with WI failures in presence of a uniform random traffic distribution. We also show the additional silicon real estate necessary to enable these architectures. As explained above, the area overhead includes contribution from both wireline and wireless resources. It is clear that the HmWNoC_fc and HmWNoC_ic architectures perform better than mSWNoC when there are no WI failures; however, the loss in HmWNoC performance is much greater than that of mSWNoC when the WIs fail. As can be seen in Fig. 4, the bandwidth degradation of mSWNoC is only 19%, when 75% of the WIs fail, whereas for HmWNoC the reduction is 26%, 53%, and 54% for the three configurations respectively. Also, mSWNoC outperforms HmWNoC_ic and HmWNoC_sc and performs slightly worse than HmWNoC_fc when 75% of the WIs fail. Although HmWNoC_fc outperforms mSWNoC even with high WI failure, it has a larger area overhead and this trade-off must be considered. The HmWNoC_fc achieves 1.3 times better bandwidth with 75% WI failure; however, it occupies twice the area of the mSWNoC. In presence of 50% and 75% WI failures, mSWNoC provides more bandwidth than HmWNoC_ic with less area overhead. Only, HmWNoC_sc

requires less area overhead than the mSWNoC, but it achieves much less bandwidth in all the cases. This implies that mSWNoC provides better performance-area overhead trade-off than HmWNoC and it is more robust.

VI. CONCLUSION In this paper we have undertaken a detailed performance

evaluation of mm-wave wireless NoC architectures and established the relevant design trade-offs. We have demonstrated that a hierarchical architecture with long-range wireless shortcuts achieve more bandwidth with lower energy dissipation compared to the power-law connectivity based small-world architecture. But, these benefits come at the cost of higher silicon real estate. If the area overhead of the hierarchical architecture is reduced then the small-world architecture offers better performance. The small-world wireless NoC is more resilient to wireless link failures, though its raw performance may be less.

REFERENCES [1] S. Deb, et al., “Wireless NoC as interconnection backbone for multicore

chips: promises and challenges,” IEEE J. on Emerg. Sel. Topics in Circuits Syst., vol. 2, no. 2, 2012, pp. 228-239.

[2] K. Chang, et al., “Performance evaluation and design trade-offs for wireless network-on-chip architectures”, ACM J. Emerg. Technol. Comput. Syst., vol. 8, no. 3, 2012.

[3] U. Y. Ogras and R. Marculescu, “It’s a small world after all: NoC performance optimization via long-range link insertion,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 7, 2006, pp. 693-706.

[4] A. Ganguly, et al., “Scalable hybrid wireless network-on-chip architectures for multi-core systems,” IEEE Trans. Comput., vol. 60, no. 10, 2011, pp. 1485-1502.

[5] P. Wettin, et al., “Energy-efficient multicore chip design through cross-layer approach”, Proc. of IEEE Design, Automation and Test in Europe (DATE), 2013.

[6] S. Deb, et al., “CMOS compatible many-core NoC architectures with multi-channel millimeter-wave wireless links,” Proceedings of Great Lakes Symposium on VLSI (GLSVLSI), May 2012.

[7] J. Draper and F. Petrini, “Routing in bidirectional k-ary n-cube switch the red rover algorithm,” Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997, pp. 1184-1193.

[8] H. Chi and C. Tang, “A deadlock-free routing scheme for interconnection networks with irregular topology,” Proceedings of ICPADS, pp. 88-95.

[9] A. Kumar, et al., “Token flow control,” Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO), 2008, pp. 342-353.

[10] A. B. Floyd, et al., “Intra-chip wireless interconnect for clock distribution implemented with integrated antennas, receivers, and transmitters,” IEEE J. of Solid-State Circuits, vol. 37, no. 5, 2002, pp. 543-552.

Figure 4. Bandwidth and area of 128-core systems with WI failure.

82

[IEEE 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and...

Documents

Transcript of [IEEE 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and...