for Networks on Chips with Emerging Interconnect...

Post on 01-Feb-2018

219 views 0 download

Transcript of for Networks on Chips with Emerging Interconnect...

Architectures for Networks‐on‐Chips with Emerging Interconnect Technologies 

1

CMPE 750 –Project Presentation 

Sagar Saxena

Content…

2

System on chip : Idea & overview Need for Multi-Core chip NOCs Paradigm Traditionally used & Limitation 2D Conventional NOCs architecture Wiring complexity Emerging interconnect technology

3D NOCsPhotonic NOCs

CORONA ARCHITECTUREFIREFLY ARCHITECTURE

On- chip RF/wireless interconnect On-chip Antennas used Small world network : Wattz Strogatz Model

3

System on chipTechnology Scaling

Transistor count:exponential growth

Prohibitive manufacturing costs

Over 100 millions for today’s SoCs

10M$ - 100M$0.13micron designs

Reusability + Platform Based Design

4

System on Board vs. System on ChipConceptually System on Chip refers to integrating the components of a board onto a single chip.Looks straightforward but productivity levels are too low to make it a reality

5

Examples of IP Blocks in Use today• RISC: ARM, MIPS, PowerPC, SPARC• CISC: 680x0 x86• Interfaces: USB, PCI, UART, Rambus• Encryptions: DES, AES• Multimedia” JPEG coder, MPEG decoder• Networking: ATM switch, Ethernet• Microcontroller: HC11, etc.• DSP: Oak, TI, etc.

SoC is forcing companies to develop high-quality IP blocks to stay in business.

Overview of Design and Technology Trends

1.5GHz Itanium chip (Intel), 410M tx, 374mm2 , 130W@1.3V

1.1 GHz POWER4 (IBM), 170M tx, 115W@1.5V

if these trends continue, power will become unmanageable

150Mhz Sony Graphics Processor, 7.5M tx (logic) + 280M tx (memory) = 288M tx, 

400mm2 10W@1.8V

if trend continues, most designs in the future will have a high percentage of memory

Contd…

Interconnects are the biggest bottleneck

We need to look beyond the metal/dielectric‐based planar architectures

Optical, 3D integration and Wireless are the emerging alternatives

Single‐chip Bluetooth transceiver (Alcatel), 400mm2, 150mW@2.5V

required 30 designers over 2.5 years (75 person‐years)

if trend continues, it will be difficult to integrate larger systems on a single chip in a 

reasonable time

Moore’s Law

•The number of transistors in a

chip doubles approximately every

two years

• This provide the computational 

power demanded so far

•Challenge :

• Scaling up of frequency causes 

elevated power dissipation. Original Moore's Law graph, 1965

Source: Intel

Power Dissipation

0.1

1

10

100

1,000

10,000

’71 ’74 ’78 ’85 ’92 ’00 ’04 ’08

Power Density(Watts/cu. m)

40048008

80808085

8086286

386

486

Pentium®

processorsHot PlateNuclear Reactor

Rocket Nozzle

Sun’s Surface

Source: Intel

• Scaling up speed/frequency is impossible

Multi‐Core Systems Era

Challenge :

Keeping up the demand of  computational power  as clock 

frequency cannot be scaled. 

Solution: 

Increase number of cores ‐ parallelism

• Intel, AMD dual‐core and quad‐core CPUs

• Custom Systems‐on‐Chip (SoCs)

• Number of cores will increase manifold in the next 5‐10 years

New challenge: interconnection of the cores!Single-chip Cloud

ComputerIntel processor

Noc: The Network‐on‐Chip Paradigm Packet switched on‐chip network

Route packets, not wires –Bill Dally, 2000.

Dedicated infrastructure for data transport

Decoupling of functionality from communication

NoC infrastructure

Multiple publications in IEEE ISSCC, 2010 from Intel, IBM, AMD, Renesas Tech. and Sun Microsystems show that multi-core NoC is a reality

Limitations of a Traditional NoCMulti‐hop wireline communicationHigh Latency and energy dissipation 

source destination

-core

-NoC interface

-NoC switch

80% of chip power will be from on-chip interconnects in the next 5 years –ITRS, 2007

Wireless/RFInterconnects

Optical Interconnects

Three DimensionalIntegration

Novel Interconnect Paradigms for Multicore designs

High Bandwidth and

Low Energy Dissipation

14

The network-on-chip paradigm

Driven by

Increased levels of integration

Complexity of large SoCs

– New designs counting 100s of IP blocks

Need for platform‐based design methodologies

DSM constraints (power, delay, time‐to‐market, etc…)

15

Some Common Architectures (a) Mesh, (b) Folded-Torus (FT) and (c) Butterfly Fat Tree (BFT)

- Functional IP - Switch

(a) (b)

(c)

16

Data Transmission Packet‐based communication Low memory requirement

Packet switching

Wormhole routing Packets are broken down into flow control units or flits 

which are then routed in a pipelined fashion

17

Connecting Different IP Blocks Using Tree Architecture

18

2D Network on Chips

• Current SoC contains several different microprocessor components, memory sub‐systems and I/Os together on single chip.

• Fast developing SoC Market• Development of Multi‐processor SoC platforms for embedded processors.

• MP‐SoC contains combination of numerous IPs performing computation on different clock frequencies which communicate with each other.

19

•Integration of all the blocks on the single chip.

•Non-scalable global wires, as delay increases exponentially with increase in the length.

•In ultra-deep sub-micron process 80 % of delay is due to interconnects

•FIFO are used for data synchronization but it becomes costly due to process variability and

power dissipation for achieving cross chip signaling in single clock cycle.

•Thus there is a need for on chip interconnect architecture: frequently used are shared medium

arbitrated bus.

•Hence use of network on chip : parallelism, exhibits modularity, less power consumption.

•High throughput and low latency, low energy consumption with little area overhead.

Challenges & Need

20

SPIN (Scalable, Programmable Interconnect Network)

• Guerrier and Greiner proposed SPIN

• Parent is represented four times at the

entry.

• Picture shows N = 16 nodes, hence 16

IP blocks can be connected.

• Network growth = (NlogN)/8.

• Switches are placed at vertices and

number of switches = (3N)/4

21

CLICHÉ (Chip level Integration of communicating heterogeneous Elements)

• m x n Mesh of switches 

interconnecting Ips

• Every switch is connecting to the 

neighboring four switches.

2D Torus

Dally and Towles proposed 2D Torus

22

Folded Torus

23

• Enhanced version of 2D Torus.

• Can suitably be implemented in 

the VLSI implementation.

• The length of each segment 

remains same.

24

OCTAGON

Karim et al. Proposed OCTAGON for

MP‐SoC

Picture shows 8 nodes and 12 bi‐

directional links.

Communication between pair of nodes

will take atleast 2 hops within the basic

orthogonal unit.

For more large systems orthogonal unit

is extended to multidimensional space.

25

Butterfly Fat‐Tree (BFT)• Ips are placed in the leaves and switches are placed at

the vertices.

• Nodes are denoted by (l,p) where l denotes level and p

denotes position.

• Switch is denoted by S(l,p).

• Four down links corresponding to child ports and two

uplinks for parent port, so total number of switches (at

level j = N/2 ^ (j+1) ) at j= 1 is N/4.

• As we go on going up the switches required will reduce

by 2.

2626

Performance Metrics•Architecture should exhibit ‐> high throughput, low latency, energy efficiency, & low area overhead.•Message Throughput ‐> 

•Transport Latency ‐> Occurrence of message header from source to occurrence of tail at the destination.  

•Energy ‐> Energy consumed due to the use of interconnects and switches.

•Area Requirements ‐> The area consumed by the switches and the interconnects. Some interconnects need to be buffered, this also increases the area.

2727

Evaluation Methodology•Developed Simulator for flit‐level event driven worm‐hole routing to study characteristics of interconnect

architectures.

•Traffic injection is based on two distributions namely poisson and self‐similar.

•Each simulation run for 1000 cycles to allow transient effects to stabilize and then executed for 20,000 cycles.

•Throughput calculated as number of flits reaching each destination per unit time.

•Average latency and energy is calculated as shown in previous slide.

•For area complete VHDL model was synthesized using 0.13 micro meter technology and calculated for area.

•To determine interconnect energy capacitance was calculated for each topology.

2828

No. of Channels & throughput Variation throughput under spatially uniform distribution 

Variation of latency with virtual channels

Energy Dissipation profile for uniform traffic 

3030

Chip area based on Topology•Wire length between switches in BFT and SPIN architectures

•To keep inter‐switch delay within one clock cycle some wires need to be buffered.

•In CLICHÉ and Folded Torus, all the inter‐switch wires are of equal length and delay is within one clock 

cycle.

•In Octagon, the inter‐switch wires connecting functional IP blocks in disjoint octagon units needs to be 

buffered.

31

Area Overhead

• Silicon area overhead for all platforms under

consideration across different technology nodes,

assuming die size of 20mm x 20mm and each

functional IP block consists of 100k gates equivalent

of 2‐input NAND gates.

• SPIN and Octagon has higher silicon area overhead

due to higher degree of connectivity.

• The percentage of silicon area overhead for different

platforms increases slightly with technology scaling

but relative area remains unchanged.

32

Wiring complexity•Wire lengths between switches in the BFT and SPIN depends upon level of switches.

•Inter‐switch wire length is given by:

•For CLICHÉ and Folder Torus all the wire segments are of same length.

•For CLICHÉ is given by   and folded torus is all the wire lengths are double than    CLICHÉ.

•In Octagon links are differentiated as long links (0‐4, and 7‐3), medium links(0‐7, 3‐4, 1‐5, 2‐6) and 

short links(0‐1, 1‐2, 2‐3, 4‐5, 5‐6, and 6‐7).

33

Wiring Octagon

34

Wiring SPIN

35

Wiring BFT

36

Conclusion• NoC based architecture with respect to topology, various topologies such as SPIN,

Folded Torus, CLICHÉ, Octagon and BFT are discussed with effect to various factors

power consumption, Latency, effect on area, wiring complexity, throughput,

bandwidth, etc.

• Results showing some can achieve high data rate with an expense of high energy

while others can provide lower data rate with low energy consumption.

• Locality also is an important aspect while designing NoC as it can reduce power

consumption and increase throughput significantly.

• These are important aspects of characterizing emerging network on chip architecture.

37

SO FAR……. NOC Paradigm

Traditionally used & Limitation2D Conventional NOC architectureWiring complexity

Now……. Emerging interconnect technology

3D NOCsPhotonic NOCs

CORONA ARCHITECTUREFIREFLY ARCHITECTURE

On- chip RF/wireless interconnect On-chip Antennas used

38

• Reduction in total power consumption

• Smaller footprint

• Shorter interconnects

• Circuit Security and many more…..

Figure 1. Stacked IC [2]

Why go 3D??????

Three-Dimensional Integrated Circuits

• Developing very fast

• Active devices stacked in many layers

• Reasons

– Limited floor planning choices

– Desire to integrate disparate technologies

(GaAs, SOI, SiGe, BiCMOS) and diparate

signals (analog, digital,RF)

– Interconnect bottleneck

39

2D IC

3D IC

As small as 20µm

40

3D NoC

• Pavlidis et al., “3-D topologies for Networks-on-Chip”, IEEE Transactions on Very Large Scale Integration (TVLSI), 2007.

• Stacking multiple active layers

• Manufacturability

– Mismatch between various layers

– Yield is an issue

• Temperature concerns

– Despite power advantages, reduced

footprint increases power density

41

Architectures

424242

Photonic NoC• High bandwidth photonic links for high payload transfers• Challenges: On‐going research– On‐chip integration of photonic components

• A. Shacham et al., “Photonic Network-on-Chip for Future Generations of Chip Multi-Processors”, IEEE Transactions on Computers, 2008.

43

Corona Architecture

• HP Labs

• Serpentine Waveguide

• 3D layer for photonic

devices

43

44

Firefly Architecture

*Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: illuminating future network-on-chip with nanophotonics.

In Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09). ACM, New York, NY, USA, 429-440.

45

On-Chip RF/Wireless Interconnects• Replace long distance

wires• Use of waveguides out of

package or IC structures like parallel metal wires

• Chang et al. demonstrated Transmission Line based RF interconnect for on chip communication– Not really wireless

46

STATE –OF‐ART OF NoC

On‐Chip Antennas

Several types• Metal• Carbon Nanotube (CNT)High B/W, frequency light (IR, visible, UV)

• Small wavelength: small antennas• less areaCNTs as Optical AntennasDirectional radiation characteristics

• Agrees with radio antenna theoryLaser excitation

47

• Kempa, et al., "Carbon Nanotubes as Optical Antennae," Advanced Materials, 2007. • Slepyan, et al., "Theory of optical scattering by achiral carbon nanotubes and their potential as optical nanoantennas," Physical Review B.

48

Reliability of CNT Antennas• Various uncertainties in CNT growth

– Length– Diameter– Electronic properties

• Semiconducting vs. Metallic

• Rates of uncertainties tend to be high– Much higher than CMOS processes

• Can lead to poor performance– Failure of wireless links

Design Constraints

Limited wireless bandwidth Off‐chip laser sources Design of metal antennas/transceivers

On‐chip wireless nodes have associated overhead Transceiver

Modulator/demodulator antennas

49

How to efficiently distribute the wireless resources?

505050

Hybrid nature of the WiNoC• Augment wireline with

wireless• Not completely wireless• Can use wireless links only for

long distance communication• >10mm

51

Architectural Solutions in Nature• Natural Complex Networks

– Brain: Efficient function and memory retention

– Microbes: Biological Networks

– social networks – Paul Erdös

52

Topology• Small-World graphs: The Watts-Strogatz Model

• Often found in nature

• Scales well: low average distance

• Few high speed shortcuts: Wireless

• Local, shorter links: Wireline

regular latticeL: HiC: Hi

Small-worldL: LoC: Hi

random graphL: LoC: Lo

9 May 2016 53

Small World and Scale-Free Network Model

• Stochastic links• Power-tail probability distribution

– SF: Degree/node– SW: Link Length

P

l or d

54

Fault-Tolerant Wireless NoC• Small-World topology

– Inherently fault-tolerant• Link Insertion

– Following adjacent probability distribution

• Few high speed shortcuts: Wireless Interconnects

• Local, shorter links: Wireline

P(i, j) lij fij

lij fij

j

i

55

Experimental Setup• Topology

– Small World• wire line and wireless

• System Size– 64 & 256 cores– Representative of current industry designs

• Wireless links– 24 links

B.G. Lee et al., “Ultrahigh-Bandwidth Silicon Photonic Nanowire Waveguides for On-ChipNetworks,” IEEE Photonics Technology Letters, 2008

56

Experimental Results• Throughput

– Flits/core/cycleThis image cannot currently be displayed.

64 cores 256 coresMinimal effect on performance with wireless link failures

A. Ganguly, et al., “Complex Network Inspired Fault-tolerant NoC Architectures with WirelessLinks”, Proc. Of IEEE/ACM International Symposium on Networks-on-Chip (NOCS), May, 2011.

57

Comparison of Saturation Throughput

• Hierarchical Wireless NoC (WiNoC)– Same CNT wireless links

This image cannot currently be displayed.

This image cannot currently be displayed.

64 cores 256 cores

Better fault-tolerance than hierarchical NoC

58

Packet Energy Dissipation•Packet Energy Dissipation

•Significantly less than wireline NoCs

Increase is insignificant with wireless link failures

Wiring & Area Overheads

59

This image cannot currently be displayed.

This image cannot currently be displayed.

• Wiring requirements for 64 core systems

•Area overheads for 64 core systems

Insignificant real estate overheads

60

Conclusion

• On-chip wireless communication with CNTantenna• Efficient NoC architecture can be built• But emerging interconnect technologies are defect-prone

• Inherent resilience of natural complex networks• Wireless NoC topologies inspired by the small-world architecture• Better power-performance trade-off• Fault-Tolerance• Security

61

REFERENCE PAPERS Networks‐on‐Chip in a Three‐Dimensional Environment: A

Performance Evaluation. Scalable Hybrid Wireless Network‐on‐Chip Architectures for Multi‐

Core . Systems Co‐design of 3D Wireless Network‐on‐Chip Architectures

with Microchannel‐Based Cooling. Performance Evaluation and Design Trade‐Offs for Network‐on‐

Chip Interconnect Architectures. Corona: System Implications of Emerging Nanophotonic

Technology. Firefly: Illuminating Future Network‐on‐Chip with Nanophotonics. It’s a Small World After All”: NoC Performance Optimization Via

Long‐Range Link Insertion.

62