University of Rostock Institute of Applied Microelectronics and Computer Engineering Monitoring and...

Post on 08-Jan-2018

217 views 0 download

description

3 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 1. Introduction  Increasing integration density → rising complexity, shrinking device sizes  NoCs able to deal with arising requirements (e.g. for communication)  But: Reliability becomes a dominant factor for chip design  Goal: Increase reliability in NoC-based systems  Increasing integration density → rising complexity, shrinking device sizes  NoCs able to deal with arising requirements (e.g. for communication)  But: Reliability becomes a dominant factor for chip design  Goal: Increase reliability in NoC-based systems Impacts of technological development

Transcript of University of Rostock Institute of Applied Microelectronics and Computer Engineering Monitoring and...

University of Rostock Institute of Applied Microelectronics and Computer Engineering

Monitoring and Control of Temperature in Networks-

on-ChipTim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk

Timmermann;

MEMICS 2010, Mikulov, Czech Republic, October 22-24

2Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCsOutline

1. Introduction

2. Networks-on-Chip (NoCs)

3. Impact of Temperature on Reliability

4. Monitoring & Control of Temperature in NoCs

5. Summary

Tran

sist

or c

ount

1954: IBM 704 Mainframe

1981: IBM PC5150

2007: Apple iPhone

3Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs1. Introduction

Increasing integration density → rising complexity, shrinking device sizes

NoCs able to deal with arising requirements (e.g. for communication)

But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems

Impacts of technological development

4Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCsOutline

1. Introduction

2. Networks-on-Chip (NoCs)

3. Impact of Temperature on Reliability

4. Monitoring & Control of Temperature in NoCs

5. Summary

IP core

IP core

IP core

IP core

R

R R

R

CLK0

CLK3

CLK1

CLK2

5Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs2. Networks-on-Chip

Infrastructure for on-chip interconnection Point-to-point links replace long global

busses Parallel packet-based communication Separation of communication &

computation Globally asynchronous locally synchronous

(GALS) Modularity of IP cores (not part of actual

NoC) reusability, high abstraction level

Properties

NoCs are able to satisfy requirements of modern VLSI systems

6Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCsOutline

1. Introduction

2. Networks-on-Chip (NoCs)

3. Impact of Temperature on Reliability

4. Monitoring & Control of Temperature in NoCs

5. Summary

7Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs3. Impact of Temperature on Reliability

Increasing integration densities, progress of nanotechnology Growing number of transistors per chip = raised probability

of failure decreasing structural size of ICs = higher susceptibility to

environmental influences & deterioration

Impacts of technological progress

Intel 8086 (1978): ≈879

transistors/mm²

Intel Bloomfield (2008): ≈2,78 Mio.

transistors/mm²

8Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs3. Impact of Temperature on Reliability

Particular physical effects (e.g. TDDB, EM) contribute to deterioration Abetted by high temperatures

Correlation between temperature & failure mechanisms established by Arrhenius model Exponential decrease of IC lifetime with

temperature

Why is thermal awareness important?

Growing influence of on-chip temperature distribution on lifetime, operability, performance etc.

TkE

failb

a

eT *

9Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCsOutline

1. Introduction

2. Networks-on-Chip (NoCs)

3. Impact of Temperature on Reliability

4. Monitoring & Control of Temperature in NoCs

5. Summary

Mitigate effects contributing to deterioration & delay occurrence of failures Control of on-chip temperature distribution

10Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4. Monitoring and Control of Temperature for NoCs

Objective:

Effective mechanisms to monitor & control on-chip temperature

Integration into existing NoC Preservation of modularity & reusability Minimum costs (area, frequency) Maximum performance of monitoring and control Minimum impact on system performance

Requirements:

11Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4.1 Mechanisms for monitoring Concept: attach physical monitoring

probes to every IP core

temperature variation ∆T Continuous checking of

TIPC

|TIPC,old - TIPC,new| ≥ ∆T ? Report TIPC,new

Area: 66 LUT/FF pairs Frequency: 227 MHz

Event-driven:

Period of time ∆t Report TIPC,new every ∆t

Area: 80 LUT/FF pairs Frequency: 338 MHz

Time-driven:

IP core

CCU

IP core

IP core

R

R R

R

12Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4.2 Mechanisms for control

Reception & interpretation of probe packets

Instructions for Dynamic Frequency Scaling to probes (if necessary)

Area: 507 LUT/FF pairs Frequency: 165 MHz

Central Control Unit (CCU):

!!! Not the smartest approach, but suffices to test functionality !!!

R

IP coreP

IP coreP

R

IP core

RP

Area penalty: 30,5%

Freq. penalty: 8,2%

Area penalty: 7,3% Freq. penalty: /

(but Mux/Demux)

Area penalty: / Freq.

penalty: /

13Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4.3 Integration of monitoring 3 approaches Different impact on performance & costs

Into IP core: Router port of IP core: Extra router port:

14Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4.4 Impact on system performance

15Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs4.5 Performance of monitoring & control

16Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs5. Summary

Event-driven approach preferable (situational monitoring, better performance, no redundant traffic, lower area costs)

Integration into NoC using router port of IP core best trade-off between costs & preservation of modularity/non-intrusiveness

Conclusion

Implementation of 2 approaches for monitoring on-chip temperature + 3 methods for integration into NoC

Investigation of: Costs (area, frequency) Impact on system performance Performance of monitoring & control

Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Thanks for your attention!Any questions?

tim.wegner@uni-rostock.de

www.networks-on-chip.com

University of Rostock, GermanyInstitute of Applied Microelectronics and Computer Engineering

Contact:

Homepage:

Establishes relationship between temperature and failure mechanisms

Describes dependence of chemical reactions on temperature changes

Assumption: all other parameters constant

T fai

l

Temperature

18Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Arrhenius Model

TbkaE

efailT*

Lifetime of ICs decreases exponentially with temperature

Monitoring and Control of Temperature in NoCs

19Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Monitoring and Control of Temperature in NoCs

Inoperability of transistor through gate oxide breakdown (long-term)

Time Dependent Dielectric Breakdown (TDDB)

20Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Transport of material in conductors (i.e. wires) Cause: ion movement induced by current flow (ions’

mobility increases with temperature) Effects:

• Hillocks short circuits

• Voids interruption of current paths

Electromigration (EM)

Monitoring and Control of Temperature in NoCs

21Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Intel Bloomfield:• Year: 2008• 731 Mio. Transistors• 263mm²• 2779467 Tr./mm2

Intel 8086:• Year: 1978• 29k transistors• 33mm²• 879 Tr./mm²

Intel Processors

Monitoring and Control of Temperature in NoCs

22Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Impact on system performance

Monitoring and Control of Temperature in NoCs

23Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Performance of monitoring & control

Monitoring and Control of Temperature in NoCs

24Tim Wegner - 23 October 2010MEMICS 2010, Mikulov, Czech Republic, October 22-24

Synthesis results for monitoring & control

Component Integration method

Event-driven probe

Time-driven probe

Central Control

Unit

Into IP core

Using IP core port

Extra port

Frequency [MHz]

227 338 165 122 119 112

Area [LUT/FF pairs]

66 80 507 1901 1896 2312

Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz

Monitoring and Control of Temperature in NoCs