Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta...
Temperature Aware Microprocessor Floorplanning
Considering Application Dependent Power Load
Temperature Aware Microprocessor Floorplanning
Considering Application Dependent Power Load
*Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing
Electrical Engineering Department
University of California, Los Angeles, 90095, CA
This work was partially supported by NSF CAREER award and a UC MICRO grant sponsored by Altera and Intel
Chunta Chu is now with Apache Design Solutions
OutlineOutline
Motivation
Problem formulation and models
Experimental results
Conclusion
1
MotivationMotivation
Ever increasing integration level and clock rate lead to increased temperature and temperature gradient Extra clock skew and performance degradation Excessive leakage Increased cooling cost
Increased clock needs interconnect pipelining
Microprocessor floorplan should smooth the temperature gradient and also take into account interconnect pipelining
2
Existing WorkExisting Work
Quick but not accurate [Han: TACS’05] Model temperature by deterministic heat diffusion
model No consideration of interconnect pipelining
More accurate but far less efficient [Sankaranarayanan: JILP’05] and [Nookala: ISLPED’06] Calculate temperature for each potential floorplanning No explicit interconnect pipelining
3
Primary Contribution Primary Contribution
An efficient yet effective floorplanning Explicit modeling of interconnect pipelining by TPWL
model [Long:DAC’04]
Stochastic heat diffusion model to avoid temperature calculation
Reduce highest temperature by up to 3oC and run up to 27x faster
compared with the existing most accurate solution [Sankaranarayanan: JILP’05]]
4
OutlineOutline
Motivation
Problem formulation and models
Experimental results
Conclusion
5
Problem FormulationProblem Formulation
Find a floorplanning for given soft modules of a microprocessor
Minimize
where CPI is average cycles per instruction
normthermal
normCPI
normarea T
TW
CPI
CPIW
Area
AreaWObj max
6
CPI Model [He-Long,DAC’04 ]CPI Model [He-Long,DAC’04 ]
Pre-calculate CPI for a number of floorplans based on predicted trajectory in the solution space
Table lookup to calculate CPI for a new floorplan by interpolation based on its distance to floorplans with known CPI
Less than 3% error compared to cycle accurate uArch simulation
7
Deterministic Heat Diffusion Model [Han: TACS’05]Deterministic Heat Diffusion Model [Han: TACS’05]
The heat diffusion between two modules Mi and Mj
and are the average power densities over time
The total heat diffusion for module Mi
The bigger the heat
diffusion is, the smaller
the temperature gradient
and Tmax are
ijDjDiji lengthsharedPPMMh _)(),(
itoadjacentj
jii MMhH ),(
DiP DjP
8
H H
(a) (b)
Recast of Problem FormulationRecast of Problem Formulation
Find a floorplanning for given soft modules of a microprocessor
Minimize
normthermal
normCPI
normarea DiffusionHeat
DiffusionHeatW
CPI
CPIW
Area
AreaWObj
_
_ max
9
normthermal
normCPI
normarea T
TW
CPI
CPIW
Area
AreaWObj max
Primary Limitation of Deterministic Heat DiffusionPrimary Limitation of Deterministic Heat Diffusion
Average power density ignores power load correlation
(a) Transient temperature is higher when power is positively correlated
(b) Transient temperature is lower when power is negatively correlated
10
Power Correlation of Alpha-chip in SimpleScalarPower Correlation of Alpha-chip in SimpleScalar
(a) Positively correlated (b) uncorrelated
11
Calculation of Power Correlation Calculation of Power Correlation
Treat power for each module as a stochastic process
Obtain samples of the above stochastic process for each module as transient power simulated over SPEC2000 benchmarks
Compute power correlation between modules as co-variance between the above stochastic processes
12
Correlation between ModulesCorrelation between Modules
1 Decode 2 Branch 3 RAT 4 RUU
5 LSQ 6 IALU1 7 IALU2 8 IALU3
9 IntReg 10 IL1 11 DL1 12 IALU4
13 FPAdd 14 FPMul 15 FPReg 16 L2_1
17 L2_2 18 L2_3
13
Correlation between ModulesCorrelation between Modules
1 Decode 2 Branch 3 RAT 4 RUU
5 LSQ 6 IALU1 7 IALU2 8 IALU3
9 IntReg 10 IL1 11 DL1 12 IALU4
13 FPAdd 14 FPMul 15 FPReg 16 L2_1
17 L2_2 18 L2_3
14
Correlation between modules 3 and10 is 0.9
Other Limitations: It Ignores Dead SpaceOther Limitations: It Ignores Dead Space
Without considering dead space may lead to higher Temperature.
15
Floorplan has dead spaces and some modules can diffuse more heat to the dead space.
Ex.M1’s temperature is lower in (a) than that in (b)
Other Limitations: It ignores module geometryOther Limitations: It ignores module geometry
M1 has higher temperature in (a) than in (b), since M2’s area is smaller than M3’s area
Power density:
M1>>M4>M2=M3
16
Besides shared length between modules, the depth of the adjacent module also have to be
considered.
Other Limitations: It ignores border effectOther Limitations: It ignores border effect
Module can diffuse different amount of heat to the border depending on the package
design
17
Stochastic Heat Diffusion ModelStochastic Heat Diffusion Model
Given m modules, n dead spaces, and power vector Pi=[pi1,…,piT] over T time steps for module Mi
Mean power density for module Mi
Ai is the area for module Mi, PDi is the transient power density vector, which equals Pi/Ai.
E(X) is the expectation value of vector X
T
jij
iDiDi p
TAPEP
1
11)(
18
Stochastic Heat Diffusion Model (Cond.)Stochastic Heat Diffusion Model (Cond.)
If the adjacent module Mj or dead space Nj is totally inside the window, we modify PDj to
)1(
)1(
1
1
~
kKD
kKDPP
K
k k
K
k kDk
Dj
19
Stochastic Heat Diffusion Model (Cond.)Stochastic Heat Diffusion Model (Cond.)
Heat diffusion to the adjacent modules
Lij :shared length bewteen Mi and Mj
Heat diffusion to the adjacent dead spaces,
Cij :shared length between Mi and Nj
Heat diffusion to the border
Bi :shared length between Mi and the border Con_lateral and Con_adjacent: unit thermal conductance
x
jijDjDiadji LPPH
1_ )(
y
jijDideadi CPH
1_
20
adjacentCon
lateralConBPBf iDii _
_)(
Stochastic Heat Diffusion Model (Cond.)Stochastic Heat Diffusion Model (Cond.)
Given m modules, n dead spaces, Power density covariance between Mi and Mj
E(PDi,PDj) is the expectation value of PDiPDj over T timesteps
The standard deviation of the total heat diffusion for module Mi
))))((()))(((( 2__
2__ ideadiadjiideadiadjii BfHHEBfHHEsqrt
21
DjDiDjDiDjDi PPPPEPPcov )(),(
Stochastic Heat Diffusion Model (Cond.)Stochastic Heat Diffusion Model (Cond.)
The total stochastic heat diffusion for Mi
Given Z potential hottest modules, the total stochastic heat flow is
Wi: weight proportional to DiP
~
1
_ i
Z
ii HWHeatDiffStochastic
22
iideadiadjii BfEHEHEH 3))(()()( __
~
OutlineOutline
Motivation
Problem formulation and models
Experimental results
Conclusion
23
Implementation and ExperimentImplementation and Experiment
uP 90nm
Issue Width 4
Die Area (mm2) 100
Die Thickness (mm) 0.5
Heat Spreader (mm2) 900
Heat Sink (mm2) 2500
24
The floorplanner uses sequence pair based simulated annealing [PARQUET]
Experiments consider SPEC2000 benchmarks One SuperScalar processors for 90nm technology Modules are soft and the aspect ratio is between 0.33
~3 and L2 is partitioned into three modules
Comparison with HotSpot tool [JILP’05]Comparison with HotSpot tool [JILP’05]
[JILP’05 ] directly calculates temperature but ignores interconnect piplelining
Our model Reduces temperature by up to 3oC with 1.34%
increase in area Runs up to 27x faster
uP in 90nm
Tmax(oC) Area(mm2)(WS) Runtime(s)
[JILP’05] 93.0 119.4(4.7%) 2300
Ours 90.0 121.0(5.6%) 85
Impact -3.2% +1.34% 1/27x
25
Impact of Thermal ModelingImpact of Thermal Modeling
Our stochastic thermal model can reduce temperature up to 8.9oC Compared to the thermal-oblivious floorplanner
Compared with the deterministic model, our model obtains up to 3.2oC reduction of the on-chip peak temperature, and 1.13x better CPI performance.
uP in 90nm
Obj. CPI Tmax(oC) Area(mm2)WS(%)
Best Avg Best Avg Best Avg
AC 0.820 0.890 97.7 96.7 118.5(3.05) 122.4(6.89)
ACHd0.995
+21.3%
1.000
+12.4%
92.0
-5.8%
92.2
-4.7%
122.0(6.67)
+2.9%
125.3(9.08)
+2.3%
ACHs0.880
+7.3%
0.954
+7.2%
88.8
-9.1%
88.9
-8.1%
121.1(6.10)
+2.2%
123.2(7.36)
+0.6%
Obj:
A: area
C: CPI
Hd: [Han:TACS’05]
Hs: Ours
26
ConclusionsConclusions
We have developed a stochastic heat diffusion model to effectively capture correlation between transient power over workload
We have also developed an efficient yet effective thermal-aware uP floorplanning
In the future, we will extend to 3D integration and multi-core processors
27