ECE 510 Brendan Crowley Paper Review October 31, 2006.

23
ECE 510 Brendan Crowley Paper Review October 31, 2006
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of ECE 510 Brendan Crowley Paper Review October 31, 2006.

Page 1: ECE 510 Brendan Crowley Paper Review October 31, 2006.

ECE 510Brendan Crowley

Paper ReviewOctober 31, 2006

Page 2: ECE 510 Brendan Crowley Paper Review October 31, 2006.

“Processor Power Reduction Via Single-ISA

Heterogeneous Multi-Core Architectures”

Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha

Ranganathan, Dean M. Tullsen

Page 3: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Presentation Overview Introduction The Architecture Modeling the Architecture Results Critical Analysis / Conclusion

Page 4: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Introduction Background

Processors continue to have increased speed and transistor count as transistor sizes decrease

This leads to increased power consumption which causes problems

Heat dissipation Chip failure Battery life

Designers are always searching for new ways to decrease power consumption

Page 5: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Introduction (2) Most work on reducing power consumption

falls under one of two categories: Voltage and frequency scaling “Gating” – the ability to turn on/off portions of the

core Some designs have included the use of

multiple identical (homogeneous) cores Others have included processors with co-

processors that run a different instruction set

Page 6: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Introduction (3) The Main Idea

Different software applications have different resource requirements

This fact leads the authors to believe that core diversity is of greater value than uniformity

Therefore, proposed design is a single-ISA heterogeneous multi-core architecture

Each core runs the same instruction set, but has different abilities and performance characteristics

Page 7: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture One method is to take a family of

previously designed cores, modify their interfaces, and combine them on one die

Each core executes same instruction set, but contains different resources, and therefore achieves different performance and energy efficiency on the same application

Page 8: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (2) The operating system determines the

application’s requirements and decides which core is best to use (which core will be the most energy efficient)

To accommodate a wide variety of applications, the cores should have a wide range of performances

Page 9: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (3) Authors chose a 5-core design, using

existing cores with a few changes: Hypothetical single-threaded version of the

EV8 (Alpha 21464), which they call the “EV8-” MIPS R4700 EV4 (Alpha 21064) EV5 (Alpha 21164) EV6 (Alpha 21264)

Page 10: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (4) Assumptions

Each core has a private L1 data and instruction cache

All cores share an L2 cache, phase-locked-loop circuitry and pins

Implemented in 0.10 micron technology One application running at a time (one thread

running)

Page 11: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (5) Relative core sizes

Page 12: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (6) Different parts of a program may require

different resources To take full advantage of the core diversity

it is necessary to switch between cores in the middle of program execution This is done at operating system timeslice

intervals, with user-state already saved to memory

If the OS decides to switch cores, the data is saved to the shared L2 cache, where the next core can retrieve it

Page 13: ECE 510 Brendan Crowley Paper Review October 31, 2006.

The Architecture (7) The authors assume the unused cores are

powered down to avoid static leakage and dynamic switching power This means time must be spent powering up

the cores Experimental results show that this

doesn’t affect performance when core-switching is done at OS timer intervals, even with pessimistic assumptions about power-up time and software overhead

Page 14: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Modeling the Architecture Data on the EV8 was based on some

predictions and reported data Data on the other cores was from

published literature Assume all of the alpha cores run at

2.1GHz (since they assume 0.10 micron process), and the R4700 runs at 1GHz

Page 15: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Modeling the Architecture (2) All architectures were modeled as

accurately as possible on a highly detailed instruction-level simulator, using the configurations in the table below

Page 16: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Modeling the Architecture (3) The table below shows the area and peak

power statistics of the cores Areas were found from die photos Total Die area is approximately 400mm2

Page 17: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Modeling the Architecture (4) Benchmark execution simulated using

SMTSIM Simulator was modified to simulate a

multi-core processor with a shared L2 cache

Assume a single thread running on one core at a time

Switching cores requires the active core’s pipeline to be flushed and writing back the L1 cache lines to the L2 cache

Page 18: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Results The following figure shows results for the

SPEC application applu The Y-axis, IPS2/W, is basically the inverse

of power-delay product Constraint:

Never choose a core that sacrifices more than 50% performance relative to EV8- over an interval

Page 19: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Results (2)

Page 20: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Results (3) Compared to a single-core architecture,

this design could ideally reduce the PDP by 74% Combination of 25% performance loss and 81%

energy savings Could change the constraint to achieve

greater PDP savings (sacrificing performance, of course)

Another design point gives 36% energy savings with 4% performance loss

Page 21: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Results (4) Could optimize other metrics besides PDP,

depending on the design goals Different power and performance tradeoffs

can be made simply by changing the core switching algorithm (no need to change the hardware)

Page 22: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Critical Analysis / Conclusion There are a lot of assumptions made about

things like frequency scaling, power consumption of cores, etc.

This paper only reports results for one benchmark application

Multiple cores/threads running at the same time would likely be used in practice How would this affect the core switching

complexity and latency

Page 23: ECE 510 Brendan Crowley Paper Review October 31, 2006.

Critical Analysis / Conclusion (2) This technique seems like a very good one

Homogeneous multi-core chips are already on the market

Potential for significant energy savings