Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

27
Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg

Transcript of Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Page 1: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Configurational Workload Characterization

Hashem H. Najaf-abadi

Eric Rotenberg

Page 2: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Program 2 Program 1

Heterogeneity

Processor

A Single-Core:

Page 3: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Program 2

Processor

Program 1

Heterogeneity

Processor

Processor

Multiple Cores:

Page 4: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Program 2

Processor

Program 1

Heterogeneity

Multiple Cores:

Processor

Processor

Page 5: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Program 1Program 2

Heterogeneity

Processor

Processor

Heterogeneous Cores:

Page 6: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Heterogeneous CMP Design

Must determine:

1) Best processor configuration for a group of

workloads.

2) Best way to group workloads together.

Page 7: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

The Challenge:

A

B

C

D

Core 1

Core 2

Workload Space Best core configurations

Core 1

Core 2

Communal Customization

EF

GH

I

JK

L

M

N

Page 8: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Existing Approaches

• Regression models: Enable speedy exploration.

• Subsetting: Reduce workloads to a representative subset based on characteristics.

Page 9: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

The Argument

• Subsetting isn’t a valid substitute or facilitator for communal customization.

• Reason: complex interdependencies between different architectural units.

Page 10: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Ties that bind

1) The global clock intertwines the sizing of different architectural units.

2) The burden of compromise in one unit can be passed on to another.

Page 11: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Example: The Global Clock

solid line: delay of the issue queue,dashed line: access delay of the cache

1ns

CacheIss

ue

Qu

eu

e

0.66ns

CacheIss

ue

Qu

eu

e

0.66ns

Cache

Iss

ue

Qu

eu

e

1ns

Cache

Iss

ue

Qu

eu

ePipeline:

Less slack Slack

Pipeline too deep

Small Issue-queue

Needlessly large cache

Page 12: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Example: The Global Clock

The clock period, issue-queue size and cache size can not be optimized independent of each other.

1ns

Cache

Issu

e Q

ueu

e

0.66ns

Cache

Issu

e Q

ueu

e

0.66ns

Cache

Issu

e Q

1ns

Cache

Issu

e Q

ueu

e

Page 13: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Ties that bind

1) The global clock intertwines the sizing of different architectural units.

2) The burden of compromise in one unit can be passed on to another.

Page 14: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Example: Passing on the Burden

02468

10A

B

CD

E

024

68

10A

B

CD

E

024

68

10A

B

CD

E

A) Working-set size, B) Branch predictabilityC) Density of dependence chains D) Frequency of loadsE) Frequency of conditional branches* All normalized to a scale of 0~10

βα γ

Page 15: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Example: Passing on the Burden

02468

10A

B

CD

E

024

68

10A

B

CD

E

024

68

10A

B

CD

E

A) Working-set sizeB) Branch predictabilityC) Density of dependence chains D) Frequency of loadsE) Frequency of conditional branches* all normalized to a scale of 0~10

βα γ

L HSpeed:

Core

Cache

Core

Cache

L HL H

Cache

L H L H

Customized Architectures:

Page 16: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Example: Passing on the Burden

02468

10A

B

CD

E

024

68

10A

B

CD

E

024

68

10A

B

CD

E

A) Working-set size, B) Branch predictabilityC) Density of dependence chains D) Frequency of loadsE) Frequency of conditional branches* all normalized to a scale of 0~10

βα γ

Speed:

Core

CacheCache

Core

L HL H L H

Customized Architectures:

Page 17: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

A More Accurate Solution

• Represent workloads by their customized architectural configurations.

• Allows for direct and accurate evaluation how well different workloads do on customized configurations.

• We call this Configurational Workload Characterization

Page 18: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Design Process Overview

Important workloads

Rep. workloads

Optimal core combination

Select representative workloads based on workload behavior

Search for opt. core combination

Important workloads

Customized architectures

Optimal core combination

Customize a core for each workload (configurational characterization)

Search for opt. core combination

How not to do it How to do it

Page 19: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Pros & Cons

- more costly to determine

+ provides a more optimal design solution

+ provides a systematic approach

+ can be performed prior to the design phase that is critical for time-to-market

Page 20: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

XP-SCALAR

• A superscalar design-space exploration frame work

• www4.ncsu.edu/~hhashem/xpscalar.htm

• Uses Simplescalar to perform cycle-accurate simulations

• Uses CACTI model to approximate the access latency of the different units

Page 21: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

XP-SCALAR

What parameters are varied: Clock period,

Processor width,

Size of the issue queue,

Size of the register-file,

Size of the load-store queue,

Size of the L1 and L2 caches

Page 22: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

XP-SCALAR

How they are varied:a) Clock period is varied, and architecture

parameters are adjusted to make latencies fit within pipeline stages.

b) Number of pipeline stages of a unit is varied and its configuration

appropriately adjusted.

Page 23: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Determining the Best cores

• Execute all benchmarks on each-other’s customized configurations.

• From that, determine best grouping through a complete search.

Page 24: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Best Core Results

customized core(s) avg. IPT har. IPT

best config for avg. & har. IPT gcc 2.06 1.57

2 best configs for avg. IPT parser, twolf 2.27 1.76

2 best configs for har. IPT gcc, mcf 2.12 1.88

3 best configs for avg. IPT crafty, parser, twolf 2.35 1.82

3 best configs for har. IPT crafty, mcf, twolf 2.27 2.05

4 best configs for avg. & har. IPT crafty, mcf, parser, twolf 2.32 2.08

each benchmark on its own customized architecture

- 2.38 2.12

Page 25: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

The effect of subsetting

• Subsetting of a single pair of benchmarks results in the extraction of a totally different set of best cores.

Page 26: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Representation

• Dendogram are

Page 27: Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.

Conclusions

• There are interdependencies between architectural units in how they are customized.

• In the design of a heterogeneous CMP subsetting can lead to performance degradation.