Can Hardware Performance Counters Be...

28
Can Hardware Performance Counters Be Trusted? Vincent M. Weaver and Sally A. McKee Cornell University 16 September 2008

Transcript of Can Hardware Performance Counters Be...

Page 1: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Can Hardware PerformanceCounters Be Trusted?

Vincent M. Weaver and Sally A. McKee

Cornell University

16 September 2008

Page 2: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Motivation

• Gather Basic Block Vectors for SimPoint

• Attempt to validate

• Found variation

1

Fusion

Page 3: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Hardware Performance Counters

• Available on all modern processors

• Used for validation

• Used for performance and workload characterization

Can they be trusted?

2

Fusion

Page 4: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Related Work

• Black, Huang, Lipasti, Shen. ICCD 1996

PowerPC, Shorter benchmarks

• Korn, Teller, Castillo. IPCCC 2001

MIPS, up to 25% error compared to sim

• Maxwell, Teller, Salayandia, Moore. LACSIS 2002

Pentium III, <1% error, microbenchmarks

• Mytkowicz, Diwan, Hauswirth, Sweeney. NSF-NGS 2008

VM effects can cause up to 5% run-time variation

3

Fusion

Page 5: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Retired Instruction Count

• Universally available

• Should be same on all implementations of ISA

• Used extensively with sampled execution (SimPoint, etc.)

• Part of IPC/CPI metrics

4

Fusion

Page 6: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Experimental Setup

• Linux 2.6.25.4 with perfmon2 patches

• Statically linked 32-bit SPEC CPU 2000 and 2006, full

reference inputs

• Nine systems, seven runs per benchmark per system,

48 SPEC 2000, 55 SPEC 2006

• Only userspace instructions counted

• Pin, Qemu and Valgrind DBI tools also investigated

5

Fusion

Page 7: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Experimental Systems

Processor Speed Memory L1 I/D L2 Cache

Pentium Pro 200MHz 256MB 8KB/8KB 512KB

Pentium II 400MHz 256MB 16KB/16KB 512KB

Pentium III 550MHz 512MB 16KB/16KB 512KB

Pentium 4 2.8GHz 2GB 12Kµ/16KB 512KB

Pentium D 3.46GHz 4GB 12Kµ/16KB 2MB

Athlon XP 1.73GHz 768MB 64KB/64KB 512KB

Phenom 2.2GHz 2GB 64KB/64KB 512KB

Core Duo 1.6GHz 1GB 32KB/32KB 1MB

Core2 Q6600 2.4GHz 2GB 32KB/32KB 4MB

6

Fusion

Page 8: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Sources of Variation

• We found sources of variation:

◦ Inconsistent Instructions

◦ Virtual Memory Layout

◦ Hardware Effects

◦ System Issues

◦ DBI/Simulator Differences

• Can these be mitigated?

7

Fusion

Page 9: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

fldcw Instruction

• Pentium 4 instr retired:nobugsntag counts fldcwas two instructions

• Can lead to a large overcount; 177.mesa has additional

7 billion dynamic instructions, an overcount of 2.4%

• 12 of the benchmarks have overcount of at least 100M

Mitigation:• Use instr completed:nbogus (Pentium D)

• Adjust using DBI-collected fldcw count

8

Fusion

Page 10: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

x86 32-bit Virtual Memory Layout

Operating System

Stack

Heap

BSS

Data

Text (Executable)

0x0000 0000

0xffff ffffOperating System

Exe Name

Cmd Line Args

Env Vars

Stack

0xbfff ffff

9

Fusion

Page 11: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Virtual Memory Randomization

Operating System

BSS

Data

Text (Executable)

0x0000 0000

0xffff ffffOperating System

Exe Name

Cmd Line Args

0xbfff ffff

Heap

Stack

Env Vars

Stack

Stack/HeapRandomization

10

Fusion

Page 12: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Stack Offset Changes

Operating System

Stack

Heap

BSS

Data

Text (Executable)

0x0000 0000

0xffff ffffOperating System

0xbfff ffff

Stack

Cmd Line Args

Exe Name

Env Vars

Environment VariablesCommand LineExectuable Name

11

Fusion

Page 13: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

64-bit Compatibility

Heap

BSS

Data

Text (Executable)

0x0000 0000

Exe Name

Cmd Line Args

Env Vars

Stack

Stack0xffff ffff

64−bitCompatibility

12

Fusion

Page 14: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Virtual Memory Layout Impact

• Pointers as hash table keys:

◦ Heap — parser

◦ Stack — perlbench

• Optimized memory copies

Mitigation:

• linux32 -3 -R — enforce VM, disable randomization

• /proc/sys/kernel/randomize va space

• Enforce environment variable size

13

Fusion

Page 15: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Hardware Effects

• Processor Errata

• Hardware Interrupts — cause extra counts

Mitigation:

• Be aware of errata

• Count or estimate interrupts

14

Fusion

Page 16: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Operating System Effects

• Non-deterministic system calls: time, PID, thread

synchronization, random numbers, network activity, IO

• Page faults

Mitigation:

• Modify benchmarks, use methods to reduce non-

determinism

• Count pagefaults

15

Fusion

Page 17: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

DBI Tool/Simulator Issues

• Instruction complexity: rep prefix

• Floating point rounding issues (art, dealII)

• Virtual Memory Layout

Mitigation:

• Fix simulator/DBI tool

• VM — same as with real hardware

16

Fusion

Page 18: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Results

Two kinds of variation:

• Inter-machine (differences between systems)

• Intra-machine (differences on the same machine)

17

Fusion

Page 19: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Coefficient of Variation – SPEC CPU 2000

Original Standard Deviation Updated Standard Deviation

256.bzip2.graphic

256.bzip2.program

256.bzip2.source

186.crafty.default

252.eon.cook

252.eon.kajiya

252.eon.rushmeier

254.gap.default

176.gcc.166

176.gcc.200

176.gcc.expr

176.gcc.integrate

176.gcc.scilab

164.gzip.graphic

164.gzip.log

164.gzip.program

164.gzip.random

164.gzip.source

181.mcf.default

197.parser.default

253.perlbmk.535

253.perlbmk.704

253.perlbmk.957

253.perlbmk.850

253.perlbmk.diffmail

253.perlbmk.makerand

253.perlbmk.perfect

300.twolf.d

efault

255.vortex.1

255.vortex.2

255.vortex.3

175.vpr.place

175.vpr.route

1

0.001

1e-6

1e-9Coe

ffici

ent o

fV

aria

tion

(log)

188.ammp.default

173.applu.default

301.apsi.default

179.art.110

179.art.470

183.equake.default

187.facerec.default

191.fma3d.default

178.galgel.default

189.lucas.default

177.mesa.default

172.mgrid.default

200.sixtrack.default

171.swim.default

168.wupwise.default

1

0.001

1e-6

1e-9Coe

ffici

ent o

fV

aria

tion

(log)

1.07%

18

Fusion

Page 20: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Coefficient of Variation – SPEC CPU 2006

Original Standard Deviation Updated Standard Deviation

473.astar.BigLakes

473.astar.rivers

401.bzip2.chicken

401.bzip2.combined

401.bzip2.html

401.bzip2.liberty

401.bzip2.program

401.bzip2.source

403.gcc.166

403.gcc.200

403.gcc.c-typeck

403.gcc.cp-decl

403.gcc.expr

403.gcc.expr2

403.gcc.g23

403.gcc.s04

403.gcc.scilab

445.gobmk.13x13

445.gobmk.nngs

445.gobmk.score2

445.gobmk.trevorc

445.gobmk.trevord

464.h264ref.foreman_baseline

464.h264ref.foreman_main

464.h264ref.sss_main

456.hmmer.nph3

456.hmmer.retro

462.libquantum.default

429.mcf.default

471.omnetpp.default

400.perlbench.checkspam

400.perlbench.diffmail

400.perlbench.splitmail

458.sjeng.default

483.xalancbmk.default

1

0.001

1e-6

1e-9Coe

ffici

ent o

fV

aria

tion

(log)

410.bwaves.default

436.cactusADM.default

454.calculix.default

447.dealII.default

416.gamess.cytosine

416.gamess.h2ocu2

416.gamess.triazolium

459.GemsFDTD.default

435.gromacs.default

470.lbm.default

437.leslie3d.default

433.milc.default

444.namd.default

453.povray.default

450.soplex.pds-50

450.soplex.ref

482.sphinx3.default

465.tonto.default

481.wrf.default

434.zeusmp.default

1

0.001

1e-6

1e-9Coe

ffici

ent o

fV

aria

tion

(log)

0.41%

19

Fusion

Page 21: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Same Machine Results – SPEC CPU 2000

Pentium Pro

Pentium II

Pentium III

Pentium 4

Pentium D

Athlon XP

Phenom 9500

Core Duo

Core2 Q6600 PinQemu

Valgrind

010010k1M

100M

-100-10k-1M

-100M

Diff

eren

ce fr

omM

ean

(log)

Diff

eren

ce fr

omM

ean

(log)

pppp pppp

s

ppppp

ppppp pppp pppp

pppp c

pppp aae

fpppp pppp

ppppp

pppp

Original Standard Deviation Updated Standard Deviation

aa

: applu: apsi

ce

: crafty: equake

fp

p : swim: perlbmk

s

: non−outlying benchmarks

: facerec: parser

20

Fusion

Page 22: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Same Machine Results – SPEC CPU 2006

Pentium Pro

Pentium II

Pentium III

Pentium 4

Pentium D

Athlon XP

Phenom 9500

Core Duo

Core2 Q6600 PinQemu

Valgrind

010010k1M

100M

-100-10k-1M

-100M

Diff

eren

ce fr

omM

ean

(log)

Diff

eren

ce fr

omM

ean

(log)

p mp

pp

pp ppp ppp p hhhlmox

bccdgGglmstwz

ps g

l

p

zg

pss p p

p

Original Standard Deviation Updated Standard Deviation

: sjenggg : gromacs

: gcc.expr2

l : leslie3d: non−outlying benchmarks

m : milc

: povrayp : perlbenchp

ss : soplex

: zeusmpz

21

Fusion

Page 23: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Cross Machine Results – SPEC CPU 2000

256.bzip2.graphic

252.eon.cook

197.parser.default

187.facerec.default

177.mesa.default

010010K1M

100M10B

-100-10K-1M

-100M-10B

Diff

eren

ce fr

omM

ean

(log)

Diff

eren

ce fr

omM

ean

(log)

6

6 6

6

6

2

2 2

2

233 3

3

3

4

4

44

4

D

D D

D

D

A

A

A

A

A

9

9 9

9

9CC C

CC

T

T T

T

T

P

PP

P

P

Q

QQ

Q

Q

V

V V

V

V

Original Standard Deviation Updated Standard Deviation

623

: Pentium Pro: Pentium II: Pentium III

4DA

PQVT

: Qemu: Pentium 4: Pentium D: Athlon XP : Core2 Q6600

: Pin

: ValgrindC : Core Duo9 : Phenom 9500

22

Fusion

Page 24: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Cross Machine Results – SPEC CPU 2006

401.bzip2.liberty

403.gcc.scilab

456.hmmer.retro

483.xalancbmk.default

482.sphinx3.default

010010K1M

100M10B

-100-10K-1M

-100M-10B

Diff

eren

ce fr

omM

ean

(log)

Diff

eren

ce fr

omM

ean

(log)

3

3 3

3

3

4 44

4 4

D

D

DD

D

A

A

A AA

9

9

99

9C

C

C

C

C

T

T

TT

T

P P PP

P

Q Q QQ

Q

V

V

V

VV

Original Standard Deviation Updated Standard Deviation

623

: Pentium Pro: Pentium II: Pentium III

4DA

PQVT

: Qemu: Pentium 4: Pentium D: Athlon XP : Core2 Q6600

: Pin

: ValgrindC : Core Duo9 : Phenom 9500

23

Fusion

Page 25: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Conclusion

• The retired instruction counter can be trusted to have

low variation both inter- and intra-machine

• These results hold across processor generations

• For best results, precautions must be taken

24

Fusion

Page 26: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Future Work

• Track down remaining sources of variation

• Non-x86 platforms

• Investigate other counter types

• Parallel workloads

25

Fusion

Page 27: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Tools

All code is available from our tools page:

http://fusion.csl.cornell.edu/tools/

26

Fusion

Page 28: Can Hardware Performance Counters Be Trusted?web.eece.maine.edu/~vweaver/papers/iiswc08/iiswc08_slides.pdf · 3 3 3 3 4 4 4 4 4 D D D D D A A A A A 9 9 9 9 C 9 C C C C T T T T T P

Questions?

All code is available from our tools page:

http://fusion.csl.cornell.edu/tools/

27

Fusion