Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim...

16
Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli ([email protected]) Mustafa Cavus ([email protected]) Resit Sendag ([email protected]) Department of Electrical, Computer, and Biomedical Engineering University of Rhode Island

Transcript of Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim...

Page 1: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Prefetching On-time and When it Works

Sequential Prefetcher With Adaptive Distance (SPAD)

Ibrahim Burak Karsli ([email protected])Mustafa Cavus ([email protected])Resit Sendag ([email protected])

Department of Electrical, Computer, and Biomedical EngineeringUniversity of Rhode Island

Page 2: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Outline

Motivation Sequential Prefetcher with Adaptive Distance (SPAD) Hardware Budget Results

Page 3: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Motivation

Next-line prefetcher (offset: +1) is simple and performs quite well (score ~4.439). But Opportunity loss due to no feedback mechanism

Timeliness: Late prefetches most important problem Accuracy: No on/off mechanism No adaptivity to program behavior changes

Basic idea: Add adaptive distance to next-line prefetcher Start with +1, increment/decrement distance based on

feedback

Page 4: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Motivation

Sequential Prefetcher Performance with FIXED distance (offset)

Distance 1 (next-line) score : 4.439Distance 3 (best) score : 4.484

Page 5: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Terminology

Interval: A period of 512 L2 demand accesses L2miss: Number of L2 misses in an interval Testing Queue (TQ):

FIFO Queue Every predicted address is inserted into TQ Also acts as a prefetch filter tqhits: Number of L2 demand accesses found in TQ in an

interval tqmhits: Number of L2 demand access misses found in TQ in an

interval

Page 6: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

SPAD Prefetcher Components

Update Once Per Interval

Test Addr + 1

Distance ≠ 0 ?

No

Test AddrYes

Test Addr

Accessed L2 Memory Address

Distance ≠ 0 ?&

Not in TQ ?Predicted Addr

Yes

PrefetchPredicted Addr

.

.

.

Counters

tqhits

.

.

.

Decision Engine

tqmhits

l2miss

Page 7: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

SPAD Decision Engine: Distance Update Mechanism

tqhits < 16 l2miss < 10 l2miss-tqmhits > 300 l2miss/tqmhits < 2

Decrement distance

distance = 0Preserve current

behaviour

Increase distance

Yes

Yes

Yes

No No No

distance = 0

Yes

3 Consecutive Intervals

3 Consecutive Intervals

3 Consecutive Intervals

Preserve current

behaviour

distance > 1

Yes

distance < 6

Yes

Page 8: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

SPAD Adaptiveness

197.parser.1

00m

400.perlbench

.100m

410.bwaves.1

00m

434.zeusm

p.100m

436.cactu

sADM

.100m

459.GemsFD

TD.100m

481.wrf.

100m0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00Best Distance SequentialSPAD

BD:3BD:4

BD:6BD:1

BD:1

BD:5BD:1

Comparing the results of SPAD with the results of fixed distance sequential prefetcher using best distances (BD).

Page 9: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

SPAD Hardware & Performance

Prefetcher Score

Sequential +1 4.439

Sequential +3(Best performing offset) 4.483

Ampm lite 4.511

Sandbox (+/- 16)32 offsets 4.578

SPAD 4.584

SPAD Hardware Budget

Test Queue: 4103 bitsRegisters&Counters: 160 bitsTotal: 4263 bits

SPAD Performance

Page 10: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

IP-Stride and SPAD

The score of SPAD is significantly better than the score of ip stride prefetcher.

However, ip stride works significantly better than SPAD for some benchmarks, such as bzip2 and soplex.

Integrating SPAD with ip stride improves SPAD performance by 5.5%.

Page 11: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Submission Hardware Budget

SPAD (4263 bits) Test Queue (4103 bits) Registers&Counters (160 bits)

Ip Stride (67584 bits) Global Prefetch Queue (4103 bits) Total (75950 bits)

Page 12: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Benchmarks

40 benchmarks from SPEC CPU2000, SPEC CPU2006 and Olden benchmark suites.

We used Simpoint 2.0 to generate representative 100M-instruction traces. 10m instructions for warmup 90m instructions for simulation

Page 13: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Results

Config 1 Config 2 Config 3 Config 41

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

ip stride SPAD combined (submitted)

Speedup

Page 14: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Results

Prefetcher Score

Sequential +1 4.439

Sequential +3 4.483

Ampm lite 4.511

Sandbox 4.578

Ip stride 4.300

SPAD 4.584

SPAD & IP Stride (Combined) 4.616

Page 15: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Conclusion

Adaptive distance in sequential prefetchers have significant benefits.

Our submitted version is not optimized. It can be significantly improved as we observed in our later tests.

Combining SPAD with ip stride prefetcher boosts the performance.

Page 16: Prefetching On-time and When it Works Sequential Prefetcher With Adaptive Distance (SPAD) Ibrahim Burak Karsli (bkarsli@ele.uri.edu) Mustafa Cavus (mcavus@my.uri.edu)

Questions?

Thank You